Smart City Gnosys

Smart city article details

Title Gradient-Enhanced Focal-Pooling Vision Transformer With Adaptive Tuning For Robust And Accurate Vehicle Detection In Smart Environments
ID_Doc 28240
Authors Ankireddy P.; Gopalakrishnan S.; Reddy V.L.
Year 2025
Published Iran Journal of Computer Science
DOI http://dx.doi.org/10.1007/s42044-025-00279-z
Abstract Accurate and efficient vehicle detection is critical to the safety guarantee, traffic optimization, and intelligent transportation development. However, the strong detection under possible occlusion, dynamic illumination, and sizeable intra-class shape and size variation of the vehicles is still very challenging. Conventional object detection systems are not usually accurate in intricate cityscapes, as false alarms and undetected objects tend to render such systems unreliable. This work introduces a new deep learning model, i.e., focal-pooling vision transformer (FoPViT), to address such an issue by combining the advantages of focal transformers and pooling-based vision transformers. To a large degree, it helps integrate the gradient-aware pooling tuner (GAPT), a new mechanism capable of dynamically adjusting the sizes of pooling kernels about gradient signals produced during training. The adaptive policy facilitates the effective extraction of features over scales, thus maintaining fine details and correct detection for vehicles with different sizes and orientations. The innovation of the new model is that it is two-sided: on the one hand, focal attention excludes the non-vital areas, and on the other hand, GAPT optimizes spatial feature pooling for more accurate results with less computational cost. The proposed system expands the vehicle detection models’ functionality and provides a new standard for intelligent detection systems in moving scenes. The proposed model reflects improved accuracy in detecting vehicles. The experimental outcomes show that FoPViT is 98% accurate with precision, recall, and F1-score rates at 97.5%, 97.8%, and 98%, respectively. Regarding efficiency, the model processes inference in 15 ms and can be used in real time. © The Author(s), under exclusive licence to Springer Nature Switzerland AG 2025.
Author Keywords Deep learning; Image processing; Object detection; Optimization; Smart city; Vehicle detection


Similar Articles


Id Similarity Authors Title Published
38029 View0.859Qiu Y.; Yang J.; Shao R.; Sha Q.Mrpvt: A Novel Multiscale Rain-Cutting And Pooling Vision Transformer Model For Object Detection On Drone-Captured Scenarios2024 2nd International Conference on Computer, Vision and Intelligent Technology, ICCVIT 2024 - Proceedings (2024)