Smart City Gnosys

Smart city article details

Title Uavformer: A Composite Transformer Network For Urban Scene Segmentation Of Uav Images
ID_Doc 59303
Authors Yi S.; Liu X.; Li J.; Chen L.
Year 2023
Published Pattern Recognition, 133
DOI http://dx.doi.org/10.1016/j.patcog.2022.109019
Abstract Urban scenes segmentation based on UAV (Unmanned aerial vehicle) view is a fundamental task for the applications of smart city such as city planning, land use monitoring, traffic monitoring, and crowd estimation. While urban scenes in UAV image characteristic by large scale variation of objects size and complexity background, which posed challenges to urban scenes segmentation of UAV image. The feature extracting backbone of existing networks cannot extract complex features of UAV image effectively, which limits the performance of urban scenes segmentation. To design segmentation network capable of extracting features of large scale variation urban ground scenes, this study proposed a novel composite transformer network for urban scenes segmentation of UAV image. A composite backbone with aggregation windows multi-head self-attention transformer blocks is proposed to make the extracted features more representatives by adaptive multi-level features fusion, and the full utilisation of contextual information and local information. Position attention modules are inserted in each stage between encoder and decoder to further enhance the spatial attention of extracted feature maps. Finally, a V-shaped decoder which is capable of utilising multi-level features is designed to get accurately dense prediction. The accuracy of urban scenes segmentation could significantly be enhanced in this way and successfully segmented the large scale variation objects from UAV views. Extensive ablation experiments and comparative experiments for the proposed network have been conducted on the public available urban scenes segmentation datasets for UAV imagery. Experimental results have demonstrated the effectiveness of designed network structure and the superiority of proposed network over state-of-the-art methods. Specifically, reached 53.2% mIoU on the UAVid dataset and 77.6% mIoU on the UDD6 dataset, respectively. © 2022 Elsevier Ltd
Author Keywords Aggregation windows multi-head self-attention transformer block; Composite backbone; UAV image; Urban scenes segmentation; V-shaped decoder


Similar Articles


Id Similarity Authors Title Published
35283 View0.876Yang F.; Jia L.; Purwanto E.; Smith J.; Man K.L.; Yue Y.Lightweight Uav Image Segmentation Model Design With Edge Feature Aggregation2023 International Conference on Platform Technology and Service, PlatCon 2023 - Proceedings (2023)
38029 View0.859Qiu Y.; Yang J.; Shao R.; Sha Q.Mrpvt: A Novel Multiscale Rain-Cutting And Pooling Vision Transformer Model For Object Detection On Drone-Captured Scenarios2024 2nd International Conference on Computer, Vision and Intelligent Technology, ICCVIT 2024 - Proceedings (2024)
15925 View0.851Said Y.; Alassaf Y.; Saidani T.; Ghodhbani R.; Rhaiem O.B.; Alalawi A.A.Context-Aware Feature Extraction Network For High-Precision Uav-Based Vehicle Detection In Urban EnvironmentsComputers, Materials and Continua, 81, 3 (2024)