Smart City Gnosys

Smart city article details

Title Mvr: Synergizing Large And Vision Transformer For Multimodal Natural Language-Driven Vehicle Retrieval
ID_Doc 38752
Authors Alzubi T.M.; Mukhtar U.R.
Year 2025
Published IEEE Access, 13
DOI http://dx.doi.org/10.1109/ACCESS.2024.3524392
Abstract In recent years, intelligent transportation systems have played a pivotal role in the development of smart cities, with vehicle retrieval becoming a critical component of traffic management and surveillance. Traditional vehicle retrieval systems rely heavily on image-based matching techniques derived from vehicle re-identification (VReID) tasks. However, these approaches are limited by their dependency on image queries, which may not always be available in real-world scenarios. Natural language (NL)-based vehicle retrieval systems offer a more flexible and accessible alternative by enabling users to query vehicles using textual descriptions. Despite progress in NL-based retrieval, existing methods face challenges in fully capturing multi-granularity information and aligning heterogeneous visual and linguistic inputs. This paper addresses these limitations by proposing a robust Multimodal Vehicle Retrieval (MVR) model that integrates both visual and textual data through a dual-stream architecture. Our model captures complementary local features, alongside global information including motion and environmental context. We utilize InfoNCE and instance losses to align the visual and textual modalities within a shared feature space, while post-processing modules, including Granular Vehicle Feature Refinement and Spatial Relationship Modeling, further enhance retrieval performance by refining vehicle attributes and contextual relationships. Our experiments, conducted on the CityFlow-NL dataset, demonstrate that our model achieves a 35.6% improvement in Mean Reciprocal Rank (MRR), a 41.3% increase in recall at 5 (R@5), and a 22.9% improvement in recall at 10 (R@10) compared to the baseline, and overcomes the inherent challenges of cross-modal retrieval in improving real-world VReID. © 2013 IEEE.
Author Keywords intelligent transportation; multimodal; smart city; Traffic surveillance; vehicle tracking


Similar Articles


Id Similarity Authors Title Published
12878 View0.923Bo X.; Liu J.; Yang D.; Ma W.Bridging The Gap: Multi-Granularity Representation Learning For Text-Based Vehicle RetrievalComplex and Intelligent Systems, 11, 1 (2025)
47389 View0.916Sadiq T.; Omlin C.W.Scene Retrieval In Traffic Videos With Contrastive Multimodal LearningProceedings - International Conference on Tools with Artificial Intelligence, ICTAI (2023)
57381 View0.914Sebastian C.; Imbriaco R.; Meletis P.; Dubbelman G.; Bondarev E.; De With P.H.N.Tied: A Cycle Consistent Encoder-Decoder Model For Text-To-Image RetrievalIEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (2021)
39699 View0.908Du Y.; Zhang B.; Ruan X.; Su F.; Zhao Z.; Chen H.Omg: Observe Multiple Granularities For Natural Language-Based Vehicle RetrievalIEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2022-June (2022)
7243 View0.877Scribano C.; Sapienza D.; Franchini G.; Verucchi M.; Bertogna M.All You Can Embed: Natural Language Based Vehicle Retrieval With Spatio-Temporal TransformersIEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (2021)
3230 View0.873Shankaranarayan N.; Sowmya Kamath S.A Novel Approach For Real-Time Vehicle Re-Identification Using Content-Based Image Retrieval With Relevance FeedbackSpringer Proceedings in Mathematics and Statistics, 401 (2023)
18090 View0.873Shankaranarayan N.; Kamath Sowmya S.Deep Vision Based Vehicle Retrieval For Automated Smart Traffic Surveillance SystemsProceedings - 2022 3rd International Conference on Computation, Automation and Knowledge Management, ICCAKM 2022 (2022)
60942 View0.859Xiang X.; Ma Z.; Li X.; Zhang L.; Zhen X.Vehicle Re-Identification With Large Separable Kernel Attention And Hybrid Channel AttentionImage and Vision Computing, 155 (2025)
58804 View0.854Taleb H.; Wang C.Transformer-Based Vehicle Re-Identification With Multiple DetailsProceedings of SPIE - The International Society for Optical Engineering, 13074 (2024)
6641 View0.854Yi X.; Wang Q.; Liu Q.; Rui Y.; Ran B.Advances In Vehicle Re-Identification Techniques: A SurveyNeurocomputing, 614 (2025)