Smart City Gnosys

Smart city article details

Title	Tied: A Cycle Consistent Encoder-Decoder Model For Text-To-Image Retrieval
ID_Doc	57381
Authors	Sebastian C.; Imbriaco R.; Meletis P.; Dubbelman G.; Bondarev E.; De With P.H.N.
Year	2021
Published	IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
DOI	http://dx.doi.org/10.1109/CVPRW53098.2021.00467
Abstract	Retrieving specific vehicle tracks by Natural Language (NL)-based descriptions is a convenient way to monitor vehicle movement patterns and traffic-related events. NL-based image retrieval has several applications in smart cities, traffic control, etc. In this work, we propose TIED, a text-to-image encoder-decoder model for the simultaneous extraction of visual and textual information for vehicle track retrieval. The model consists of an encoder network that enforces the two modalities into a common latent space and a decoder network that performs an inverse mapping to the text descriptions. The method exploits visual semantic attributes of a target vehicle along with a cycle-consistency loss. The proposed method employs both intra-modal and inter-modal relationships to improve retrieval performance. Our system yields competitive performance achieving the 7th position in the Natural Language-Based Vehicle Retrieval public track of the 2021 NVIDIA AI City Challenge. We demonstrate that the proposed TIED model obtains six times higher Mean Reciprocal Rank (MRR) than the baseline, achieving an MRR of 15.48. The code and models will be made publicly available. © 2021 IEEE.
Author Keywords

Similar Articles

Id	Similarity	Authors	Title	Published
38752	0.914	Alzubi T.M.; Mukhtar U.R.	Mvr: Synergizing Large And Vision Transformer For Multimodal Natural Language-Driven Vehicle Retrieval	IEEE Access, 13 (2025)
39699	0.885	Du Y.; Zhang B.; Ruan X.; Su F.; Zhao Z.; Chen H.	Omg: Observe Multiple Granularities For Natural Language-Based Vehicle Retrieval	IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2022-June (2022)
7243	0.878	Scribano C.; Sapienza D.; Franchini G.; Verucchi M.; Bertogna M.	All You Can Embed: Natural Language Based Vehicle Retrieval With Spatio-Temporal Transformers	IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (2021)
47389	0.877	Sadiq T.; Omlin C.W.	Scene Retrieval In Traffic Videos With Contrastive Multimodal Learning	Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI (2023)
12878	0.875	Bo X.; Liu J.; Yang D.; Ma W.	Bridging The Gap: Multi-Granularity Representation Learning For Text-Based Vehicle Retrieval	Complex and Intelligent Systems, 11, 1 (2025)