Smart City Gnosys

Smart city article details

Title Automatic Estimation For Visual Quality Changes Of Street Space Via Street-View Images And Multimodal Large Language Models
ID_Doc 11315
Authors Liang H.; Zhang J.; Li Y.; Wang B.; Huang J.
Year 2024
Published IEEE Access, 12
DOI http://dx.doi.org/10.1109/ACCESS.2024.3408843
Abstract Estimating Visual Quality of Street Space (VQoSS) is pivotal for urban design, environmental sustainability, civic engagement, etc. Recent advancements, notably in deep learning, have enabled large-scale analysis. However, traditional deep learning approaches are hampered by extensive data annotation requirements and limited adaptability across diverse VQoSS tasks. Multimodal Large Language Models (MLLMs) have recently demonstrated proficiency in various computer vision tasks, positioning them as promising tools for automated VQoSS assessment. In this paper, we pioneer the application of MLLMs to VQoSS change estimation, with our empirical findings affirming their effectiveness. In addition, we introduce Street Quality Generative Pre-trained Transformer (SQ-GPT), a model that distills knowledge from the current most powerful but inaccessible (not free) GPT-4V, requiring no human efforts. SQ-GPT approaches GPT-4V's performance and is viable for large-scale VQoSS change estimation. In a case study of Nanjing, we showcase the practicality of SQ-GPT and knowledge distillation pipeline. Our work promises to be a valuable asset for future urban studies research. © 2013 IEEE.
Author Keywords deep learning; multimodal large language models; Smart city; visual quality


Similar Articles


Id Similarity Authors Title Published
53202 View0.865Zhao C.; Ogawa Y.; Chen S.; Oki T.; Sekimoto Y.Street Space Quality Improvement: Fusion Of Subjective Perception In Street View Image GenerationInformation Fusion, 125 (2026)
60280 View0.852Li Z.; Xia L.; Tang J.; Xu Y.; Shi L.; Xia L.; Yin D.; Huang C.Urbangpt: Spatio-Temporal Large Language ModelsProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2024)
48265 View0.852Arulananth T.S.; Kuppusamy P.G.; Ayyasamy R.K.; Alhashmi S.M.; Mahalakshmi M.; Vasanth K.; Chinnasamy P.Semantic Segmentation Of Urban Environments: Leveraging U-Net Deep Learning Model For Cityscape Image AnalysisPLoS ONE, 19, 4 April (2024)