| Abstract |
Accurately delineating and identifying different functional areas in cities (e.g., commercial, residential, industrial zones) is crucial for urban development and the creation of smart cities. With the rapid advancements in remote sensing technology and big data, auxiliary data, such as Points of Interest (POI), have become key resources for improving functional area recognition accuracy. However, the heterogeneity and uneven spatial distribution between remote sensing imagery and POI data present significant challenges for data fusion and functional area identification. Although traditional CNNs have shown great success in image classification, they face difficulties in handling multi-source data fusion. This paper proposes an innovative solution to address these challenges. First, we optimize the traditional CNN by introducing a spatiotemporal rule-based fusion mechanism (ST-CNN), incorporating cutting-edge deep learning techniques. Second, we employ Transformer technology to enhance the model's performance to capture complex spatial patterns for functional area recognition. The combined framework, ST-Former (ST-CNN + Transformer), improves accuracy by 3%-15% compared to other classic CNN models. Additionally, when data fused through ST-CNN is input into other methods, accuracy improves by 1%-5%, balancing reconstruction performance with network complexity. © 2025 Copyright held by the owner/author(s). |