| Abstract |
Person re-identification aims to recognize a target pedestrian across non-overlapping camera views based on source information. The Internet of Things (IoT) provides a wide range of application scenarios for pedestrian re-identification technology-smart city management, resource optimization, and multi-source data fusion. It is crucial for IoT applications like intelligent video surveillance but remains challenging due to factors like low image resolution, varying angles, lighting changes, and occlusion. In this paper, we propose a multi-task learning approach that integrates text information to enhance recognition accuracy. Using a dual-stream Transformer encoder, we extract both image and text features. To improve feature interaction and learning, we perform multimodal interaction for fine-grained alignment and share feature for modality-invariant feature representation and learning. Our method, TFTI, outperforms state-of-the-art techniques in person re-identification, as validated on the CUHK-PEDES dataset. © 2024 IEEE. |