| Abstract |
In smart cities, pedestrian trajectory prediction is a key component of autonomous driving technology, addressing the uncertainty of human behavior by modeling the multimodality of future motion states, thereby enhancing traffic safety and efficiency. To address this challenge, we propose a novel model, Enhanced Multimodal Prediction via Feature Fusion and Momentum Buffering (FFMB), which aims to extract richer features from limited datasets and effectively handle motion uncertainty to improve prediction performance. Specifically, we construct vector coordinates to compute and analyze the correlation features between different time nodes, thereby capturing short-term and long-term dependency coupling. Additionally, by integrating temporal frame features and keypoint features of agents, we achieve deep cross-modal information fusion. To tackle motion uncertainty, we introduce a stochastic trajectory prediction correction module, which can select and optimize the best path from multiple possible prediction outcomes. Moreover, we adopt an effective sparse self-attention mechanism to ensure the accuracy of the learning process. Our method has been extensively validated on multiple pedestrian trajectory prediction benchmarks, including ActEV/VIRAT, Forking Paths, Stanford Drone and ETH/UCY datasets, with experimental results showing that our approach has significant advantages. © 2025 Elsevier Ltd |