Smart City Gnosys

Smart city article details

Title	Mtrain: Enable Efficient Cnn Training On Heterogeneous Fpga-Based Edge Servers
ID_Doc	38056
Authors	Tang Y.; Jones A.K.; Xiong J.; Zhou P.; Hu J.
Year	2025
Published	IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
DOI	http://dx.doi.org/10.1109/TCAD.2025.3541486
Abstract	FPGA-based edge servers are used in many applications in smart cities, hospitals, retail, etc. Equipped with heterogeneous FPGA-based accelerator cards, the servers can be implemented with multiple tasks including efficient video prepossessing, machine learning algorithm acceleration, etc. These servers are required to implement inference during the daytime while re-training the model during the night to adapt to new environments, domains, or new users. During the re-training, conventionally, the incoming data are transmitted to the cloud, and then the updated machine learning models will be transferred back to the edge server. Such a process is inefficient and cannot protect users' privacy, so it is desirable for the models to be directly trained on the edge servers. Deploying convolutional neural network (CNN) training on heterogeneous resource-constrained FPGAs is challenging since it needs to consider both the complex data dependency of the training process and the communication bottleneck among different FPGAs. Previous multi-accelerator training algorithms select optimal scheduling strategies for data parallelism, tensor parallelism, and pipeline parallelism. However, pipeline parallelism cannot deal with batch normalization (BN) which is an essential CNN operator, while purely applying data parallelism and tensor parallelism suffers from resource under-utilization and intensive communication costs. In this work, we propose MTrain, a novel multi-accelerator training scheduling strategy that transfers the training process into a multi-branch workflow, thus independent sub-operations of different branches are executed on different training accelerators in parallelism for better utilization and reduced communication overhead. Experimental results show that we can achieve efficient CNN training on heterogeneous FPGA-based edge servers with 1.07x-2.21x speedup under 15 GB/s peer-to-peer bandwidth compared to the state-of-the-art work. © 2025 IEEE.
Author Keywords	CNN training; edge server; heterogeneous FPGAs

Similar Articles

Id	Similarity	Authors	Title	Published
6307	0.884	Zhou L.; Samavatian M.H.; Bacha A.; Majumdar S.; Teodorescu R.	Adaptive Parallel Execution Of Deep Neural Networks On Heterogeneous Edge Devices	Proceedings of the 4th ACM/IEEE Symposium on Edge Computing, SEC 2019 (2019)
5923	0.868	Xiang T.; Feng Y.; Ye X.; Tan X.; Li W.; Zhu Y.; Wu M.; Zhang H.; Fan D.	Accelerating Cnn Algorithm With Fine-Grained Dataflow Architectures	Proceedings - 20th International Conference on High Performance Computing and Communications, 16th International Conference on Smart City and 4th International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018 (2019)
13852	0.861	Prashanthi S.K.; Kesanapalli S.A.; Simmhan Y.	Characterizing The Performance Of Accelerated Jetson Edge Devices For Training Deep Learning Models	Proceedings of the ACM on Measurement and Analysis of Computing Systems, 6, 3 (2022)
38608	0.853	Chen Z.; Luo L.; Quan W.; Shi Y.; Yu J.; Wen M.; Zhang C.	Multiple Cnn-Based Tasks Scheduling Across Shared Gpu Platform In Research And Development Scenarios	Proceedings - 20th International Conference on High Performance Computing and Communications, 16th International Conference on Smart City and 4th International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018 (2019)
20553	0.852	Li Q.; Huang L.; Tong Z.; Du T.-T.; Zhang J.; Wang S.-C.	Dissec: A Distributed Deep Neural Network Inference Scheduling Strategy For Edge Clusters	Neurocomputing, 500 (2022)
34376	0.852	Wang H.; Chen X.; Xu H.; Liu J.; Huang L.	Joint Job Offloading And Resource Allocation For Distributed Deep Learning In Edge Computing	Proceedings - 21st IEEE International Conference on High Performance Computing and Communications, 17th IEEE International Conference on Smart City and 5th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2019 (2019)
21859	0.851	Xue F.; Fang W.; Xu W.; Wang Q.; Ma X.; Ding Y.	Edgeld: Locally Distributed Deep Learning Inference On Edge Device Clusters	Proceedings - 2020 IEEE 22nd International Conference on High Performance Computing and Communications, IEEE 18th International Conference on Smart City and IEEE 6th International Conference on Data Science and Systems, HPCC-SmartCity-DSS 2020 (2020)