Smart City Gnosys

Smart city article details

Title Mtrain: Enable Efficient Cnn Training On Heterogeneous Fpga-Based Edge Servers
ID_Doc 38056
Authors Tang Y.; Jones A.K.; Xiong J.; Zhou P.; Hu J.
Year 2025
Published IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
DOI http://dx.doi.org/10.1109/TCAD.2025.3541486
Abstract FPGA-based edge servers are used in many applications in smart cities, hospitals, retail, etc. Equipped with heterogeneous FPGA-based accelerator cards, the servers can be implemented with multiple tasks including efficient video prepossessing, machine learning algorithm acceleration, etc. These servers are required to implement inference during the daytime while re-training the model during the night to adapt to new environments, domains, or new users. During the re-training, conventionally, the incoming data are transmitted to the cloud, and then the updated machine learning models will be transferred back to the edge server. Such a process is inefficient and cannot protect users' privacy, so it is desirable for the models to be directly trained on the edge servers. Deploying convolutional neural network (CNN) training on heterogeneous resource-constrained FPGAs is challenging since it needs to consider both the complex data dependency of the training process and the communication bottleneck among different FPGAs. Previous multi-accelerator training algorithms select optimal scheduling strategies for data parallelism, tensor parallelism, and pipeline parallelism. However, pipeline parallelism cannot deal with batch normalization (BN) which is an essential CNN operator, while purely applying data parallelism and tensor parallelism suffers from resource under-utilization and intensive communication costs. In this work, we propose MTrain, a novel multi-accelerator training scheduling strategy that transfers the training process into a multi-branch workflow, thus independent sub-operations of different branches are executed on different training accelerators in parallelism for better utilization and reduced communication overhead. Experimental results show that we can achieve efficient CNN training on heterogeneous FPGA-based edge servers with 1.07x-2.21x speedup under 15 GB/s peer-to-peer bandwidth compared to the state-of-the-art work. © 2025 IEEE.
Author Keywords CNN training; edge server; heterogeneous FPGAs


Similar Articles


Id Similarity Authors Title Published
6307 View0.884Zhou L.; Samavatian M.H.; Bacha A.; Majumdar S.; Teodorescu R.Adaptive Parallel Execution Of Deep Neural Networks On Heterogeneous Edge DevicesProceedings of the 4th ACM/IEEE Symposium on Edge Computing, SEC 2019 (2019)
5923 View0.868Xiang T.; Feng Y.; Ye X.; Tan X.; Li W.; Zhu Y.; Wu M.; Zhang H.; Fan D.Accelerating Cnn Algorithm With Fine-Grained Dataflow ArchitecturesProceedings - 20th International Conference on High Performance Computing and Communications, 16th International Conference on Smart City and 4th International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018 (2019)
13852 View0.861Prashanthi S.K.; Kesanapalli S.A.; Simmhan Y.Characterizing The Performance Of Accelerated Jetson Edge Devices For Training Deep Learning ModelsProceedings of the ACM on Measurement and Analysis of Computing Systems, 6, 3 (2022)
38608 View0.853Chen Z.; Luo L.; Quan W.; Shi Y.; Yu J.; Wen M.; Zhang C.Multiple Cnn-Based Tasks Scheduling Across Shared Gpu Platform In Research And Development ScenariosProceedings - 20th International Conference on High Performance Computing and Communications, 16th International Conference on Smart City and 4th International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018 (2019)
20553 View0.852Li Q.; Huang L.; Tong Z.; Du T.-T.; Zhang J.; Wang S.-C.Dissec: A Distributed Deep Neural Network Inference Scheduling Strategy For Edge ClustersNeurocomputing, 500 (2022)
34376 View0.852Wang H.; Chen X.; Xu H.; Liu J.; Huang L.Joint Job Offloading And Resource Allocation For Distributed Deep Learning In Edge ComputingProceedings - 21st IEEE International Conference on High Performance Computing and Communications, 17th IEEE International Conference on Smart City and 5th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2019 (2019)
21859 View0.851Xue F.; Fang W.; Xu W.; Wang Q.; Ma X.; Ding Y.Edgeld: Locally Distributed Deep Learning Inference On Edge Device ClustersProceedings - 2020 IEEE 22nd International Conference on High Performance Computing and Communications, IEEE 18th International Conference on Smart City and IEEE 6th International Conference on Data Science and Systems, HPCC-SmartCity-DSS 2020 (2020)