Smart City Gnosys

Smart city article details

Title Characterizing The Performance Of Accelerated Jetson Edge Devices For Training Deep Learning Models
ID_Doc 13852
Authors Prashanthi S.K.; Kesanapalli S.A.; Simmhan Y.
Year 2022
Published Proceedings of the ACM on Measurement and Analysis of Computing Systems, 6, 3
DOI http://dx.doi.org/10.1145/3570604
Abstract Deep Neural Networks (DNNs) have had a significant impact on domains like autonomous vehicles and smart cities through low-latency inferencing on edge computing devices close to the data source. However, DNN training on the edge is poorly explored. Techniques like federated learning and the growing capacity of GPU-accelerated edge devices like NVIDIA Jetson motivate the need for a holistic characterization of DNN training on the edge. Training DNNs is resource-intensive and can stress an edge's GPU, CPU, memory and storage capacities. Edge devices also have different resources compared to workstations and servers, such as slower shared memory and diverse storage media. Here, we perform a principled study of DNN training on individual devices of three contemporary Jetson device types: AGX Xavier, Xavier NX and Nano for three diverse DNN model - dataset combinations. We vary device and training parameters such as I/O pipelining and parallelism, storage media, mini-batch sizes and power modes, and examine their effect on CPU and GPU utilization, fetch stalls, training time, energy usage, and variability. Our analysis exposes several resource inter-dependencies and counter-intuitive insights, while also helping quantify known wisdom. Our rigorous study can help tune the training performance on the edge, trade-off time and energy usage on constrained devices, and even select an ideal edge hardware for a DNN workload, and, in future, extend to federated learning too. As an illustration, we use these results to build a simple model to predict the training time and energy per epoch for any given DNN across different power modes, with minimal additional profiling. © 2022 ACM.
Author Keywords dnn training; edge accelerators; performance characterization


Similar Articles


Id Similarity Authors Title Published
13850 View0.964Prashanthi S.K.; Kesanapalli S.A.; Simmhan Y.Characterizing The Performance Of Accelerated Jetson Edge Devices For Training Deep Learning ModelsSIGMETRICS 2023 - Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (2023)
13851 View0.964Prashanthi S.K.; Kesanapalli S.A.; Simmhan Y.Characterizing The Performance Of Accelerated Jetson Edge Devices For Training Deep Learning ModelsPerformance Evaluation Review, 51, 1 (2023)
11308 View0.897Gutierrez-Torre A.; Bahadori K.; Baig S.-U.-R.; Iqbal W.; Vardanega T.; Berral J.L.; Carrera D.Automatic Distributed Deep Learning Using Resource-Constrained Edge DevicesIEEE Internet of Things Journal, 9, 16 (2022)
21859 View0.881Xue F.; Fang W.; Xu W.; Wang Q.; Ma X.; Ding Y.Edgeld: Locally Distributed Deep Learning Inference On Edge Device ClustersProceedings - 2020 IEEE 22nd International Conference on High Performance Computing and Communications, IEEE 18th International Conference on Smart City and IEEE 6th International Conference on Data Science and Systems, HPCC-SmartCity-DSS 2020 (2020)
6307 View0.864Zhou L.; Samavatian M.H.; Bacha A.; Majumdar S.; Teodorescu R.Adaptive Parallel Execution Of Deep Neural Networks On Heterogeneous Edge DevicesProceedings of the 4th ACM/IEEE Symposium on Edge Computing, SEC 2019 (2019)
38056 View0.861Tang Y.; Jones A.K.; Xiong J.; Zhou P.; Hu J.Mtrain: Enable Efficient Cnn Training On Heterogeneous Fpga-Based Edge ServersIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2025)
41334 View0.86Xiao D.; Wang X.; Yang Z.; Huang C.Partial Distributed Deep Learning Inference Model For Image Based Edge Device ClusterProceedings of 2024 8th International Conference on Electronic Information Technology and Computer Engineering, EITCE 2024 (2025)
13948 View0.852Zhang Z.; Li F.; Lin C.; Wen S.; Liu X.; Liu J.Choosing Appropriate Ai-Enabled Edge Devices, Not The Costly OnesProceedings of the International Conference on Parallel and Distributed Systems - ICPADS, 2021-December (2021)
20553 View0.852Li Q.; Huang L.; Tong Z.; Du T.-T.; Zhang J.; Wang S.-C.Dissec: A Distributed Deep Neural Network Inference Scheduling Strategy For Edge ClustersNeurocomputing, 500 (2022)