Smart City Gnosys

Smart city article details

Title Preemptive Scheduling For Distributed Machine Learning Jobs In Edge-Cloud Networks
ID_Doc 42904
Authors Wang N.; Zhou R.; Jiao L.; Zhang R.; Li B.; Li Z.
Year 2022
Published IEEE Journal on Selected Areas in Communications, 40, 8
DOI http://dx.doi.org/10.1109/JSAC.2022.3180772
Abstract Recent advances in 5G and edge computing enable rapid development and deployment of edge-cloud systems, which are ideal for delay-sensitive machine learning (ML) applications such as autonomous driving and smart city. Distributed ML jobs often need to train a large model with enormous datasets, which can only be handled by deploying a distributed set of workers in an edge-cloud system. One common approach is to employ a parameter server (PS) architecture, in which training is carried out at multiple workers, while PSs are used for aggregation and model updates. In this architecture, one of the fundamental challenges is how to dispatch ML jobs to workers and PSs such that the average job completion time (JCT) can be minimized. In this work, we propose a novel online preemptive scheduling framework to decide the location and the execution time window of concurrent workers and PSs upon each job arrival. Specifically, our proposed scheduling framework consists of: i) a job dispatching and scheduling algorithm that assigns each ML job to workers and decides the schedule to train each data chunk; ii) a PS assignment algorithm that determines the placement of PS. We prove theoretically that our proposed algorithm is D max(1+1) -competitive with (1 + )-speed augmentation, where Dmax is the maximal number of data chunks in any job. Extensive testbed experiments and trace-driven simulations show that our algorithm can reduce the average JCT by up to 30% compared with state-of-the-art baselines. © 1983-2012 IEEE.
Author Keywords Distributed machine learning; Edge-cloud networks; Parameter server architecture; Preemptive scheduling


Similar Articles


Id Similarity Authors Title Published
20985 View0.914Zhou R.; Wang N.; Huang Y.; Pang J.; Chen H.Dps: Dynamic Pricing And Scheduling For Distributed Machine Learning Jobs In Edge-Cloud NetworksIEEE Transactions on Mobile Computing, 22, 11 (2023)
34376 View0.885Wang H.; Chen X.; Xu H.; Liu J.; Huang L.Joint Job Offloading And Resource Allocation For Distributed Deep Learning In Edge ComputingProceedings - 21st IEEE International Conference on High Performance Computing and Communications, 17th IEEE International Conference on Smart City and 5th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2019 (2019)
21112 View0.876Zhang R.; Shen G.; Gong L.; Guo C.Dsana: A Distributed Machine Learning Acceleration Solution Based On Dynamic Scheduling And Network AccelerationProceedings - 2020 IEEE 22nd International Conference on High Performance Computing and Communications, IEEE 18th International Conference on Smart City and IEEE 6th International Conference on Data Science and Systems, HPCC-SmartCity-DSS 2020 (2020)
57704 View0.87Zhu K.; Zhang Z.; Sun F.Toward Intelligent Cooperation At The Edge: Improving The Qos Of Workflow Scheduling With The Competitive Cooperation Of Edge ServersWireless Networks, 30, 6 (2024)
24315 View0.865Pang J.; Han Z.; Zhou R.; Zhang R.; Lui J.C.S.; Chen H.Eris: An Online Auction For Scheduling Unbiased Distributed Learning Over Edge NetworksIEEE Transactions on Mobile Computing, 23, 6 (2024)
34323 View0.861Chen B.; Yang Y.; Xu M.Job-Aware Communication Scheduling For Dml Training In Shared ClusterProceedings - 2020 IEEE 22nd International Conference on High Performance Computing and Communications, IEEE 18th International Conference on Smart City and IEEE 6th International Conference on Data Science and Systems, HPCC-SmartCity-DSS 2020 (2020)
40078 View0.859Han Z.; Zhou R.; Pang J.; Cao Y.; Tan H.Online Scheduling Unbiased Distributed Learning Over Wireless Edge NetworksProceedings of the International Conference on Parallel and Distributed Systems - ICPADS, 2021-December (2021)
41334 View0.858Xiao D.; Wang X.; Yang Z.; Huang C.Partial Distributed Deep Learning Inference Model For Image Based Edge Device ClusterProceedings of 2024 8th International Conference on Electronic Information Technology and Computer Engineering, EITCE 2024 (2025)
18096 View0.856Qadeer A.; Lee M.J.Deep-Deterministic Policy Gradient Based Multi-Resource Allocation In Edge-Cloud System: A Distributed ApproachIEEE Access, 11 (2023)
28911 View0.856Mdemaya G.B.J.; Sindjoung M.L.F.; Ndadji M.M.Z.; Velempini M.Hercule: High-Efficiency Resource Coordination Using Kubernetes And Machine Learning In Edge Computing For Improved Qos And QoeIEEE Access, 13 (2025)