|Hierarchical Coded Gradient Aggregation for Learning at the Edge
|Saurav Prakash, University of Southern California, United States; Amirhossein Reisizadeh, Ramtin Pedarsani, UC Santa Barbara, United States; Amir Salman Avestimehr, University of Southern California, United States
|L.6: Gradient-Based Distributed Learning
|Statistics and Learning Theory
|Click here to download the manuscript
|Click here to watch in the Virtual Symposium
|Client devices at the edge are generating increasingly large amounts of rich data suitable for learning powerful statistical models. However, privacy concerns and heavy communication load make it infeasible to move the client data to a centralized location for training. In many distributed learning setups, client nodes carry out gradient computations on their local data while the central master server receives the local gradients and aggregates them to take the global model update step. To guarantee robustness against straggling communication links, we consider a hierarchical setup with ne clients and nh reliable helper nodes that are available to aid in gradient aggregation at the master. To achieve resiliency against straggling client-to-helpers links, we propose two approaches leveraging coded redundancy. First is the Aligned Repetition Coding (ARC) that repeats gradient components on the helper links, allowing significant partial aggregations at the helpers, resulting in a helpers-to-master communication load (CHM) of O(nh). ARC however results in a client-to-helpers communication load (CEH) of Θ(nh), which is prohibitive for client nodes due to limited and costly bandwidth. We thus propose Aligned Minimum Distance Separable Coding (AMC) that achieves optimal CEH of Θ(1) for a given resiliency threshold by using MDS code over the gradient components, while achieving a CHM of O(ne).