Technical Program

Paper Detail

Paper IDL.3.2
Paper Title Communication Efficient and Byzantine Tolerant Distributed Learning
Authors Avishek Ghosh, University of California, Berkeley, United States; Raj Kumar Maity, University of Massachusetts, Amherst, United States; Swanand Kadhe, University of California, Berkeley, United States; Arya Mazumdar, University of Massachusetts, Amherst, United States; Kannan Ramchandran, University of California, Berkeley, United States
Session L.3: Distributed Learning Performance Analysis
Presentation Lecture
Track Statistics and Learning Theory
Manuscript  Click here to download the manuscript
Virtual Presentation  Click here to watch in the Virtual Symposium
Abstract We develop a communication-efficient distributed learning algorithm that is robust against Byzantine worker machines. We propose and analyze a distributed gradient-descent algorithm that performs a simple thresholding based on gradient norms to mitigate Byzantine failures. We show the (statistical) error-rate of our algorithm matches that of Yin et al., 2018, which uses more complicated schemes (like coordinate-wise median or trimmed mean) and thus optimal. Furthermore, for communication efficiency, we consider a generic class of $\delta$-approximate compressors from Karimireddy et al., 2019 that encompasses sign-based compressors and top-$k$ sparsification. Our algorithm uses compressed gradients and gradient norms for aggregation and Byzantine removal respectively. We establish the statistical error rate of the algorithm for arbitrary (convex or non-convex) smooth loss function. We show that, in certain regime of $\delta$, the rate of convergence is not affected by the compression operation. We have experimentally validated our results and shown good performance in convergence for convex (least-square regression) and non-convex (neural network training) problems.

Plan Ahead


2021 IEEE International Symposium on Information Theory

11-16 July 2021 | Melbourne, Victoria, Australia

Visit Website!