Paper ID | L.3.2 | ||
Paper Title | Communication Efficient and Byzantine Tolerant Distributed Learning | ||
Authors | Avishek Ghosh, University of California, Berkeley, United States; Raj Kumar Maity, University of Massachusetts, Amherst, United States; Swanand Kadhe, University of California, Berkeley, United States; Arya Mazumdar, University of Massachusetts, Amherst, United States; Kannan Ramchandran, University of California, Berkeley, United States | ||
Session | L.3: Distributed Learning Performance Analysis | ||
Presentation | Lecture | ||
Track | Statistics and Learning Theory | ||
Manuscript | Click here to download the manuscript | ||
Virtual Presentation | Click here to watch in the Virtual Symposium | ||
Abstract | We develop a communication-efficient distributed learning algorithm that is robust against Byzantine worker machines. We propose and analyze a distributed gradient-descent algorithm that performs a simple thresholding based on gradient norms to mitigate Byzantine failures. We show the (statistical) error-rate of our algorithm matches that of Yin et al., 2018, which uses more complicated schemes (like coordinate-wise median or trimmed mean) and thus optimal. Furthermore, for communication efficiency, we consider a generic class of $\delta$-approximate compressors from Karimireddy et al., 2019 that encompasses sign-based compressors and top-$k$ sparsification. Our algorithm uses compressed gradients and gradient norms for aggregation and Byzantine removal respectively. We establish the statistical error rate of the algorithm for arbitrary (convex or non-convex) smooth loss function. We show that, in certain regime of $\delta$, the rate of convergence is not affected by the compression operation. We have experimentally validated our results and shown good performance in convergence for convex (least-square regression) and non-convex (neural network training) problems. |
Plan Ahead
2021 IEEE International Symposium on Information Theory
11-16 July 2021 | Melbourne, Victoria, Australia