|Communication Efficient and Byzantine Tolerant Distributed Learning
|Avishek Ghosh, University of California, Berkeley, United States; Raj Kumar Maity, University of Massachusetts, Amherst, United States; Swanand Kadhe, University of California, Berkeley, United States; Arya Mazumdar, University of Massachusetts, Amherst, United States; Kannan Ramchandran, University of California, Berkeley, United States
|L.3: Distributed Learning Performance Analysis
|Statistics and Learning Theory
|Click here to download the manuscript
|Click here to watch in the Virtual Symposium
|We develop a communication-efficient distributed learning algorithm that is robust against Byzantine worker machines. We propose and analyze a distributed gradient-descent algorithm that performs a simple thresholding based on gradient norms to mitigate Byzantine failures. We show the (statistical) error-rate of our algorithm matches that of Yin et al., 2018, which uses more complicated schemes (like coordinate-wise median or trimmed mean) and thus optimal. Furthermore, for communication efficiency, we consider a generic class of $\delta$-approximate compressors from Karimireddy et al., 2019 that encompasses sign-based compressors and top-$k$ sparsification. Our algorithm uses compressed gradients and gradient norms for aggregation and Byzantine removal respectively. We establish the statistical error rate of the algorithm for arbitrary (convex or non-convex) smooth loss function. We show that, in certain regime of $\delta$, the rate of convergence is not affected by the compression operation. We have experimentally validated our results and shown good performance in convergence for convex (least-square regression) and non-convex (neural network training) problems.