Paper ID | L.6.3 | ||
Paper Title | On Byzantine-Resilient High-Dimensional Stochastic Gradient Descent | ||
Authors | Deepesh Data, Suhas Diggavi, University of California, Los Angeles, United States | ||
Session | L.6: Gradient-Based Distributed Learning | ||
Presentation | Lecture | ||
Track | Statistics and Learning Theory | ||
Manuscript | Click here to download the manuscript | ||
Virtual Presentation | Click here to watch in the Virtual Symposium | ||
Abstract | We study stochastic gradient descent (SGD) in the master-worker architecture under Byzantine attacks. Building upon the recent advances in algorithmic high-dimensional robust statistics, in each SGD iteration, master employs a non-trivial decoding to estimate the true gradient from the unbiased stochastic gradients received from workers, some of which may be corrupt. We provide convergence analyses for both strongly-convex and non-convex smooth objectives under standard SGD assumptions. We can control the approximation error of our solution in both these settings by the mini-batch size of stochastic gradients; and we can make the approximation error as small as we want, provided that workers use a sufficiently large mini-batch size. Our algorithm can tolerate less than $\frac{1}{3}$ fraction of Byzantine workers. It can approximately find the optimal parameters in the strongly-convex setting {\em exponentially fast}, and reaches to an approximate stationary point in the non-convex setting with {\em linear speed}, i.e., with a rate of $\frac{1}{T}$, thus, matching the convergence rates of vanilla SGD in the Byzantine-free setting. |
Plan Ahead
2021 IEEE International Symposium on Information Theory
11-16 July 2021 | Melbourne, Victoria, Australia