Sebastian Stitch (EPFL) - Error Feedback for Communication Efficient SGD
Error Feedback for Communication Efficient SGD
Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i.e. algorithms that leverage the compute power of many devices for training. The communication overhead is a key bottleneck that hinders perfect scalability. Various recent works proposed to use quantization or sparsification techniques to reduce the amount of data that needs to be communicated. We analyze Stochastic Gradient Descent (SGD) with k-sparsification (for instance top-k or random-k) and compression (for instance quantization) and show that these schemes converge at the same rate as vanilla SGD when equipped with error compensation (i.e. keeping track of accumulated errors in memory). That is, communication can be reduced by a factor of the dimension of the problem (sometimes even more) whilst still converging at the same rate.
Sebastian Stitch
I am working as a scientist at EPFL with Prof. Martin Jaggi in the Machine Learning and Optimization Laboratory (MLO). Research Interests include complexity analysis of (randomized) optimization algorithms, in serial, parallel and distributed settings and optimization algorithms for high-dimensional and/or structured problems.