Gradient coding -- When coding theory and distributed computing meet
Gradient Descent, and its variants, are a popular method for solving empirical risk minimization problems in machine learning. However, if the size of the training set is large, a computational bottleneck is the computation of the gradient, and hence, it is common to distribute the training set among worker nodes. Doing this in a synchronous fashion faces yet another challenge of stragglers (i.e., slow or unavailable nodes), which might cause a considerable delay. Hence, schemes for mitigation of stragglers are essential.
It was recently shown by Tandon et al. that stragglers can be avoided by carefully assigning redundant computations to the worker nodes and coding across partial gradients, and a randomized construction for the coding matrix was given. In this talk we present a more efficient deterministic scheme by employing cyclic MDS codes over the real or complex field. In addition, we propose to replace the exact computation of the gradient with an approximate one; a technique which drastically increases the straggler tolerance, and stems from adjacency matrices of expander graphs.