Two New Bootstrap Methods for High-Dimensional and Large-Scale Data
Bootstrap methods are among the most broadly applicable tools for statistical inference and uncertainty quantification. Although these methods have an extensive literature, much remains to be understood about their applicability in modern settings, where observations are high-dimensional, or where the quantity of data outstrips computational resources. In this talk, I will present a couple of new bootstrap methods that are tailored to these settings. First, I will discuss the topic of "spectral statistics" arising from high-dimensional sample covariance matrices, and describe a method for approximating the distributions of such statistics. Second, in the context of large-scale data, I will discuss a more unconventional application of the bootstrap -- dealing with the tradeoff between accuracy and computational cost for randomized numerical linear algebra. This will include joint work from a paper with Alexander Aue and Andrew Blandino; https://arxiv.org/abs/1709.08251 (to appear at Biometrika), and a paper with Michael Mahoney, and Shusen Wang; https://arxiv.org/abs/1708.01945 (to appear at JMLR).