Applied Mathematics Colloquium

Monday May 7, 2012 4:15 PM

Statistics and Computation in the Age of Massive Data

Speaker: Michael Jordan, EECS & Statistics, UC Berkeley
Location: Annenberg 105
There are many issues remaining to be addressed, or even formulated, at the interface of statistics and computation. One way to capture the current state of affairs is the following: If we view data as a resource, how can it be that in many practical problems of interest we find ourselves embarassed by being given too much data? Our inferential procedures typically use polynomial amounts of time and space but that doesn't suffice; we need to be able to guarantee that on a fixed computational budget the statistical risk decreases as the number of data points grows (without bound). A general theory not yet being available, in this talk I present three vignettes that describe various lines of attack on the problem: one involving the bootstrap, another involving matrix completion algorithms and the third involving phylogenetic analysis in the regime of large numbers of taxa. All three vignettes involve divide-and-conquer strategies, with the third vignette being particularly interesting in this regard (divide-and-conquer arises from Poisson thinning). [Joint work with Alexandre Bouchard-Cote, Ariel Kleiner, Lester Mackey, Purna Sarkar and Ameet Talwalkar.]
Series Applied Mathematics Colloquium Series

Contact: Sydney Garstang at x4555 sydney@caltech.edu
For more information visit: http://www.acm.caltech.edu