Special CMX Seminar
How Many Labels Do You Need For Semi-Supervised Learning?
Given a data set of which a small subset are labelled, the goal of semi-supervised learning is to find the unknown labels. A popular method is to minimise a discrete p-Dirichlet energy defined on a graph constructed from the data. As the size of the data set increases one hopes that solutions of the discrete problem converge to a continuum variational problem with the continuum p-Dirichlet energy. It follows from Sobolev regularity that one cannot impose constraints if p is less than the dimension of the data hence, in this regime, one must also increase the number of labels in order to avoid labels "dissappearing" in the limit. In this talk I will address the question of what is the minimal number of labels. To compare labelling functions on different domains we use a metric based on optimal transport which then allows for the application of methods from the calculus of variation, in particular Gamma-convergence, and methods from PDE's, such as constructing barrier functions in order to apply the maximum principle. We can further show rates of convergence. This is joint work with Jeff Calder (Minnesota) and Dejan Slepcev (CMU).