Frontiers in Computing and Mathematical Sciences
Monday, January 10, 2022
Welcome & Introduction
Adam Wierman, Caltech
Tractable Probabilistic Reasoning for Trustworthy AI
YooJung Choi, UCLA
Automated decision-making systems are increasingly being deployed in areas with personal and societal impacts: from financial services to healthcare and criminal justice. This led to growing interest and need for trustworthy AI and ML systems--that is, models that are robust, explainable, fair, and so on. It is important to note that these guarantees only hold with respect to a certain model of the world, with inherent uncertainties. In this talk, I will present how probabilistic modeling and inference, by incorporating a distribution, offer a principled way to handle different kinds of uncertainties when reasoning about decision-making system behaviors. For example, labels in training data may be biased; I will show that probabilistic circuits, a class of tractable probabilistic models (TPMs), can be effective in enforcing and auditing fairness properties by explicitly modeling a latent unbiased label. Another common source of uncertainty is missing values at prediction time, which also leads to fairness and robustness queries that account for this to be computationally hard inference tasks. I will also demonstrate how TPMs can again tractably answer these more complex queries. Finally, I will conclude with my future work towards a framework to flexibly reason about and enforce trustworthy AI/ML system behaviors.
Bridging Safety and Learning in Human-Robot Interaction
*Andrea Bajcsy, UC Berkeley
From autonomous cars in cities to mobile manipulators at home, robots must interact with people. What makes this hard is that human behavior—especially when interacting with other agents—is vastly complex, varying between individuals, environments, and over time. Thus, robots rely on data and machine learning throughout the design process and during deployment to build and refine models of humans. However, by blindly trusting their data-driven human models, today's robots confidently plan unsafe behaviors around people, resulting in anything from miscoordination to dangerous collisions.
My research aims to ensure safety in human-robot interaction, particularly when robots learn from and about people. In this talk, I will discuss how treating robot learning algorithms as dynamical systems driven by human data enables safe human-robot interaction. I will first introduce a Bayesian monitor which infers online if the robot's learned human model can evolve to well-explain observed human data. I will then discuss how control-theoretic tools enable us to formally quantify what the robot could learn online from human data and how quickly it could learn it. Coupling these ideas with robot motion planning algorithms, I will demonstrate how robots can safely and automatically adapt their behavior based on how trustworthy their learned human models are. I will end this talk by taking a step back and raising the question: "What is the ‘right' notion of safety when robots interact with people?" and discussing opportunities for how rethinking our notions of safety can capture more subtle aspects of human-robot interaction.
Learning to Generate Data by Estimating Gradients of the Data Distribution
*Yang Song, Stanford
Generating data with complex patterns, such as images, audio, and molecular structures, requires fitting very flexible statistical models to the data distribution. Even in the age of deep neural networks, building such models is difficult because they typically require an intractable normalization procedure to represent a probability distribution. To address this challenge, I propose to model the vector field of gradients of the data distribution (known as the score function), which does not require normalization and therefore can take full advantage of the flexibility of deep neural networks. I will show how to (1) estimate the score function from data with principled statistical methods, (2) generate new data using stochastic differential equations and numerical techniques, and even (3) evaluate probabilities as in a traditional statistical model. The resulting method, called score-based generative modeling, achieves record-breaking performance in applications including image synthesis, image compression, text-to-speech generation, time series prediction, and point cloud generation, challenging the long-time dominance of generative adversarial networks (GANs) on many of these tasks. Furthermore, unlike GANs, score-based generative models are suitable for Bayesian reasoning tasks such as solving ill-posed inverse problems, and I have demonstrated their superior performance on examples like sparse-view computed tomography and accelerated magnetic resonance imaging. Finally, I will discuss how score-based generative modeling opens up new opportunities for solving data generation problems in various disciplines of science and engineering.
A New Toolbox for Scheduling Theory
*Ziv Scully, Carnegie Mellon
Queueing-induced delays due to resource contention are ubiquitous in many domains, including computer systems, service systems, supply chains, and transportation. One of the main tools system designers have to reduce delay is scheduling. However, scheduling is complicated; designing policies and evaluating their performance impact are difficult tasks. The role of queueing theory, and in particular scheduling theory, is to provide a rigorous mathematical basis for understanding scheduling, including evaluating policy performance and guiding policy design.
Unfortunately, the scope of today's queueing and scheduling theory fails to address many practical concerns of today's computer systems. For example, scheduling theory seldom explicitly models nontrivial preemption limitations, instead assuming that preemption is either always or never possible. As another example, there is very little theory for scheduling in multiserver queues.
We present two new, broadly applicable tools that greatly expand the reach of scheduling theory, applying each to solve multiple open problems.
The first tool, called "SOAP", is a new unifying theory of single-server scheduling, specifically the M/G/1 queueing model. SOAP characterizes the delay distribution of a broad space of policies, most of which have never been analyzed before. Among the newly analyzed policies are the Gittins index policy, which minimizes mean delay in low-information settings, and many policies with nontrivial preemption limitations.
The second tool, called "KIWI", is a new queueing identity that complements Little's law. KIWI enables a new method of analyzing complex queueing systems, most notably the M/G/k multiserver queue, by relating them to simpler systems, such as the M/G/1. This results in the first delay bounds for the M/G/k under scheduling policies like SRPT (shortest remaining processing time) and the Gittins index policy.
Two Topics in Monte Carlo for Scientific Computing
Michael Lindsey, NYU
In the first part of the talk, I consider the problem of sampling from a probability distribution with known density, which arises almost ubiquitously in the mathematical sciences. The most generic and widely-used method for this task is Markov chain Monte Carlo (MCMC), though this method typically suffers from extremely long autocorrelation times when the target density has many modes that are separated by regions of low probability. I will present an ensemble MCMC approach with interacting walkers designed to overcome this obstacle.
In the second part of the talk, I consider the problem of computing the ground-state energy and wavefunction of a quantum many-body system, i.e., the lowest eigenpair of an exponentially high-dimensional Hermitian operator. I present a new optimization approach within the framework of variational Monte Carlo (VMC), in which the problem is solved by stochastic optimization over a parametric class of wavefunctions. Of particular interest are recently introduced neural-network-based parametrizations for which our approach yields state-of-the-art results. I will also discuss a new optimization approach for simultaneously computing the lowest several eigenpairs while avoiding intractable wavefunction orthogonality constraints.
Transport and Beyond: Efficient Optimization Over Probability Distributions
Jason Altschuler, MIT
The core of classical optimization focuses on the setting where decision variables are vectors in R^d. However, modern applications throughout machine learning, applied mathematics, and engineering demand high-dimensional optimization problems where decision variables are probability distributions. Can such optimization problems be solved efficiently? This talk presents two vignettes in this direction.
The first vignette concerns entropic optimal transport and related problems including Min-Mean-Cycle and Matrix Preconditioning. We present approximation algorithms that are faster in both theory and practice, yielding near-linear runtimes in general, and even faster runtimes in commonly arising geometric settings. The second vignette concerns Wasserstein barycenters and more generally, multimarginal optimal transport problems. Despite considerable attention, even in dimension as low as 2, it remained unknown whether Wasserstein barycenters can be computed in polynomial time. We uncover the subtle dependence of the answer on the dimension: yes in fixed dimension and no in general. Taken together, these two vignettes illustrate the growing interface of optimization, probability, and efficient algorithms.
Monday, February 7, 2022
Adam Wierman, Computing and Mathematical Sciences, Caltech
Spectral Independence: A New Tool to Analyze Markov Chains
Kuikui Liu, University of Washington
Sampling from high-dimensional probability distributions is a fundamental and challenging problem encountered throughout science and engineering. One of the most popular approaches to tackle such problems is the Markov chain Monte Carlo (MCMC) paradigm. While MCMC algorithms are often simple to implement and widely used in practice, analyzing the fidelity of the generated samples remains a difficult problem.
In this talk, I will describe a new technique called "spectral independence" that my collaborators and I developed over the last couple of years to analyze Markov chains. This technique has allowed us to break long-standing barriers and resolve several decades-old open problems in MCMC theory. Our work has opened up numerous connections with other areas of computer science, mathematics, and statistical physics, leading to dozens of new developments as well as exciting new directions of inquiry. I will then discuss how these connections have allowed us to "unify" nearly all major algorithmic paradigms for approximate counting and sampling. Finally, I will conclude with a wide variety of future directions and open problems at the frontier of this research.
Training of Deep Neural Networks: Neural Tangent Kernel and Beyond
Arthur Jacot, École Polytechnique Fédérale de Lausanne (EPFL)
Modern deep learning has popularized the use of very large neural networks, but the theoretical tools to study such networks are still lacking. The Neural Tangent Kernel (NTK) describes how the output neutrons evolve during training, allowing a precise description of the convergence and generalization of deep neural networks (DNNs). In the infinite width limit (when the number of neutrons in each hidden layer of the network grow to infinity) the NTK converges to a deterministic and fixed limit ensuring exponential convergence to a global minimum. The NTK allows us to relate the generalization properties of DNNs with those of kernel methods, which are already well studied. The NTK also plays a role in understanding the so-called double-descent descent phenomenon observed in DNNs, wherein the generalization error exhibits a surprising non-monotonous behavior as a function of the number of parameters.
How to handle Biased Data and Multiple Agents in Machine Learning?
Manolis Zampetakis, UC Berkeley
Modern machine learning (ML) methods commonly postulate strong assumptions such as: (1) access to data that adequately captures the application environment, (2) the goal is to optimize the objective function of a single agent, assuming that the application environment is isolated and is not affected by the outcome chosen by the ML system. In this talk I will present methods with theoretical guarantees that are applicable in the absence of (1) and (2) as well as corresponding fundamental lower bounds. In the context of (1) I will focus on how to deal with truncation and self-selection bias and in the context of (2) I will present a foundational comparison between min-max and single objective optimization.
Toward Mathematical Understanding of Real-life Deep Learning
Zhiyuan Li, Princeton University
There is great interest in developing a mathematical understanding of the tremendous success of deep learning. Most of this understanding has been done in simplified settings (depth 2 or 3; NTK regime). This talk presents my recent works providing a mathematical understanding of real-life nets and losses, incorporating the effect of normalization, architectural features, stochasticity, and finite learning rate(LR). It leverages insights from continuous mathematics (including Stochastic Differential Equation(SDE)) which I will use to show interesting new mechanisms for implicit regularization during training. I will finish by presenting a new practical advance from our theoretical insights: a robust variant of BERT (a language model at the heart of the ongoing revolution in Natural Language Processing) called SIBERT that uses a new scale-invariance architecture and is trainable with vanilla SGD.
Foundations of Cryptographic Proof Systems
Alex Lombardi, MIT
One of computer science's greatest insights has been in understanding the power and versatility of *proofs*, which were revolutionized in the 1980s to utilize *interaction* as well as other resources such as randomization and computational hardness. Today, they form the backbone of both theoretical and practical cryptography and are simultaneously the source of deep connections to areas such as complexity theory, game theory, and quantum computation.
In this talk, I will introduce general-purpose tools, techniques, and abstractions for two key aspects of cryptographic proof systems that have been poorly understood for decades:
1) Can we remove interaction from interactive proofs? Already in the 1980s, Fiat and Shamir proposed a heuristic *but unproven* methodology for removing interaction from interactive proofs, which is now ubiquitous and essential for practical applications. However, it remained open for over 30 years to prove the security of this transformation in essentially any setting of interest.
My work on the Fiat-Shamir transform has led to resolutions to many long-standing open problems, including (i) building non-interactive zero knowledge proof systems based on lattice cryptography, (ii) establishing the existence of highly efficient and succinct non-interactive proof systems, and (iii) demonstrating that foundational protocols from the 80s fail to compose in parallel.
2) Are classical interactive protocols secure against quantum computers? At its heart, the problem of analyzing and ruling out quantum attacks on cryptographic protocols is the issue of "rewinding." The inability to rewind a quantum attack stems from the no-cloning theorem, a fundamental property of quantum information. As a result, very few classical protocols were known to be secure against quantum attacks.
In a recent work, I showed how to overcome these difficulties and settle many foundational questions on post-quantum cryptographic proof systems. Our main technique is showing how to efficiently extract certain pieces of (classical) information from a quantum attacker without disturbing its internal state.
Perceiving the World in 2D and 3D
*Georgia Gkioxari, Facebook AI Research
Images are powerful storytellers as they capture events, memorable or mundane, from our everyday lives. Humans have the ability to perceive images effortlessly but for machines to do the same, they need to build an understanding of the world, a world composed of complex objects, humans and their rich interactions. In this talk, I will present my work towards enabling machines to recognize and localize objects and their interactions from images, work that is powering products in industry used by millions of people, such as Portal. The advances in 2D visual understanding are unprecedented but the world is 3D and objects have 3D properties which modern recognition models ignore. Toward 3D perception, I will present my work on inferring 3D object shapes from real-world images and understanding 3D scenes via multi-view 2D supervision. To this end, I will present PyTorch3D, our efficient and modular 3D deep learning library which efficiently fuses advances in deep learning with geometry and is widely adopted within the academic and industry research community.