skip to main content

DOLCIT/RSRG Seminar

Wednesday, November 14, 2018
12:00pm to 1:00pm
Add to Cal
Annenberg 213
Protein and Small Molecule Engineering by Machine Learning (and Other Topics)
Jennifer Listgarten, Electrical Engineering and Computer Science & Center for Computational Biology, UC Berkeley,

With the advent of more and more high-throughput assays to measure protein properties of interest such as binding, expression, fluorescence, etc., the time for machine learning to act synergistically with protein design is here. In particular, one can obtain a proxy assay by building a predictive machine learning model--a stochastic oracle---from the "input design space" to the properties of interest. For example, an oracle might be a neural network model which predicts protein fluorescence from DNA sequence, but could just as well be any black box, human-derived, or physics-based. However, inverting such a model to perform "input design"---that is, finding the sequence to satisfy stipulated property desiderata, is a dramatically more difficult problem owing to the combinatoric search space. Furthermore, one should account for uncertainty in the oracle: in general, the oracle should provide different levels of confidence according to its knowledge of different parts of the search space. I will present our new method, DbAS, whose goal is precisely to tackle this problem. One can think of our approach as a computational directed evolution. If time permits, I will also discuss some other work at the intersection of statistical genetics and medical imaging, and CRISPR guide design.