Computing + Mathematical Sciences Faculty Candidate Seminar
New Systems, Algorithms, and Data Structures for High Availability
Abstract : Users of Internet services are increasingly intolerant of delays and outages, while demanding an increasingly rich and consistent online experience. One screenshot, and a website that is down or misbehaving spreads through the news like wildfire. Yet protecting against such failures -- whether due to misconfigurations, bugs, or even malice -- is notoriously expensive. At the same time, the pressure to increase availability pushes system designers to cut corners, sometimes with disastrous consequences.
In this talk, I will describe new systems and theory for delivering high availability to users, inspired by real-world debacles, some of which are unknown to the public. A unifying approach in my research is to integrate theoretical innovations into the final stages of system design, giving robust guarantees that are also practical. First, I will describe a system that allows any Internet service to tolerate arbitrary faults scalably on read-mostly workloads, as well as new algorithms to scale this fault tolerance to a million nodes on general workloads. Then, I will describe an indexing technique used by database systems to increase transaction concurrency, and theoretically explain why it failed for one company, leading to the design of new data structures that are being incorporated into algorithms textbooks. I will also briefly discuss my other work on high availability, such as a system for saving energy on enterprise desktops, and my ongoing work on a system for rich analysis of globally distributed, real-time data.
Contact: Lucinda Acosta at 4843 email@example.com