skip to main content

Rigorous Systems Research Group (RSRG) Seminar

Friday, November 6, 2015
10:00am to 11:00am
Add to Cal
Annenberg 213
Scheduling in Hadoop clusters: Novel algorithms, state space collapse and delay minimization in heavy traffic
Yi Lu, Electrical & Computer Engineering, University of Illinois Urbana-Champaign,

A fundamental problem to all data-parallel applications is data locality. There are multiple levels of data locality, as a data block can be accessed in local memory or disk, within a rack or a data center, or across data centers. Scheduling with data locality is an affinity scheduling problem with an explosive number of task types and unknown task arrival rates. As a result, existing algorithms do not apply and the recently proposed JSQ-MaxWeight (Wang et. al. 2014) algorithm for two-level locality is delay-optimal only for a special heavy traffic scenario.

We found that for two-level locality, a simple priority algorithm is delay-optimal for all heavy traffic scenarios. With multi-level locality, we propose a novel algorithm. The key insight is that instead of focusing on scheduling alone, what if we co-design routing and scheduling? The insight is potentially applicable to affinity scheduling in general. 

For both algorithms, we provide rigorous proofs for heavy-traffic optimality that use an ideal load decomposition and an interesting state space collapse. We implemented our algorithm in Hadoop clusters, and showed that it accelerates the data-processing phase of jobs by 11 times with hot-spots and 2.4 times without hot-spots.

 

 

For more information, please contact Sydney Garstang by email at [email protected].