CRS4

Dynamic Hadoop clusters on HPC scheduling systems

Michele Muggiri, Luca Pireddu, Simone Leo, Gianluigi Zanetti
Misc - august 2013
Download the publication : presentation.pdf [1Mo]  
With the rising interest in Hadoop, there is an increasingly common requirement for HPC centers to be able to run Hadoop jobs on shared clusters where other types of workloads (e.g., MPI applications) are also run on a regular basis. In this work, we present a strategy for enabling flexible and efficient Hadoop node allocation that allows Hadoop jobs to coexist with standard HPC workloads. Our approach is based on decoupling computing and storage resources: we use a statically allocated storage infrastructure, which may be a Hadoop Distributed File System (HDFS) or a general-purpose shared storage, and allocate and deallocate Hadoop computing nodes dynamically through the cluster's standard job queuing system. Our solution works with popular HPC schedulers, such as Open Grid Engine, and is being used in production at CRS4 to run bioinformatics pipelines and other workloads on a cluster shared with other HPC jobs.

BibTex references

@Misc{MPLZ13,
  author       = {Muggiri, M. and Pireddu, L. and Leo, S. and Zanetti, G.},
  title        = {Dynamic Hadoop clusters on HPC scheduling systems},
  month        = {august},
  year         = {2013},
  keywords     = {hadoop, hpc, scheduling},
  url          = {https://publications.crs4.it/pubdocs/2013/MPLZ13},
}

Other publications in the database

» Michele Muggiri
» Luca Pireddu
» Simone Leo
» Gianluigi Zanetti