Dynamic Hadoop clusters on HPC scheduling systems

Michele Muggiri, Luca Pireddu, Simone Leo, Gianluigi Zanetti

Misc - august 2013

Télécharger la publication :

With the rising interest in Hadoop, there is an increasingly common requirement for HPC centers to be able to run Hadoop jobs on shared clusters where other types of workloads (e.g., MPI applications) are also run on a regular basis. In this work, we present a strategy for enabling flexible and efficient Hadoop node allocation that allows Hadoop jobs to coexist with standard HPC workloads. Our approach is based on decoupling computing and storage resources: we use a statically allocated storage infrastructure, which may be a Hadoop Distributed File System (HDFS) or a general-purpose shared storage, and allocate and deallocate Hadoop computing nodes dynamically through the cluster's standard job queuing system. Our solution works with popular HPC schedulers, such as Open Grid Engine, and is being used in production at CRS4 to run bioinformatics pipelines and other workloads on a cluster shared with other HPC jobs.

Références BibTex

@Misc{MPLZ13,
  author       = {Muggiri, M. and Pireddu, L. and Leo, S. and Zanetti, G.},
  title        = {Dynamic Hadoop clusters on HPC scheduling systems},
  month        = {august},
  year         = {2013},
  keywords     = {hadoop, hpc, scheduling},
  url          = {https://publications.crs4.it/pubdocs/2013/MPLZ13},
}

Autres publications dans la base

» Michele Muggiri
» Luca Pireddu
» Simone Leo
» Gianluigi Zanetti