Suspending, migrating and resuming hpc virtual clusters
Paolo Anedda,
Simone Leo,
Simone Manca,
Massimo Gaggero,
Gianluigi Zanetti
Future Generation Computer Systems, Volume 26, Number 8, page 1063--1072 - october 2010
A systematic study of issues related to suspending, migrating and resuming virtual clusters for data-driven HPC applications is presented. The interest is focused on nontrivial virtual clusters, that is where the running computation is expected to be coordinated and strongly coupled. It is shown that this requires that all cluster level operations,such as start and save, should be performed as synchronously as possible on all virtual cluster nodes, introducing the need of barriers at the virtual cluster computing meta-level. Once a synchronization mechanism is provided, and appropriate transport strategies have been setup, it is possible to suspend,migrate and resume whole virtual clusters composed of "heavy" (4 GB RAM, 6 GB disk images) virtual machines in times of the order of few minutes without disrupting parallel computation -- albeit of the MapReduce type -- running inside them. The approach is intrinsically parallel, and it should scale without problems to larger size virtual clusters.
BibTex references
@Article{ALMGZ10a,
author = {Anedda, P. and Leo, S. and Manca, S. and Gaggero, M. and Zanetti, G.},
title = {Suspending, migrating and resuming hpc virtual clusters},
journal = {Future Generation Computer Systems},
number = {8},
volume = {26},
pages = {1063--1072},
month = {october},
year = {2010},
note = {idxproject: CYBERSAR},
keywords = {Virtual cluster,Data-driven application,HPC},
url = {10.1016/j.future.2010.05.007
}
Other publications in the database