Using virtual clusters to decouple computation and data management in high throughput analysis applications
Simone Leo,
Paolo Anedda,
Massimo Gaggero,
Gianluigi Zanetti
Proceedings Of The 18th Euromicro Conference On Parallel, Distributed And Network-Based Processing, page 411--415 - 2010
The rapid growth in the throughput to cost ratio of experimental data production technologies is generating vast amounts of scientific data, often organized into "large" objects (genomes, CT-scans) exhibiting complex internal structures. Frequently, datasets must be shared between multiple research groups interested not only in the final results, but also in how they are produced. The practical difficulties of moving terabytes or more of data across the network, as well as the need to maintain a clear separation between software stack and storage infrastructure, are thus raising interest in the use of virtual clusters for HPC and data intensive applications. In this paper we employ a MapReduce implementation of an image analysis pipeline used by deep sequencing platforms to analyse different virtual cluster scenarios and their impact on system performance.
BibTex references
@InProceedings{LAGZ10a,
author = {Leo, S. and Anedda, P. and Gaggero, M. and Zanetti, G.},
title = {Using virtual clusters to decouple computation and data management in high throughput analysis applications},
booktitle = {Proceedings Of The 18th Euromicro Conference On Parallel, Distributed And Network-Based Processing},
pages = {411--415},
year = {2010},
editor = {M. Danelutto et al.},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
note = {idxproject: CYBERSAR},
keywords = {Virtual cluster,Data-driven application,HPC},
url = {http://www.computer.org/portal/web/csdl/doi/10.1109/PDP.2010.29
}
Other publications in the database