Biodoop: bioinformatics on hadoop

Simone Leo, Federico Andrea Santoni, Gianluigi Zanetti

The 38th International Conference On Parallel Processing Workshops (ICPPW 2009), page 415--422 - 2009

Bioinformatics applications currently require both processing of huge amounts of data and heavy computation. Fulfilling these requirements calls for simple ways to implement parallel computing. MapReduce is a general-purpose parallelization model that seems particularly well-suited to this task and for which an open source implementation (Hadoop) is available. Here we report on its application to three relevant algorithms: BLAST, GSEA and GRAMMAR. The first is characterized by relatively low-weight computation on large data sets,while the second requires heavy processing of relatively small data sets. The third one can be considered as containing a mixture of these two computational flavors. Our results are encouraging and indicate that the framework could have a wide range of bioinformatics applications while maintaining good computational efficiency, scalability and ease of maintenance.

Références BibTex

@InProceedings{LSZ09a,
  author       = {Leo, S. and Santoni, F. and Zanetti, G.},
  title        = {Biodoop: bioinformatics on hadoop},
  booktitle    = {The 38th International Conference On Parallel Processing Workshops (ICPPW 2009)},
  pages        = {415--422},
  year         = {2009},
  publisher    = {IEEE Computer Society},
  address      = {Los Alamitos, CA, USA},
  note         = {idxproject: CYBERSAR},
  keywords     = {bioinformatics,hadoop},
  url          = {http://www.computer.org/portal/web/csdl/doi/10.1109/ICPPW.2009.37
}

Autres publications dans la base

» Simone Leo
» Federico Andrea Santoni
» Gianluigi Zanetti