CRS4

A Distributed and High-Throughput Short Read Processing Suite

Luca Pireddu, Simone Leo, Gianluigi Zanetti
Misc - july 2011
Télécharger la publication : poster.pdf [5Mo]  
Seal is a suite of open source tools for short read processing designed for high-throughput sequencing operations. The suite currently includes scalable, distributed tools for: demultiplexing output from Illumina multiplexed sequencing runs; read filtering and format conversion; read mapping (based on the popular BWA aligner); duplicate read identification and removal; sorting read mappings. These tools are designed following the MapReduce scalable computing model and have a throughput that scales linearly with the number of computing nodes, providing a solution that can grow in capacity with the amount of data to be processed. Seal leverages the Hadoop open source MapReduce distributed computing platform to provide resilience to node failures and transient events such as peaks in cluster load. In conclusion, Seal provides tools that can harness all available computational resources to efficiently process large amounts of data with a limited amount of operator effort. The Seal suite is currently used to implement most of the production pipeline at the CRS4 Sequencing and Genotyping Platform, currently processing data from 6 Illumina sequencing machines. Seal is available online at http://biodoop-seal.sourceforge.net/

Références BibTex

@Misc{PLZ11b,
  author       = {Pireddu, L. and Leo, S. and Zanetti, G.},
  title        = {A Distributed and High-Throughput Short Read Processing Suite},
  month        = {july},
  year         = {2011},
  type         = {poster},
  keywords     = {bioinformatics,sequencing,hadoop},
  url          = {https://publications.crs4.it/pubdocs/2011/PLZ11b},
}

Autres publications dans la base

» Luca Pireddu
» Simone Leo
» Gianluigi Zanetti