A Distributed and High-Throughput Short Read Processing Suite

Luca Pireddu, Simone Leo, Gianluigi Zanetti
Misc - july 2011
Download the publication : poster.pdf [5Mo]  
Seal is a suite of open source tools for short read processing designed for high-throughput sequencing operations. The suite currently includes scalable, distributed tools for: demultiplexing output from Illumina multiplexed sequencing runs; read filtering and format conversion; read mapping (based on the popular BWA aligner); duplicate read identification and removal; sorting read mappings. These tools are designed following the MapReduce scalable computing model and have a throughput that scales linearly with the number of computing nodes, providing a solution that can grow in capacity with the amount of data to be processed. Seal leverages the Hadoop open source MapReduce distributed computing platform to provide resilience to node failures and transient events such as peaks in cluster load. In conclusion, Seal provides tools that can harness all available computational resources to efficiently process large amounts of data with a limited amount of operator effort. The Seal suite is currently used to implement most of the production pipeline at the CRS4 Sequencing and Genotyping Platform, currently processing data from 6 Illumina sequencing machines. Seal is available online at

BibTex references

  author       = {Pireddu, L. and Leo, S. and Zanetti, G.},
  title        = {A Distributed and High-Throughput Short Read Processing Suite},
  month        = {july},
  year         = {2011},
  type         = {poster},
  keywords     = {bioinformatics,sequencing,hadoop},
  url          = {},

Other publications in the database

» Luca Pireddu
» Simone Leo
» Gianluigi Zanetti