Exploiting a large scale biodata management system to support NGS variant detection studies

Gianmauro Cuccuru, Paolo Uva, Stefano Onano, Rossano Atzeni, Simone Leo, Luca Lianas, Manuela Oppo, Luca Pireddu, Andrea Angius, Laura Crisponi, Gianluigi Zanetti, Giorgio Fotia
Misc - 2015
Tracing, analysing and updating the results of data processing protocols can significantly improve turn-around time and reproducibility when systematically dealing with any large scale NGS based clinical study. We present an infrastructure to automate and track all data-related procedures and clinical information at the CRS4 high-throughput sequencing platform, which is the largest in Italy (20 TB of raw sequencing data every ten days). To allow automated data processing this facility is directly interconnected to the CRS4 computational resources (3000 cores, 4.5 PB storage). Our infrastructure has been recently used in a clinical exome sequencing program for mutation identification in patients with syndromic intellectual disability, a large group of disorders with variable phenotypes. An average of 60 million or 100bp paired-end reads were generated per sample, with a mean 100-fold coverage across 62Mb of target regions. Sequences were automatically processed according to the GATK Best Practices, annotated and filtered by allele frequency and predicted impact to select likely pathogenic variants from an average of 170,000 variants per family. One of the core components of the CRS4 infrastructure is OMERO.biobank, a robust, extensible and scalable traceability biodata management system. This component provides a way to store, query and retrieve a full description of biomedical datasets, the computational procedures that derived them, and the graph of dependencies between datasets. These functionalities are essential to support dataset traceability, reproducibility and update.

BibTex references

  author       = {Cuccuru, G. and Uva, P. and Onano, S. and Atzeni, R. and Leo, S. and Lianas, L. and Oppo, M. and Pireddu, L. and Angius, A. and Crisponi, L. and Zanetti, G. and Fotia, G.},
  title        = {Exploiting a large scale biodata management system to support NGS variant detection studies},
  year         = {2015},
  keywords     = {variant detection, ngs},
  url          = {},

Other publications in the database

» Gianmauro Cuccuru
» Paolo Uva
» Stefano Onano
» Rossano Atzeni
» Simone Leo
» Luca Lianas
» Manuela Oppo
» Luca Pireddu
» Andrea Angius
» Laura Crisponi
» Gianluigi Zanetti
» Giorgio Fotia