IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0139868.html
   My bibliography  Save this article

NGS-QCbox and Raspberry for Parallel, Automated and Rapid Quality Control Analysis of Large-Scale Next Generation Sequencing (Illumina) Data

Author

Listed:
  • Mohan A V S K Katta
  • Aamir W Khan
  • Dadakhalandar Doddamani
  • Mahendar Thudi
  • Rajeev K Varshney

Abstract

Rapid popularity and adaptation of next generation sequencing (NGS) approaches have generated huge volumes of data. High throughput platforms like Illumina HiSeq produce terabytes of raw data that requires quick processing. Quality control of the data is an important component prior to the downstream analyses. To address these issues, we have developed a quality control pipeline, NGS-QCbox that scales up to process hundreds or thousands of samples. Raspberry is an in-house tool, developed in C language utilizing HTSlib (v1.2.1) (http://htslib.org), for computing read/base level statistics. It can be used as stand-alone application and can process both compressed and uncompressed FASTQ format files. NGS-QCbox integrates Raspberry with other open-source tools for alignment (Bowtie2), SNP calling (SAMtools) and other utilities (bedtools) towards analyzing raw NGS data at higher efficiency and in high-throughput manner. The pipeline implements batch processing of jobs using Bpipe (https://github.com/ssadedin/bpipe) in parallel and internally, a fine grained task parallelization utilizing OpenMP. It reports read and base statistics along with genome coverage and variants in a user friendly format. The pipeline developed presents a simple menu driven interface and can be used in either quick or complete mode. In addition, the pipeline in quick mode outperforms in speed against other similar existing QC pipeline/tools. The NGS-QCbox pipeline, Raspberry tool and associated scripts are made available at the URL https://github.com/CEG-ICRISAT/NGS-QCbox and https://github.com/CEG-ICRISAT/Raspberry for rapid quality control analysis of large-scale next generation sequencing (Illumina) data.

Suggested Citation

  • Mohan A V S K Katta & Aamir W Khan & Dadakhalandar Doddamani & Mahendar Thudi & Rajeev K Varshney, 2015. "NGS-QCbox and Raspberry for Parallel, Automated and Rapid Quality Control Analysis of Large-Scale Next Generation Sequencing (Illumina) Data," PLOS ONE, Public Library of Science, vol. 10(10), pages 1-9, October.
  • Handle: RePEc:plo:pone00:0139868
    DOI: 10.1371/journal.pone.0139868
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0139868
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0139868&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0139868?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Ravi K Patel & Mukesh Jain, 2012. "NGS QC Toolkit: A Toolkit for Quality Control of Next Generation Sequencing Data," PLOS ONE, Public Library of Science, vol. 7(2), pages 1-7, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dongya Wu & Enhui Shen & Bowen Jiang & Yu Feng & Wei Tang & Sangting Lao & Lei Jia & Han-Yang Lin & Lingjuan Xie & Xifang Weng & Chenfeng Dong & Qinghong Qian & Feng Lin & Haiming Xu & Huabing Lu & Lu, 2022. "Genomic insights into the evolution of Echinochloa species as weed and orphan crop," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    2. Abrar E Al-Shaer & George R Flentke & Mark E Berres & Ana Garic & Susan M Smith, 2019. "Exon level machine learning analyses elucidate novel candidate miRNA targets in an avian model of fetal alcohol spectrum disorder," PLOS Computational Biology, Public Library of Science, vol. 15(4), pages 1-25, April.
    3. Wei Ding & Shougang Wang & Peng Qin & Shen Fan & Xiaoyan Su & Peiyan Cai & Jie Lu & Han Cui & Meng Wang & Yi Shu & Yongming Wang & Hui-Hui Fu & Yu-Zhong Zhang & Yong-Xin Li & Weipeng Zhang, 2023. "Anaerobic thiosulfate oxidation by the Roseobacter group is prevalent in marine biofilms," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    4. Irene Stefanini & Monica Di Paola & Gianni Liti & Andrea Marranci & Federico Sebastiani & Enrico Casalone & Duccio Cavalieri, 2022. "Resistance to Arsenite and Arsenate in Saccharomyces cerevisiae Arises through the Subtelomeric Expansion of a Cluster of Yeast Genes," IJERPH, MDPI, vol. 19(13), pages 1-15, July.
    5. Wenxiu Wang & Weizhi Song & Marwan E. Majzoub & Xiaoyuan Feng & Bu Xu & Jianchang Tao & Yuanqing Zhu & Zhiyong Li & Pei-Yuan Qian & Nicole S. Webster & Torsten Thomas & Lu Fan, 2024. "Decoupling of strain- and intrastrain-level interactions of microbiomes in a sponge holobiont," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    6. Ben Jia & Liming Xuan & Kaiye Cai & Zhiqiang Hu & Liangxiao Ma & Chaochun Wei, 2013. "NeSSM: A Next-Generation Sequencing Simulator for Metagenomics," PLOS ONE, Public Library of Science, vol. 8(10), pages 1-10, October.
    7. Lihong Gu & Feng Wang & Zhemin Lin & Tieshan Xu & Dajie Lin & Manping Xing & Shaoxiong Yang & Zhe Chao & Baoguo Ye & Peng Lin & Chunhui Hui & Lizhi Lu & Shuisheng Hou, 2020. "Genetic characteristics of Jiaji Duck by whole genome re-sequencing," PLOS ONE, Public Library of Science, vol. 15(2), pages 1-15, February.
    8. Pingfen Zhu & Weiqiang Liu & Xiaoxiao Zhang & Meng Li & Gaoming Liu & Yang Yu & Zihao Li & Xuanjing Li & Juan Du & Xiao Wang & Cyril C. Grueter & Ming Li & Xuming Zhou, 2023. "Correlated evolution of social organization and lifespan in mammals," Nature Communications, Nature, vol. 14(1), pages 1-18, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0139868. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.