IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0090581.html
   My bibliography  Save this article

MOSAIK: A Hash-Based Algorithm for Accurate Next-Generation Sequencing Short-Read Mapping

Author

Listed:
  • Wan-Ping Lee
  • Michael P Stromberg
  • Alistair Ward
  • Chip Stewart
  • Erik P Garrison
  • Gabor T Marth

Abstract

MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-network based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (http://gkno.me).

Suggested Citation

  • Wan-Ping Lee & Michael P Stromberg & Alistair Ward & Chip Stewart & Erik P Garrison & Gabor T Marth, 2014. "MOSAIK: A Hash-Based Algorithm for Accurate Next-Generation Sequencing Short-Read Mapping," PLOS ONE, Public Library of Science, vol. 9(3), pages 1-11, March.
  • Handle: RePEc:plo:pone00:0090581
    DOI: 10.1371/journal.pone.0090581
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0090581
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0090581&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0090581?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Stephen M Rumble & Phil Lacroute & Adrian V Dalca & Marc Fiume & Arend Sidow & Michael Brudno, 2009. "SHRiMP: Accurate Mapping of Short Color-space Reads," PLOS Computational Biology, Public Library of Science, vol. 5(5), pages 1-11, May.
    2. Jonathan M. Rothberg & Wolfgang Hinz & Todd M. Rearick & Jonathan Schultz & William Mileski & Mel Davey & John H. Leamon & Kim Johnson & Mark J. Milgrew & Matthew Edwards & Jeremy Hoon & Jan F. Simons, 2011. "An integrated semiconductor device enabling non-optical genome sequencing," Nature, Nature, vol. 475(7356), pages 348-352, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lars Hahn & Chris-André Leimeister & Rachid Ounit & Stefano Lonardi & Burkhard Morgenstern, 2016. "rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison," PLOS Computational Biology, Public Library of Science, vol. 12(10), pages 1-18, October.
    2. Meznah Almutairy & Eric Torng, 2017. "The effects of sampling on the efficiency and accuracy of k−mer indexes: Theoretical and empirical comparisons using the human genome," PLOS ONE, Public Library of Science, vol. 12(7), pages 1-23, July.
    3. Jack Hu & Fareeha Safir & Kai Chang & Sahil Dagli & Halleh B. Balch & John M. Abendroth & Jefferson Dixon & Parivash Moradifar & Varun Dolia & Malaya K. Sahoo & Benjamin A. Pinsky & Stefanie S. Jeffre, 2023. "Rapid genetic screening with high quality factor metasurfaces," Nature Communications, Nature, vol. 14(1), pages 1-9, December.
    4. Zheng Sun & Weidong Tian, 2012. "SAP—A Sequence Mapping and Analyzing Program for Long Sequence Reads Alignment and Accurate Variants Discovery," PLOS ONE, Public Library of Science, vol. 7(8), pages 1-6, August.
    5. Ilianna Barbayianni & Paraskevi Kanellopoulou & Dionysios Fanidis & Dimitris Nastos & Eleftheria-Dimitra Ntouskou & Apostolos Galaris & Vaggelis Harokopos & Pantelis Hatzis & Eliza Tsitoura & Robert H, 2023. "SRC and TKS5 mediated podosome formation in fibroblasts promotes extracellular matrix invasion and pulmonary fibrosis," Nature Communications, Nature, vol. 14(1), pages 1-20, December.
    6. Francesca Cordero & Marco Beccuti & Maddalena Arigoni & Susanna Donatelli & Raffaele A Calogero, 2012. "Optimizing a Massive Parallel Sequencing Workflow for Quantitative miRNA Expression Analysis," PLOS ONE, Public Library of Science, vol. 7(2), pages 1-10, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0090581. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.