IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0003589.html
   My bibliography  Save this article

FastBLAST: Homology Relationships for Millions of Proteins

Author

Listed:
  • Morgan N Price
  • Paramvir S Dehal
  • Adam P Arkin

Abstract

Background: All-versus-all BLAST, which searches for homologous pairs of sequences in a database of proteins, is used to identify potential orthologs, to find new protein families, and to provide rapid access to these homology relationships. As DNA sequencing accelerates and data sets grow, all-versus-all BLAST has become computationally demanding. Methodology/Principal Findings: We present FastBLAST, a heuristic replacement for all-versus-all BLAST that relies on alignments of proteins to known families, obtained from tools such as PSI-BLAST and HMMer. FastBLAST avoids most of the work of all-versus-all BLAST by taking advantage of these alignments and by clustering similar sequences. FastBLAST runs in two stages: the first stage identifies additional families and aligns them, and the second stage quickly identifies the homologs of a query sequence, based on the alignments of the families, before generating pairwise alignments. On 6.53 million proteins from the non-redundant Genbank database (“NR”), FastBLAST identifies new families 25 times faster than all-versus-all BLAST. Once the first stage is completed, FastBLAST identifies homologs for the average query in less than 5 seconds (8.6 times faster than BLAST) and gives nearly identical results. For hits above 70 bits, FastBLAST identifies 98% of the top 3,250 hits per query. Conclusions/Significance: FastBLAST enables research groups that do not have supercomputers to analyze large protein sequence data sets. FastBLAST is open source software and is available at http://microbesonline.org/fastblast.

Suggested Citation

  • Morgan N Price & Paramvir S Dehal & Adam P Arkin, 2008. "FastBLAST: Homology Relationships for Millions of Proteins," PLOS ONE, Public Library of Science, vol. 3(10), pages 1-8, October.
  • Handle: RePEc:plo:pone00:0003589
    DOI: 10.1371/journal.pone.0003589
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0003589
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0003589&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0003589?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Shibu Yooseph & Granger Sutton & Douglas B Rusch & Aaron L Halpern & Shannon J Williamson & Karin Remington & Jonathan A Eisen & Karla B Heidelberg & Gerard Manning & Weizhong Li & Lukasz Jaroszewski , 2007. "The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families," PLOS Biology, Public Library of Science, vol. 5(3), pages 1-35, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Natarajan Kannan & Susan S Taylor & Yufeng Zhai & J Craig Venter & Gerard Manning, 2007. "Structural and Functional Diversity of the Microbial Kinome," PLOS Biology, Public Library of Science, vol. 5(3), pages 1-12, March.
    2. Meishun Yu & Menghui Zhang & Runying Zeng & Ruolin Cheng & Rui Zhang & Yanping Hou & Fangfang Kuang & Xuejin Feng & Xiyang Dong & Yinfang Li & Zongze Shao & Min Jin, 2024. "Diversity and potential host-interactions of viruses inhabiting deep-sea seamount sediments," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    3. Katharina Mir & Steffen Schober, 2014. "Selection Pressure in Alternative Reading Frames," PLOS ONE, Public Library of Science, vol. 9(10), pages 1-7, October.
    4. Yael Baran & Eran Halperin, 2012. "Joint Analysis of Multiple Metagenomic Samples," PLOS Computational Biology, Public Library of Science, vol. 8(2), pages 1-11, February.
    5. Armstrong, Claire W. & Foley, Naomi S. & Tinch, Rob & van den Hove, Sybille, 2012. "Services from the deep: Steps towards valuation of deep sea goods and services," Ecosystem Services, Elsevier, vol. 2(C), pages 2-13.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0003589. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.