IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v15y2024i1d10.1038_s41467-024-49587-1.html
   My bibliography  Save this article

Discovering type I cis-AT polyketides through computational mass spectrometry and genome mining with Seq2PKS

Author

Listed:
  • Donghui Yan

    (Carnegie Mellon University)

  • Muqing Zhou

    (Carnegie Mellon University)

  • Abhinav Adduri

    (Carnegie Mellon University)

  • Yihao Zhuang

    (University of Michigan)

  • Mustafa Guler

    (Carnegie Mellon University)

  • Sitong Liu

    (Carnegie Mellon University)

  • Hyonyoung Shin

    (Carnegie Mellon University)

  • Torin Kovach

    (Carnegie Mellon University)

  • Gloria Oh

    (Carnegie Mellon University)

  • Xiao Liu

    (Carnegie Mellon University)

  • Yuting Deng

    (Carnegie Mellon University)

  • Xiaofeng Wang

    (University of Michigan)

  • Liu Cao

    (Carnegie Mellon University)

  • David H. Sherman

    (University of Michigan
    University of Michigan)

  • Pamela J. Schultz

    (University of Michigan
    University of Michigan)

  • Roland D. Kersten

    (University of Michigan)

  • Jason A. Clement

    (Baruch S. Blumberg Institute)

  • Ashootosh Tripathi

    (University of Michigan
    University of Michigan
    University of Michigan)

  • Bahar Behsaz

    (Carnegie Mellon University
    Chemia Biosciences Inc)

  • Hosein Mohimani

    (Carnegie Mellon University)

Abstract

Type 1 polyketides are a major class of natural products used as antiviral, antibiotic, antifungal, antiparasitic, immunosuppressive, and antitumor drugs. Analysis of public microbial genomes leads to the discovery of over sixty thousand type 1 polyketide gene clusters. However, the molecular products of only about a hundred of these clusters are characterized, leaving most metabolites unknown. Characterizing polyketides relies on bioactivity-guided purification, which is expensive and time-consuming. To address this, we present Seq2PKS, a machine learning algorithm that predicts chemical structures derived from Type 1 polyketide synthases. Seq2PKS predicts numerous putative structures for each gene cluster to enhance accuracy. The correct structure is identified using a variable mass spectral database search. Benchmarks show that Seq2PKS outperforms existing methods. Applying Seq2PKS to Actinobacteria datasets, we discover biosynthetic gene clusters for monazomycin, oasomycin A, and 2-aminobenzamide-actiphenol.

Suggested Citation

  • Donghui Yan & Muqing Zhou & Abhinav Adduri & Yihao Zhuang & Mustafa Guler & Sitong Liu & Hyonyoung Shin & Torin Kovach & Gloria Oh & Xiao Liu & Yuting Deng & Xiaofeng Wang & Liu Cao & David H. Sherman, 2024. "Discovering type I cis-AT polyketides through computational mass spectrometry and genome mining with Seq2PKS," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
  • Handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-49587-1
    DOI: 10.1038/s41467-024-49587-1
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-024-49587-1
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-024-49587-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Chad W. Johnston & Michael A. Skinnider & Morgan A. Wyatt & Xiang Li & Michael R. M. Ranieri & Lian Yang & David L. Zechel & Bin Ma & Nathan A. Magarvey, 2015. "An automated Genomes-to-Natural Products platform (GNP) for the discovery of modular natural products," Nature Communications, Nature, vol. 6(1), pages 1-11, December.
    2. Bahar Behsaz & Edna Bode & Alexey Gurevich & Yan-Ni Shi & Florian Grundmann & Deepa Acharya & Andrés Mauricio Caraballo-Rodríguez & Amina Bouslimani & Morgan Panitchpakdi & Annabell Linck & Changhui G, 2021. "Integrating genomics and metabolomics for scalable non-ribosomal peptide discovery," Nature Communications, Nature, vol. 12(1), pages 1-17, December.
    3. Hosein Mohimani & Alexey Gurevich & Alexander Shlemov & Alla Mikheenko & Anton Korobeynikov & Liu Cao & Egor Shcherbin & Louis-Felix Nothias & Pieter C. Dorrestein & Pavel A. Pevzner, 2018. "Dereplication of microbial metabolites through database search of mass spectra," Nature Communications, Nature, vol. 9(1), pages 1-12, December.
    4. Bahar Behsaz & Edna Bode & Alexey Gurevich & Yan-Ni Shi & Florian Grundmann & Deepa Acharya & Andrés Mauricio Caraballo-Rodríguez & Amina Bouslimani & Morgan Panitchpakdi & Annabell Linck & Changhui G, 2021. "Publisher Correction: Integrating genomics and metabolomics for scalable non-ribosomal peptide discovery," Nature Communications, Nature, vol. 12(1), pages 1-1, December.
    5. Yi-Yuan Lee & Mustafa Guler & Desnor N. Chigumba & Shen Wang & Neel Mittal & Cameron Miller & Benjamin Krummenacher & Haodong Liu & Liu Cao & Aditya Kannan & Keshav Narayan & Samuel T. Slocum & Bryan , 2023. "HypoRiPPAtlas as an Atlas of hypothetical natural products for mass spectrometry database search," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    6. Michael A. Skinnider & Chad W. Johnston & Mathusan Gunabalasingam & Nishanth J. Merwin & Agata M. Kieliszek & Robyn J. MacLellan & Haoxin Li & Michael R. M. Ranieri & Andrew L. H. Webster & My P. T. C, 2020. "Comprehensive prediction of secondary metabolite structure and biological activity from microbial genome sequences," Nature Communications, Nature, vol. 11(1), pages 1-9, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yi-Yuan Lee & Mustafa Guler & Desnor N. Chigumba & Shen Wang & Neel Mittal & Cameron Miller & Benjamin Krummenacher & Haodong Liu & Liu Cao & Aditya Kannan & Keshav Narayan & Samuel T. Slocum & Bryan , 2023. "HypoRiPPAtlas as an Atlas of hypothetical natural products for mass spectrometry database search," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    2. Gita Naseri, 2023. "A roadmap to establish a comprehensive platform for sustainable manufacturing of natural products in yeast," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    3. Nicholas J. Morehouse & Trevor N. Clark & Emily J. McMann & Jeffrey A. Santen & F. P. Jake Haeckl & Christopher A. Gray & Roger G. Linington, 2023. "Annotation of natural product compound families using molecular networking topology and structural similarity fingerprinting," Nature Communications, Nature, vol. 14(1), pages 1-10, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-49587-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.