IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v14y2023i1d10.1038_s41467-023-37969-w.html
   My bibliography  Save this article

Single-step retrosynthesis prediction by leveraging commonly preserved substructures

Author

Listed:
  • Lei Fang

    (Microsoft Research Asia)

  • Junren Li

    (Peking University)

  • Ming Zhao

    (Waseda University)

  • Li Tan

    (Mincui Therapeutix)

  • Jian-Guang Lou

    (Microsoft Research Asia)

Abstract

Retrosynthesis analysis is an important task in organic chemistry with numerous industrial applications. Previously, machine learning approaches employing natural language processing techniques achieved promising results in this task by first representing reactant molecules as strings and subsequently predicting reactant molecules using text generation or machine translation models. Chemists cannot readily derive useful insights from traditional approaches that rely largely on atom-level decoding in the string representations, because human experts tend to interpret reactions by analyzing substructures that comprise a molecule. It is well-established that some substructures are stable and remain unchanged in reactions. In this paper, we developed a substructure-level decoding model, where commonly preserved portions of product molecules were automatically extracted with a fully data-driven approach. Our model achieves improvement over previously reported models, and we demonstrate that its performance can be boosted further by enhancing the accuracy of these substructures. Analyzing substructures extracted from our machine learning model can provide human experts with additional insights to assist decision-making in retrosynthesis analysis.

Suggested Citation

  • Lei Fang & Junren Li & Ming Zhao & Li Tan & Jian-Guang Lou, 2023. "Single-step retrosynthesis prediction by leveraging commonly preserved substructures," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
  • Handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-37969-w
    DOI: 10.1038/s41467-023-37969-w
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-023-37969-w
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-023-37969-w?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Igor V. Tetko & Pavel Karpov & Ruud Deursen & Guillaume Godin, 2020. "State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis," Nature Communications, Nature, vol. 11(1), pages 1-11, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yuqiang Han & Xiaoyang Xu & Chang-Yu Hsieh & Keyan Ding & Hongxia Xu & Renjun Xu & Tingjun Hou & Qiang Zhang & Huajun Chen, 2024. "Retrosynthesis prediction with an iterative string editing model," Nature Communications, Nature, vol. 15(1), pages 1-16, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yuqiang Han & Xiaoyang Xu & Chang-Yu Hsieh & Keyan Ding & Hongxia Xu & Renjun Xu & Tingjun Hou & Qiang Zhang & Huajun Chen, 2024. "Retrosynthesis prediction with an iterative string editing model," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    2. Yasuhiro Yoshikai & Tadahaya Mizuno & Shumpei Nemoto & Hiroyuki Kusuhara, 2024. "Difficulty in chirality recognition for Transformer architectures learning chemical structures from string representations," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    3. Jinho Chang & Jong Chul Ye, 2024. "Bidirectional generation of structure and properties through a single molecular foundation model," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    4. Umit V. Ucak & Islambek Ashyrmamatov & Junsu Ko & Juyong Lee, 2022. "Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    5. Yu Shee & Haote Li & Pengpeng Zhang & Andrea M. Nikolic & Wenxin Lu & H. Ray Kelly & Vidhyadhar Manee & Sanil Sreekumar & Frederic G. Buono & Jinhua J. Song & Timothy R. Newhouse & Victor S. Batista, 2024. "Site-specific template generative approach for retrosynthetic planning," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    6. Yu Wang & Chao Pang & Yuzhe Wang & Junru Jin & Jingjie Zhang & Xiangxiang Zeng & Ran Su & Quan Zou & Leyi Wei, 2023. "Retrosynthesis prediction with an interpretable deep-learning framework based on molecular assembly tasks," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    7. Weihe Zhong & Ziduo Yang & Calvin Yu-Chian Chen, 2023. "Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing," Nature Communications, Nature, vol. 14(1), pages 1-14, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-37969-w. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.