IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v14y2023i1d10.1038_s41467-023-39396-3.html
   My bibliography  Save this article

Artificial intelligence driven design of catalysts and materials for ring opening polymerization using a domain-specific language

Author

Listed:
  • Nathaniel H. Park

    (IBM Research–Almaden)

  • Matteo Manica

    (IBM Research–Zurich)

  • Jannis Born

    (IBM Research–Zurich
    ETH Zurich, Mattenstrasse 26)

  • James L. Hedrick

    (IBM Research–Almaden)

  • Tim Erdmann

    (IBM Research–Almaden)

  • Dmitry Yu. Zubarev

    (IBM Research–Almaden)

  • Nil Adell-Mill

    (IBM Research–Zurich
    Arctoris, 120E Olympic Avenue)

  • Pedro L. Arrechea

    (IBM Research–Almaden)

Abstract

Advances in machine learning (ML) and automated experimentation are poised to vastly accelerate research in polymer science. Data representation is a critical aspect for enabling ML integration in research workflows, yet many data models impose significant rigidity making it difficult to accommodate a broad array of experiment and data types found in polymer science. This inflexibility presents a significant barrier for researchers to leverage their historical data in ML development. Here we show that a domain specific language, termed Chemical Markdown Language (CMDL), provides flexible, extensible, and consistent representation of disparate experiment types and polymer structures. CMDL enables seamless use of historical experimental data to fine-tune regression transformer (RT) models for generative molecular design tasks. We demonstrate the utility of this approach through the generation and the experimental validation of catalysts and polymers in the context of ring-opening polymerization—although we provide examples of how CMDL can be more broadly applied to other polymer classes. Critically, we show how the CMDL tuned model preserves key functional groups within the polymer structure, allowing for experimental validation. These results reveal the versatility of CMDL and how it facilitates translation of historical data into meaningful predictive and generative models to produce experimentally actionable output.

Suggested Citation

  • Nathaniel H. Park & Matteo Manica & Jannis Born & James L. Hedrick & Tim Erdmann & Dmitry Yu. Zubarev & Nil Adell-Mill & Pedro L. Arrechea, 2023. "Artificial intelligence driven design of catalysts and materials for ring opening polymerization using a domain-specific language," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
  • Handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-39396-3
    DOI: 10.1038/s41467-023-39396-3
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-023-39396-3
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-023-39396-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Alain C. Vaucher & Federico Zipoli & Joppe Geluykens & Vishnu H. Nair & Philippe Schwaller & Teodoro Laino, 2020. "Automated extraction of chemical synthesis actions from experimental procedures," Nature Communications, Nature, vol. 11(1), pages 1-11, December.
    2. Alain C. Vaucher & Philippe Schwaller & Joppe Geluykens & Vishnu H. Nair & Anna Iuliano & Teodoro Laino, 2021. "Inferring experimental procedures from text-based representations of chemical reactions," Nature Communications, Nature, vol. 12(1), pages 1-11, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Manu Suvarna & Alain Claude Vaucher & Sharon Mitchell & Teodoro Laino & Javier Pérez-Ramírez, 2023. "Language models and protocol standardization guidelines for accelerating synthesis planning in heterogeneous catalysis," Nature Communications, Nature, vol. 14(1), pages 1-11, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-39396-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.