IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v13y2022i1d10.1038_s41467-022-31915-y.html
   My bibliography  Save this article

Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science

Author

Listed:
  • Charlotte Loh

    (Massachusetts Institute of Technology)

  • Thomas Christensen

    (Massachusetts Institute of Technology)

  • Rumen Dangovski

    (Massachusetts Institute of Technology)

  • Samuel Kim

    (Massachusetts Institute of Technology)

  • Marin Soljačić

    (Massachusetts Institute of Technology)

Abstract

Deep learning techniques have been increasingly applied to the natural sciences, e.g., for property prediction and optimization or material discovery. A fundamental ingredient of such approaches is the vast quantity of labeled data needed to train the model. This poses severe challenges in data-scarce settings where obtaining labels requires substantial computational or labor resources. Noting that problems in natural sciences often benefit from easily obtainable auxiliary information sources, we introduce surrogate- and invariance-boosted contrastive learning (SIB-CL), a deep learning framework which incorporates three inexpensive and easily obtainable auxiliary information sources to overcome data scarcity. Specifically, these are: abundant unlabeled data, prior knowledge of symmetries or invariances, and surrogate data obtained at near-zero cost. We demonstrate SIB-CL’s effectiveness and generality on various scientific problems, e.g., predicting the density-of-states of 2D photonic crystals and solving the 3D time-independent Schrödinger equation. SIB-CL consistently results in orders of magnitude reduction in the number of labels needed to achieve the same network accuracies.

Suggested Citation

  • Charlotte Loh & Thomas Christensen & Rumen Dangovski & Samuel Kim & Marin Soljačić, 2022. "Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
  • Handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-31915-y
    DOI: 10.1038/s41467-022-31915-y
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-022-31915-y
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-022-31915-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Kristof T. Schütt & Farhad Arbabzadah & Stefan Chmiela & Klaus R. Müller & Alexandre Tkatchenko, 2017. "Quantum-chemical insights from deep tensor neural networks," Nature Communications, Nature, vol. 8(1), pages 1-8, April.
    2. Simon Batzner & Albert Musaelian & Lixin Sun & Mario Geiger & Jonathan P. Mailoa & Mordechai Kornbluth & Nicola Molinari & Tess E. Smidt & Boris Kozinsky, 2022. "E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    3. Stefan Chmiela & Huziel E. Sauceda & Klaus-Robert Müller & Alexandre Tkatchenko, 2018. "Towards exact molecular dynamics simulations with machine-learned force fields," Nature Communications, Nature, vol. 9(1), pages 1-10, December.
    4. Andrew W. Senior & Richard Evans & John Jumper & James Kirkpatrick & Laurent Sifre & Tim Green & Chongli Qin & Augustin Žídek & Alexander W. R. Nelson & Alex Bridgland & Hugo Penedones & Stig Petersen, 2020. "Improved protein structure prediction using potentials from deep learning," Nature, Nature, vol. 577(7792), pages 706-710, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yusong Wang & Tong Wang & Shaoning Li & Xinheng He & Mingyu Li & Zun Wang & Nanning Zheng & Bin Shao & Tie-Yan Liu, 2024. "Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    2. Albert Musaelian & Simon Batzner & Anders Johansson & Lixin Sun & Cameron J. Owen & Mordechai Kornbluth & Boris Kozinsky, 2023. "Learning local equivariant representations for large-scale atomistic dynamics," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    3. Niklas W. A. Gebauer & Michael Gastegger & Stefaan S. P. Hessmann & Klaus-Robert Müller & Kristof T. Schütt, 2022. "Inverse design of 3d molecular structures with conditional generative neural networks," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    4. Adil Kabylda & Valentin Vassilev-Galindo & Stefan Chmiela & Igor Poltavsky & Alexandre Tkatchenko, 2023. "Efficient interatomic descriptors for accurate machine learning force fields of extended molecules," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    5. Yuanming Bai & Leslie Vogt-Maranto & Mark E. Tuckerman & William J. Glover, 2022. "Machine learning the Hohenberg-Kohn map for molecular excited states," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    6. Simon Batzner & Albert Musaelian & Lixin Sun & Mario Geiger & Jonathan P. Mailoa & Mordechai Kornbluth & Nicola Molinari & Tess E. Smidt & Boris Kozinsky, 2022. "E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    7. Lauren L. Porter & Allen K. Kim & Swechha Rimal & Loren L. Looger & Ananya Majumdar & Brett D. Mensh & Mary R. Starich & Marie-Paule Strub, 2022. "Many dissimilar NusG protein domains switch between α-helix and β-sheet folds," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    8. Zachary C. Drake & Justin T. Seffernick & Steffen Lindert, 2022. "Protein complex prediction using Rosetta, AlphaFold, and mass spectrometry covalent labeling," Nature Communications, Nature, vol. 13(1), pages 1-9, December.
    9. Nicolae Sapoval & Amirali Aghazadeh & Michael G. Nute & Dinler A. Antunes & Advait Balaji & Richard Baraniuk & C. J. Barberan & Ruth Dannenfelser & Chen Dun & Mohammadamin Edrisi & R. A. Leo Elworth &, 2022. "Current progress and open challenges for applying deep learning across the biosciences," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    10. Krzysztof Rusek & Agnieszka Kleszcz & Albert Cabellos-Aparicio, 2022. "Bayesian inference of spatial and temporal relations in AI patents for EU countries," Papers 2201.07168, arXiv.org.
    11. Krzysztof Rusek & Agnieszka Kleszcz & Albert Cabellos-Aparicio, 2023. "Bayesian inference of spatial and temporal relations in AI patents for EU countries," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(6), pages 3313-3335, June.
    12. Benoit Stijlemans & Patrick Baetselier & Inge Molle & Laurence Lecordier & Erika Hendrickx & Ema Romão & Cécile Vincke & Wendy Baetens & Steve Schoonooghe & Gholamreza Hassanzadeh-Ghassabeh & Hannelie, 2024. "Q586B2 is a crucial virulence factor during the early stages of Trypanosoma brucei infection that is conserved amongst trypanosomatids," Nature Communications, Nature, vol. 15(1), pages 1-18, December.
    13. Lisa Van den Broeck & Dinesh Kiran Bhosale & Kuncheng Song & Cássio Flavio Fonseca de Lima & Michael Ashley & Tingting Zhu & Shanshuo Zhu & Brigitte Van De Cotte & Pia Neyt & Anna C. Ortiz & Tiffany R, 2023. "Functional annotation of proteins for signaling network inference in non-model species," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    14. Li Zheng & Konstantinos Karapiperis & Siddhant Kumar & Dennis M. Kochmann, 2023. "Unifying the design space and optimizing linear and nonlinear truss metamaterials by generative modeling," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    15. Nikita Moshkov & Tim Becker & Kevin Yang & Peter Horvath & Vlado Dancik & Bridget K. Wagner & Paul A. Clemons & Shantanu Singh & Anne E. Carpenter & Juan C. Caicedo, 2023. "Predicting compound activity from phenotypic profiles and chemical structures," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    16. Januschowski, Tim & Wang, Yuyang & Torkkola, Kari & Erkkilä, Timo & Hasson, Hilaf & Gasthaus, Jan, 2022. "Forecasting with trees," International Journal of Forecasting, Elsevier, vol. 38(4), pages 1473-1481.
    17. Huziel E. Sauceda & Luis E. Gálvez-González & Stefan Chmiela & Lauro Oliver Paz-Borbón & Klaus-Robert Müller & Alexandre Tkatchenko, 2022. "BIGDML—Towards accurate quantum machine learning force fields for materials," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    18. Xing Chen & Flavio Abreu Araujo & Mathieu Riou & Jacob Torrejon & Dafiné Ravelosona & Wang Kang & Weisheng Zhao & Julie Grollier & Damien Querlioz, 2022. "Forecasting the outcome of spintronic experiments with Neural Ordinary Differential Equations," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    19. Hajkowicz, Stefan & Naughtin, Claire & Sanderson, Conrad & Schleiger, Emma & Karimi, Sarvnaz & Bratanova, Alexandra & Bednarz, Tomasz, 2022. "Artificial intelligence for science – adoption trends and future development pathways," MPRA Paper 115464, University Library of Munich, Germany.
    20. Qiufen Chen & Yuanzhao Guo & Jiuhong Jiang & Jing Qu & Li Zhang & Han Wang, 2023. "The Relative Distance Prediction of Transmembrane Protein Surface Residue Based on Improved Residual Networks," Mathematics, MDPI, vol. 11(3), pages 1-16, January.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-31915-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.