IDEAS home Printed from https://ideas.repec.org/a/nat/nature/v618y2023i7965d10.1038_s41586-023-06139-9.html
   My bibliography  Save this article

Transfer learning enables predictions in network biology

Author

Listed:
  • Christina V. Theodoris

    (Dana-Farber Cancer Institute
    Broad Institute of MIT and Harvard
    Boston Children’s Hospital
    Harvard Medical School Genetics Training Program)

  • Ling Xiao

    (Broad Institute of MIT and Harvard
    Massachusetts General Hospital)

  • Anant Chopra

    (Bayer US LLC)

  • Mark D. Chaffin

    (Broad Institute of MIT and Harvard)

  • Zeina R. Al Sayed

    (Broad Institute of MIT and Harvard)

  • Matthew C. Hill

    (Broad Institute of MIT and Harvard
    Massachusetts General Hospital)

  • Helene Mantineo

    (Broad Institute of MIT and Harvard
    Massachusetts General Hospital)

  • Elizabeth M. Brydon

    (Bayer US LLC)

  • Zexian Zeng

    (Dana-Farber Cancer Institute
    Harvard T.H. Chan School of Public Health)

  • X. Shirley Liu

    (Dana-Farber Cancer Institute
    Harvard T.H. Chan School of Public Health
    Dana-Farber Cancer Institute)

  • Patrick T. Ellinor

    (Broad Institute of MIT and Harvard
    Massachusetts General Hospital)

Abstract

Mapping gene networks requires large amounts of transcriptomic data to learn the connections between genes, which impedes discoveries in settings with limited data, including rare diseases and diseases affecting clinically inaccessible tissues. Recently, transfer learning has revolutionized fields such as natural language understanding1,2 and computer vision3 by leveraging deep learning models pretrained on large-scale general datasets that can then be fine-tuned towards a vast array of downstream tasks with limited task-specific data. Here, we developed a context-aware, attention-based deep learning model, Geneformer, pretrained on a large-scale corpus of about 30 million single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology. During pretraining, Geneformer gained a fundamental understanding of network dynamics, encoding network hierarchy in the attention weights of the model in a completely self-supervised manner. Fine-tuning towards a diverse panel of downstream tasks relevant to chromatin and network dynamics using limited task-specific data demonstrated that Geneformer consistently boosted predictive accuracy. Applied to disease modelling with limited patient data, Geneformer identified candidate therapeutic targets for cardiomyopathy. Overall, Geneformer represents a pretrained deep learning model from which fine-tuning towards a broad range of downstream applications can be pursued to accelerate discovery of key network regulators and candidate therapeutic targets.

Suggested Citation

  • Christina V. Theodoris & Ling Xiao & Anant Chopra & Mark D. Chaffin & Zeina R. Al Sayed & Matthew C. Hill & Helene Mantineo & Elizabeth M. Brydon & Zexian Zeng & X. Shirley Liu & Patrick T. Ellinor, 2023. "Transfer learning enables predictions in network biology," Nature, Nature, vol. 618(7965), pages 616-624, June.
  • Handle: RePEc:nat:nature:v:618:y:2023:i:7965:d:10.1038_s41586-023-06139-9
    DOI: 10.1038/s41586-023-06139-9
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41586-023-06139-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1038/s41586-023-06139-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Hao Li & Zebei Han & Yu Sun & Fu Wang & Pengzhen Hu & Yuang Gao & Xuemei Bai & Shiyu Peng & Chao Ren & Xiang Xu & Zeyu Liu & Hebing Chen & Yang Yang & Xiaochen Bo, 2024. "CGMega: explainable graph neural network framework with attention mechanisms for cancer gene module dissection," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    2. Felix Fischer & David S. Fischer & Roman Mukhin & Andrey Isaev & Evan Biederstedt & Alexandra-Chloé Villani & Fabian J. Theis, 2024. "scTab: Scaling cross-tissue single-cell annotation models," Nature Communications, Nature, vol. 15(1), pages 1-15, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:nature:v:618:y:2023:i:7965:d:10.1038_s41586-023-06139-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.