Author
Listed:
- Christina V. Theodoris
(Dana-Farber Cancer Institute
Broad Institute of MIT and Harvard
Boston Children’s Hospital
Harvard Medical School Genetics Training Program)
- Ling Xiao
(Broad Institute of MIT and Harvard
Massachusetts General Hospital)
- Anant Chopra
(Bayer US LLC)
- Mark D. Chaffin
(Broad Institute of MIT and Harvard)
- Zeina R. Al Sayed
(Broad Institute of MIT and Harvard)
- Matthew C. Hill
(Broad Institute of MIT and Harvard
Massachusetts General Hospital)
- Helene Mantineo
(Broad Institute of MIT and Harvard
Massachusetts General Hospital)
- Elizabeth M. Brydon
(Bayer US LLC)
- Zexian Zeng
(Dana-Farber Cancer Institute
Harvard T.H. Chan School of Public Health)
- X. Shirley Liu
(Dana-Farber Cancer Institute
Harvard T.H. Chan School of Public Health
Dana-Farber Cancer Institute)
- Patrick T. Ellinor
(Broad Institute of MIT and Harvard
Massachusetts General Hospital)
Abstract
Mapping gene networks requires large amounts of transcriptomic data to learn the connections between genes, which impedes discoveries in settings with limited data, including rare diseases and diseases affecting clinically inaccessible tissues. Recently, transfer learning has revolutionized fields such as natural language understanding1,2 and computer vision3 by leveraging deep learning models pretrained on large-scale general datasets that can then be fine-tuned towards a vast array of downstream tasks with limited task-specific data. Here, we developed a context-aware, attention-based deep learning model, Geneformer, pretrained on a large-scale corpus of about 30 million single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology. During pretraining, Geneformer gained a fundamental understanding of network dynamics, encoding network hierarchy in the attention weights of the model in a completely self-supervised manner. Fine-tuning towards a diverse panel of downstream tasks relevant to chromatin and network dynamics using limited task-specific data demonstrated that Geneformer consistently boosted predictive accuracy. Applied to disease modelling with limited patient data, Geneformer identified candidate therapeutic targets for cardiomyopathy. Overall, Geneformer represents a pretrained deep learning model from which fine-tuning towards a broad range of downstream applications can be pursued to accelerate discovery of key network regulators and candidate therapeutic targets.
Suggested Citation
Christina V. Theodoris & Ling Xiao & Anant Chopra & Mark D. Chaffin & Zeina R. Al Sayed & Matthew C. Hill & Helene Mantineo & Elizabeth M. Brydon & Zexian Zeng & X. Shirley Liu & Patrick T. Ellinor, 2023.
"Transfer learning enables predictions in network biology,"
Nature, Nature, vol. 618(7965), pages 616-624, June.
Handle:
RePEc:nat:nature:v:618:y:2023:i:7965:d:10.1038_s41586-023-06139-9
DOI: 10.1038/s41586-023-06139-9
Download full text from publisher
As the access to this document is restricted, you may want to search for a different version of it.
Citations
Citations are extracted by the
CitEc Project, subscribe to its
RSS feed for this item.
Cited by:
- Hao Li & Zebei Han & Yu Sun & Fu Wang & Pengzhen Hu & Yuang Gao & Xuemei Bai & Shiyu Peng & Chao Ren & Xiang Xu & Zeyu Liu & Hebing Chen & Yang Yang & Xiaochen Bo, 2024.
"CGMega: explainable graph neural network framework with attention mechanisms for cancer gene module dissection,"
Nature Communications, Nature, vol. 15(1), pages 1-15, December.
- Felix Fischer & David S. Fischer & Roman Mukhin & Andrey Isaev & Evan Biederstedt & Alexandra-Chloé Villani & Fabian J. Theis, 2024.
"scTab: Scaling cross-tissue single-cell annotation models,"
Nature Communications, Nature, vol. 15(1), pages 1-15, December.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:nature:v:618:y:2023:i:7965:d:10.1038_s41586-023-06139-9. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.