Author
Listed:
- Hanwen Xu
(Microsoft Research
University of Washington)
- Naoto Usuyama
(Microsoft Research)
- Jaspreet Bagga
(Microsoft Research)
- Sheng Zhang
(Microsoft Research)
- Rajesh Rao
(Microsoft Research)
- Tristan Naumann
(Microsoft Research)
- Cliff Wong
(Microsoft Research)
- Zelalem Gero
(Microsoft Research)
- Javier González
(Microsoft Research)
- Yu Gu
(Microsoft Research)
- Yanbo Xu
(Microsoft Research)
- Mu Wei
(Microsoft Research)
- Wenhui Wang
(Microsoft Research)
- Shuming Ma
(Microsoft Research)
- Furu Wei
(Microsoft Research)
- Jianwei Yang
(Microsoft Research)
- Chunyuan Li
(Microsoft Research)
- Jianfeng Gao
(Microsoft Research)
- Jaylen Rosemon
(Providence Genomics)
- Tucker Bower
(Providence Genomics)
- Soohee Lee
(Providence Research Network)
- Roshanthi Weerasinghe
(Providence Research Network)
- Bill J. Wright
(Providence Research Network)
- Ari Robicsek
(Providence Research Network)
- Brian Piening
(Providence Genomics
Providence Cancer Institute)
- Carlo Bifulco
(Providence Genomics
Providence Cancer Institute)
- Sheng Wang
(University of Washington
University of Washington)
- Hoifung Poon
(Microsoft Research)
Abstract
Digital pathology poses unique computational challenges, as a standard gigapixel slide may comprise tens of thousands of image tiles1–3. Prior models have often resorted to subsampling a small portion of tiles for each slide, thus missing the important slide-level context4. Here we present Prov-GigaPath, a whole-slide pathology foundation model pretrained on 1.3 billion 256 × 256 pathology image tiles in 171,189 whole slides from Providence, a large US health network comprising 28 cancer centres. The slides originated from more than 30,000 patients covering 31 major tissue types. To pretrain Prov-GigaPath, we propose GigaPath, a novel vision transformer architecture for pretraining gigapixel pathology slides. To scale GigaPath for slide-level learning with tens of thousands of image tiles, GigaPath adapts the newly developed LongNet5 method to digital pathology. To evaluate Prov-GigaPath, we construct a digital pathology benchmark comprising 9 cancer subtyping tasks and 17 pathomics tasks, using both Providence and TCGA data6. With large-scale pretraining and ultra-large-context modelling, Prov-GigaPath attains state-of-the-art performance on 25 out of 26 tasks, with significant improvement over the second-best method on 18 tasks. We further demonstrate the potential of Prov-GigaPath on vision–language pretraining for pathology7,8 by incorporating the pathology reports. In sum, Prov-GigaPath is an open-weight foundation model that achieves state-of-the-art performance on various digital pathology tasks, demonstrating the importance of real-world data and whole-slide modelling.
Suggested Citation
Hanwen Xu & Naoto Usuyama & Jaspreet Bagga & Sheng Zhang & Rajesh Rao & Tristan Naumann & Cliff Wong & Zelalem Gero & Javier González & Yu Gu & Yanbo Xu & Mu Wei & Wenhui Wang & Shuming Ma & Furu Wei , 2024.
"A whole-slide foundation model for digital pathology from real-world data,"
Nature, Nature, vol. 630(8015), pages 181-188, June.
Handle:
RePEc:nat:nature:v:630:y:2024:i:8015:d:10.1038_s41586-024-07441-w
DOI: 10.1038/s41586-024-07441-w
Download full text from publisher
As the access to this document is restricted, you may want to search for a different version of it.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:nature:v:630:y:2024:i:8015:d:10.1038_s41586-024-07441-w. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.