IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v14y2023i1d10.1038_s41467-023-43651-y.html
   My bibliography  Save this article

PhenoSV: interpretable phenotype-aware model for the prioritization of genes affected by structural variants

Author

Listed:
  • Zhuoran Xu

    (University of Pennsylvania Perelman School of Medicine
    Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia
    Weill Cornell Medicine)

  • Quan Li

    (Princess Margaret Cancer Centre, University Health Network, University of Toronto)

  • Luigi Marchionni

    (Weill Cornell Medicine)

  • Kai Wang

    (Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia
    University of Pennsylvania)

Abstract

Structural variants (SVs) represent a major source of genetic variation associated with phenotypic diversity and disease susceptibility. While long-read sequencing can discover over 20,000 SVs per human genome, interpreting their functional consequences remains challenging. Existing methods for identifying disease-related SVs focus on deletion/duplication only and cannot prioritize individual genes affected by SVs, especially for noncoding SVs. Here, we introduce PhenoSV, a phenotype-aware machine-learning model that interprets all major types of SVs and genes affected. PhenoSV segments and annotates SVs with diverse genomic features and employs a transformer-based architecture to predict their impacts under a multiple-instance learning framework. With phenotype information, PhenoSV further utilizes gene-phenotype associations to prioritize phenotype-related SVs. Evaluation on extensive human SV datasets covering all SV types demonstrates PhenoSV’s superior performance over competing methods. Applications in diseases suggest that PhenoSV can determine disease-related genes from SVs. A web server and a command-line tool for PhenoSV are available at https://phenosv.wglab.org .

Suggested Citation

  • Zhuoran Xu & Quan Li & Luigi Marchionni & Kai Wang, 2023. "PhenoSV: interpretable phenotype-aware model for the prioritization of genes affected by structural variants," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
  • Handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-43651-y
    DOI: 10.1038/s41467-023-43651-y
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-023-43651-y
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-023-43651-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Jesse R. Dixon & Siddarth Selvaraj & Feng Yue & Audrey Kim & Yan Li & Yin Shen & Ming Hu & Jun S. Liu & Bing Ren, 2012. "Topological domains in mammalian genomes identified by analysis of chromatin interactions," Nature, Nature, vol. 485(7398), pages 376-380, May.
    2. Yilong Li & Nicola D. Roberts & Jeremiah A. Wala & Ofer Shapira & Steven E. Schumacher & Kiran Kumar & Ekta Khurana & Sebastian Waszak & Jan O. Korbel & James E. Haber & Marcin Imielinski & Joachim We, 2020. "Patterns of somatic structural variation in human cancer genomes," Nature, Nature, vol. 578(7793), pages 112-121, February.
    3. Konrad J. Karczewski & Laurent C. Francioli & Grace Tiao & Beryl B. Cummings & Jessica Alföldi & Qingbo Wang & Ryan L. Collins & Kristen M. Laricchia & Andrea Ganna & Daniel P. Birnbaum & Laura D. Gau, 2020. "The mutational constraint spectrum quantified from variation in 141,456 humans," Nature, Nature, vol. 581(7809), pages 434-443, May.
    4. Ryan L. Collins & Harrison Brand & Konrad J. Karczewski & Xuefang Zhao & Jessica Alföldi & Laurent C. Francioli & Amit V. Khera & Chelsea Lowther & Laura D. Gauthier & Harold Wang & Nicholas A. Watts , 2020. "A structural variation reference for medical and population genetics," Nature, Nature, vol. 581(7809), pages 444-451, May.
    5. Teri A. Manolio & Francis S. Collins & Nancy J. Cox & David B. Goldstein & Lucia A. Hindorff & David J. Hunter & Mark I. McCarthy & Erin M. Ramos & Lon R. Cardon & Aravinda Chakravarti & Judy H. Cho &, 2009. "Finding the missing heritability of complex diseases," Nature, Nature, vol. 461(7265), pages 747-753, October.
    6. Kuhn, Max, 2008. "Building Predictive Models in R Using the caret Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 28(i05).
    7. Carles A. Boix & Benjamin T. James & Yongjin P. Park & Wouter Meuleman & Manolis Kellis, 2021. "Regulatory genomic circuitry of human disease loci by integrative epigenomics," Nature, Nature, vol. 590(7845), pages 300-307, February.
    8. Donald F. Conrad & Dalila Pinto & Richard Redon & Lars Feuk & Omer Gokcumen & Yujun Zhang & Jan Aerts & T. Daniel Andrews & Chris Barnes & Peter Campbell & Tomas Fitzgerald & Min Hu & Chun Hwa Ihm & K, 2010. "Origins and functional impact of copy number variation in the human genome," Nature, Nature, vol. 464(7289), pages 704-712, April.
    9. Esther Rheinbay & Morten Muhlig Nielsen & Federico Abascal & Jeremiah A. Wala & Ofer Shapira & Grace Tiao & Henrik Hornshøj & Julian M. Hess & Randi Istrup Juul & Ziao Lin & Lars Feuerbach & Radhakris, 2020. "Analyses of non-coding somatic drivers in 2,658 cancer whole genomes," Nature, Nature, vol. 578(7793), pages 102-111, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alexander Martinez-Fundichely & Austin Dixon & Ekta Khurana, 2022. "Modeling tissue-specific breakpoint proximity of structural variations from whole-genomes to identify cancer drivers," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    2. Joanna Hui Juan Tan & Zhihui Li & Mar Gonzalez Porta & Ramesh Rajaby & Weng Khong Lim & Ye An Tan & Rodrigo Toro Jimenez & Renyi Teo & Maxime Hebrard & Jack Ling Ow & Shimin Ang & Justin Jeyakani & Ya, 2024. "A Catalogue of Structural Variation across Ancestrally Diverse Asian Genomes," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    3. Parithi Balachandran & Isha A. Walawalkar & Jacob I. Flores & Jacob N. Dayton & Peter A. Audano & Christine R. Beck, 2022. "Transposable element-mediated rearrangements are prevalent in human genomes," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    4. Yirong Shi & Yiwei Niu & Peng Zhang & Huaxia Luo & Shuai Liu & Sijia Zhang & Jiajia Wang & Yanyan Li & Xinyue Liu & Tingrui Song & Tao Xu & Shunmin He, 2023. "Characterization of genome-wide STR variation in 6487 human genomes," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    5. Ting Fu & Kofi Amoah & Tracey W. Chan & Jae Hoon Bahn & Jae-Hyung Lee & Sari Terrazas & Rockie Chong & Sriram Kosuri & Xinshu Xiao, 2024. "Massively parallel screen uncovers many rare 3′ UTR variants regulating mRNA abundance of cancer driver genes," Nature Communications, Nature, vol. 15(1), pages 1-20, December.
    6. Liyuan Zhou & Qiongzi Qiu & Qing Zhou & Jianwei Li & Mengqian Yu & Kezhen Li & Lingling Xu & Xiaohui Ke & Haiming Xu & Bingjian Lu & Hui Wang & Weiguo Lu & Pengyuan Liu & Yan Lu, 2022. "Long-read sequencing unveils high-resolution HPV integration and its oncogenic progression in cervical cancer," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    7. Andrea Wilderman & Eva D’haene & Machteld Baetens & Tara N. Yankee & Emma Wentworth Winchester & Nicole Glidden & Ellen Roets & Jo Dorpe & Sandra Janssens & Danny E. Miller & Miranda Galey & Kari M. B, 2024. "A distant global control region is essential for normal expression of anterior HOXA genes during mouse and human craniofacial development," Nature Communications, Nature, vol. 15(1), pages 1-23, December.
    8. Oriol Pich & Iker Reyes-Salazar & Abel Gonzalez-Perez & Nuria Lopez-Bigas, 2022. "Discovering the drivers of clonal hematopoiesis," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    9. Christos Miliotis & Yuling Ma & Xanthi-Lida Katopodi & Dimitra Karagkouni & Eleni Kanata & Kaia Mattioli & Nikolas Kalavros & Yered H. Pita-Juárez & Felipe Batalini & Varune R. Ramnarine & Shivani Nan, 2024. "Determinants of gastric cancer immune escape identified from non-coding immune-landscape quantitative trait loci," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    10. Ada J. S. Chan & Worrawat Engchuan & Miriam S. Reuter & Zhuozhi Wang & Bhooma Thiruvahindrapuram & Brett Trost & Thomas Nalpathamkalam & Carol Negrijn & Sylvia Lamoureux & Giovanna Pellecchia & Rohan , 2022. "Genome-wide rare variant score associates with morphological subtypes of autism spectrum disorder," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    11. Theodore Sakellaropoulos & Catherine Do & Guimei Jiang & Giulia Cova & Peter Meyn & Dacia Dimartino & Sitharam Ramaswami & Adriana Heguy & Aristotelis Tsirigos & Jane A. Skok, 2024. "MethNet: a robust approach to identify regulatory hubs and their distal targets from cancer data," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    12. Remo Monti & Pia Rautenstrauch & Mahsa Ghanbari & Alva Rani James & Matthias Kirchler & Uwe Ohler & Stefan Konigorski & Christoph Lippert, 2022. "Identifying interpretable gene-biomarker associations with functionally informed kernel-based tests in 190,000 exomes," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    13. Peter H. Dixon & Adam P. Levine & Inês Cebola & Melanie M. Y. Chan & Aliya S. Amin & Anshul Aich & Monika Mozere & Hannah Maude & Alice L. Mitchell & Jun Zhang & Jenny Chambers & Argyro Syngelaki & Je, 2022. "GWAS meta-analysis of intrahepatic cholestasis of pregnancy implicates multiple hepatic genes and regulatory elements," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    14. Thomas E. Wilson & Samreen Ahmed & Amanda Winningham & Thomas W. Glover, 2024. "Replication stress induces POLQ-mediated structural variant formation throughout common fragile sites after entry into mitosis," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    15. Katelyn L. Mortenson & Courtney Dawes & Emily R. Wilson & Nathan E. Patchen & Hailey E. Johnson & Jason Gertz & Swneke D. Bailey & Yang Liu & Katherine E. Varley & Xiaoyang Zhang, 2024. "3D genomic analysis reveals novel enhancer-hijacking caused by complex structural alterations that drive oncogene overexpression," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    16. Wenmin Sun & Dan Xiong & Jiamin Ouyang & Xueshan Xiao & Yi Jiang & Yingwei Wang & Shiqiang Li & Ziying Xie & Junwen Wang & Zhonghui Tang & Qingjiong Zhang, 2024. "Altered chromatin topologies caused by balanced chromosomal translocation lead to central iris hypoplasia," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    17. Arthur S. Lee & Lauren J. Ayers & Michael Kosicki & Wai-Man Chan & Lydia N. Fozo & Brandon M. Pratt & Thomas E. Collins & Boxun Zhao & Matthew F. Rose & Alba Sanchis-Juan & Jack M. Fu & Isaac Wong & X, 2024. "A cell type-aware framework for nominating non-coding variants in Mendelian regulatory disorders," Nature Communications, Nature, vol. 15(1), pages 1-26, December.
    18. Jinlong Shi & Zhilong Jia & Jinxiu Sun & Xiaoreng Wang & Xiaojing Zhao & Chenghui Zhao & Fan Liang & Xinyu Song & Jiawei Guan & Xue Jia & Jing Yang & Qi Chen & Kang Yu & Qian Jia & Jing Wu & Depeng Wa, 2023. "Structural variants involved in high-altitude adaptation detected using single-molecule long-read sequencing," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    19. Fengju Chen & Yiqun Zhang & Darshan S. Chandrashekar & Sooryanarayana Varambally & Chad J. Creighton, 2023. "Global impact of somatic structural variation on the cancer proteome," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    20. Mischan Vali-Pour & Solip Park & Jose Espinosa-Carrasco & Daniel Ortiz-Martínez & Ben Lehner & Fran Supek, 2022. "The impact of rare germline variants on human somatic mutation processes," Nature Communications, Nature, vol. 13(1), pages 1-21, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-43651-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.