IDEAS home Printed from https://ideas.repec.org/a/nat/nathum/v8y2024i8d10.1038_s41562-024-01909-5.html
   My bibliography  Save this article

Principled distillation of UK Biobank phenotype data reveals underlying structure in human variation

Author

Listed:
  • Caitlin E. Carey

    (Broad Institute of MIT and Harvard
    Massachusetts General Hospital
    Massachusetts General Hospital)

  • Rebecca Shafee

    (Broad Institute of MIT and Harvard
    Harvard Medical School
    National Institute of Mental Health)

  • Robbee Wedow

    (Broad Institute of MIT and Harvard
    Massachusetts General Hospital
    Purdue University
    Indiana University School of Medicine)

  • Amanda Elliott

    (Broad Institute of MIT and Harvard
    Massachusetts General Hospital
    Harvard Medical School)

  • Duncan S. Palmer

    (Broad Institute of MIT and Harvard
    Massachusetts General Hospital
    Broad Institute of MIT and Harvard
    Medical Sciences Division University of Oxford)

  • John Compitello

    (Broad Institute of MIT and Harvard
    Massachusetts General Hospital
    Broad Institute of MIT and Harvard)

  • Masahiro Kanai

    (Broad Institute of MIT and Harvard
    Massachusetts General Hospital
    Broad Institute of MIT and Harvard)

  • Liam Abbott

    (Broad Institute of MIT and Harvard
    Massachusetts General Hospital)

  • Patrick Schultz

    (Broad Institute of MIT and Harvard
    Massachusetts General Hospital
    Broad Institute of MIT and Harvard)

  • Konrad J. Karczewski

    (Broad Institute of MIT and Harvard
    Massachusetts General Hospital)

  • Samuel C. Bryant

    (Broad Institute of MIT and Harvard
    Massachusetts General Hospital)

  • Caroline M. Cusick

    (Broad Institute of MIT and Harvard)

  • Claire Churchhouse

    (Broad Institute of MIT and Harvard
    Massachusetts General Hospital
    Broad Institute of MIT and Harvard)

  • Daniel P. Howrigan

    (Broad Institute of MIT and Harvard
    Massachusetts General Hospital)

  • Daniel King

    (Broad Institute of MIT and Harvard
    Massachusetts General Hospital
    Broad Institute of MIT and Harvard)

  • George Davey Smith

    (Broad Institute of MIT and Harvard
    University of Bristol, Oakfield House
    University of Bristol)

  • Benjamin M. Neale

    (Broad Institute of MIT and Harvard
    Massachusetts General Hospital
    Massachusetts General Hospital
    Broad Institute of MIT and Harvard)

  • Raymond K. Walters

    (Broad Institute of MIT and Harvard
    Massachusetts General Hospital
    Harvard Medical School)

  • Elise B. Robinson

    (Broad Institute of MIT and Harvard
    Massachusetts General Hospital
    Massachusetts General Hospital)

Abstract

Data within biobanks capture broad yet detailed indices of human variation, but biobank-wide insights can be difficult to extract due to complexity and scale. Here, using large-scale factor analysis, we distill hundreds of variables (diagnoses, assessments and survey items) into 35 latent constructs, using data from unrelated individuals with predominantly estimated European genetic ancestry in UK Biobank. These factors recapitulate known disease classifications, disentangle elements of socioeconomic status, highlight the relevance of psychiatric constructs to health and improve measurement of pro-health behaviours. We go on to demonstrate the power of this approach to clarify genetic signal, enhance discovery and identify associations between underlying phenotypic structure and health outcomes. In building a deeper understanding of ways in which constructs such as socioeconomic status, trauma, or physical activity are structured in the dataset, we emphasize the importance of considering the interwoven nature of the human phenome when evaluating public health patterns.

Suggested Citation

  • Caitlin E. Carey & Rebecca Shafee & Robbee Wedow & Amanda Elliott & Duncan S. Palmer & John Compitello & Masahiro Kanai & Liam Abbott & Patrick Schultz & Konrad J. Karczewski & Samuel C. Bryant & Caro, 2024. "Principled distillation of UK Biobank phenotype data reveals underlying structure in human variation," Nature Human Behaviour, Nature, vol. 8(8), pages 1599-1615, August.
  • Handle: RePEc:nat:nathum:v:8:y:2024:i:8:d:10.1038_s41562-024-01909-5
    DOI: 10.1038/s41562-024-01909-5
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41562-024-01909-5
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1038/s41562-024-01909-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
    2. Gwenaëlle Douaud & Soojin Lee & Fidel Alfaro-Almagro & Christoph Arthofer & Chaoyue Wang & Paul McCarthy & Frederik Lange & Jesper L. R. Andersson & Ludovica Griffanti & Eugene Duff & Saad Jbabdi & Be, 2022. "SARS-CoV-2 is associated with changes in brain structure in UK Biobank," Nature, Nature, vol. 604(7907), pages 697-707, April.
    3. Clare Bycroft & Colin Freeman & Desislava Petkova & Gavin Band & Lloyd T. Elliott & Kevin Sharp & Allan Motyer & Damjan Vukcevic & Olivier Delaneau & Jared O’Connell & Adrian Cortes & Samantha Welsh &, 2018. "The UK Biobank resource with deep phenotyping and genomic data," Nature, Nature, vol. 562(7726), pages 203-209, October.
    4. Dmitry Kobak & Philipp Berens, 2019. "The art of using t-SNE for single-cell transcriptomics," Nature Communications, Nature, vol. 10(1), pages 1-14, December.
    5. Yi Han & Qiong Jia & Pedram Shafiei Jahani & Benjamin P. Hurrell & Calvin Pan & Pin Huang & Janet Gukasyan & Nicholas C. Woodward & Eleazar Eskin & Frank D. Gilliland & Omid Akbari & Jaana A. Hartiala, 2020. "Genome-wide analysis highlights contribution of immune system pathways to the genetic architecture of asthma," Nature Communications, Nature, vol. 11(1), pages 1-13, December.
    6. Adam E. Locke & Bratati Kahali & Sonja I. Berndt & Anne E. Justice & Tune H. Pers & Felix R. Day & Corey Powell & Sailaja Vedantam & Martin L. Buchkovich & Jian Yang & Damien C. Croteau-Chonka & Tonu , 2015. "Genetic studies of body mass index yield new insights for obesity biology," Nature, Nature, vol. 518(7538), pages 197-206, February.
    7. Caroline M. Nievergelt & Adam X. Maihofer & Torsten Klengel & Elizabeth G. Atkinson & Chia-Yen Chen & Karmel W. Choi & Jonathan R. I. Coleman & Shareefa Dalvie & Laramie E. Duncan & Joel Gelernter & D, 2019. "International meta-analysis of PTSD genome-wide association studies identifies sex- and ancestry-specific genetic risk loci," Nature Communications, Nature, vol. 10(1), pages 1-16, December.
    8. W. David Hill & Neil M. Davies & Stuart J. Ritchie & Nathan G. Skene & Julien Bryois & Steven Bell & Emanuele Di Angelantonio & David J. Roberts & Shen Xueyi & Gail Davies & David C. M. Liewald & Davi, 2019. "Genome-wide analysis identifies molecular systems and 149 genetic loci associated with income," Nature Communications, Nature, vol. 10(1), pages 1-16, December.
    9. Andrew D. Grotzinger & Mijke Rhemtulla & Ronald Vlaming & Stuart J. Ritchie & Travis T. Mallard & W. David Hill & Hill F. Ip & Riccardo E. Marioni & Andrew M. McIntosh & Ian J. Deary & Philipp D. Koel, 2019. "Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits," Nature Human Behaviour, Nature, vol. 3(5), pages 513-525, May.
    10. Joanne B. Cole & Jose C. Florez & Joel N. Hirschhorn, 2020. "Comprehensive genomic analysis of dietary habits in UK Biobank identifies hundreds of genetic associations," Nature Communications, Nature, vol. 11(1), pages 1-11, December.
    11. Nick Patterson & Alkes L Price & David Reich, 2006. "Population Structure and Eigenanalysis," PLOS Genetics, Public Library of Science, vol. 2(12), pages 1-20, December.
    12. Karren Dai Yang & Anastasiya Belyaeva & Saradha Venkatachalapathy & Karthik Damodaran & Abigail Katcoff & Adityanarayanan Radhakrishnan & G. V. Shivashankar & Caroline Uhler, 2021. "Multi-domain translation between single-cell imaging and sequencing data using autoencoders," Nature Communications, Nature, vol. 12(1), pages 1-10, December.
    13. Jessica Tyrrell & Jie Zheng & Robin Beaumont & Kathryn Hinton & Tom G. Richardson & Andrew R. Wood & George Davey Smith & Timothy M. Frayling & Kate Tilling, 2021. "Genetic predictors of participation in optional components of UK Biobank," Nature Communications, Nature, vol. 12(1), pages 1-13, December.
    14. Mitja I. Kurki & Juha Karjalainen & Priit Palta & Timo P. Sipilä & Kati Kristiansson & Kati M. Donner & Mary P. Reeve & Hannele Laivuori & Mervi Aavikko & Mari A. Kaunisto & Anu Loukola & Elisa Lahtel, 2023. "Author Correction: FinnGen provides genetic insights from a well-phenotyped isolated population," Nature, Nature, vol. 615(7952), pages 19-19, March.
    15. Kyoko Watanabe & Erdogan Taskesen & Arjen Bochoven & Danielle Posthuma, 2017. "Functional mapping and annotation of genetic associations with FUMA," Nature Communications, Nature, vol. 8(1), pages 1-11, December.
    16. Rosseel, Yves, 2012. "lavaan: An R Package for Structural Equation Modeling," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 48(i02).
    17. Gianmarco Mignogna & Caitlin E. Carey & Robbee Wedow & Nikolas Baya & Mattia Cordioli & Nicola Pirastu & Rino Bellocco & Kathryn Fiuza Malerbi & Michel G. Nivard & Benjamin M. Neale & Raymond K. Walte, 2023. "Patterns of item nonresponse behaviour to survey questionnaires are systematic and associated with genetic loci," Nature Human Behaviour, Nature, vol. 7(8), pages 1371-1387, August.
    18. Yosuke Tanigawa & Jiehan Li & Johanne M. Justesen & Heiko Horn & Matthew Aguirre & Christopher DeBoever & Chris Chang & Balasubramanian Narasimhan & Kasper Lage & Trevor Hastie & Chong Y. Park & Gill , 2019. "Components of genetic associations across 2,138 phenotypes in the UK Biobank highlight adipocyte biology," Nature Communications, Nature, vol. 10(1), pages 1-14, December.
    19. Anshul Kundaje & Wouter Meuleman & Jason Ernst & Misha Bilenky & Angela Yen & Alireza Heravi-Moussavi & Pouya Kheradpour & Zhizhuo Zhang & Jianrong Wang & Michael J. Ziller & Viren Amin & John W. Whit, 2015. "Integrative analysis of 111 reference human epigenomes," Nature, Nature, vol. 518(7539), pages 317-330, February.
    20. Vassily Trubetskoy & Antonio F. Pardiñas & Ting Qi & Georgia Panagiotaropoulou & Swapnil Awasthi & Tim B. Bigdeli & Julien Bryois & Chia-Yen Chen & Charlotte A. Dennison & Lynsey S. Hall & Max Lam & K, 2022. "Mapping genomic loci implicates genes and synaptic biology in schizophrenia," Nature, Nature, vol. 604(7906), pages 502-508, April.
    21. Mitja I. Kurki & Juha Karjalainen & Priit Palta & Timo P. Sipilä & Kati Kristiansson & Kati M. Donner & Mary P. Reeve & Hannele Laivuori & Mervi Aavikko & Mari A. Kaunisto & Anu Loukola & Elisa Lahtel, 2023. "FinnGen provides genetic insights from a well-phenotyped isolated population," Nature, Nature, vol. 613(7944), pages 508-518, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gianmarco Mignogna & Caitlin E. Carey & Robbee Wedow & Nikolas Baya & Mattia Cordioli & Nicola Pirastu & Rino Bellocco & Kathryn Fiuza Malerbi & Michel G. Nivard & Benjamin M. Neale & Raymond K. Walte, 2023. "Patterns of item nonresponse behaviour to survey questionnaires are systematic and associated with genetic loci," Nature Human Behaviour, Nature, vol. 7(8), pages 1371-1387, August.
    2. Jordi Manuello & Joosung Min & Paul McCarthy & Fidel Alfaro-Almagro & Soojin Lee & Stephen Smith & Lloyd T. Elliott & Anderson M. Winkler & Gwenaëlle Douaud, 2024. "The effects of genetic and modifiable risk factors on brain regions vulnerable to ageing and disease," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    3. Ruoyu Tian & Tian Ge & Hyeokmoon Kweon & Daniel B. Rocha & Max Lam & Jimmy Z. Liu & Kritika Singh & Daniel F. Levey & Joel Gelernter & Murray B. Stein & Ellen A. Tsai & Hailiang Huang & Christopher F., 2024. "Whole-exome sequencing in UK Biobank reveals rare genetic architecture for depression," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    4. Bingxin Zhao & Yujue Li & Zirui Fan & Zhenyi Wu & Juan Shu & Xiaochen Yang & Yilin Yang & Xifeng Wang & Bingxuan Li & Xiyao Wang & Carlos Copana & Yue Yang & Jinjie Lin & Yun Li & Jason L. Stein & Joa, 2024. "Eye-brain connections revealed by multimodal retinal and brain imaging genetics," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    5. Mattia Marchi & Anne Alkema & Charley Xia & Chris H. L. Thio & Li-Yu Chen & Winni Schalkwijk & Gian M. Galeazzi & Silvia Ferrari & Luca Pingani & Hyeokmoon Kweon & Sara Evans-Lacko & W. David Hill & M, 2024. "Investigating the impact of poverty on mental illness in the UK Biobank using Mendelian randomization," Nature Human Behaviour, Nature, vol. 8(9), pages 1771-1783, September.
    6. Xiao-Yu He & Bang-Sheng Wu & Liu Yang & Yu Guo & Yue-Ting Deng & Ze-Yu Li & Chen-Jie Fei & Wei-Shi Liu & Yi-Jun Ge & Jujiao Kang & Jianfeng Feng & Wei Cheng & Qiang Dong & Jin-Tai Yu, 2024. "Genetic associations of protein-coding variants in venous thromboembolism," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    7. Max Lam & Chia-Yen Chen & W. David Hill & Charley Xia & Ruoyu Tian & Daniel F. Levey & Joel Gelernter & Murray B. Stein & Alexander S. Hatoum & Hailiang Huang & Anil K. Malhotra & Heiko Runz & Tian Ge, 2022. "Collective genomic segments with differential pleiotropic patterns between cognitive dimensions and psychopathology," Nature Communications, Nature, vol. 13(1), pages 1-22, December.
    8. Shahram Bahrami & Kaja Nordengen & Jaroslav Rokicki & Alexey A. Shadrin & Zillur Rahman & Olav B. Smeland & Piotr P. Jaholkowski & Nadine Parker & Pravesh Parekh & Kevin S. O’Connell & Torbjørn Elvsås, 2024. "The genetic landscape of basal ganglia and implications for common brain disorders," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    9. Xingjie Hao & Zhonghe Shao & Ning Zhang & Minghui Jiang & Xi Cao & Si Li & Yunlong Guan & Chaolong Wang, 2023. "Integrative genome-wide analyses identify novel loci associated with kidney stones and provide insights into its genetic architecture," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    10. Wendiao Zhang & Ming Zhang & Zhenhong Xu & Hongye Yan & Huimin Wang & Jiamei Jiang & Juan Wan & Beisha Tang & Chunyu Liu & Chao Chen & Qingtuan Meng, 2023. "Human forebrain organoid-based multi-omics analyses of PCCB as a schizophrenia associated gene linked to GABAergic pathways," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    11. Peter Zhukovsky & Earvin S. Tio & Gillian Coughlan & David A. Bennett & Yanling Wang & Timothy J. Hohman & Diego A. Pizzagalli & Benoit H. Mulsant & Aristotle N. Voineskos & Daniel Felsky, 2024. "Genetic influences on brain and cognitive health and their interactions with cardiovascular conditions and depression," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    12. Dmitrii Usoltsev & Nikita Kolosov & Oxana Rotar & Alexander Loboda & Maria Boyarinova & Ekaterina Moguchaya & Ekaterina Kolesova & Anastasia Erina & Kristina Tolkunova & Valeriia Rezapova & Ivan Molot, 2024. "Complex trait susceptibilities and population diversity in a sample of 4,145 Russians," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    13. Guanghao Qi & Surya B. Chhetri & Debashree Ray & Diptavo Dutta & Alexis Battle & Samsiddhi Bhattacharjee & Nilanjan Chatterjee, 2024. "Genome-wide large-scale multi-trait analysis characterizes global patterns of pleiotropy and unique trait-specific variants," Nature Communications, Nature, vol. 15(1), pages 1-18, December.
    14. Jakub Kopal & Kuldeep Kumar & Kimia Shafighi & Karin Saltoun & Claudia Modenato & Clara A. Moreau & Guillaume Huguet & Martineau Jean-Louis & Charles-Olivier Martin & Zohra Saci & Nadine Younis & Elis, 2024. "Using rare genetic mutations to revisit structural brain asymmetry," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    15. William R. Reay & Dylan J. Kiltschewskij & Maria A. Biase & Zachary F. Gerring & Kousik Kundu & Praveen Surendran & Laura A. Greco & Erin D. Clarke & Clare E. Collins & Alison M. Mondul & Demetrius Al, 2024. "Genetic influences on circulating retinol and its relationship to human health," Nature Communications, Nature, vol. 15(1), pages 1-20, December.
    16. Linda Ottensmann & Rubina Tabassum & Sanni E. Ruotsalainen & Mathias J. Gerl & Christian Klose & Elisabeth Widén & Kai Simons & Samuli Ripatti & Matti Pirinen, 2023. "Genome-wide association analysis of plasma lipidome identifies 495 genetic associations," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    17. Charley Xia & Sarah J. Pickett & David C. M. Liewald & Alexander Weiss & Gavin Hudson & W. David Hill, 2023. "The contributions of mitochondrial and nuclear mitochondrial genetic variation to neuroticism," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    18. Chamlee Cho & Beomsu Kim & Dan Say Kim & Mi Yeong Hwang & Injeong Shim & Minku Song & Yeong Chan Lee & Sang-Hyuk Jung & Sung Kweon Cho & Woong-Yang Park & Woojae Myung & Bong-Jo Kim & Ron Do & Hyon K., 2024. "Large-scale cross-ancestry genome-wide meta-analysis of serum urate," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    19. Andrew D. Grotzinger & Travis T. Mallard & Zhaowen Liu & Jakob Seidlitz & Tian Ge & Jordan W. Smoller, 2023. "Multivariate genomic architecture of cortical thickness and surface area at multiple levels of analysis," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    20. Andrew D. Grotzinger & Javier de la Fuente & Gail Davies & Michel G. Nivard & Elliot M. Tucker-Drob, 2022. "Transcriptome-wide and stratified genomic structural equation modeling identify neurobiological pathways shared across diverse cognitive traits," Nature Communications, Nature, vol. 13(1), pages 1-15, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:nathum:v:8:y:2024:i:8:d:10.1038_s41562-024-01909-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.