IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0214311.html
   My bibliography  Save this article

ukbtools: An R package to manage and query UK Biobank data

Author

Listed:
  • Ken B Hanscombe
  • Jonathan R I Coleman
  • Matthew Traylor
  • Cathryn M Lewis

Abstract

Introduction: The UK Biobank (UKB) is a resource that includes detailed health-related data on about 500,000 individuals and is available to the research community. However, several obstacles limit immediate analysis of the data: data files vary in format, may be very large, and have numerical codes for column names. Results: ukbtools removes all the upfront data wrangling required to get a single dataset for statistical analysis. All associated data files are merged into a single dataset with descriptive column names. The package also provides tools to assist in quality control by exploring the primary demographics of subsets of participants; query of disease diagnoses for one or more individuals, and estimating disease frequency relative to a reference variable; and to retrieve genetic metadata. Conclusion: Having a dataset with meaningful variable names, a set of UKB-specific exploratory data analysis tools, disease query functions, and a set of helper functions to explore and write genetic metadata to file, will rapidly enable UKB users to undertake their research.

Suggested Citation

  • Ken B Hanscombe & Jonathan R I Coleman & Matthew Traylor & Cathryn M Lewis, 2019. "ukbtools: An R package to manage and query UK Biobank data," PLOS ONE, Public Library of Science, vol. 14(5), pages 1-6, May.
  • Handle: RePEc:plo:pone00:0214311
    DOI: 10.1371/journal.pone.0214311
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0214311
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0214311&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0214311?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Clare Bycroft & Colin Freeman & Desislava Petkova & Gavin Band & Lloyd T. Elliott & Kevin Sharp & Allan Motyer & Damjan Vukcevic & Olivier Delaneau & Jared O’Connell & Adrian Cortes & Samantha Welsh &, 2018. "The UK Biobank resource with deep phenotyping and genomic data," Nature, Nature, vol. 562(7726), pages 203-209, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Clark, Stephen & Birkin, Mark & Lomax, Nik & Morris, Michelle, 2020. "Developing a whole systems obesity classification for the UK Biobank Cohort," OSF Preprints 7nqgd, Center for Open Science.
    2. Daniel E. Vosberg & Igor Jurisica & Zdenka Pausova & Tomáš Paus, 2024. "Intrauterine growth and the tangential expansion of the human cerebral cortex in times of food scarcity and abundance," Nature Communications, Nature, vol. 15(1), pages 1-9, December.
    3. Remo Monti & Pia Rautenstrauch & Mahsa Ghanbari & Alva Rani James & Matthias Kirchler & Uwe Ohler & Stefan Konigorski & Christoph Lippert, 2022. "Identifying interpretable gene-biomarker associations with functionally informed kernel-based tests in 190,000 exomes," Nature Communications, Nature, vol. 13(1), pages 1-16, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Matteo Di Scipio & Mohammad Khan & Shihong Mao & Michael Chong & Conor Judge & Nazia Pathan & Nicolas Perrot & Walter Nelson & Ricky Lali & Shuang Di & Robert Morton & Jeremy Petch & Guillaume Paré, 2023. "A versatile, fast and unbiased method for estimation of gene-by-environment interaction effects on biobank-scale datasets," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    2. Jacob Joseph & Chang Liu & Qin Hui & Krishna Aragam & Zeyuan Wang & Brian Charest & Jennifer E. Huffman & Jacob M. Keaton & Todd L. Edwards & Serkalem Demissie & Luc Djousse & Juan P. Casas & J. Micha, 2022. "Genetic architecture of heart failure with preserved versus reduced ejection fraction," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    3. Vincent Michaud & Eulalie Lasseaux & David J. Green & Dave T. Gerrard & Claudio Plaisant & Tomas Fitzgerald & Ewan Birney & Benoît Arveiler & Graeme C. Black & Panagiotis I. Sergouniotis, 2022. "The contribution of common regulatory and protein-coding TYR variants to the genetic architecture of albinism," Nature Communications, Nature, vol. 13(1), pages 1-8, December.
    4. Dick Schijven & Sourena Soheili-Nezhad & Simon E. Fisher & Clyde Francks, 2024. "Exome-wide analysis implicates rare protein-altering variants in human handedness," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    5. Lili Liu & Atlas Khan & Elena Sanchez-Rodriguez & Francesca Zanoni & Yifu Li & Nicholas Steers & Olivia Balderes & Junying Zhang & Priya Krithivasan & Robert A. LeDesma & Clara Fischman & Scott J. Heb, 2022. "Genetic regulation of serum IgA levels and susceptibility to common immune, infectious, kidney, and cardio-metabolic traits," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    6. Sylvia Hartmann & Summaira Yasmeen & Benjamin M. Jacobs & Spiros Denaxas & Munir Pirmohamed & Eric R. Gamazon & Mark J. Caulfield & Harry Hemingway & Maik Pietzner & Claudia Langenberg, 2023. "ADRA2A and IRX1 are putative risk genes for Raynaud’s phenomenon," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    7. Mit Shah & Marco H. A. Inácio & Chang Lu & Pierre-Raphaël Schiratti & Sean L. Zheng & Adam Clement & Antonio Marvao & Wenjia Bai & Andrew P. King & James S. Ware & Martin R. Wilkins & Johanna Mielke &, 2023. "Environmental and genetic predictors of human cardiovascular ageing," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    8. Mathias Seviiri & Matthew H. Law & Jue-Sheng Ong & Puya Gharahkhani & Pierre Fontanillas & Catherine M. Olsen & David C. Whiteman & Stuart MacGregor, 2022. "A multi-phenotype analysis reveals 19 susceptibility loci for basal cell carcinoma and 15 for squamous cell carcinoma," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    9. Zhening Liu & Hangkai Huang & Jiarong Xie & Yingying Xu & Chengfu Xu, 2024. "Circulating fatty acids and risk of hepatocellular carcinoma and chronic liver disease mortality in the UK Biobank," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    10. Junqing Xie & Shuo Feng & Xintong Li & Ester Gea-Mallorquí & Albert Prats-Uribe & Dani Prieto-Alhambra, 2022. "Comparative effectiveness of the BNT162b2 and ChAdOx1 vaccines against Covid-19 in people over 50," Nature Communications, Nature, vol. 13(1), pages 1-8, December.
    11. Erik Schoenmakers & Federica Marelli & Helle F. Jørgensen & W. Edward Visser & Carla Moran & Stefan Groeneweg & Carolina Avalos & Sean J. Jurgens & Nichola Figg & Alison Finigan & Neha Wali & Maura Ag, 2023. "Selenoprotein deficiency disorder predisposes to aortic aneurysm formation," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    12. Harry D Green & Alistair Jones & Jonathan P Evans & Andrew R Wood & Robin N Beaumont & Jessica Tyrrell & Timothy M Frayling & Christopher Smith & Michael N Weedon, 2021. "A genome-wide association study identifies 5 loci associated with frozen shoulder and implicates diabetes as a causal risk factor," PLOS Genetics, Public Library of Science, vol. 17(6), pages 1-13, June.
    13. Zhen Qiao & Julia Sidorenko & Joana A. Revez & Angli Xue & Xueling Lu & Katri Pärna & Harold Snieder & Peter M. Visscher & Naomi R. Wray & Loic Yengo, 2023. "Estimation and implications of the genetic architecture of fasting and non-fasting blood glucose," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    14. Xiaoyi Raymond Gao & Marion Chiariglione & Alexander J. Arch, 2022. "Whole-exome sequencing study identifies rare variants and genes associated with intraocular pressure and glaucoma," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    15. Romain Fournier & Zoi Tsangalidou & David Reich & Pier Francesco Palamara, 2023. "Haplotype-based inference of recent effective population size in modern and ancient DNA samples," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    16. Nicole Deflaux & Margaret Sunitha Selvaraj & Henry Robert Condon & Kelsey Mayo & Sara Haidermota & Melissa A. Basford & Chris Lunt & Anthony A. Philippakis & Dan M. Roden & Joshua C. Denny & Anjene Mu, 2023. "Demonstrating paths for unlocking the value of cloud genomics through cross cohort analysis," Nature Communications, Nature, vol. 14(1), pages 1-10, December.
    17. George B. Busby & Scott Kulm & Alessandro Bolli & Jen Kintzle & Paolo Di Domenico & Giordano Bottà, 2023. "Ancestry-specific polygenic risk scores are risk enhancers for clinical cardiovascular disease assessments," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    18. Dixon, Padraig & Hollingworth, William & Harrison, Sean & Davies, Neil M. & Davey Smith, George, 2020. "Mendelian Randomization analysis of the causal effect of adiposity on hospital costs," Journal of Health Economics, Elsevier, vol. 70(C).
    19. van den Berg, Gerard J. & von Hinke, Stephanie & Wang, R. Adele H., 2022. "Prenatal Sugar Consumption and Late-Life Human Capital and Health: Analyses Based on Postwar Rationing and Polygenic Scores," IZA Discussion Papers 15544, Institute of Labor Economics (IZA).
    20. Jordi Manuello & Joosung Min & Paul McCarthy & Fidel Alfaro-Almagro & Soojin Lee & Stephen Smith & Lloyd T. Elliott & Anderson M. Winkler & Gwenaëlle Douaud, 2024. "The effects of genetic and modifiable risk factors on brain regions vulnerable to ageing and disease," Nature Communications, Nature, vol. 15(1), pages 1-11, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0214311. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.