IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v12y2021i1d10.1038_s41467-021-25210-5.html
   My bibliography  Save this article

A hierarchical approach to removal of unwanted variation for large-scale metabolomics data

Author

Listed:
  • Taiyun Kim

    (The University of Sydney
    The University of Sydney
    Children’s Medical Research Institute)

  • Owen Tang

    (The University of Sydney
    Royal North Shore Hospital
    The University of Sydney
    The University of Sydney)

  • Stephen T. Vernon

    (The University of Sydney
    Royal North Shore Hospital
    The University of Sydney
    The University of Sydney)

  • Katharine A. Kott

    (The University of Sydney
    Royal North Shore Hospital
    The University of Sydney
    The University of Sydney)

  • Yen Chin Koay

    (The University of Sydney
    The University of Sydney
    Heart Research Institute)

  • John Park

    (The University of Sydney
    Royal North Shore Hospital
    The University of Sydney
    The University of Sydney)

  • David E. James

    (The University of Sydney
    The University of Sydney
    University of Sydney)

  • Stuart M. Grieve

    (The University of Sydney
    University of Sydney
    Royal Prince Alfred Hospital)

  • Terence P. Speed

    (Walter Eliza Hall Institute
    University of Melbourne)

  • Pengyi Yang

    (The University of Sydney
    The University of Sydney
    Children’s Medical Research Institute
    The University of Sydney)

  • Gemma A. Figtree

    (The University of Sydney
    Royal North Shore Hospital
    The University of Sydney
    The University of Sydney)

  • John F. O’Sullivan

    (The University of Sydney
    The University of Sydney
    Heart Research Institute
    Royal Prince Alfred Hospital)

  • Jean Yee Hwa Yang

    (The University of Sydney
    The University of Sydney)

Abstract

Liquid chromatography-mass spectrometry-based metabolomics studies are increasingly applied to large population cohorts, which run for several weeks or even years in data acquisition. This inevitably introduces unwanted intra- and inter-batch variations over time that can overshadow true biological signals and thus hinder potential biological discoveries. To date, normalisation approaches have struggled to mitigate the variability introduced by technical factors whilst preserving biological variance, especially for protracted acquisitions. Here, we propose a study design framework with an arrangement for embedding biological sample replicates to quantify variance within and between batches and a workflow that uses these replicates to remove unwanted variation in a hierarchical manner (hRUV). We use this design to produce a dataset of more than 1000 human plasma samples run over an extended period of time. We demonstrate significant improvement of hRUV over existing methods in preserving biological signals whilst removing unwanted variation for large scale metabolomics studies. Our tools not only provide a strategy for large scale data normalisation, but also provides guidance on the design strategy for large omics studies.

Suggested Citation

  • Taiyun Kim & Owen Tang & Stephen T. Vernon & Katharine A. Kott & Yen Chin Koay & John Park & David E. James & Stuart M. Grieve & Terence P. Speed & Pengyi Yang & Gemma A. Figtree & John F. O’Sullivan , 2021. "A hierarchical approach to removal of unwanted variation for large-scale metabolomics data," Nature Communications, Nature, vol. 12(1), pages 1-10, December.
  • Handle: RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-021-25210-5
    DOI: 10.1038/s41467-021-25210-5
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-021-25210-5
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-021-25210-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yongjie Deng & Yao Yao & Yanni Wang & Tiantian Yu & Wenhao Cai & Dingli Zhou & Feng Yin & Wanli Liu & Yuying Liu & Chuanbo Xie & Jian Guan & Yumin Hu & Peng Huang & Weizhong Li, 2024. "An end-to-end deep learning method for mass spectrometry data analysis to reveal disease-specific metabolic profiles," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    2. Yingxin Lin & Yue Cao & Elijah Willie & Ellis Patrick & Jean Y. H. Yang, 2023. "Atlas-scale single-cell multi-sample multi-condition data integration using scMerge2," Nature Communications, Nature, vol. 14(1), pages 1-13, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-021-25210-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.