IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v13y2022i1d10.1038_s41467-022-31007-x.html
   My bibliography  Save this article

HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values

Author

Listed:
  • Hannah Voß

    (University Medical Center Hamburg-Eppendorf (UKE))

  • Simon Schlumbohm

    (Helmut Schmidt University
    University Medical Center Hamburg-Eppendorf)

  • Philip Barwikowski

    (University Medical Center Hamburg-Eppendorf (UKE))

  • Marcus Wurlitzer

    (University Medical Center Hamburg-Eppendorf)

  • Matthias Dottermusch

    (University Medical Center Hamburg-Eppendorf
    University Medical Center Hamburg-Eppendorf)

  • Philipp Neumann

    (Helmut Schmidt University)

  • Hartmut Schlüter

    (University Medical Center Hamburg-Eppendorf (UKE))

  • Julia E. Neumann

    (University Medical Center Hamburg-Eppendorf
    University Medical Center Hamburg-Eppendorf)

  • Christoph Krisp

    (University Medical Center Hamburg-Eppendorf (UKE))

Abstract

Dataset integration is common practice to overcome limitations in statistically underpowered omics datasets. Proteome datasets display high technical variability and frequent missing values. Sophisticated strategies for batch effect reduction are lacking or rely on error-prone data imputation. Here we introduce HarmonizR, a data harmonization tool with appropriate missing value handling. The method exploits the structure of available data and matrix dissection for minimal data loss, without data imputation. This strategy implements two common batch effect reduction methods—ComBat and limma (removeBatchEffect()). The HarmonizR strategy, evaluated on four exemplarily analyzed datasets with up to 23 batches, demonstrated successful data harmonization for different tissue preservation techniques, LC-MS/MS instrumentation setups, and quantification approaches. Compared to data imputation methods, HarmonizR was more efficient and performed superior regarding the detection of significant proteins. HarmonizR is an efficient tool for missing data tolerant experimental variance reduction and is easily adjustable for individual dataset properties and user preferences.

Suggested Citation

  • Hannah Voß & Simon Schlumbohm & Philip Barwikowski & Marcus Wurlitzer & Matthias Dottermusch & Philipp Neumann & Hartmut Schlüter & Julia E. Neumann & Christoph Krisp, 2022. "HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
  • Handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-31007-x
    DOI: 10.1038/s41467-022-31007-x
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-022-31007-x
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-022-31007-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Florian Rohart & Benoît Gautier & Amrit Singh & Kim-Anh Lê Cao, 2017. "mixOmics: An R package for ‘omics feature selection and multiple data integration," PLOS Computational Biology, Public Library of Science, vol. 13(11), pages 1-19, November.
    2. Rebecca C. Poulos & Peter G. Hains & Rohan Shah & Natasha Lucas & Dylan Xavier & Srikanth S. Manda & Asim Anees & Jennifer M. S. Koh & Sadia Mahboob & Max Wittman & Steven G. Williams & Erin K. Sykes , 2020. "Strategies to enable large-scale proteomics for reproducible research," Nature Communications, Nature, vol. 11(1), pages 1-13, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Vanessa R. Marcelino & Caitlin Welsh & Christian Diener & Emily L. Gulliver & Emily L. Rutten & Remy B. Young & Edward M. Giles & Sean M. Gibbons & Chris Greening & Samuel C. Forster, 2023. "Disease-specific loss of microbial cross-feeding interactions in the human gut," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    2. Henry Webel & Lili Niu & Annelaura Bach Nielsen & Marie Locard-Paulet & Matthias Mann & Lars Juhl Jensen & Simon Rasmussen, 2024. "Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    3. Simon J. Pelletier & Mickaël Leclercq & Florence Roux-Dalvai & Matthijs B. Geus & Shannon Leslie & Weiwei Wang & TuKiet T. Lam & Angus C. Nairn & Steven E. Arnold & Becky C. Carlyle & Frédéric Precios, 2024. "BERNN: Enhancing classification of Liquid Chromatography Mass Spectrometry data with batch effect removal neural networks," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    4. Dmitry Kobak & Yves Bernaerts & Marissa A. Weis & Federico Scala & Andreas S. Tolias & Philipp Berens, 2021. "Sparse reduced‐rank regression for exploratory visualisation of paired multivariate data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 980-1000, August.
    5. Cemal Erdem & Sean M. Gross & Laura M. Heiser & Marc R. Birtwistle, 2023. "MOBILE pipeline enables identification of context-specific networks and regulatory mechanisms," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    6. Martin, Manon & Govaerts, Bernadette, 2019. "Feature Selection in metabolomics with PLS-derived methods," LIDAM Discussion Papers ISBA 2019020, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    7. Gaoxiang Zhu & Dengfeng Gao & Linzi Li & Yixuan Yao & Yingjie Wang & Minglei Zhi & Jinying Zhang & Xinze Chen & Qianqian Zhu & Jie Gao & Tianzhi Chen & Xiaowei Zhang & Tong Wang & Suying Cao & Aijin M, 2023. "Generation of three-dimensional meat-like tissue from stable pig epiblast stem cells," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    8. Tomás Clive Barker-Tejeda & Elisa Zubeldia-Varela & Andrea Macías-Camero & Lola Alonso & Isabel Adoración Martín-Antoniano & María Fernanda Rey-Stolle & Leticia Mera-Berriatua & Raphaëlle Bazire & Pau, 2024. "Comparative characterization of the infant gut microbiome and their maternal lineage by a multi-omics approach," Nature Communications, Nature, vol. 15(1), pages 1-21, December.
    9. Signe Schmidt Kjølner Hansen & Robert Krautz & Daria Rago & Jesper Havelund & Arnaud Stigliani & Nils J. Færgeman & Audrey Prézelin & Julie Rivière & Anne Couturier-Tarrade & Vyacheslav Akimov & Blago, 2024. "Pulmonary maternal immune activation does not cross the placenta but leads to fetal metabolic adaptation," Nature Communications, Nature, vol. 15(1), pages 1-24, December.
    10. Gaowen Yang & Masahiro Ryo & Julien Roy & Daniel R. Lammel & Max-Bernhard Ballhausen & Xin Jing & Xuefeng Zhu & Matthias C. Rillig, 2022. "Multiple anthropogenic pressures eliminate the effects of soil microbial diversity on ecosystem functions in experimental microcosms," Nature Communications, Nature, vol. 13(1), pages 1-8, December.
    11. Efrat Muller & Itamar Shiryan & Elhanan Borenstein, 2024. "Multi-omic integration of microbiome data for identifying disease-associated modules," Nature Communications, Nature, vol. 15(1), pages 1-13, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-31007-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.