IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v15y2024i1d10.1038_s41467-024-49094-3.html
   My bibliography  Save this article

Restricting datasets to classifiable samples augments discovery of immune disease biomarkers

Author

Listed:
  • Gunther Glehr

    (University Hospital Regensburg)

  • Paloma Riquelme

    (University Hospital Regensburg)

  • Katharina Kronenberg

    (University Hospital Regensburg)

  • Robert Lohmayer

    (Leibniz Institute for Immunotherapy)

  • Víctor J. López-Madrona

    (Inst Neurosci Syst)

  • Michael Kapinsky

    (Beckman Coulter Life Sciences GmbH)

  • Hans J. Schlitt

    (University Hospital Regensburg)

  • Edward K. Geissler

    (University Hospital Regensburg)

  • Rainer Spang

    (University of Regensburg)

  • Sebastian Haferkamp

    (University Hospital Regensburg)

  • James A. Hutchinson

    (University Hospital Regensburg)

Abstract

Immunological diseases are typically heterogeneous in clinical presentation, severity and response to therapy. Biomarkers of immune diseases often reflect this variability, especially compared to their regulated behaviour in health. This leads to a common difficulty that frustrates biomarker discovery and interpretation – namely, unequal dispersion of immune disease biomarker expression between patient classes necessarily limits a biomarker’s informative range. To solve this problem, we introduce dataset restriction, a procedure that splits datasets into classifiable and unclassifiable samples. Applied to synthetic flow cytometry data, restriction identifies biomarkers that are otherwise disregarded. In advanced melanoma, restriction finds biomarkers of immune-related adverse event risk after immunotherapy and enables us to build multivariate models that accurately predict immunotherapy-related hepatitis. Hence, dataset restriction augments discovery of immune disease biomarkers, increases predictive certainty for classifiable samples and improves multivariate models incorporating biomarkers with a limited informative range. This principle can be directly extended to any classification task.

Suggested Citation

  • Gunther Glehr & Paloma Riquelme & Katharina Kronenberg & Robert Lohmayer & Víctor J. López-Madrona & Michael Kapinsky & Hans J. Schlitt & Edward K. Geissler & Rainer Spang & Sebastian Haferkamp & Jame, 2024. "Restricting datasets to classifiable samples augments discovery of immune disease biomarkers," Nature Communications, Nature, vol. 15(1), pages 1-21, December.
  • Handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-49094-3
    DOI: 10.1038/s41467-024-49094-3
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-024-49094-3
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-024-49094-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Andrew Cron & Cécile Gouttefangeas & Jacob Frelinger & Lin Lin & Satwinder K Singh & Cedrik M Britten & Marij J P Welters & Sjoerd H van der Burg & Mike West & Cliburn Chan, 2013. "Hierarchical Modeling for Rare Event Detection and Cell Subset Alignment across Flow Cytometry Samples," PLOS Computational Biology, Public Library of Science, vol. 9(7), pages 1-14, July.
    2. Thomas Liechti & Yaser Iftikhar & Massimo Mangino & Margaret Beddall & Charles W. Goss & Jane A. O’Halloran & Philip A. Mudd & Mario Roederer, 2022. "Immune phenotypes that are associated with subsequent COVID-19 severity inferred from post-recovery samples," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    3. Phipson Belinda & Smyth Gordon K, 2010. "Permutation P-values Should Never Be Zero: Calculating Exact P-values When Permutations Are Randomly Drawn," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-16, October.
    4. Lori E. Dodd & Margaret S. Pepe, 2003. "Partial AUC Estimation and Regression," Biometrics, The International Biometric Society, vol. 59(3), pages 614-623, September.
    5. B. Rosner & R. J. Glynn, 2009. "Power and Sample Size Estimation for the Wilcoxon Rank Sum Test with Application to Comparisons of C Statistics from Alternative Prediction Models," Biometrics, The International Biometric Society, vol. 65(1), pages 188-197, March.
    6. Eirini Arvaniti & Manfred Claassen, 2017. "Sensitive detection of rare disease-associated cell subsets via representation learning," Nature Communications, Nature, vol. 8(1), pages 1-10, April.
    7. Zia Khan & Christian Hammer & Jonathan Carroll & Flavia Nucci & Sergio Ley Acosta & Vidya Maiya & Tushar Bhangale & Julie Hunkapiller & Ira Mellman & Matthew L. Albert & Mark I. McCarthy & G. Scott Ch, 2021. "Genetic variation associated with thyroid autoimmunity shapes the systemic immune response to PD-1 checkpoint blockade," Nature Communications, Nature, vol. 12(1), pages 1-12, December.
    8. Michael Conroy & Jarushka Naidoo, 2022. "Immune-related adverse events and the balancing act of immunotherapy," Nature Communications, Nature, vol. 13(1), pages 1-4, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Pooja Middha & Rohit Thummalapalli & Michael J. Betti & Lydia Yao & Zoe Quandt & Karmugi Balaratnam & Cosmin A. Bejan & Eduardo Cardenas & Christina J. Falcon & David M. Faleck & Matthew A. Gubens & S, 2024. "Polygenic risk score for ulcerative colitis predicts immune checkpoint inhibitor-mediated colitis," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    2. Margaret Sullivan Pepe & Tianxi Cai, 2004. "The Analysis of Placement Values for Evaluating Discriminatory Measures," Biometrics, The International Biometric Society, vol. 60(2), pages 528-535, June.
    3. Holly Janes & Gary Longton & Margaret S. Pepe, 2009. "Accommodating covariates in receiver operating characteristic analysis," Stata Journal, StataCorp LP, vol. 9(1), pages 17-39, March.
    4. Saedis Saevarsdottir & Kristbjörg Bjarnadottir & Thorsteinn Markusson & Jonas Berglund & Thorunn A. Olafsdottir & Gisli H. Halldorsson & Gudrun Rutsdottir & Kristbjorg Gunnarsdottir & Asgeir Orn Arnth, 2024. "Start codon variant in LAG3 is associated with decreased LAG-3 expression and increased risk of autoimmune thyroid disease," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    5. Margaret S. Pepe & Gary Longton & Holly Janes, 2009. "Estimation and comparison of receiver operating characteristic curves," Stata Journal, StataCorp LP, vol. 9(1), pages 1-16, March.
    6. Mei-Cheng Wang & Shanshan Li, 2012. "Bivariate Marker Measurements and ROC Analysis," Biometrics, The International Biometric Society, vol. 68(4), pages 1207-1218, December.
    7. Merve Basol & Dincer Goksuluk & Ergun Karaagaoglu, 2023. "Comparing the diagnostic performance of methods used in a full-factorial design multi-reader multi-case studies," Computational Statistics, Springer, vol. 38(3), pages 1537-1553, September.
    8. Jialiang Li & Jason P. Fine, 2010. "Weighted area under the receiver operating characteristic curve and its application to gene selection," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 59(4), pages 673-692, August.
    9. Ángel Beade & Manuel Rodríguez & José Santos, 2024. "Business failure prediction models with high and stable predictive power over time using genetic programming," Operational Research, Springer, vol. 24(3), pages 1-41, September.
    10. Tianxi Cai & Yingye Zheng, 2007. "Model Checking for ROC Regression Analysis," Biometrics, The International Biometric Society, vol. 63(1), pages 152-163, March.
    11. Chaturvedi Nimisha & Menezes Renée X. de & Goeman Jelle J. & Wieringen Wessel van, 2018. "A test for detecting differential indirect trans effects between two groups of samples," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 17(5), pages 1-11, October.
    12. Romero, Julian & Rosokha, Yaroslav, 2018. "Constructing strategies in the indefinitely repeated prisoner’s dilemma game," European Economic Review, Elsevier, vol. 104(C), pages 185-219.
    13. Jolene S. Ranek & Wayne Stallaert & J. Justin Milner & Margaret Redick & Samuel C. Wolff & Adriana S. Beltran & Natalie Stanley & Jeremy E. Purvis, 2024. "DELVE: feature selection for preserving biological trajectories in single-cell data," Nature Communications, Nature, vol. 15(1), pages 1-26, December.
    14. Yuan Qi & Youhan Fang & David R Sinclair & Shangqin Guo & Meritxell Alberich-Jorda & Jun Lu & Daniel G Tenen & Michael G Kharas & Saumyadipta Pyne, 2020. "High-speed automatic characterization of rare events in flow cytometric data," PLOS ONE, Public Library of Science, vol. 15(2), pages 1-18, February.
    15. Greg Finak & Jacob Frelinger & Wenxin Jiang & Evan W Newell & John Ramey & Mark M Davis & Spyros A Kalams & Stephen C De Rosa & Raphael Gottardo, 2014. "OpenCyto: An Open Source Infrastructure for Scalable, Robust, Reproducible, and Automated, End-to-End Flow Cytometry Data Analysis," PLOS Computational Biology, Public Library of Science, vol. 10(8), pages 1-12, August.
    16. Sergio Picart-Armada & Steven J Barrett & David R Willé & Alexandre Perera-Lluna & Alex Gutteridge & Benoit H Dessailly, 2019. "Benchmarking network propagation methods for disease gene identification," PLOS Computational Biology, Public Library of Science, vol. 15(9), pages 1-24, September.
    17. Kristina Handler & Karsten Bach & Costanza Borrelli & Salvatore Piscuoglio & Xenia Ficht & Ilhan E. Acar & Andreas E. Moor, 2023. "Fragment-sequencing unveils local tissue microenvironments at single-cell resolution," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    18. Xiao Song & Xiao-Hua Zhou, 2004. "A Marginal Model Approach for Analysis of Multi-reader Multi-test Receiver Operating Characteristic (ROC) Data," UW Biostatistics Working Paper Series 1067, Berkeley Electronic Press.
    19. Lovato, Ilenia & Pini, Alessia & Stamm, Aymeric & Vantini, Simone, 2020. "Model-free two-sample test for network-valued data," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    20. Jesse Hemerik & Jelle Goeman, 2018. "Exact testing with random permutations," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 27(4), pages 811-825, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-49094-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.