IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v15y2024i1d10.1038_s41467-024-49094-3.html
   My bibliography  Save this article

Restricting datasets to classifiable samples augments discovery of immune disease biomarkers

Author

Listed:
  • Gunther Glehr

    (University Hospital Regensburg)

  • Paloma Riquelme

    (University Hospital Regensburg)

  • Katharina Kronenberg

    (University Hospital Regensburg)

  • Robert Lohmayer

    (Leibniz Institute for Immunotherapy)

  • Víctor J. López-Madrona

    (Inst Neurosci Syst)

  • Michael Kapinsky

    (Beckman Coulter Life Sciences GmbH)

  • Hans J. Schlitt

    (University Hospital Regensburg)

  • Edward K. Geissler

    (University Hospital Regensburg)

  • Rainer Spang

    (University of Regensburg)

  • Sebastian Haferkamp

    (University Hospital Regensburg)

  • James A. Hutchinson

    (University Hospital Regensburg)

Abstract

Immunological diseases are typically heterogeneous in clinical presentation, severity and response to therapy. Biomarkers of immune diseases often reflect this variability, especially compared to their regulated behaviour in health. This leads to a common difficulty that frustrates biomarker discovery and interpretation – namely, unequal dispersion of immune disease biomarker expression between patient classes necessarily limits a biomarker’s informative range. To solve this problem, we introduce dataset restriction, a procedure that splits datasets into classifiable and unclassifiable samples. Applied to synthetic flow cytometry data, restriction identifies biomarkers that are otherwise disregarded. In advanced melanoma, restriction finds biomarkers of immune-related adverse event risk after immunotherapy and enables us to build multivariate models that accurately predict immunotherapy-related hepatitis. Hence, dataset restriction augments discovery of immune disease biomarkers, increases predictive certainty for classifiable samples and improves multivariate models incorporating biomarkers with a limited informative range. This principle can be directly extended to any classification task.

Suggested Citation

  • Gunther Glehr & Paloma Riquelme & Katharina Kronenberg & Robert Lohmayer & Víctor J. López-Madrona & Michael Kapinsky & Hans J. Schlitt & Edward K. Geissler & Rainer Spang & Sebastian Haferkamp & Jame, 2024. "Restricting datasets to classifiable samples augments discovery of immune disease biomarkers," Nature Communications, Nature, vol. 15(1), pages 1-21, December.
  • Handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-49094-3
    DOI: 10.1038/s41467-024-49094-3
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-024-49094-3
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-024-49094-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Andrew Cron & Cécile Gouttefangeas & Jacob Frelinger & Lin Lin & Satwinder K Singh & Cedrik M Britten & Marij J P Welters & Sjoerd H van der Burg & Mike West & Cliburn Chan, 2013. "Hierarchical Modeling for Rare Event Detection and Cell Subset Alignment across Flow Cytometry Samples," PLOS Computational Biology, Public Library of Science, vol. 9(7), pages 1-14, July.
    2. Thomas Liechti & Yaser Iftikhar & Massimo Mangino & Margaret Beddall & Charles W. Goss & Jane A. O’Halloran & Philip A. Mudd & Mario Roederer, 2022. "Immune phenotypes that are associated with subsequent COVID-19 severity inferred from post-recovery samples," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    3. Phipson Belinda & Smyth Gordon K, 2010. "Permutation P-values Should Never Be Zero: Calculating Exact P-values When Permutations Are Randomly Drawn," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-16, October.
    4. Lori E. Dodd & Margaret S. Pepe, 2003. "Partial AUC Estimation and Regression," Biometrics, The International Biometric Society, vol. 59(3), pages 614-623, September.
    5. Zia Khan & Christian Hammer & Jonathan Carroll & Flavia Nucci & Sergio Ley Acosta & Vidya Maiya & Tushar Bhangale & Julie Hunkapiller & Ira Mellman & Matthew L. Albert & Mark I. McCarthy & G. Scott Ch, 2021. "Genetic variation associated with thyroid autoimmunity shapes the systemic immune response to PD-1 checkpoint blockade," Nature Communications, Nature, vol. 12(1), pages 1-12, December.
    6. B. Rosner & R. J. Glynn, 2009. "Power and Sample Size Estimation for the Wilcoxon Rank Sum Test with Application to Comparisons of C Statistics from Alternative Prediction Models," Biometrics, The International Biometric Society, vol. 65(1), pages 188-197, March.
    7. Michael Conroy & Jarushka Naidoo, 2022. "Immune-related adverse events and the balancing act of immunotherapy," Nature Communications, Nature, vol. 13(1), pages 1-4, December.
    8. Eirini Arvaniti & Manfred Claassen, 2017. "Sensitive detection of rare disease-associated cell subsets via representation learning," Nature Communications, Nature, vol. 8(1), pages 1-10, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Pooja Middha & Rohit Thummalapalli & Michael J. Betti & Lydia Yao & Zoe Quandt & Karmugi Balaratnam & Cosmin A. Bejan & Eduardo Cardenas & Christina J. Falcon & David M. Faleck & Matthew A. Gubens & S, 2024. "Polygenic risk score for ulcerative colitis predicts immune checkpoint inhibitor-mediated colitis," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    2. Masha Shunko & Julie Niederhoff & Yaroslav Rosokha, 2018. "Humans Are Not Machines: The Behavioral Impact of Queueing Design on Service Time," Management Science, INFORMS, vol. 64(1), pages 453-473, January.
    3. Peterson, A. Townsend & Papeş, Monica & Soberón, Jorge, 2008. "Rethinking receiver operating characteristic analysis applications in ecological niche modeling," Ecological Modelling, Elsevier, vol. 213(1), pages 63-72.
    4. Margaret Sullivan Pepe & Tianxi Cai, 2004. "The Analysis of Placement Values for Evaluating Discriminatory Measures," Biometrics, The International Biometric Society, vol. 60(2), pages 528-535, June.
    5. Romero, Julian & Rosokha, Yaroslav, 2018. "Constructing strategies in the indefinitely repeated prisoner’s dilemma game," European Economic Review, Elsevier, vol. 104(C), pages 185-219.
    6. Silke Janitza & Ender Celik & Anne-Laure Boulesteix, 2018. "A computationally fast variable importance test for random forests for high-dimensional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(4), pages 885-915, December.
    7. Man-Jen Hsu & Huey-Miin Hsueh, 2013. "The linear combinations of biomarkers which maximize the partial area under the ROC curves," Computational Statistics, Springer, vol. 28(2), pages 647-666, April.
    8. Ross J Burton & Raya Ahmed & Simone M Cuff & Sarah Baker & Andreas Artemiou & Matthias Eberl, 2021. "CytoPy: An autonomous cytometry analysis framework," PLOS Computational Biology, Public Library of Science, vol. 17(6), pages 1-21, June.
    9. Soutik Ghosal & Zhen Chen, 2022. "Discriminatory Capacity of Prenatal Ultrasound Measures for Large-for-Gestational-Age Birth: A Bayesian Approach to ROC Analysis Using Placement Values," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 14(1), pages 1-22, April.
    10. Holly Janes & Gary Longton & Margaret S. Pepe, 2009. "Accommodating covariates in receiver operating characteristic analysis," Stata Journal, StataCorp LP, vol. 9(1), pages 17-39, March.
    11. Angela L. Riffo-Campos & Guillermo Ayala & Juan Domingo, 2021. "Ordering of Omics Features Using Beta Distributions on Montecarlo p -Values," Mathematics, MDPI, vol. 9(11), pages 1-18, June.
    12. De Capitani, L. & De Martini, D., 2011. "On stochastic orderings of the Wilcoxon Rank Sum test statistic--With applications to reproducibility probability estimation testing," Statistics & Probability Letters, Elsevier, vol. 81(8), pages 937-946, August.
    13. Baddeley, Adrian & Hardegen, Andrew & Lawrence, Thomas & Milne, Robin K. & Nair, Gopalan & Rakshit, Suman, 2017. "On two-stage Monte Carlo tests of composite hypotheses," Computational Statistics & Data Analysis, Elsevier, vol. 114(C), pages 75-87.
    14. Gigliarano, Chiara & Figini, Silvia & Muliere, Pietro, 2014. "Making classifier performance comparisons when ROC curves intersect," Computational Statistics & Data Analysis, Elsevier, vol. 77(C), pages 300-312.
    15. Yu, Wenbao & Park, Taesung, 2015. "Two simple algorithms on linear combination of multiple biomarkers to maximize partial area under the ROC curve," Computational Statistics & Data Analysis, Elsevier, vol. 88(C), pages 15-27.
    16. Jesse Hemerik & Jelle J. Goeman, 2021. "Another Look at the Lady Tasting Tea and Differences Between Permutation Tests and Randomisation Tests," International Statistical Review, International Statistical Institute, vol. 89(2), pages 367-381, August.
    17. Fabian J.E. Telschow & Michael R. Pierrynowski & Stephan F. Huckemann, 2021. "Functional inference on rotational curves under sample‐specific group actions and identification of human gait," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 48(4), pages 1256-1276, December.
    18. Chunpeng Fan & Donghui Zhang & Cun-Hui Zhang, 2011. "On Sample Size of the Kruskal–Wallis Test with Application to a Mouse Peritoneal Cavity Study," Biometrics, The International Biometric Society, vol. 67(1), pages 213-224, March.
    19. Hivert, Benjamin & Agniel, Denis & Thiébaut, Rodolphe & Hejblum, Boris P., 2024. "Post-clustering difference testing: Valid inference and practical considerations with applications to ecological and biological data," Computational Statistics & Data Analysis, Elsevier, vol. 193(C).
    20. Yousef, Waleed A., 2013. "Assessing classifiers in terms of the partial area under the ROC curve," Computational Statistics & Data Analysis, Elsevier, vol. 64(C), pages 51-70.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-49094-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.