IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v13y2022i1d10.1038_s41467-022-33071-9.html
   My bibliography  Save this article

Batch effects removal for microbiome data via conditional quantile regression

Author

Listed:
  • Wodan Ling

    (Fred Hutchinson Cancer Center)

  • Jiuyao Lu

    (Johns Hopkins Bloomberg School of Public Health)

  • Ni Zhao

    (Johns Hopkins Bloomberg School of Public Health)

  • Anju Lulla

    (University of North Carolina)

  • Anna M. Plantinga

    (Williams College)

  • Weijia Fu

    (University of Washington)

  • Angela Zhang

    (Fred Hutchinson Cancer Center
    University of Washington)

  • Hongjiao Liu

    (Fred Hutchinson Cancer Center
    University of Washington)

  • Hoseung Song

    (Fred Hutchinson Cancer Center)

  • Zhigang Li

    (University of Florida)

  • Jun Chen

    (Mayo Clinic)

  • Timothy W. Randolph

    (Fred Hutchinson Cancer Center)

  • Wei Li A. Koay

    (Children’s National Hospital
    George Washington University)

  • James R. White

    (Resphera Biosciences)

  • Lenore J. Launer

    (Laboratory of Epidemiology and Population Science, NIA)

  • Anthony A. Fodor

    (University of North Carolina at Charlotte)

  • Katie A. Meyer

    (University of North Carolina)

  • Michael C. Wu

    (Fred Hutchinson Cancer Center
    University of Washington)

Abstract

Batch effects in microbiome data arise from differential processing of specimens and can lead to spurious findings and obscure true signals. Strategies designed for genomic data to mitigate batch effects usually fail to address the zero-inflated and over-dispersed microbiome data. Most strategies tailored for microbiome data are restricted to association testing or specialized study designs, failing to allow other analytic goals or general designs. Here, we develop the Conditional Quantile Regression (ConQuR) approach to remove microbiome batch effects using a two-part quantile regression model. ConQuR is a comprehensive method that accommodates the complex distributions of microbial read counts by non-parametric modeling, and it generates batch-removed zero-inflated read counts that can be used in and benefit usual subsequent analyses. We apply ConQuR to simulated and real microbiome datasets and demonstrate its advantages in removing batch effects while preserving the signals of interest.

Suggested Citation

  • Wodan Ling & Jiuyao Lu & Ni Zhao & Anju Lulla & Anna M. Plantinga & Weijia Fu & Angela Zhang & Hongjiao Liu & Hoseung Song & Zhigang Li & Jun Chen & Timothy W. Randolph & Wei Li A. Koay & James R. Whi, 2022. "Batch effects removal for microbiome data via conditional quantile regression," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
  • Handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-33071-9
    DOI: 10.1038/s41467-022-33071-9
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-022-33071-9
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-022-33071-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Machado, Jose A.F. & Silva, J. M. C. Santos, 2005. "Quantiles for Counts," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 1226-1237, December.
    2. Koenker, Roger W & Bassett, Gilbert, Jr, 1978. "Regression Quantiles," Econometrica, Econometric Society, vol. 46(1), pages 33-50, January.
    3. András Maifeld & Hendrik Bartolomaeus & Ulrike Löber & Ellen G. Avery & Nico Steckhan & Lajos Markó & Nicola Wilck & Ibrahim Hamad & Urša Šušnjar & Anja Mähler & Christoph Hohmann & Chia-Yu Chen & Hol, 2021. "Fasting alters the gut microbiome reducing blood pressure and body weight in metabolic syndrome patients," Nature Communications, Nature, vol. 12(1), pages 1-20, December.
    4. Peter J. Turnbaugh & Micah Hamady & Tanya Yatsunenko & Brandi L. Cantarel & Alexis Duncan & Ruth E. Ley & Mitchell L. Sogin & William J. Jones & Bruce A. Roe & Jason P. Affourtit & Michael Egholm & Be, 2009. "A core gut microbiome in obese and lean twins," Nature, Nature, vol. 457(7228), pages 480-484, January.
    5. Duan, Naihua, et al, 1983. "A Comparison of Alternative Models for the Demand for Medical Care," Journal of Business & Economic Statistics, American Statistical Association, vol. 1(2), pages 115-126, April.
    6. Junjie Qin & Yingrui Li & Zhiming Cai & Shenghui Li & Jianfeng Zhu & Fan Zhang & Suisha Liang & Wenwei Zhang & Yuanlin Guan & Dongqian Shen & Yangqing Peng & Dongya Zhang & Zhuye Jie & Wenxian Wu & Yo, 2012. "A metagenome-wide association study of gut microbiota in type 2 diabetes," Nature, Nature, vol. 490(7418), pages 55-60, October.
    7. Sean M Gibbons & Claire Duvallet & Eric J Alm, 2018. "Correcting for batch effects in case-control microbiome studies," PLOS Computational Biology, Public Library of Science, vol. 14(4), pages 1-17, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Machado, José A.F. & Santos Silva, J.M.C. & Wei, Kehai, 2016. "Quantiles, corners, and the extensive margin of trade," European Economic Review, Elsevier, vol. 89(C), pages 73-84.
    2. Paul Hewson & Keming Yu, 2008. "Quantile regression for binary performance indicators," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 24(5), pages 401-418, September.
    3. Victor Chernozhukov & Iván Fernández-Val & Blaise Melly & Kaspar Wüthrich, 2020. "Generic Inference on Quantile and Quantile Effect Functions for Discrete Outcomes," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(529), pages 123-137, January.
    4. Cariou, Pierre & Wolff, Francois-Charles, 2015. "Identifying substandard vessels through Port State Control inspections: A new methodology for Concentrated Inspection Campaigns," Marine Policy, Elsevier, vol. 60(C), pages 27-39.
    5. Paul Contoyannis & Jinhu Li, 2017. "The dynamics of adolescent depression: an instrumental variable quantile regression with fixed effects approach," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 180(3), pages 907-922, June.
    6. Henry R. Scharf & Xinyi Lu & Perry J. Williams & Mevin B. Hooten, 2022. "Constructing Flexible, Identifiable and Interpretable Statistical Models for Binary Data," International Statistical Review, International Statistical Institute, vol. 90(2), pages 328-345, August.
    7. Guodong Li & Yang Li & Chih-Ling Tsai, 2015. "Quantile Correlations and Quantile Autoregressive Modeling," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 246-261, March.
    8. Jay Dev Dubey, 2021. "Measuring Income Elasticity of Healthcare-Seeking Behavior in India: A Conditional Quantile Regression Approach," Journal of Quantitative Economics, Springer;The Indian Econometric Society (TIES), vol. 19(4), pages 767-793, December.
    9. Luke B. Smith & Brian J. Reich & Amy H. Herring & Peter H. Langlois & Montserrat Fuentes, 2015. "Multilevel quantile function modeling with application to birth outcomes," Biometrics, The International Biometric Society, vol. 71(2), pages 508-519, June.
    10. Andrew Chesher, 2005. "Nonparametric Identification under Discrete Variation," Econometrica, Econometric Society, vol. 73(5), pages 1525-1550, September.
    11. S. Ghasemzadeh & M. Ganjali & T. Baghfalaki, 2022. "Quantile regression via the EM algorithm for joint modeling of mixed discrete and continuous data based on Gaussian copula," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 31(5), pages 1181-1202, December.
    12. Gustav Kjellsson & Dennis Petrie & Tom (T.G.M.) van Ourti, 2018. "Measuring income-related inequalities in risky health prospects," Tinbergen Institute Discussion Papers 18-007/V, Tinbergen Institute.
    13. Barnes, Kayleigh & Mukherji, Arnab & Mullen, Patrick & Sood, Neeraj, 2017. "Financial risk protection from social health insurance," Journal of Health Economics, Elsevier, vol. 55(C), pages 14-29.
    14. Paolo Frumento & Nicola Salvati, 2021. "Parametric modeling of quantile regression coefficient functions with count data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(4), pages 1237-1258, October.
    15. Machado, Jose A F & Santos Silva, Joao M C, 2008. "Quantiles for Fractions and Other Mixed Data," Economics Discussion Papers 3550, University of Essex, Department of Economics.
    16. Giovanni Dosi & Dario Guarascio & Andrea Ricci & Maria Enrica Virgillito, 2021. "Neodualism in the Italian business firms: training, organizational capabilities, and productivity distributions," Small Business Economics, Springer, vol. 57(1), pages 167-189, June.
    17. Alguacil, Maite & Martí, Josep & Orts, Vicente, 2017. "Firm heterogeneity and the market scope of European multinational activity," International Review of Economics & Finance, Elsevier, vol. 51(C), pages 645-659.
    18. Stijn Kelchtermans & Reinhilde Veugelers, 2011. "The great divide in scientific productivity: why the average scientist does not exist," Industrial and Corporate Change, Oxford University Press and the Associazione ICC, vol. 20(1), pages 295-336, February.
    19. Xuejun Jiang & Yunxian Li & Aijun Yang & Ruowei Zhou, 2020. "Bayesian semiparametric quantile regression modeling for estimating earthquake fatality risk," Empirical Economics, Springer, vol. 58(5), pages 2085-2103, May.
    20. Fuzi, Mohd Fadzli Mohd & Jemain, Abdul Aziz & Ismail, Noriszura, 2016. "Bayesian quantile regression model for claim count data," Insurance: Mathematics and Economics, Elsevier, vol. 66(C), pages 124-137.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-33071-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.