IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0017238.html
   My bibliography  Save this article

Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods

Author

Listed:
  • Chao Chen
  • Kay Grennan
  • Judith Badner
  • Dandan Zhang
  • Elliot Gershon
  • Li Jin
  • Chunyu Liu

Abstract

The expression microarray is a frequently used approach to study gene expression on a genome-wide scale. However, the data produced by the thousands of microarray studies published annually are confounded by “batch effects,” the systematic error introduced when samples are processed in multiple batches. Although batch effects can be reduced by careful experimental design, they cannot be eliminated unless the whole study is done in a single batch. A number of programs are now available to adjust microarray data for batch effects prior to analysis. We systematically evaluated six of these programs using multiple measures of precision, accuracy and overall performance. ComBat, an Empirical Bayes method, outperformed the other five programs by most metrics. We also showed that it is essential to standardize expression data at the probe level when testing for correlation of expression profiles, due to a sizeable probe effect in microarray data that can inflate the correlation among replicates and unrelated samples.

Suggested Citation

  • Chao Chen & Kay Grennan & Judith Badner & Dandan Zhang & Elliot Gershon & Li Jin & Chunyu Liu, 2011. "Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods," PLOS ONE, Public Library of Science, vol. 6(2), pages 1-10, February.
  • Handle: RePEc:plo:pone00:0017238
    DOI: 10.1371/journal.pone.0017238
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0017238
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0017238&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0017238?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. M. Kathleen Kerr, 2003. "Design Considerations for Efficient and Effective Microarray Studies," Biometrics, The International Biometric Society, vol. 59(4), pages 822-828, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Xia Qing & Thompson Jeffrey A. & Koestler Devin C., 2021. "Batch effect reduction of microarray data with dependent samples using an empirical Bayes approach (BRIDGE)," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 20(4-6), pages 101-119, December.
    2. Charlotte Soneson & Sarah Gerster & Mauro Delorenzi, 2014. "Batch Effect Confounding Leads to Strong Bias in Performance Estimates Obtained by Cross-Validation," PLOS ONE, Public Library of Science, vol. 9(6), pages 1-13, June.
    3. Romain Banchereau & Alejandro Jordan-Villegas & Monica Ardura & Asuncion Mejias & Nicole Baldwin & Hui Xu & Elizabeth Saye & Jose Rossello-Urgell & Phuong Nguyen & Derek Blankenship & Clarence B Creec, 2012. "Host Immune Transcriptional Profiles Reflect the Variability in Clinical Disease Manifestations in Patients with Staphylococcus aureus Infections," PLOS ONE, Public Library of Science, vol. 7(4), pages 1-11, April.
    4. Qi Su & Qin Liu & Raphaela Iris Lau & Jingwan Zhang & Zhilu Xu & Yun Kit Yeoh & Thomas W. H. Leung & Whitney Tang & Lin Zhang & Jessie Q. Y. Liang & Yuk Kam Yau & Jiaying Zheng & Chengyu Liu & Mengjin, 2022. "Faecal microbiome-based machine learning for multi-class disease diagnosis," Nature Communications, Nature, vol. 13(1), pages 1-8, December.
    5. Sean M Gibbons & Claire Duvallet & Eric J Alm, 2018. "Correcting for batch effects in case-control microbiome studies," PLOS Computational Biology, Public Library of Science, vol. 14(4), pages 1-17, April.
    6. Nazifa Ahmed Moumi & Badhan Das & Zarin Tasnim Promi & Nishat Anjum Bristy & Md Shamsuzzoha Bayzid, 2019. "Quartet-based inference of cell differentiation trees from ChIP-Seq histone modification data," PLOS ONE, Public Library of Science, vol. 14(9), pages 1-25, September.
    7. Jacopo Umberto Verga & Matthew Huff & Diarmuid Owens & Bethany J. Wolf & Gary Hardiman, 2022. "Integrated Genomic and Bioinformatics Approaches to Identify Molecular Links between Endocrine Disruptors and Adverse Outcomes," IJERPH, MDPI, vol. 19(1), pages 1-24, January.
    8. Aline Talhouk & Stefan Kommoss & Robertson Mackenzie & Martin Cheung & Samuel Leung & Derek S Chiu & Steve E Kalloger & David G Huntsman & Stephanie Chen & Maria Intermaggio & Jacek Gronwald & Fong C , 2016. "Single-Patient Molecular Testing with NanoString nCounter Data Using a Reference-Based Strategy for Batch Effect Correction," PLOS ONE, Public Library of Science, vol. 11(4), pages 1-18, April.
    9. Raihan K Uddin & Shiva M Singh, 2013. "Hippocampal Gene Expression Meta-Analysis Identifies Aging and Age-Associated Spatial Learning Impairment (ASLI) Genes and Pathways," PLOS ONE, Public Library of Science, vol. 8(7), pages 1-16, July.
    10. Kejian Wang & Jiazhi Sun & Shufeng Zhou & Chunling Wan & Shengying Qin & Can Li & Lin He & Lun Yang, 2013. "Prediction of Drug-Target Interactions for Drug Repositioning Only Based on Genomic Expression Similarity," PLOS Computational Biology, Public Library of Science, vol. 9(11), pages 1-9, November.
    11. Samir Dou & Nathalie Villa-Vialaneix & Laurence Liaubet & Yvon Billon & Mario Giorgi & Hélène Gilbert & Jean-Luc Gourdine & Juliette Riquet & David Renaudeau, 2017. "1HNMR-Based metabolomic profiling method to develop plasma biomarkers for sensitivity to chronic heat stress in growing pigs," PLOS ONE, Public Library of Science, vol. 12(11), pages 1-18, November.
    12. Christian Müller & Arne Schillert & Caroline Röthemeier & David-Alexandre Trégouët & Carole Proust & Harald Binder & Norbert Pfeiffer & Manfred Beutel & Karl J Lackner & Renate B Schnabel & Laurence T, 2016. "Removing Batch Effects from Longitudinal Gene Expression - Quantile Normalization Plus ComBat as Best Approach for Microarray Transcriptome Data," PLOS ONE, Public Library of Science, vol. 11(6), pages 1-23, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kerr Kathleen F., 2012. "Optimality Criteria for the Design of 2-Color Microarray Studies," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(1), pages 1-9, January.
    2. Landgrebe, Jobst & Bretz, Frank & Brunner, Edgar, 2006. "Efficient design and analysis of two colour factorial microarray experiments," Computational Statistics & Data Analysis, Elsevier, vol. 50(2), pages 499-517, January.
    3. Oskar Bruning & Wendy Rodenburg & Paul F K Wackers & Conny van Oostrom & Martijs J Jonker & Rob J Dekker & Han Rauwerda & Wim A Ensink & Annemieke de Vries & Timo M Breit, 2016. "Confounding Factors in the Transcriptome Analysis of an In-Vivo Exposure Experiment," PLOS ONE, Public Library of Science, vol. 11(1), pages 1-23, January.
    4. Agnes Herzberg & Richard Jarrett, 2007. "A-Optimal Block Designs with Additional Singly Replicated Treatments," Journal of Applied Statistics, Taylor & Francis Journals, vol. 34(1), pages 61-70.
    5. Zhang Runchu & Mukerjee Rahul, 2013. "Highly efficient factorial designs for cDNA microarray experiments: use of approximate theory together with a step-up step-down procedure," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(4), pages 489-503, August.
    6. R. A. Bailey, 2007. "Designs for two‐colour microarray experiments," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 56(4), pages 365-394, August.
    7. Frédéric Reynier & Fabien Petit & Malick Paye & Fanny Turrel-Davin & Pierre-Emmanuel Imbert & Arnaud Hot & Bruno Mougin & Pierre Miossec, 2011. "Importance of Correlation between Gene Expression Levels: Application to the Type I Interferon Signature in Rheumatoid Arthritis," PLOS ONE, Public Library of Science, vol. 6(10), pages 1-8, October.
    8. Richard G. Jarrett & Katya Ruggiero, 2008. "Design and Analysis of Two-Phase Experiments for Gene Expression Microarrays—Part I," Biometrics, The International Biometric Society, vol. 64(1), pages 208-216, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0017238. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.