IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v5y2006i1n11.html
   My bibliography  Save this article

Issues of Processing and Multiple Testing of SELDI-TOF MS Proteomic Data

Author

Listed:
  • Birkner Merrill D.

    (Division of Biostatistics, School of Public Health, University of California, Berkeley)

  • Hubbard Alan E.

    (Division of Biostatistics, School of Public Health, University of California, Berkeley)

  • van der Laan Mark J.

    (Division of Biostatistics, School of Public Health, University of California, Berkeley)

  • Skibola Christine F.

    (Division of Environmental Health Sciences, School of Public Health, University of California, Berkeley)

  • Hegedus Christine M.

    (Division of Environmental Health Sciences, School of Public Health, University of California, Berkeley)

  • Smith Martyn T.

    (Division of Environmental Health Sciences, School of Public Health, University of California, Berkeley)

Abstract

A new data filtering method for SELDI-TOF MS proteomic spectra data is described. We examined technical repeats (2 per subject) of intensity versus m/z (mass/charge) of bone marrow cell lysate for two groups of childhood leukemia patients: acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL). As others have noted, the type of data processing as well as experimental variability can have a disproportionate impact on the list of ``interesting'' proteins (see Baggerly et al. (2004)). We propose a list of processing and multiple testing techniques to correct for 1) background drift; 2) filtering using smooth regression and cross-validated bandwidth selection; 3) peak finding; and 4) methods to correct for multiple testing (van der Laan et al. (2005)). The result is a list of proteins (indexed by m/z) where average expression is significantly different among disease (or treatment, etc.) groups. The procedures are intended to provide a sensible and statistically driven algorithm, which we argue provides a list of proteins that have a significant difference in expression. Given no sources of unmeasured bias (such as confounding of experimental conditions with disease status), proteins found to be statistically significant using this technique have a low probability of being false positives.

Suggested Citation

  • Birkner Merrill D. & Hubbard Alan E. & van der Laan Mark J. & Skibola Christine F. & Hegedus Christine M. & Smith Martyn T., 2006. "Issues of Processing and Multiple Testing of SELDI-TOF MS Proteomic Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 5(1), pages 1-24, April.
  • Handle: RePEc:bpj:sagmbi:v:5:y:2006:i:1:n:11
    DOI: 10.2202/1544-6115.1198
    as

    Download full text from publisher

    File URL: https://doi.org/10.2202/1544-6115.1198
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.2202/1544-6115.1198?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. van der Laan Mark J. & Dudoit Sandrine & Pollard Katherine S., 2004. "Augmentation Procedures for Control of the Generalized Family-Wise Error Rate and Tail Probabilities for the Proportion of False Positives," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 3(1), pages 1-27, June.
    2. Mark van der Laan & Sandrine Dudoit & Katherine Pollard, 2004. "Multiple Testing. Part III. Procedures for Control of the Generalized Family-Wise Error Rate and Proportion of False Positives," U.C. Berkeley Division of Biostatistics Working Paper Series 1140, Berkeley Electronic Press.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. G�nther Fink & Margaret McConnell & Sebastian Vollmer, 2014. "Testing for heterogeneous treatment effects in experimental data: false discovery risks and correction procedures," Journal of Development Effectiveness, Taylor & Francis Journals, vol. 6(1), pages 44-57, January.
    2. Irene Castro-Conde & Jacobo Uña-Álvarez, 2015. "Power, FDR and conservativeness of BB-SGoF method," Computational Statistics, Springer, vol. 30(4), pages 1143-1161, December.
    3. Joseph Romano & Azeem Shaikh & Michael Wolf, 2008. "Control of the false discovery rate under dependence using the bootstrap and subsampling," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 17(3), pages 417-442, November.
    4. Wang, Li & Xu, Xingzhong, 2012. "Step-up procedure controlling generalized family-wise error rate," Statistics & Probability Letters, Elsevier, vol. 82(4), pages 775-782.
    5. Christina C. Bartenschlager & Michael Krapp, 2015. "Theorie und Methoden multipler statistischer Vergleiche," AStA Wirtschafts- und Sozialstatistisches Archiv, Springer;Deutsche Statistische Gesellschaft - German Statistical Society, vol. 9(2), pages 107-129, November.
    6. Gordon, Alexander Y., 2009. "Inequalities between generalized familywise error rates of a multiple testing procedure," Statistics & Probability Letters, Elsevier, vol. 79(19), pages 1996-2004, October.
    7. Guo Wenge & Peddada Shyamal, 2008. "Adaptive Choice of the Number of Bootstrap Samples in Large Scale Multiple Testing," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(1), pages 1-21, March.
    8. Montazeri Zahra & Yanofsky Corey M. & Bickel David R., 2010. "Shrinkage Estimation of Effect Sizes as an Alternative to Hypothesis Testing Followed by Estimation in High-Dimensional Biology: Applications to Differential Gene Expression," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-33, June.
    9. Joseph P. Romano & Azeem M. Shaikh & Michael Wolf, 2010. "Hypothesis Testing in Econometrics," Annual Review of Economics, Annual Reviews, vol. 2(1), pages 75-104, September.
    10. Somerville, Paul N. & Hemmelmann, Claudia, 2008. "Step-up and step-down procedures controlling the number and proportion of false positives," Computational Statistics & Data Analysis, Elsevier, vol. 52(3), pages 1323-1334, January.
    11. Merrill Birkner & Sandra Sinisi & Mark van der Laan, 2004. "Multiple Testing and Data Adaptive Regression: An Application to HIV-1 Sequence Data," U.C. Berkeley Division of Biostatistics Working Paper Series 1161, Berkeley Electronic Press.
    12. Frank Emmert-Streib & Galina V Glazko, 2011. "Pathway Analysis of Expression Data: Deciphering Functional Building Blocks of Complex Diseases," PLOS Computational Biology, Public Library of Science, vol. 7(5), pages 1-6, May.
    13. Mathur, Maya B & VanderWeele, Tyler, 2018. "Statistical methods for evidence synthesis," Thesis Commons kd6ja, Center for Open Science.
    14. Cerioli, Andrea & Farcomeni, Alessio, 2011. "Error rates for multivariate outlier detection," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 544-553, January.
    15. de Uña-Alvarez Jacobo, 2012. "The Beta-Binomial SGoF method for multiple dependent tests," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(3), pages 1-32, May.
    16. L. Finos & A. Farcomeni, 2011. "k-FWER Control without p -value Adjustment, with Application to Detection of Genetic Determinants of Multiple Sclerosis in Italian Twins," Biometrics, The International Biometric Society, vol. 67(1), pages 174-181, March.
    17. Debashis Ghosh, 2006. "Shrunken p-Values for Assessing Differential Expression with Applications to Genomic Data Analysis," Biometrics, The International Biometric Society, vol. 62(4), pages 1099-1106, December.
    18. Wang, Li, 2022. "New testing procedures with k-FWER control for discrete data," Statistics & Probability Letters, Elsevier, vol. 180(C).
    19. Schumi Jennifer & DiRienzo A. Gregory & DeGruttola Victor, 2008. "Testing for Associations with Missing High-Dimensional Categorical Covariates," The International Journal of Biostatistics, De Gruyter, vol. 4(1), pages 1-19, September.
    20. Alessio Farcomeni, 2009. "Generalized Augmentation to Control the False Discovery Exceedance in Multiple Testing," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 36(3), pages 501-517, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:5:y:2006:i:1:n:11. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.