IDEAS home Printed from https://ideas.repec.org/p/pra/mprapa/59642.html
   My bibliography  Save this paper

A Two Sample Test for High Dimensional Data with Applications to Gene-set Testing

Author

Listed:
  • Chen, Song Xi
  • Qin, Yingli

Abstract

We proposed a two sample test for means of high dimensional data when the data dimension is much larger than the sample size. The classical Hotelling's $T^2$ test does not work for this ``large p, small n" situation. The proposed test does not require explicit conditions on the relationship between the data dimension and sample size. This offers much flexibility in analyzing high dimensional data. An application of the proposed test is in testing significance for sets of genes, which we demonstrate in an empirical study on a Leukemia data set.

Suggested Citation

  • Chen, Song Xi & Qin, Yingli, 2010. "A Two Sample Test for High Dimensional Data with Applications to Gene-set Testing," MPRA Paper 59642, University Library of Munich, Germany.
  • Handle: RePEc:pra:mprapa:59642
    as

    Download full text from publisher

    File URL: https://mpra.ub.uni-muenchen.de/59642/1/MPRA_paper_59642.pdf
    File Function: original version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. John D. Storey & Jonathan E. Taylor & David Siegmund, 2004. "Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 66(1), pages 187-205, February.
    2. James R. Schott, 2005. "Testing for complete independence in high dimensions," Biometrika, Biometrika Trust, vol. 92(4), pages 951-956, December.
    3. Fan, Jianqing & Hall, Peter & Yao, Qiwei, 2007. "To How Many Simultaneous Hypothesis Tests Can Normal, Student's t or Bootstrap Calibration Be Applied?," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 1282-1288, December.
    4. Fan, Jianqing & Peng, Heng & Huang, Tao, 2005. "Semilinear High-Dimensional Model for Normalization of Microarray Data: A Theoretical Analysis and Partial Consistency," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 781-796, September.
    5. Huang, Jian & Wang, Deli & Zhang, Cun-Hui, 2005. "A Two-Way Semilinear Model for Normalization and Analysis of cDNA Microarray Data," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 814-829, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chen, Songxi, 2012. "Two Sample Tests for High Dimensional Covariance Matrices," MPRA Paper 46026, University Library of Munich, Germany.
    2. Nott, David J. & Yu, Zeming & Chan, Eva & Cotsapas, Chris & Cowley, Mark J. & Pulvers, Jeremy & Williams, Rohan & Little, Peter, 2007. "Hierarchical Bayes variable selection and microarray experiments," Journal of Multivariate Analysis, Elsevier, vol. 98(4), pages 852-872, April.
    3. Wang, Guanghui & Zou, Changliang & Wang, Zhaojun, 2013. "A necessary test for complete independence in high dimensions using rank-correlations," Journal of Multivariate Analysis, Elsevier, vol. 121(C), pages 224-232.
    4. Fan, Jianqing & Hall, Peter & Yao, Qiwei, 2007. "To How Many Simultaneous Hypothesis Tests Can Normal, Student's t or Bootstrap Calibration Be Applied?," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 1282-1288, December.
    5. You, Jinhong & Zhou, Haibo, 2008. "A two-stage approach to semilinear in-slide models," Journal of Multivariate Analysis, Elsevier, vol. 99(8), pages 1610-1634, September.
    6. Wang, Siyang & Cui, Hengjian, 2013. "Generalized F test for high dimensional linear regression coefficients," Journal of Multivariate Analysis, Elsevier, vol. 117(C), pages 134-149.
    7. Liping Zhu & Jinhong You & Qunfang Xu, 2014. "Statistical Inference for Single-index Panel Data Models," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 41(3), pages 830-843, September.
    8. Jianqing Fan & Xu Han, 2017. "Estimation of the false discovery proportion with unknown dependence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(4), pages 1143-1164, September.
    9. Shigeyuki Matsui & Hisashi Noma, 2011. "Estimating Effect Sizes of Differentially Expressed Genes for Power and Sample-Size Assessments in Microarray Experiments," Biometrics, The International Biometric Society, vol. 67(4), pages 1225-1235, December.
    10. Lianming Wang & David B. Dunson, 2010. "Semiparametric Bayes Multiple Testing: Applications to Tumor Data," Biometrics, The International Biometric Society, vol. 66(2), pages 493-501, June.
    11. Hanck, Christoph, 2011. "Now, whose schools are really better (or weaker) than Germany's? A multiple testing approach," Economic Modelling, Elsevier, vol. 28(4), pages 1739-1746, July.
    12. Ghosh Debashis, 2012. "Incorporating the Empirical Null Hypothesis into the Benjamini-Hochberg Procedure," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(4), pages 1-21, July.
    13. Dean Palejev & Mladen Savov, 2021. "On the Convergence of the Benjamini–Hochberg Procedure," Mathematics, MDPI, vol. 9(17), pages 1-19, September.
    14. Moon, H.R. & Perron, B., 2012. "Beyond panel unit root tests: Using multiple testing to determine the nonstationarity properties of individual series in a panel," Journal of Econometrics, Elsevier, vol. 169(1), pages 29-33.
    15. Psaradellis, Ioannis & Laws, Jason & Pantelous, Athanasios A. & Sermpinis, Georgios, 2023. "Technical analysis, spread trading, and data snooping control," International Journal of Forecasting, Elsevier, vol. 39(1), pages 178-191.
    16. Jingxin Zhao & Heng Peng & Tao Huang, 2018. "Variance estimation for semiparametric regression models by local averaging," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 27(2), pages 453-476, June.
    17. repec:hum:wpaper:sfb649dp2012-049 is not listed on IDEAS
    18. Dette, Holger & Hoderlein, Stefan & Neumeyer, Natalie, 2016. "Testing multivariate economic restrictions using quantiles: The example of Slutsky negative semidefiniteness," Journal of Econometrics, Elsevier, vol. 191(1), pages 129-144.
    19. Gharad Bryan & James J Choi & Dean Karlan, 2021. "Randomizing Religion: the Impact of Protestant Evangelism on Economic Outcomes," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 136(1), pages 293-380.
    20. Qiu, Yumou & Chen, Songxi, 2012. "Test for Bandedness of High Dimensional Covariance Matrices with Bandwidth Estimation," MPRA Paper 46242, University Library of Munich, Germany.
    21. Wang Chamont & Gevertz Jana L., 2016. "Finding causative genes from high-dimensional data: an appraisal of statistical and machine learning approaches," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 15(4), pages 321-347, August.

    More about this item

    Keywords

    large p small n; martingale central limit theorem; multiple comparison.;
    All these keywords.

    JEL classification:

    • C1 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General
    • C12 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Hypothesis Testing: General
    • C14 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Semiparametric and Nonparametric Methods: General

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pra:mprapa:59642. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Joachim Winter (email available below). General contact details of provider: https://edirc.repec.org/data/vfmunde.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.