IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v74y2014icp26-38.html
   My bibliography  Save this article

A high-dimensional two-sample test for the mean using random subspaces

Author

Listed:
  • Thulin, Måns

Abstract

A common problem in genetics is that of testing whether a set of highly dependent gene expressions differ between two populations, typically in a high-dimensional setting where the data dimension is larger than the sample size. Most high-dimensional tests for the equality of two mean vectors rely on naive diagonal or trace estimators of the covariance matrix, ignoring dependences between variables. A test using random subspaces is proposed, which offers higher power when the variables are dependent and is invariant under linear transformations of the marginal distributions. The p-values for the test are obtained using permutations. The test does not rely on assumptions about normality or the structure of the covariance matrix. It is shown by simulation that the new test has higher power than competing tests in realistic settings motivated by microarray gene expression data. Computational aspects of high-dimensional permutation tests are also discussed and an efficient R implementation of the proposed test is provided.

Suggested Citation

  • Thulin, Måns, 2014. "A high-dimensional two-sample test for the mean using random subspaces," Computational Statistics & Data Analysis, Elsevier, vol. 74(C), pages 26-38.
  • Handle: RePEc:eee:csdana:v:74:y:2014:i:c:p:26-38
    DOI: 10.1016/j.csda.2013.12.003
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947313004726
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2013.12.003?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Fraiman, Ricardo & Svarc, Marcela, 2013. "Resistant estimates for high dimensional and functional data based on random projections," Computational Statistics & Data Analysis, Elsevier, vol. 58(C), pages 326-338.
    2. Eddelbuettel, Dirk & Sanderson, Conrad, 2014. "RcppArmadillo: Accelerating R with high-performance C++ linear algebra," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 1054-1063.
    3. Mielniczuk, Jan & Teisseyre, Paweł, 2014. "Using random subspace method for prediction and variable importance assessment in linear regression," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 725-742.
    4. Efron B. & Tibshirani R. & Storey J.D. & Tusher V., 2001. "Empirical Bayes Analysis of a Microarray Experiment," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1151-1160, December.
    5. Chen, Songxi, 2012. "Two Sample Tests for High Dimensional Covariance Matrices," MPRA Paper 46026, University Library of Munich, Germany.
    6. Schott, James R., 2007. "A test for the equality of covariance matrices when the dimension is large relative to the sample sizes," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 6535-6542, August.
    7. Tony Cai & Weidong Liu & Yin Xia, 2013. "Two-Sample Covariance Matrix Testing and Support Recovery in High-Dimensional and Sparse Settings," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(501), pages 265-277, March.
    8. Chen, Song Xi & Qin, Yingli, 2010. "A Two Sample Test for High Dimensional Data with Applications to Gene-set Testing," MPRA Paper 59642, University Library of Munich, Germany.
    9. Srivastava, Muni S. & Du, Meng, 2008. "A test for the mean vector with fewer observations than the dimension," Journal of Multivariate Analysis, Elsevier, vol. 99(3), pages 386-402, March.
    10. Efron, Bradley, 2007. "Correlation and Large-Scale Simultaneous Significance Testing," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 93-103, March.
    11. Srivastava, Muni S. & Katayama, Shota & Kano, Yutaka, 2013. "A two sample test in high dimensional data," Journal of Multivariate Analysis, Elsevier, vol. 114(C), pages 349-358.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Wang, Wei & Lin, Nan & Tang, Xiang, 2019. "Robust two-sample test of high-dimensional mean vectors under dependence," Journal of Multivariate Analysis, Elsevier, vol. 169(C), pages 312-329.
    2. Zhang, Huaiyu & Wang, Haiyan, 2021. "A more powerful test of equality of high-dimensional two-sample means," Computational Statistics & Data Analysis, Elsevier, vol. 164(C).
    3. Tzviel Frostig & Yoav Benjamini, 2022. "Testing the equality of multivariate means when $$p>n$$ p > n by combining the Hotelling and Simes tests," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(2), pages 390-415, June.
    4. Wang, Rui & Xu, Xingzhong, 2018. "On two-sample mean tests under spiked covariances," Journal of Multivariate Analysis, Elsevier, vol. 167(C), pages 225-249.
    5. Mingxiang Cao & Yuanjing He, 2022. "A high-dimensional test on linear hypothesis of means under a low-dimensional factor model," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 85(5), pages 557-572, July.
    6. Qiu, Tao & Xu, Wangli & Zhu, Liping, 2021. "Two-sample test in high dimensions through random selection," Computational Statistics & Data Analysis, Elsevier, vol. 160(C).
    7. Feng, Long & Sun, Fasheng, 2015. "A note on high-dimensional two-sample test," Statistics & Probability Letters, Elsevier, vol. 105(C), pages 29-36.
    8. Yuanyuan Jiang & Xingzhong Xu, 2022. "A Two-Sample Test of High Dimensional Means Based on Posterior Bayes Factor," Mathematics, MDPI, vol. 10(10), pages 1-23, May.
    9. Zhao, Junguang & Xu, Xingzhong, 2016. "A generalized likelihood ratio test for normal mean when p is greater than n," Computational Statistics & Data Analysis, Elsevier, vol. 99(C), pages 91-104.
    10. Pini, Alessia & Stamm, Aymeric & Vantini, Simone, 2018. "Hotelling’s T2 in separable Hilbert spaces," Journal of Multivariate Analysis, Elsevier, vol. 167(C), pages 284-305.
    11. Timothy I. Cannings & Richard J. Samworth, 2017. "Random-projection ensemble classification," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(4), pages 959-1035, September.
    12. Nicolas Städler & Sach Mukherjee, 2017. "Two-sample testing in high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(1), pages 225-246, January.
    13. Zhang, Jie & Pan, Meng, 2016. "A high-dimension two-sample test for the mean using cluster subspaces," Computational Statistics & Data Analysis, Elsevier, vol. 97(C), pages 87-97.
    14. Harrar, Solomon W. & Kong, Xiaoli, 2022. "Recent developments in high-dimensional inference for multivariate data: Parametric, semiparametric and nonparametric approaches," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    15. Huang, Yuan & Li, Changcheng & Li, Runze & Yang, Songshan, 2022. "An overview of tests on high-dimensional means," Journal of Multivariate Analysis, Elsevier, vol. 188(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yin, Yanqing, 2021. "Test for high-dimensional mean vector under missing observations," Journal of Multivariate Analysis, Elsevier, vol. 186(C).
    2. Muni S. Srivastava & Hirokazu Yanagihara & Tatsuya Kubokawa, 2014. "Tests for Covariance Matrices in High Dimension with Less Sample Size," CIRJE F-Series CIRJE-F-933, CIRJE, Faculty of Economics, University of Tokyo.
    3. Jiang Hu & Zhidong Bai & Chen Wang & Wei Wang, 2017. "On testing the equality of high dimensional mean vectors with unequal covariance matrices," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 69(2), pages 365-387, April.
    4. Feng, Long & Sun, Fasheng, 2015. "A note on high-dimensional two-sample test," Statistics & Probability Letters, Elsevier, vol. 105(C), pages 29-36.
    5. Davy Paindaveine & Thomas Verdebout, 2013. "Universal Asymptotics for High-Dimensional Sign Tests," Working Papers ECARES ECARES 2013-40, ULB -- Universite Libre de Bruxelles.
    6. Huiqin Li & Jiang Hu & Zhidong Bai & Yanqing Yin & Kexin Zou, 2017. "Test on the linear combinations of mean vectors in high-dimensional data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 26(1), pages 188-208, March.
    7. Harrar, Solomon W. & Kong, Xiaoli, 2022. "Recent developments in high-dimensional inference for multivariate data: Parametric, semiparametric and nonparametric approaches," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    8. Li, Yang & Wang, Zhaojun & Zou, Changliang, 2016. "A simpler spatial-sign-based two-sample test for high-dimensional data," Journal of Multivariate Analysis, Elsevier, vol. 149(C), pages 192-198.
    9. Zhang, Yangchun & Zhou, Yirui & Liu, Xiaowei, 2023. "Applications on linear spectral statistics of high-dimensional sample covariance matrix with divergent spectrum," Computational Statistics & Data Analysis, Elsevier, vol. 178(C).
    10. Ley, Christophe & Paindaveine, Davy & Verdebout, Thomas, 2015. "High-dimensional tests for spherical location and spiked covariance," Journal of Multivariate Analysis, Elsevier, vol. 139(C), pages 79-91.
    11. Chen, Song Xi & Guo, Bin & Qiu, Yumou, 2023. "Testing and signal identification for two-sample high-dimensional covariances via multi-level thresholding," Journal of Econometrics, Elsevier, vol. 235(2), pages 1337-1354.
    12. Jamshid Namdari & Debashis Paul & Lili Wang, 2021. "High-Dimensional Linear Models: A Random Matrix Perspective," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 83(2), pages 645-695, August.
    13. Dong, Kai & Pang, Herbert & Tong, Tiejun & Genton, Marc G., 2016. "Shrinkage-based diagonal Hotelling’s tests for high-dimensional small sample size data," Journal of Multivariate Analysis, Elsevier, vol. 143(C), pages 127-142.
    14. Zhidong Bai & Jiang Hu & Chen Wang & Chao Zhang, 2021. "Test on the linear combinations of covariance matrices in high-dimensional data," Statistical Papers, Springer, vol. 62(2), pages 701-719, April.
    15. Feng, Long & Zhang, Xiaoxu & Liu, Binghui, 2020. "A high-dimensional spatial rank test for two-sample location problems," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    16. Li, Jun, 2023. "Finite sample t-tests for high-dimensional means," Journal of Multivariate Analysis, Elsevier, vol. 196(C).
    17. Zhengbang Li & Fuxiang Liu & Luanjie Zeng & Guoxin Zuo, 2021. "A stationary bootstrap test about two mean vectors comparison with somewhat dense differences and fewer sample size than dimension," Computational Statistics, Springer, vol. 36(2), pages 941-960, June.
    18. Tao Zhang & Zhiwen Wang & Yanling Wan, 2021. "Functional test for high-dimensional covariance matrix, with application to mitochondrial calcium concentration," Statistical Papers, Springer, vol. 62(3), pages 1213-1230, June.
    19. Zhang, Jin-Ting & Guo, Jia & Zhou, Bu, 2017. "Linear hypothesis testing in high-dimensional one-way MANOVA," Journal of Multivariate Analysis, Elsevier, vol. 155(C), pages 200-216.
    20. Yuanyuan Jiang & Xingzhong Xu, 2022. "A Two-Sample Test of High Dimensional Means Based on Posterior Bayes Factor," Mathematics, MDPI, vol. 10(10), pages 1-23, May.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:74:y:2014:i:c:p:26-38. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.