IDEAS home Printed from https://ideas.repec.org/a/eee/jmvana/v174y2019ics0047259x19300521.html
   My bibliography  Save this article

A two-sample test for the equality of univariate marginal distributions for high-dimensional data

Author

Listed:
  • Cousido-Rocha, Marta
  • de Uña-Álvarez, Jacobo
  • Hart, Jeffrey D.

Abstract

A recurring theme in modern statistics is dealing with high-dimensional data whose main feature is a large number, p, of variables but a small sample size. In this context our aim is to address the problem of testing the null hypothesis that the marginal distributions of p variables are the same for two groups. We propose a test statistic motivated by the simple idea of comparing, for each of the p variables, the empirical characteristic functions computed from the two samples. The asymptotic normality of the test statistic is derived under mixing conditions. In our asymptotic analysis the number of variables tends to infinity, while the size of individual samples remains fixed. In order to obtain a practical test several estimators of the variance are proposed, leading to three somewhat different versions of the test. An alternative global test based on the P-values derived from permutation tests is also proposed. A simulation study to investigate the finite sample properties of the proposed tests is carried out, and a practical illustration involving microarray data is provided.

Suggested Citation

  • Cousido-Rocha, Marta & de Uña-Álvarez, Jacobo & Hart, Jeffrey D., 2019. "A two-sample test for the equality of univariate marginal distributions for high-dimensional data," Journal of Multivariate Analysis, Elsevier, vol. 174(C).
  • Handle: RePEc:eee:jmvana:v:174:y:2019:i:c:s0047259x19300521
    DOI: 10.1016/j.jmva.2019.104537
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0047259X19300521
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jmva.2019.104537?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Biswas, Munmun & Ghosh, Anil K., 2014. "A nonparametric two-sample test applicable to high dimensional data," Journal of Multivariate Analysis, Elsevier, vol. 123(C), pages 160-171.
    2. Hu, Tien-Chung & Rosalsky, Andrew & Volodin, Andrei, 2008. "On convergence properties of sums of dependent random variables under second moment and covariance restrictions," Statistics & Probability Letters, Elsevier, vol. 78(14), pages 1999-2005, October.
    3. Liu, Zhi & Xia, Xiaochao & Zhou, Wang, 2015. "A test for equality of two distributions via jackknife empirical likelihood and characteristic functions," Computational Statistics & Data Analysis, Elsevier, vol. 92(C), pages 97-114.
    4. Mondal, Pronoy K. & Biswas, Munmun & Ghosh, Anil K., 2015. "On high dimensional two-sample tests based on nearest neighbors," Journal of Multivariate Analysis, Elsevier, vol. 141(C), pages 168-178.
    5. Gupta, A. K. & Henze, N. & Klar, B., 2004. "Testing for affine equivalence of elliptically symmetric distributions," Journal of Multivariate Analysis, Elsevier, vol. 88(2), pages 222-242, February.
    6. Giraitis, Liudas & Kokoszka, Piotr & Leipus, Remigijus & Teyssiere, Gilles, 2003. "Rescaled variance and related tests for long memory in volatility and levels," Journal of Econometrics, Elsevier, vol. 112(2), pages 265-294, February.
    7. D. Zhan & J. D. Hart, 2014. "Testing equality of a large number of densities," Biometrika, Biometrika Trust, vol. 101(2), pages 449-464.
    8. Dimitris Politis & Halbert White, 2004. "Automatic Block-Length Selection for the Dependent Bootstrap," Econometric Reviews, Taylor & Francis Journals, vol. 23(1), pages 53-70.
    9. Marta Cousido-Rocha & Jacobo Uña-Álvarez & Jeffrey D. Hart, 2019. "Testing equality of a large number of densities under mixing conditions," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(4), pages 1203-1228, December.
    10. Chen, Song Xi & Qin, Yingli, 2010. "A Two Sample Test for High Dimensional Data with Applications to Gene-set Testing," MPRA Paper 59642, University Library of Munich, Germany.
    11. Jiang Hu & Zhidong Bai & Chen Wang & Wei Wang, 2017. "On testing the equality of high dimensional mean vectors with unequal covariance matrices," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 69(2), pages 365-387, April.
    12. Munmun Biswas & Minerva Mukhopadhyay & Anil K. Ghosh, 2014. "A distribution-free two-sample run test applicable to high-dimensional data," Biometrika, Biometrika Trust, vol. 101(4), pages 913-926.
    13. Marie Hušková & Simos Meintanis, 2008. "Tests for the multivariate -sample problem based on the empirical characteristic function," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 20(3), pages 263-277.
    14. GIRAITIS, Liudas & KOKOSZKA, Piotr & LEIPUS, Remigijus & TEYSSIÈRE, Gilles, 2003. "Rescaled variance and related tests for long memory in volatility and levels," LIDAM Reprints CORE 1594, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    15. Radulovic, Dragan, 1996. "The bootstrap of the mean for strong mixing sequences under minimal conditions," Statistics & Probability Letters, Elsevier, vol. 28(1), pages 65-72, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. David M. Ritzwoller & Joseph P. Romano & Azeem M. Shaikh, 2024. "Randomization Inference: Theory and Applications," Papers 2406.09521, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shin-ichi Tsukada, 2019. "High dimensional two-sample test based on the inter-point distance," Computational Statistics, Springer, vol. 34(2), pages 599-615, June.
    2. Paul, Biplab & De, Shyamal K. & Ghosh, Anil K., 2022. "Some clustering-based exact distribution-free k-sample tests applicable to high dimension, low sample size data," Journal of Multivariate Analysis, Elsevier, vol. 190(C).
    3. Saha, Enakshi & Sarkar, Soham & Ghosh, Anil K., 2017. "Some high-dimensional one-sample tests based on functions of interpoint distances," Journal of Multivariate Analysis, Elsevier, vol. 161(C), pages 83-95.
    4. M. D. Jiménez-Gamero & M. Cousido-Rocha & M. V. Alba-Fernández & F. Jiménez-Jiménez, 2022. "Testing the equality of a large number of populations," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(1), pages 1-21, March.
    5. Nicolas Städler & Sach Mukherjee, 2017. "Two-sample testing in high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(1), pages 225-246, January.
    6. Qiu, Tao & Zhang, Qintong & Fang, Yuanyuan & Xu, Wangli, 2024. "Testing homogeneity in high dimensional data through random projections," Journal of Multivariate Analysis, Elsevier, vol. 200(C).
    7. Mondal, Pronoy K. & Biswas, Munmun & Ghosh, Anil K., 2015. "On high dimensional two-sample tests based on nearest neighbors," Journal of Multivariate Analysis, Elsevier, vol. 141(C), pages 168-178.
    8. Lee, Sangyeol & Meintanis, Simos G. & Pretorius, Charl, 2022. "Monitoring procedures for strict stationarity based on the multivariate characteristic function," Journal of Multivariate Analysis, Elsevier, vol. 189(C).
    9. Pini, Alessia & Stamm, Aymeric & Vantini, Simone, 2018. "Hotelling’s T2 in separable Hilbert spaces," Journal of Multivariate Analysis, Elsevier, vol. 167(C), pages 284-305.
    10. Yue, Mu & Li, Jialiang & Cheng, Ming-Yen, 2019. "Two-step sparse boosting for high-dimensional longitudinal data with varying coefficients," Computational Statistics & Data Analysis, Elsevier, vol. 131(C), pages 222-234.
    11. Zhang, Jin-Ting & Guo, Jia & Zhou, Bu, 2024. "Testing equality of several distributions in separable metric spaces: A maximum mean discrepancy based approach," Journal of Econometrics, Elsevier, vol. 239(2).
    12. Ata Assaf & Luis Alberiko Gil-Alana & Khaled Mokni, 2022. "True or spurious long memory in the cryptocurrency markets: evidence from a multivariate test and other Whittle estimation methods," Empirical Economics, Springer, vol. 63(3), pages 1543-1570, September.
    13. Reza Modarres, 2024. "Hotelling $$T^2$$ T 2 test in high dimensions with application to Wilks outlier method," Statistical Papers, Springer, vol. 65(8), pages 5203-5218, October.
    14. Bill Russell & Dooruj Rambaccussing, 2019. "Breaks and the statistical process of inflation: the case of estimating the ‘modern’ long-run Phillips curve," Empirical Economics, Springer, vol. 56(5), pages 1455-1475, May.
    15. Chkili, Walid & Aloui, Chaker & Nguyen, Duc Khuong, 2012. "Asymmetric effects and long memory in dynamic volatility relationships between stock returns and exchange rates," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 22(4), pages 738-757.
    16. Christian Peretti, 2007. "Long Memory and Hysteresis," Springer Books, in: Gilles Teyssière & Alan P. Kirman (ed.), Long Memory in Economics, pages 363-389, Springer.
    17. Shi, Yanlin & Ho, Kin-Yip, 2015. "Long memory and regime switching: A simulation study on the Markov regime-switching ARFIMA model," Journal of Banking & Finance, Elsevier, vol. 61(S2), pages 189-204.
    18. Zhou, Bu & Guo, Jia, 2017. "A note on the unbiased estimator of Σ2," Statistics & Probability Letters, Elsevier, vol. 129(C), pages 141-146.
    19. Surgailis, Donatas & Teyssière, Gilles & Vaiciulis, Marijus, 2008. "The increment ratio statistic," Journal of Multivariate Analysis, Elsevier, vol. 99(3), pages 510-541, March.
    20. TEYSSIERE, Gilles, 2003. "Interaction models for common long-range dependence in asset price volatilities," LIDAM Discussion Papers CORE 2003026, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:jmvana:v:174:y:2019:i:c:s0047259x19300521. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/622892/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.