IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v96y2016icp145-158.html
   My bibliography  Save this article

Graph-theoretic multisample tests of equality in distribution for high dimensional data

Author

Listed:
  • Petrie, Adam

Abstract

Testing whether two or more independent samples arise from a common distribution is a classic problem in statistics. Several multivariate two-sample tests of equality are based on graphs such as the minimum spanning tree, nearest neighbor, and optimal nonbipartite perfect matching. Here, the samples are pooled and the test statistic is the number of edges in the graph that connect points with different sample identities. These tests are typically unbiased and perform well when estimates of underlying probability densities are poor. However, these tests have not been thoroughly studied when data is very high dimensional or in the multisample case. We introduce the use of orthogonal perfect matchings for testing equality in distribution. A suite of Monte Carlo simulations on artificial and real data shows that orthogonal perfect matchings and spanning trees typically have higher power than other graphs and are also more effective at discerning when samples have differences in their covariance structure compared to other nonparametric tests such as the energy and triangle tests.

Suggested Citation

  • Petrie, Adam, 2016. "Graph-theoretic multisample tests of equality in distribution for high dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 96(C), pages 145-158.
  • Handle: RePEc:eee:csdana:v:96:y:2016:i:c:p:145-158
    DOI: 10.1016/j.csda.2015.11.003
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947315002716
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2015.11.003?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Lu, Bo & Greevy, Robert & Xu, Xinyi & Beck, Cole, 2011. "Optimal Nonbipartite Matching and Its Statistical Applications," The American Statistician, American Statistical Association, vol. 65(1), pages 21-30.
    2. Paul R. Rosenbaum, 2005. "An exact distribution‐free test comparing two multivariate distributions based on adjacency," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(4), pages 515-530, September.
    3. Justel, Ana & Peña, Daniel & Zamar, Rubén, 1997. "A multivariate Kolmogorov-Smirnov test of goodness of fit," Statistics & Probability Letters, Elsevier, vol. 35(3), pages 251-259, October.
    4. Dinh Pham & Joachim Möcks & Lothar Sroka, 1989. "Asymptotic normality of double-indexed linear permutation statistics," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 41(3), pages 415-427, September.
    5. Zhenyu Liu & Reza Modarres, 2011. "A triangle test for equality of distribution functions in high dimensions," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 23(3), pages 605-615.
    6. Dale L. Zimmerman, 1993. "A Bivariate Cramér–Von Mises Type of Test for Spatial Randomness," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 42(1), pages 43-54, March.
    7. Nettleton, Dan & Banerjee, T., 2001. "Testing the equality of distributions of random vectors with categorical components," Computational Statistics & Data Analysis, Elsevier, vol. 37(2), pages 195-208, August.
    8. Rousson, Valentin, 2002. "On Distribution-Free Tests for the Multivariate Two-Sample Location-Scale Model," Journal of Multivariate Analysis, Elsevier, vol. 80(1), pages 43-57, January.
    9. Baringhaus, L. & Franz, C., 2004. "On a new multivariate two-sample test," Journal of Multivariate Analysis, Elsevier, vol. 88(1), pages 190-206, January.
    10. Anderson, N. H. & Hall, P. & Titterington, D. M., 1994. "Two-Sample Test Statistics for Measuring Discrepancies Between Two Multivariate Probability Density Functions Using Kernel-Based Density Estimates," Journal of Multivariate Analysis, Elsevier, vol. 50(1), pages 41-54, July.
    11. Justel, Ana & Peña, Daniel & Zamar, Rubén, 1997. "A multivariate Kolmogorov-Smirnov test of goodness of fit," Statistics & Probability Letters, Elsevier, vol. 35(3), pages 251-259, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Luai Al-Labadi & Forough Fazeli Asl & Zahra Saberi, 2022. "A Bayesian nonparametric multi-sample test in any dimension," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 106(2), pages 217-242, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Biswas, Munmun & Ghosh, Anil K., 2014. "A nonparametric two-sample test applicable to high dimensional data," Journal of Multivariate Analysis, Elsevier, vol. 123(C), pages 160-171.
    2. Modarres, Reza, 2014. "On the interpoint distances of Bernoulli vectors," Statistics & Probability Letters, Elsevier, vol. 84(C), pages 215-222.
    3. Shin-ichi Tsukada, 2019. "High dimensional two-sample test based on the inter-point distance," Computational Statistics, Springer, vol. 34(2), pages 599-615, June.
    4. Mondal, Pronoy K. & Biswas, Munmun & Ghosh, Anil K., 2015. "On high dimensional two-sample tests based on nearest neighbors," Journal of Multivariate Analysis, Elsevier, vol. 141(C), pages 168-178.
    5. Petrie, Adam & Willemain, Thomas R., 2013. "An empirical study of tests for uniformity in multidimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 64(C), pages 253-268.
    6. Paul, Biplab & De, Shyamal K. & Ghosh, Anil K., 2022. "Some clustering-based exact distribution-free k-sample tests applicable to high dimension, low sample size data," Journal of Multivariate Analysis, Elsevier, vol. 190(C).
    7. Carole Bernard & Oleg Bondarenko & Steven Vanduffel, 2021. "A model-free approach to multivariate option pricing," Review of Derivatives Research, Springer, vol. 24(2), pages 135-155, July.
    8. Jie Shi & Arno P. J. M. Siebes & Siamak Mehrkanoon, 2023. "TransCORALNet: A Two-Stream Transformer CORAL Networks for Supply Chain Credit Assessment Cold Start," Papers 2311.18749, arXiv.org.
    9. Squalli, Jay, 2017. "Renewable energy, coal as a baseload power source, and greenhouse gas emissions: Evidence from U.S. state-level data," Energy, Elsevier, vol. 127(C), pages 479-488.
    10. Qiu, Tao & Zhang, Qintong & Fang, Yuanyuan & Xu, Wangli, 2024. "Testing homogeneity in high dimensional data through random projections," Journal of Multivariate Analysis, Elsevier, vol. 200(C).
    11. Chiragiev, Arthur & Landsman, Zinoviy, 2009. "Multivariate flexible Pareto model: Dependency structure, properties and characterizations," Statistics & Probability Letters, Elsevier, vol. 79(16), pages 1733-1743, August.
    12. Torri, Gabriele & Giacometti, Rosella & Paterlini, Sandra, 2018. "Robust and sparse banking network estimation," European Journal of Operational Research, Elsevier, vol. 270(1), pages 51-65.
    13. Yue, Zenghui & Xu, Haiyun & Yuan, Guoting & Pang, Hongshen, 2019. "Modeling study of knowledge diffusion in scientific collaboration networks based on differential dynamics: A case study in graphene field," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 524(C), pages 375-391.
    14. Langrené, Nicolas & Warin, Xavier, 2021. "Fast multivariate empirical cumulative distribution function with connection to kernel density estimation," Computational Statistics & Data Analysis, Elsevier, vol. 162(C).
    15. Cheng, Qixiu & Lin, Yuqian & Zhou, Xuesong (Simon) & Liu, Zhiyuan, 2024. "Analytical formulation for explaining the variations in traffic states: A fundamental diagram modeling perspective with stochastic parameters," European Journal of Operational Research, Elsevier, vol. 312(1), pages 182-197.
    16. R. N. Rattihalli, 2023. "A Class of Multivariate Power Skew Symmetric Distributions: Properties and Inference for the Power-Parameter," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 85(2), pages 1356-1393, August.
    17. Zuliqar Ali & Ijaz Hussain & Muhammad Faisal & Hafiza Mamona Nazir & Mitwali Abd-el Moemen & Tajammal Hussain & Sadaf Shamsuddin, 2017. "A Novel Multi-Scalar Drought Index for Monitoring Drought: the Standardized Precipitation Temperature Index," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 31(15), pages 4957-4969, December.
    18. Jean-David Fermanian, 2003. "Goodness of Fit Tests for Copulas," Working Papers 2003-34, Center for Research in Economics and Statistics.
    19. Abul Kalam Azad & Mohammad Golam Rasul & Talal Yusaf, 2014. "Statistical Diagnosis of the Best Weibull Methods for Wind Power Assessment for Agricultural Applications," Energies, MDPI, vol. 7(5), pages 1-30, May.
    20. Audrius Kabašinskas & Leonidas Sakalauskas & Ingrida Vaičiulytė, 2021. "An Analytical EM Algorithm for Sub-Gaussian Vectors," Mathematics, MDPI, vol. 9(9), pages 1-20, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:96:y:2016:i:c:p:145-158. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.