IDEAS home Printed from https://ideas.repec.org/a/eee/jmvana/v202y2024ics0047259x24000277.html
   My bibliography  Save this article

A fast and accurate kernel-based independence test with applications to high-dimensional and functional data

Author

Listed:
  • Zhang, Jin-Ting
  • Zhu, Tianming

Abstract

Testing the dependency between two random variables is an important inference problem in statistics since many statistical procedures rely on the assumption that the two samples are independent. To test whether two samples are independent, a so-called HSIC (Hilbert–Schmidt Independence Criterion)-based test has been proposed. Its null distribution is approximated either by permutation or a Gamma approximation. In this paper, a new HSIC-based test is proposed. Its asymptotic null and alternative distributions are established. It is shown that the proposed test is root-n consistent. A three-cumulant matched chi-squared-approximation is adopted to approximate the null distribution of the test statistic. By choosing a proper reproducing kernel, the proposed test can be applied to many different types of data including multivariate, high-dimensional, and functional data. Three simulation studies and two real data applications show that in terms of level accuracy, power, and computational cost, the proposed test outperforms several existing tests for multivariate, high-dimensional, and functional data.

Suggested Citation

  • Zhang, Jin-Ting & Zhu, Tianming, 2024. "A fast and accurate kernel-based independence test with applications to high-dimensional and functional data," Journal of Multivariate Analysis, Elsevier, vol. 202(C).
  • Handle: RePEc:eee:jmvana:v:202:y:2024:i:c:s0047259x24000277
    DOI: 10.1016/j.jmva.2024.105320
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0047259X24000277
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jmva.2024.105320?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Niklas Pfister & Peter Bühlmann & Bernhard Schölkopf & Jonas Peters, 2018. "Kernel‐based tests for joint independence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(1), pages 5-31, January.
    2. Zhou, Yang & Lin, Shu-Chin & Wang, Jane-Ling, 2018. "Local and global temporal correlations for longitudinal data," Journal of Multivariate Analysis, Elsevier, vol. 167(C), pages 1-14.
    3. Zhang, Jin-Ting & Guo, Jia & Zhou, Bu, 2024. "Testing equality of several distributions in separable metric spaces: A maximum mean discrepancy based approach," Journal of Econometrics, Elsevier, vol. 239(2).
    4. Jin-Ting Zhang, 2005. "Approximate and Asymptotic Distributions of Chi-Squared-Type Mixtures With Applications," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 273-285, March.
    5. Qiu, Tao & Xu, Wangli & Zhu, Lixing, 2023. "Independence tests with random subspace of two random vectors in high dimension," Journal of Multivariate Analysis, Elsevier, vol. 195(C).
    6. Dubin, Joel A. & Muller, Hans-Georg, 2005. "Dynamical Correlation for Multivariate Longitudinal Data," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 872-881, September.
    7. Zhu, Tianming & Zhang, Jin-Ting & Cheng, Ming-Yen, 2022. "One-way MANOVA for functional data via Lawley–Hotelling trace test," Journal of Multivariate Analysis, Elsevier, vol. 192(C).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Nicoleta Serban & Huijing Jiang, 2012. "Multilevel Functional Clustering Analysis," Biometrics, The International Biometric Society, vol. 68(3), pages 805-814, September.
    2. Xiongtao Dai & Zhenhua Lin & Hans‐Georg Müller, 2021. "Modeling sparse longitudinal data on Riemannian manifolds," Biometrics, The International Biometric Society, vol. 77(4), pages 1328-1341, December.
    3. Zhang, Qingyang, 2019. "Independence test for large sparse contingency tables based on distance correlation," Statistics & Probability Letters, Elsevier, vol. 148(C), pages 17-22.
    4. Jin-Ting Zhang & Xuehua Liang, 2014. "One-Way anova for Functional Data via Globalizing the Pointwise F-test," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 41(1), pages 51-71, March.
    5. Mariusz Kubkowski & Jan Mielniczuk, 2021. "Asymptotic Distributions of Empirical Interaction Information," Methodology and Computing in Applied Probability, Springer, vol. 23(1), pages 291-315, March.
    6. Dalia Valencia & Rosa E. Lillo & Juan Romo, 2019. "A Kendall correlation coefficient between functional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(4), pages 1083-1103, December.
    7. Tomasz Górecki & Mirosław Krzyśko & Łukasz Waszak & Waldemar Wołyński, 2018. "Selected statistical methods of data analysis for multivariate functional data," Statistical Papers, Springer, vol. 59(1), pages 153-182, March.
    8. Rafael Carvalho Ceregatti & Rafael Izbicki & Luis Ernesto Bueno Salasar, 2021. "WIKS: a general Bayesian nonparametric index for quantifying differences between two populations," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(1), pages 274-291, March.
    9. Valencia García, Dalia Jazmin & Lillo Rodríguez, Rosa Elvira & Romo, Juan, 2013. "Spearman coefficient for functions," DES - Working Papers. Statistics and Econometrics. WS ws133329, Universidad Carlos III de Madrid. Departamento de Estadística.
    10. Fernández-Durán Juan José & Gregorio-Domínguez María Mercedes, 2023. "Test of bivariate independence based on angular probability integral transform with emphasis on circular-circular and circular-linear data," Dependence Modeling, De Gruyter, vol. 11(1), pages 1-17, January.
    11. Chamakh, Linda & Szabo, Zoltan, 2021. "Kernel minimum divergence portfolios," LSE Research Online Documents on Economics 115723, London School of Economics and Political Science, LSE Library.
    12. Kuang‐Yao Lee & Lexin Li, 2022. "Functional structural equation model," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(2), pages 600-629, April.
    13. Zhang, Jin-Ting & Guo, Jia & Zhou, Bu, 2017. "Linear hypothesis testing in high-dimensional one-way MANOVA," Journal of Multivariate Analysis, Elsevier, vol. 155(C), pages 200-216.
    14. Xu, Kai & Cheng, Qing, 2024. "Test of conditional independence in factor models via Hilbert–Schmidt independence criterion," Journal of Multivariate Analysis, Elsevier, vol. 199(C).
    15. Valencia García, Dalia Jazmin & Lillo Rodríguez, Rosa Elvira & Romo, Juan, 2013. "A Kendall correlation coefficient for functional dependence," DES - Working Papers. Statistics and Econometrics. WS ws133228, Universidad Carlos III de Madrid. Departamento de Estadística.
    16. Huang, Peng & Gu, Yingkui & Li, He & Yazdi, Mohammad & Qiu, Guangqi, 2023. "An Optimal Tolerance Design Approach of Robot Manipulators for Positioning Accuracy Reliability," Reliability Engineering and System Safety, Elsevier, vol. 237(C).
    17. Zhang, Jin-Ting & Zhou, Bu & Guo, Jia, 2022. "Linear hypothesis testing in high-dimensional heteroscedastic one-way MANOVA: A normal reference L2-norm based test," Journal of Multivariate Analysis, Elsevier, vol. 187(C).
    18. Qiu, Zhiping & Fan, Jiangyuan & Zhang, Jin-Ting & Chen, Jianwei, 2024. "Tests for equality of several covariance matrix functions for multivariate functional data," Journal of Multivariate Analysis, Elsevier, vol. 199(C).
    19. Roy, Angshuman & Ghosh, Anil K., 2020. "Some tests of independence based on maximum mean discrepancy and ranks of nearest neighbors," Statistics & Probability Letters, Elsevier, vol. 164(C).
    20. Huang, Zhendong & Ferrari, Davide & Qian, Guoqi, 2017. "Parsimonious and powerful composite likelihood testing for group difference and genotype–phenotype association," Computational Statistics & Data Analysis, Elsevier, vol. 110(C), pages 37-49.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:jmvana:v:202:y:2024:i:c:s0047259x24000277. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/622892/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.