IDEAS home Printed from https://ideas.repec.org/a/bla/scjsta/v48y2021i4p1314-1343.html
   My bibliography  Save this article

A new Gini correlation between quantitative and qualitative variables

Author

Listed:
  • Xin Dang
  • Dao Nguyen
  • Yixin Chen
  • Junying Zhang

Abstract

We propose a new Gini correlation to measure dependence between a categorical and numerical variables. Analogous to Pearson R2 in ANOVA model, the Gini correlation is interpreted as the ratio of the between‐group variation and the total variation, but it characterizes independence (zero Gini correlation mutually implies independence). Closely related to the distance correlation, the Gini correlation is of simple formulation by considering the nature of categorical variable. As a result, the proposed Gini correlation has a simpler computation implementation than the distance correlation and is more straightforward to perform inference. Simulation and real data applications are conducted to demonstrate the advantages.

Suggested Citation

  • Xin Dang & Dao Nguyen & Yixin Chen & Junying Zhang, 2021. "A new Gini correlation between quantitative and qualitative variables," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 48(4), pages 1314-1343, December.
  • Handle: RePEc:bla:scjsta:v:48:y:2021:i:4:p:1314-1343
    DOI: 10.1111/sjos.12490
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/sjos.12490
    Download Restriction: no

    File URL: https://libkey.io/10.1111/sjos.12490?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Dorfman, Robert, 1979. "A Formula for the Gini Coefficient," The Review of Economics and Statistics, MIT Press, vol. 61(1), pages 146-149, February.
    2. Xin Dang & Hailin Sang & Lauren Weatherall, 2019. "Gini covariance matrix and its affine equivariant version," Statistical Papers, Springer, vol. 60(3), pages 641-666, June.
    3. Koshevoy, G. A. & Mosler, K., 1997. "Multivariate Gini Indices," Journal of Multivariate Analysis, Elsevier, vol. 60(2), pages 252-276, February.
    4. Cui, Hengjian & Zhong, Wei, 2019. "A distribution-free test of independence based on mean variance index," Computational Statistics & Data Analysis, Elsevier, vol. 139(C), pages 117-133.
    5. Corrado Gini, 2005. "On the measurement of concentration and variability of characters," Metron - International Journal of Statistics, Dipartimento di Statistica, Probabilità e Statistiche Applicate - University of Rome, vol. 0(1), pages 1-38.
    6. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    7. Runze Li & Wei Zhong & Liping Zhu, 2012. "Feature Screening via Distance Correlation Learning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(499), pages 1129-1139, September.
    8. Brian C Ross, 2014. "Mutual Information between Discrete and Continuous Data Sets," PLOS ONE, Public Library of Science, vol. 9(2), pages 1-5, February.
    9. Tjur, Tue, 2009. "Coefficients of Determination in Logistic Regression Models—A New Proposal: The Coefficient of Discrimination," The American Statistician, American Statistical Association, vol. 63(4), pages 366-372.
    10. Gabor J. Szekely & Maria L. Rizzo, 2005. "Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method," Journal of Classification, Springer;The Classification Society, vol. 22(2), pages 151-183, September.
    11. Frick, Joachim R. & Goebel, Jan & Schechtman, Edna & Wagner, Gert G. & Yitzhaki, Shlomo, 2006. "Using Analysis of Gini (ANOGI) for Detecting Whether Two Subsamples Represent the Same Universe: The German Socio-Economic Panel Study (SOEP) Experience," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 34(4), pages 427-468.
    12. E. Schechtman & S. Yitzhaki, 2003. "A Family of Correlation Coefficients Based on the Extended Gini Index," The Journal of Economic Inequality, Springer;Society for the Study of Economic Inequality, vol. 1(2), pages 129-146, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yongli Sang & Xin Dang, 2024. "Asymptotic normality of a modified estimator of Gini distance correlation," Statistical Papers, Springer, vol. 65(8), pages 4843-4860, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xi Wu & Shifeng Xiong & Weiyan Mu, 2023. "An Ensemble Method for Feature Screening," Mathematics, MDPI, vol. 11(2), pages 1-14, January.
    2. Jing Zhang & Qihua Wang & Xuan Wang, 2022. "Surrogate-variable-based model-free feature screening for survival data under the general censoring mechanism," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(2), pages 379-397, April.
    3. Zhaoyu Xing & Yang Wan & Juan Wen & Wei Zhong, 2024. "GOLFS: feature selection via combining both global and local information for high dimensional clustering," Computational Statistics, Springer, vol. 39(5), pages 2651-2675, July.
    4. Wang, Christina Dan & Chen, Zhao & Lian, Yimin & Chen, Min, 2022. "Asset selection based on high frequency Sharpe ratio," Journal of Econometrics, Elsevier, vol. 227(1), pages 168-188.
    5. Linh H. Nghiem & Francis K.C. Hui & Samuel Müller & A.H. Welsh, 2023. "Screening methods for linear errors‐in‐variables models in high dimensions," Biometrics, The International Biometric Society, vol. 79(2), pages 926-939, June.
    6. Hung Hung & Su‐Yun Huang, 2019. "Sufficient dimension reduction via random‐partitions for the large‐p‐small‐n problem," Biometrics, The International Biometric Society, vol. 75(1), pages 245-255, March.
    7. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    8. Jingyuan Liu & Runze Li & Rongling Wu, 2014. "Feature Selection for Varying Coefficient Models With Ultrahigh-Dimensional Covariates," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(505), pages 266-274, March.
    9. Zhang, Jing & Wang, Qihua & Kang, Jian, 2020. "Feature screening under missing indicator imputation with non-ignorable missing response," Computational Statistics & Data Analysis, Elsevier, vol. 149(C).
    10. Lu, Jun & Lin, Lu, 2018. "Feature screening for multi-response varying coefficient models with ultrahigh dimensional predictors," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 242-254.
    11. Wang, Pei & Yin, Xiangrong & Yuan, Qingcong & Kryscio, Richard, 2021. "Feature filter for estimating central mean subspace and its sparse solution," Computational Statistics & Data Analysis, Elsevier, vol. 163(C).
    12. Jun Lu & Dan Wang & Qinqin Hu, 2022. "Interaction screening via canonical correlation," Computational Statistics, Springer, vol. 37(5), pages 2637-2670, November.
    13. Lai, Peng & Song, Fengli & Chen, Kaiwen & Liu, Zhi, 2017. "Model free feature screening with dependent variable in ultrahigh dimensional binary classification," Statistics & Probability Letters, Elsevier, vol. 125(C), pages 141-148.
    14. Xiaochao Xia & Hao Ming, 2022. "A Flexibly Conditional Screening Approach via a Nonparametric Quantile Partial Correlation," Mathematics, MDPI, vol. 10(24), pages 1-32, December.
    15. Zhao, Bangxin & Liu, Xin & He, Wenqing & Yi, Grace Y., 2021. "Dynamic tilted current correlation for high dimensional variable screening," Journal of Multivariate Analysis, Elsevier, vol. 182(C).
    16. Zhenghui Feng & Lu Lin & Ruoqing Zhu & Lixing Zhu, 2020. "Nonparametric variable selection and its application to additive models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 72(3), pages 827-854, June.
    17. Kim, Arlene Kyoung Hee & Shin, Seung Jun, 2017. "The cumulative Kolmogorov filter for model-free screening in ultrahigh dimensional data," Statistics & Probability Letters, Elsevier, vol. 126(C), pages 238-243.
    18. Liu, Jingyuan & Sun, Ao & Ke, Yuan, 2024. "A generalized knockoff procedure for FDR control in structural change detection," Journal of Econometrics, Elsevier, vol. 239(2).
    19. Li, Lu & Ke, Chenlu & Yin, Xiangrong & Yu, Zhou, 2023. "Generalized martingale difference divergence: Detecting conditional mean independence with applications in variable screening," Computational Statistics & Data Analysis, Elsevier, vol. 180(C).
    20. He, Yong & Zhang, Liang & Ji, Jiadong & Zhang, Xinsheng, 2019. "Robust feature screening for elliptical copula regression model," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 568-582.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:scjsta:v:48:y:2021:i:4:p:1314-1343. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0303-6898 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.