IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v152y2020ics0167947320301341.html
   My bibliography  Save this article

A new correlation coefficient between categorical, ordinal and interval variables with Pearson characteristics

Author

Listed:
  • Baak, M.
  • Koopman, R.
  • Snoek, H.
  • Klous, S.

Abstract

A prescription is presented for a new and practical correlation coefficient, ϕK, based on several refinements to Pearson’s hypothesis test of independence of two variables. The combined features of ϕK form an advantage over existing coefficients. Primarily, it works consistently between categorical, ordinal and interval variables, in essence by treating each variable as categorical, and can therefore be used to calculate correlations between variables of mixed type. Second, it captures nonlinear dependency. The strength of ϕK is similar to Pearson’s correlation coefficient, and is equivalent in case of a bivariate normal input distribution. These are useful properties when studying the correlations between variables with mixed types, where some are categorical. Two more innovations are presented: to the proper evaluation of statistical significance of correlations, and to the interpretation of variable relationships in a contingency table, in particular in case of sparse or low statistics samples and significant dependencies. Two practical applications are discussed. The presented algorithms are easy to use and available through a public Python library.11https://github.com/KaveIO/PhiK.

Suggested Citation

  • Baak, M. & Koopman, R. & Snoek, H. & Klous, S., 2020. "A new correlation coefficient between categorical, ordinal and interval variables with Pearson characteristics," Computational Statistics & Data Analysis, Elsevier, vol. 152(C).
  • Handle: RePEc:eee:csdana:v:152:y:2020:i:c:s0167947320301341
    DOI: 10.1016/j.csda.2020.107043
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947320301341
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2020.107043?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. P. M. Kroonenberg & Albert Verbeek, 2018. "The Tale of Cochran's Rule: My Contingency Table has so Many Expected Values Smaller than 5, What Am I to Do?," The American Statistician, Taylor & Francis Journals, vol. 72(2), pages 175-183, April.
    2. Garrido, J. & Genest, C. & Schulz, J., 2016. "Generalized linear models for dependent frequency and severity of insurance claims," Insurance: Mathematics and Economics, Elsevier, vol. 70(C), pages 205-215.
    3. W. M. Patefield, 1981. "An Efficient Method of Generating Random R × C Tables with Given Row and Column Totals," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 30(1), pages 91-97, March.
    4. Yoo, Yeawon & Escobedo, Adolfo R. & Skolfield, J. Kyle, 2020. "A new correlation coefficient for comparing and aggregating non-strict and incomplete rankings," European Journal of Operational Research, Elsevier, vol. 285(3), pages 1025-1041.
    5. Kim, Donguk & Agresti, Alan, 1997. "Nearly exact tests of conditional independence and marginal homogeneity for sparse contingency tables," Computational Statistics & Data Analysis, Elsevier, vol. 24(1), pages 89-104, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jialiang Cui & Vanessa Hoi Mei Cheung & Wenjie Huang & Wan Sang Kan, 2022. "Mental Distress during the COVID-19 Pandemic: A Cross-Sectional Study of Women Receiving the Comprehensive Social Security Allowance in Hong Kong," IJERPH, MDPI, vol. 19(16), pages 1-13, August.
    2. Cimpoeru Smaranda & Roman Monica & Kobeissi Amira & Mohammad Heba, 2020. "How are European Migrants from the MENA Countries Affected by COVID-19? Insights from an Online Survey," Journal of Social and Economic Statistics, Sciendo, vol. 9(1), pages 128-143, August.
    3. Zhou, Yu & Chen, Ben & Meng, Kai & Zhou, Haoran & Chen, Wenshang & Zhang, Ning & Deng, Qihao & Yang, Guanghua & Tu, Zhengkai, 2023. "Optimal design of a cathode flow field for performance enhancement of PEM fuel cell," Applied Energy, Elsevier, vol. 343(C).
    4. Yuan Liu & Chuyao Liao & Li Zhuo & Haiyan Tao, 2022. "Evaluating Effects of Dynamic Interventions to Control COVID-19 Pandemic: A Case Study of Guangdong, China," IJERPH, MDPI, vol. 19(16), pages 1-17, August.
    5. Leng, Lijian & Li, Tanghao & Zhan, Hao & Rizwan, Muhammad & Zhang, Weijin & Peng, Haoyi & Yang, Zequn & Li, Hailong, 2023. "Machine learning-aided prediction of nitrogen heterocycles in bio-oil from the pyrolysis of biomass," Energy, Elsevier, vol. 278(PB).
    6. Cesar de Lima Nogueira, Silvio & Och, Stephan Hennings & Moura, Luis Mauro & Domingues, Eric & Coelho, Leandro dos Santos & Mariani, Viviana Cocco, 2023. "Prediction of the NOx and CO2 emissions from an experimental dual fuel engine using optimized random forest combined with feature engineering," Energy, Elsevier, vol. 280(C).
    7. Tianqi Zhang & Yue Zhou & Ming Li & Haoran Zhang & Tong Wang & Yu Tian, 2022. "Impacts of Urbanization on Drainage System Health and Sustainable Drainage Recommendations for Future Scenarios—A Small City Case in China," Sustainability, MDPI, vol. 14(24), pages 1-24, December.
    8. Choi, Insu & Lee, Myounggu & Kim, Hyejin & Kim, Woo Chang, 2023. "Elucidating Directed Statistical Dependencies: Investigating Global Financial Market Indices' Influence on Korean Short Selling Activities," Pacific-Basin Finance Journal, Elsevier, vol. 79(C).
    9. Cosimo Russo & Alberto Castro & Andrea Gioia & Vito Iacobellis & Angela Gorgoglione, 2023. "A Stormwater Management Framework for Predicting First Flush Intensity and Quantifying its Influential Factors," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 37(3), pages 1437-1459, February.
    10. Bas Bosma & Arjen Witteloostuijn, 2024. "Machine learning in international business," Journal of International Business Studies, Palgrave Macmillan;Academy of International Business, vol. 55(6), pages 676-702, August.
    11. Podani, János & Patonai, Katalin & Szabó, Péter & Szilágyi, András, 2022. "Coefficients of association between nominal and fully ranked ordinal variables with applications to ecological network analysis," Ecological Modelling, Elsevier, vol. 466(C).
    12. Alla Yu. Vladova, 2022. "Remote Geotechnical Monitoring of a Buried Oil Pipeline," Mathematics, MDPI, vol. 10(11), pages 1-14, May.
    13. dos Santos Ferreira, Greicili & Martins dos Santos, Deilson & Luciano Avila, Sérgio & Viana Luiz Albani, Vinicius & Cardoso Orsi, Gustavo & Cesar Cordeiro Vieira, Pedro & Nilson Rodrigues, Rafael, 2023. "Short- and long-term forecasting for building energy consumption considering IPMVP recommendations, WEO and COP27 scenarios," Applied Energy, Elsevier, vol. 339(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Agresti, Alan & Coull, Brent A., 1998. "Order-restricted inference for monotone trend alternatives in contingency tables," Computational Statistics & Data Analysis, Elsevier, vol. 28(2), pages 139-155, August.
    2. Yang Lu, 2019. "Flexible (panel) regression models for bivariate count–continuous data with an insurance application," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 182(4), pages 1503-1521, October.
    3. Borkowf, Craig B., 2004. "An efficient algorithm for generating two-way contingency tables with fixed marginal totals and arbitrary mean proportions, with applications to permutation tests," Computational Statistics & Data Analysis, Elsevier, vol. 44(3), pages 431-449, January.
    4. Roland R. Ramsahai, 2020. "Connecting actuarial judgment to probabilistic learning techniques with graph theory," Papers 2007.15475, arXiv.org.
    5. Ramon Alemany & Catalina Bolancé & Roberto Rodrigo & Raluca Vernic, 2020. "Bivariate Mixed Poisson and Normal Generalised Linear Models with Sarmanov Dependence—An Application to Model Claim Frequency and Optimal Transformed Average Severity," Mathematics, MDPI, vol. 9(1), pages 1-18, December.
    6. Yeawon Yoo & Adolfo R. Escobedo, 2021. "A New Binary Programming Formulation and Social Choice Property for Kemeny Rank Aggregation," Decision Analysis, INFORMS, vol. 18(4), pages 296-320, December.
    7. Kim, Donguk & Agresti, Alan, 1997. "Nearly exact tests of conditional independence and marginal homogeneity for sparse contingency tables," Computational Statistics & Data Analysis, Elsevier, vol. 24(1), pages 89-104, March.
    8. Pierre-Olivier Goffard & Patrick Laub, 2021. "Approximate Bayesian Computations to fit and compare insurance loss models," Working Papers hal-02891046, HAL.
    9. Enrique Garcia Tejeda, 2022. "La concentracion espacial de los reportes de disparos al 911 en la Ciudad de Mexico: ¿Comportamiento racional en el uso de armas durante la pandemia Covid-19?," Sobre México. Revista de Economía, Sobre México. Temas en economía, vol. 3(5), pages 69-93.
    10. Salazar García, Juan Fernando & Guzmán Aguilar, Diana Sirley & Hoyos Nieto, Daniel Arturo, 2023. "Modelación de una prima de seguros mediante la aplicación de métodos actuariales, teoría de fallas y Black-Scholes en la salud en Colombia [Modelling of an insurance premium through the application," Revista de Métodos Cuantitativos para la Economía y la Empresa = Journal of Quantitative Methods for Economics and Business Administration, Universidad Pablo de Olavide, Department of Quantitative Methods for Economics and Business Administration, vol. 35(1), pages 330-359, June.
    11. Marco Riani & Anthony C. Atkinson & Francesca Torti & Aldo Corbellini, 2022. "Robust correspondence analysis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1381-1401, November.
    12. Akbari, Sina & Escobedo, Adolfo R., 2023. "Beyond kemeny rank aggregation: A parameterizable-penalty framework for robust ranking aggregation with ties," Omega, Elsevier, vol. 119(C).
    13. Xu, Shuzhe & Zhang, Chuanlong & Hong, Don, 2022. "BERT-based NLP techniques for classification and severity modeling in basic warranty data study," Insurance: Mathematics and Economics, Elsevier, vol. 107(C), pages 57-67.
    14. repec:jss:jstsof:06:i04 is not listed on IDEAS
    15. Hasan Zakaria & Shinya Numata & Katsuya Hihara, 2021. "Expenditure Patterns of Foreign Resident Visitors and Foreign Tourist Visitors at a Day-Trip Nature-Based Destination," Tourism and Hospitality, MDPI, vol. 2(2), pages 1-11, June.
    16. Yuan, Meng & Lu, Dawei, 2023. "Asymptotics for a time-dependent by-claim model with dependent subexponential claims," Insurance: Mathematics and Economics, Elsevier, vol. 112(C), pages 120-141.
    17. Pierre-Olivier Goffard & Patrick Laub, 2021. "Approximate Bayesian Computations to fit and compare insurance loss models," Post-Print hal-02891046, HAL.
    18. Park, Sojung C. & Kim, Joseph H.T. & Ahn, Jae Youn, 2018. "Does hunger for bonuses drive the dependence between claim frequency and severity?," Insurance: Mathematics and Economics, Elsevier, vol. 83(C), pages 32-46.
    19. Sarra Ghaddab & Manel Kacem & Christian Peretti & Lotfi Belkacem, 2023. "Extreme severity modeling using a GLM-GPD combination: application to an excess of loss reinsurance treaty," Empirical Economics, Springer, vol. 65(3), pages 1105-1127, September.
    20. Jeong, Himchan & Valdez, Emiliano A., 2020. "Predictive compound risk models with dependence," Insurance: Mathematics and Economics, Elsevier, vol. 94(C), pages 182-195.
    21. Michel Denuit & Yang Lu, 2021. "Wishart‐gamma random effects models with applications to nonlife insurance," Journal of Risk & Insurance, The American Risk and Insurance Association, vol. 88(2), pages 443-481, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:152:y:2020:i:c:s0167947320301341. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.