IDEAS home Printed from https://ideas.repec.org/a/vrs/eaiada/v24y2020i1p51-70n5.html
   My bibliography  Save this article

Assessment of the Influence of Dependent Variable Distribution on Selected Goodness of Fit Measures Using the Example of Customer Churn Model

Author

Listed:
  • Migut Grzegorz

    (StatSoft Polska sp. z o.o)

Abstract

Classification models enable optimal actions to be taken at every stage of the customer’s lifecycle. A circumstance affecting both the model building process and the assessment of their discriminatory power is the unbalanced distribution of the dichotomous dependent variable. The article focuses on the question of reliable assessment of the goodness of fit. The first part of the article reviews the measures of predictive power and then assesses the impact of the distribution of the dependent variable on the selected measures of goodness of fit. As a result, the high sensitivity of a number of measures such as lift, accuracy (ACC), or F-Score was observed. The sensitivity of MCC and Kappa Cohen’s measurements was also observed. Sensitivity (SENS) and specificity (SPEC), Youden’s index and measures based on ROC curves showed no such sensitivity. The conclusions obtained may allow the avoidance of misjudging the predictive power of models built for both learning and business practice.

Suggested Citation

  • Migut Grzegorz, 2020. "Assessment of the Influence of Dependent Variable Distribution on Selected Goodness of Fit Measures Using the Example of Customer Churn Model," Econometrics. Advances in Applied Data Analysis, Sciendo, vol. 24(1), pages 51-70, March.
  • Handle: RePEc:vrs:eaiada:v:24:y:2020:i:1:p:51-70:n:5
    DOI: 10.15611/eada.2020.1.05
    as

    Download full text from publisher

    File URL: https://doi.org/10.15611/eada.2020.1.05
    Download Restriction: no

    File URL: https://libkey.io/10.15611/eada.2020.1.05?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Sabri Boughorbel & Fethi Jarray & Mohammed El-Anbari, 2017. "Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric," PLOS ONE, Public Library of Science, vol. 12(6), pages 1-17, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mirza Rizwan Sajid & Bader A. Almehmadi & Waqas Sami & Mansour K. Alzahrani & Noryanti Muhammad & Christophe Chesneau & Asif Hanif & Arshad Ali Khan & Ahmad Shahbaz, 2021. "Development of Nonlaboratory-Based Risk Prediction Models for Cardiovascular Diseases Using Conventional and Machine Learning Approaches," IJERPH, MDPI, vol. 18(23), pages 1-16, November.
    2. Wang, Xinlin & Yao, Zhihao & Papaefthymiou, Marios, 2023. "A real-time electrical load forecasting and unsupervised anomaly detection framework," Applied Energy, Elsevier, vol. 330(PA).
    3. Christian Kauten & Ashish Gupta & Xiao Qin & Glenn Richey, 2022. "Predicting Blood Donors Using Machine Learning Techniques," Information Systems Frontiers, Springer, vol. 24(5), pages 1547-1562, October.
    4. Ruchika Malhotra & Megha Khanna, 2023. "On the applicability of search-based algorithms for software change prediction," International Journal of System Assurance Engineering and Management, Springer;The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden, vol. 14(1), pages 55-73, February.
    5. David Cemernek & Sandra Cemernek & Heimo Gursch & Ashwini Pandeshwar & Thomas Leitner & Matthias Berger & Gerald Klösch & Roman Kern, 2022. "Machine learning in continuous casting of steel: a state-of-the-art survey," Journal of Intelligent Manufacturing, Springer, vol. 33(6), pages 1561-1579, August.
    6. Zhou, Xiaoyi & Lu, Pan & Zheng, Zijian & Tolliver, Denver & Keramati, Amin, 2020. "Accident Prediction Accuracy Assessment for Highway-Rail Grade Crossings Using Random Forest Algorithm Compared with Decision Tree," Reliability Engineering and System Safety, Elsevier, vol. 200(C).
    7. Bikeri Adline & Kazushi Ikeda, 2023. "A Hawkes Model Approach to Modeling Price Spikes in the Japanese Electricity Market," Energies, MDPI, vol. 16(4), pages 1-20, February.
    8. Manuel Casal-Guisande & María Torres-Durán & Mar Mosteiro-Añón & Jorge Cerqueiro-Pequeño & José-Benito Bouza-Rodríguez & Alberto Fernández-Villar & Alberto Comesaña-Campos, 2023. "Design and Conceptual Proposal of an Intelligent Clinical Decision Support System for the Diagnosis of Suspicious Obstructive Sleep Apnea Patients from Health Profile," IJERPH, MDPI, vol. 20(4), pages 1-31, February.
    9. Schade, Philipp & Schuhmacher, Monika C., 2023. "Predicting entrepreneurial activity using machine learning," Journal of Business Venturing Insights, Elsevier, vol. 19(C).
    10. Wang, Delu & Tong, Xian & Wang, Yadong, 2020. "An early risk warning system for Outward Foreign Direct Investment in Mineral Resource-based enterprises using multi-classifiers fusion," Resources Policy, Elsevier, vol. 66(C).
    11. Li, Yang & Zhang, Meng & Chen, Chen, 2022. "A Deep-Learning intelligent system incorporating data augmentation for Short-Term voltage stability assessment of power systems," Applied Energy, Elsevier, vol. 308(C).
    12. Gnekpe, Christian & Tchuente, Dieudonné & Nyawa, Serge & Dey, Prasanta Kumar, 2024. "Energy Performance of Building Refurbishments: Predictive and Prescriptive AI-based Machine Learning Approaches," Journal of Business Research, Elsevier, vol. 183(C).
    13. Kouadri, Abdelmalek & Hajji, Mansour & Harkat, Mohamed-Faouzi & Abodayeh, Kamaleldin & Mansouri, Majdi & Nounou, Hazem & Nounou, Mohamed, 2020. "Hidden Markov model based principal component analysis for intelligent fault diagnosis of wind energy converter systems," Renewable Energy, Elsevier, vol. 150(C), pages 598-606.
    14. López-Díaz, María Concepción & López-Díaz, Miguel & Martínez-Fernández, Sergio, 2023. "On the optimal binary classifier with an application," Computational Statistics & Data Analysis, Elsevier, vol. 181(C).
    15. Manuel Casal-Guisande & Jorge Cerqueiro-Pequeño & José-Benito Bouza-Rodríguez & Alberto Comesaña-Campos, 2023. "Integration of the Wang & Mendel Algorithm into the Application of Fuzzy Expert Systems to Intelligent Clinical Decision Support Systems," Mathematics, MDPI, vol. 11(11), pages 1-33, May.
    16. Salvatore Carta & Alessandro Sebastian Podda & Diego Reforgiato Recupero & Roberto Saia, 2020. "A Local Feature Engineering Strategy to Improve Network Anomaly Detection," Future Internet, MDPI, vol. 12(10), pages 1-30, October.
    17. Tan Kai Noel Quah & Yi Wei Daniel Tay & Jian Hui Lim & Ming Jen Tan & Teck Neng Wong & King Ho Holden Li, 2023. "Concrete 3D Printing: Process Parameters for Process Control, Monitoring and Diagnosis in Automation and Construction," Mathematics, MDPI, vol. 11(6), pages 1-34, March.
    18. Nader Mahmoudi & Łukasz P. Olech & Paul Docherty, 2022. "A comprehensive study of domain-specific emoji meanings in sentiment classification," Computational Management Science, Springer, vol. 19(2), pages 159-197, June.
    19. Bruno Faria & Fernao Vistulo de Abreu, 2019. "Cellular frustration algorithms for anomaly detection applications," PLOS ONE, Public Library of Science, vol. 14(7), pages 1-31, July.
    20. Shaniel Chotkan & Raymond van der Meij & Wouter Jan Klerk & Phil J. Vardon & Juan Pablo Aguilar-López, 2022. "A Data-Driven Method for Identifying Drought-Induced Crack-Prone Levees Based on Decision Trees," Sustainability, MDPI, vol. 14(11), pages 1-23, June.

    More about this item

    Keywords

    classification models; goodness of fit; unbalanced datasets; customer churn analysis;
    All these keywords.

    JEL classification:

    • C10 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - General
    • C52 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Model Evaluation, Validation, and Selection

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:vrs:eaiada:v:24:y:2020:i:1:p:51-70:n:5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.sciendo.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.