IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i19p2990-d1485801.html
   My bibliography  Save this article

Zero-Inflated Binary Classification Model with Elastic Net Regularization

Author

Listed:
  • Hua Xin

    (School of Mathematics and Statistics, Northeast Petroleum University, Daqing 163318, China)

  • Yuhlong Lio

    (Department of Mathematical Sciences, University of South Dakota, Vermillion, SD 57069, USA)

  • Hsien-Ching Chen

    (Department of Statistics, Tamkang University, Tamsui District, New Taipei City 251301, Taiwan)

  • Tzong-Ru Tsai

    (Department of Statistics, Tamkang University, Tamsui District, New Taipei City 251301, Taiwan)

Abstract

Zero inflation and overfitting can reduce the accuracy rate of using machine learning models for characterizing binary data sets. A zero-inflated Bernoulli (ZIBer) model can be the right model to characterize zero-inflated binary data sets. When the ZIBer model is used to characterize zero-inflated binary data sets, overcoming the overfitting problem is still an open question. To improve the overfitting problem for using the ZIBer model, the minus log-likelihood function of the ZIBer model with the elastic net regularization rule for an overfitting penalty is proposed as the loss function. An estimation procedure to minimize the loss function is developed in this study using the gradient descent method (GDM) with the momentum term as the learning rate. The proposed estimation method has two advantages. First, the proposed estimation method can be a general method that simultaneously uses L 1 - and L 2 -norm terms for penalty and includes the ridge and least absolute shrinkage and selection operator methods as special cases. Second, the momentum learning rate can accelerate the convergence of the GDM and enhance the computation efficiency of the proposed estimation procedure. The parameter selection strategy is studied, and the performance of the proposed method is evaluated using Monte Carlo simulations. A diabetes example is used as an illustration.

Suggested Citation

  • Hua Xin & Yuhlong Lio & Hsien-Ching Chen & Tzong-Ru Tsai, 2024. "Zero-Inflated Binary Classification Model with Elastic Net Regularization," Mathematics, MDPI, vol. 12(19), pages 1-17, September.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:19:p:2990-:d:1485801
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/19/2990/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/19/2990/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Minggen Lu & Chin-Shang Li & Karla D. Wagner, 2024. "Penalised estimation of partially linear additive zero-inflated Bernoulli regression models," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 36(3), pages 863-890, July.
    2. Simon, Noah & Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2011. "Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 39(i05).
    3. Chin-Shang Li & Minggen Lu, 2022. "Semiparametric zero-inflated Bernoulli regression with applications," Journal of Applied Statistics, Taylor & Francis Journals, vol. 49(11), pages 2845-2869, August.
    4. Harris, Mark N. & Zhao, Xueyan, 2007. "A zero-inflated ordered probit model, with an application to modelling tobacco consumption," Journal of Econometrics, Elsevier, vol. 141(2), pages 1073-1099, December.
    5. Daniel B. Hall, 2000. "Zero-Inflated Poisson and Binomial Regression with Random Effects: A Case Study," Biometrics, The International Biometric Society, vol. 56(4), pages 1030-1039, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Das, Ujjwal & Das, Kalyan, 2018. "Inference on zero inflated ordinal models with semiparametric link," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 104-115.
    2. Cristian Roner & Claudia Di Caterina & Davide Ferrari, 2021. "Exponential Tilting for Zero-inflated Interval Regression with Applications to Cyber Security Survey Data," BEMPS - Bozen Economics & Management Paper Series BEMPS85, Faculty of Economics and Management at the Free University of Bozen.
    3. J Paul Dunne & Nan Tian, 2016. "Determinants of Civil War and Excess Zeroes," SALDRU Working Papers 191, Southern Africa Labour and Development Research Unit, University of Cape Town.
    4. Dunne J. Paul & Tian Nan, 2017. "Working Paper 274 - Conflict and Fragile States in Africa," Working Paper Series 2391, African Development Bank.
    5. William Greene, 2009. "Models for count data with endogenous participation," Empirical Economics, Springer, vol. 36(1), pages 133-173, February.
    6. Luiz Paulo Fávero & Joseph F. Hair & Rafael de Freitas Souza & Matheus Albergaria & Talles V. Brugni, 2021. "Zero-Inflated Generalized Linear Mixed Models: A Better Way to Understand Data Relationships," Mathematics, MDPI, vol. 9(10), pages 1-28, May.
    7. Sarah Brown & Mark N Harris & Jake Prendergast & Preety Srivastava, 2015. "Pharmaceutical Drug Misuse, Industry of Employment and Occupation," Bankwest Curtin Economics Centre Working Paper series WP1501, Bankwest Curtin Economics Centre (BCEC), Curtin Business School.
    8. Borowiecki, Karol J. & Bakhshi, Hasan, 2018. "Did you really take a hit? Understanding how video games playing affects individuals," Research in Economics, Elsevier, vol. 72(2), pages 313-326.
    9. Cho, Daegon & Hwang, Youngdeok & Park, Jongwon, 2018. "More buzz, more vibes: Impact of social media on concert distribution," Journal of Economic Behavior & Organization, Elsevier, vol. 156(C), pages 103-113.
    10. Cheng, Zhiming & Smyth, Russell & Zhang, Le, 2024. "Does childhood adversity affect household portfolio decisions? Evidence from the Chinese Great Famine," China Economic Review, Elsevier, vol. 87(C).
    11. Greene, William, 2007. "Functional Form and Heterogeneity in Models for Count Data," Foundations and Trends(R) in Econometrics, now publishers, vol. 1(2), pages 113-218, August.
    12. Soave, David & Lawless, Jerald F., 2023. "Regularized regression for two phase failure time studies," Computational Statistics & Data Analysis, Elsevier, vol. 182(C).
    13. Niklas Elert, 2014. "What determines entry? Evidence from Sweden," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 53(1), pages 55-92, August.
    14. Massimo Florio & Francesco Giffoni & Gelsomina Catalano, 2020. "Should governments fund basic science? Evidence from a willingness-to-pay experiment in five universities," Journal of Economic Policy Reform, Taylor and Francis Journals, vol. 23(1), pages 16-33, January.
    15. Sarah Brown & Pulak Ghosh & Bhuvanesh Pareek & Karl Taylor, 2017. "Financial Hardship and Saving Behaviour: Bayesian Analysis of British Panel Data," Working Papers 2017011, The University of Sheffield, Department of Economics.
    16. Zemin Zheng & Jie Zhang & Yang Li, 2022. "L 0 -Regularized Learning for High-Dimensional Additive Hazards Regression," INFORMS Journal on Computing, INFORMS, vol. 34(5), pages 2762-2775, September.
    17. Karol Borowiecki & Juan Prieto-Rodriguez, 2015. "Video games playing: A substitute for cultural consumptions?," Journal of Cultural Economics, Springer;The Association for Cultural Economics International, vol. 39(3), pages 239-258, August.
    18. Yanling Li & Zita Oravecz & Shuai Zhou & Yosef Bodovski & Ian J. Barnett & Guangqing Chi & Yuan Zhou & Naomi P. Friedman & Scott I. Vrieze & Sy-Miin Chow, 2022. "Bayesian Forecasting with a Regime-Switching Zero-Inflated Multilevel Poisson Regression Model: An Application to Adolescent Alcohol Use with Spatial Covariates," Psychometrika, Springer;The Psychometric Society, vol. 87(2), pages 376-402, June.
    19. Simon Bussy & Mokhtar Z. Alaya & Anne‐Sophie Jannot & Agathe Guilloux, 2022. "Binacox: automatic cut‐point detection in high‐dimensional Cox model with applications in genetics," Biometrics, The International Biometric Society, vol. 78(4), pages 1414-1426, December.
    20. Payandeh Najafabadi Amir T. & MohammadPour Saeed, 2018. "A k-Inflated Negative Binomial Mixture Regression Model: Application to Rate–Making Systems," Asia-Pacific Journal of Risk and Insurance, De Gruyter, vol. 12(2), pages 1-31, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:19:p:2990-:d:1485801. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.