IDEAS home Printed from https://ideas.repec.org/a/gam/jrisks/v11y2023i9p163-d1238092.html
   My bibliography  Save this article

Modelling Motor Insurance Claim Frequency and Severity Using Gradient Boosting

Author

Listed:
  • Carina Clemente

    (NOVA IMS—Information Management School, Universidade Nova de Lisboa, 1070-312 Lisbon, Portugal)

  • Gracinda R. Guerreiro

    (FCT NOVA, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal
    CMA-FCT-UNL, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal)

  • Jorge M. Bravo

    (NOVA IMS—Information Management School, Universidade Nova de Lisboa, MagIC, 1070-312 Lisbon, Portugal
    Department of Economics, University Paris-Dauphine PSL, 75016 Paris, France
    CEFAGE-UE, 7000-809 Évora, Portugal
    BRU-ISCTE-IUL, 1649-026 Lisbon, Portugal)

Abstract

Modelling claim frequency and claim severity are topics of great interest in property-casualty insurance for supporting underwriting, ratemaking, and reserving actuarial decisions. Standard Generalized Linear Models (GLM) frequency–severity models assume a linear relationship between a function of the response variable and the predictors, independence between the claim frequency and severity, and assign full credibility to the data. To overcome some of these restrictions, this paper investigates the predictive performance of Gradient Boosting with decision trees as base learners to model the claim frequency and the claim severity distributions of an auto insurance big dataset and compare it with that obtained using a standard GLM model. The out-of-sample performance measure results show that the predictive performance of the Gradient Boosting Model (GBM) is superior to the standard GLM model in the Poisson claim frequency model. Differently, in the claim severity model, the classical GLM outperformed the Gradient Boosting Model. The findings suggest that gradient boost models can capture the non-linear relation between the response variable and feature variables and their complex interactions and thus are a valuable tool for the insurer in feature engineering and the development of a data-driven approach to risk management and insurance.

Suggested Citation

  • Carina Clemente & Gracinda R. Guerreiro & Jorge M. Bravo, 2023. "Modelling Motor Insurance Claim Frequency and Severity Using Gradient Boosting," Risks, MDPI, vol. 11(9), pages 1-20, September.
  • Handle: RePEc:gam:jrisks:v:11:y:2023:i:9:p:163-:d:1238092
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-9091/11/9/163/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-9091/11/9/163/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Frees, Edward W. & Shi, Peng & Valdez, Emiliano A., 2009. "Actuarial Applications of a Hierarchical Insurance Claims Model," ASTIN Bulletin, Cambridge University Press, vol. 39(1), pages 165-197, May.
    2. Yves Staudt & Joël Wagner, 2021. "Assessing the Performance of Random Forests for Modeling Claim Severity in Collision Car Insurance," Risks, MDPI, vol. 9(3), pages 1-28, March.
    3. Edward Frees & Jie Gao & Marjorie Rosenberg, 2011. "Predicting the Frequency and Amount of Health Care Expenditures," North American Actuarial Journal, Taylor & Francis Journals, vol. 15(3), pages 377-392.
    4. Yi Yang & Wei Qian & Hui Zou, 2018. "Insurance Premium Prediction via Gradient Tree-Boosted Tweedie Compound Poisson Models," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 36(3), pages 456-470, July.
    5. Gao, Guangyuan & Li, Jiahong, 2023. "Dependence modeling of frequency-severity of insurance claims using waiting time," Insurance: Mathematics and Economics, Elsevier, vol. 109(C), pages 29-51.
    6. Meng, Shengwang & Gao, Yaqian & Huang, Yifan, 2022. "Actuarial intelligence in auto insurance: Claim frequency modeling with driving behavior features and improved boosted trees," Insurance: Mathematics and Economics, Elsevier, vol. 106(C), pages 115-127.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xiaoshan Su & Manying Bai, 2020. "Stochastic gradient boosting frequency-severity model of insurance claims," PLOS ONE, Public Library of Science, vol. 15(8), pages 1-24, August.
    2. Jeong, Himchan & Valdez, Emiliano A., 2020. "Predictive compound risk models with dependence," Insurance: Mathematics and Economics, Elsevier, vol. 94(C), pages 182-195.
    3. Junhao Liu & Anita Mukherjee, 2021. "Medicaid and long‐term care: The effects of penalizing strategic asset transfers," Journal of Risk & Insurance, The American Risk and Insurance Association, vol. 88(1), pages 53-77, March.
    4. Pierre-Olivier Goffard & Patrick Laub, 2021. "Approximate Bayesian Computations to fit and compare insurance loss models," Working Papers hal-02891046, HAL.
    5. Marie Michaelides & Mathieu Pigeon & H'el`ene Cossette, 2022. "Individual Claims Reserving using Activation Patterns," Papers 2208.08430, arXiv.org, revised Aug 2023.
    6. Kevin Kuo & Daniel Lupton, 2020. "Towards Explainability of Machine Learning Models in Insurance Pricing," Papers 2003.10674, arXiv.org.
    7. Garrido, J. & Genest, C. & Schulz, J., 2016. "Generalized linear models for dependent frequency and severity of insurance claims," Insurance: Mathematics and Economics, Elsevier, vol. 70(C), pages 205-215.
    8. Peng Shi & Wei Zhang, 2011. "A copula regression model for estimating firm efficiency in the insurance industry," Journal of Applied Statistics, Taylor & Francis Journals, vol. 38(10), pages 2271-2287.
    9. Hua, Lei, 2015. "Tail negative dependence and its applications for aggregate loss modeling," Insurance: Mathematics and Economics, Elsevier, vol. 61(C), pages 135-145.
    10. Zifeng Zhao & Peng Shi & Xiaoping Feng, 2021. "Knowledge Learning of Insurance Risks Using Dependence Models," INFORMS Journal on Computing, INFORMS, vol. 33(3), pages 1177-1196, July.
    11. Zhiyu Quan & Changyue Hu & Panyi Dong & Emiliano A. Valdez, 2024. "Improving Business Insurance Loss Models by Leveraging InsurTech Innovation," Papers 2401.16723, arXiv.org.
    12. Qianhong Lu & Xiaoqing Gan & Zhensheng Chen, 2023. "The Impact of Medical Insurance Payment Policy Reform on Medical Cost and Medical Burden in China," Sustainability, MDPI, vol. 15(3), pages 1-18, January.
    13. Anja Breuer & Yves Staudt, 2022. "Equalization Reserves for Reinsurance and Non-Life Undertakings in Switzerland," Risks, MDPI, vol. 10(3), pages 1-41, March.
    14. Pechon, Florian & Denuit, Michel & Trufin, Julien, 2019. "Home and Motor insurance joined at a household level using multivariate credibility," LIDAM Discussion Papers ISBA 2019013, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    15. Tsyganov, Aleksander & Baskakov, Valery & Yazykov, Andrey & Sheparnev, Nikolay & Yanenko, Evgeny & Grysenkova, Yulia, 2019. "The impact of the bonus-malus system on the insurance ratemaking in the system of compulsory insurance of the responsibility of transport owners in Russia," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 56, pages 123-141.
    16. Edward W. Frees & Gee Lee & Lu Yang, 2016. "Multivariate Frequency-Severity Regression Models in Insurance," Risks, MDPI, vol. 4(1), pages 1-36, February.
    17. Yaojun Zhang & Lanpeng Ji & Georgios Aivaliotis & Charles Taylor, 2023. "Bayesian CART models for insurance claims frequency," Papers 2303.01923, arXiv.org, revised Dec 2023.
    18. Araichi, Sawssen & Peretti, Christian de & Belkacem, Lotfi, 2017. "Reserve modelling and the aggregation of risks using time varying copula models," Economic Modelling, Elsevier, vol. 67(C), pages 149-158.
    19. Iris Meulman & Bette Loef & Niek Stadhouders & Tron Anders Moger & Albert Wong & Johan J. Polder & Ellen Uiters, 2023. "Estimating healthcare expenditures after becoming divorced or widowed using propensity score matching," The European Journal of Health Economics, Springer;Deutsche Gesellschaft für Gesundheitsökonomie (DGGÖ), vol. 24(7), pages 1047-1060, September.
    20. Ahmed, Hanan, 2022. "Extreme value statistics using related variables," Other publications TiSEM 246f0f13-701c-4c0d-8e09-e, Tilburg University, School of Economics and Management.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jrisks:v:11:y:2023:i:9:p:163-:d:1238092. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.