IDEAS home Printed from https://ideas.repec.org/a/gam/jrisks/v9y2021i3p53-d517868.html
   My bibliography  Save this article

Assessing the Performance of Random Forests for Modeling Claim Severity in Collision Car Insurance

Author

Listed:
  • Yves Staudt

    (Department Alpine Region Development, Institute for Tourism and Leisure, University of Applied Sciences of the Grisons, Comercialstrasse 19, 7000 Chur, Switzerland
    Center of Data Analysis, Simulation and Visualization, Department Applied Future Technologies, University of Applied Sciences of the Grisons, Ringstrasse 34, 7000 Chur, Switzerland
    These authors contributed equally to this work.)

  • Joël Wagner

    (Department of Actuarial Science, Faculty of Business and Economics (HEC Lausanne), University of Lausanne, Extranef, 1015 Lausanne, Switzerland
    Swiss Finance Institute, University of Lausanne, 1015 Lausanne, Switzerland
    These authors contributed equally to this work.)

Abstract

For calculating non-life insurance premiums, actuaries traditionally rely on separate severity and frequency models using covariates to explain the claims loss exposure. In this paper, we focus on the claim severity. First, we build two reference models, a generalized linear model and a generalized additive model, relying on a log-normal distribution of the severity and including the most significant factors. Thereby, we relate the continuous variables to the response in a nonlinear way. In the second step, we tune two random forest models, one for the claim severity and one for the log-transformed claim severity, where the latter requires a transformation of the predicted results. We compare the prediction performance of the different models using the relative error, the root mean squared error and the goodness-of-lift statistics in combination with goodness-of-fit statistics. In our application, we rely on a dataset of a Swiss collision insurance portfolio covering the loss exposure of the period from 2011 to 2015, and including observations from 81 309 settled claims with a total amount of CHF 184 mio. In the analysis, we use the data from 2011 to 2014 for training and from 2015 for testing. Our results indicate that the use of a log-normal transformation of the severity is not leading to performance gains with random forests. However, random forests with a log-normal transformation are the favorite choice for explaining right-skewed claims. Finally, when considering all indicators, we conclude that the generalized additive model has the best overall performance.

Suggested Citation

  • Yves Staudt & Joël Wagner, 2021. "Assessing the Performance of Random Forests for Modeling Claim Severity in Collision Car Insurance," Risks, MDPI, vol. 9(3), pages 1-28, March.
  • Handle: RePEc:gam:jrisks:v:9:y:2021:i:3:p:53-:d:517868
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-9091/9/3/53/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-9091/9/3/53/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Denuit, Michel & Hainaut, Donatien & Trufin, Julien, 2020. "Effective Statistical Learning Methods for Actuaries II : Tree-Based Methods and Extensions," LIDAM Reprints ISBA 2020035, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    2. Grubinger, Thomas & Zeileis, Achim & Pfeiffer, Karl-Peter, 2014. "evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 61(i01).
    3. Daniela Laas & Hato Schmeiser & Joël Wagner, 2016. "Empirical Findings on Motor Insurance Pricing in Germany, Austria and Switzerland," The Geneva Papers on Risk and Insurance - Issues and Practice, Palgrave Macmillan;The Geneva Association, vol. 41(3), pages 398-431, July.
    4. Dalkilic, Turkan Erbay & Tank, Fatih & Kula, Kamile Sanli, 2009. "Neural networks approach for determining total claim amounts in insurance," Insurance: Mathematics and Economics, Elsevier, vol. 45(2), pages 236-241, October.
    5. Kuhn, Max, 2008. "Building Predictive Models in R Using the caret Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 28(i05).
    6. Eling, Martin, 2014. "Fitting asset returns to skewed distributions: Are the skew-normal and skew-student good models?," Insurance: Mathematics and Economics, Elsevier, vol. 59(C), pages 45-56.
    7. Victor Chernozhukov & Christian Hansen & Martin Spindler, 2015. "Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach," Annual Review of Economics, Annual Reviews, vol. 7(1), pages 649-688, August.
    8. Quan Zhiyu & Valdez Emiliano A., 2018. "Predictive analytics of insurance claims using multivariate decision trees," Dependence Modeling, De Gruyter, vol. 6(1), pages 377-407, December.
    9. Denuit, Michel & Sznajder, Dominik & Trufin, Julien, 2019. "Model selection based on Lorenz and concentration curves, Gini indices and convex order," Insurance: Mathematics and Economics, Elsevier, vol. 89(C), pages 128-139.
    10. Klein, Nadja & Denuit, Michel & Lang, Stefan & Kneib, Thomas, 2014. "Nonlife ratemaking and risk management with Bayesian generalized additive models for location, scale, and shape," Insurance: Mathematics and Economics, Elsevier, vol. 55(C), pages 225-249.
    11. Manning, Willard G., 1998. "The logged dependent variable, heteroscedasticity, and the retransformation problem," Journal of Health Economics, Elsevier, vol. 17(3), pages 283-295, June.
    12. Denuit, Michel & Lang, Stefan, 2004. "Non-life rate-making with Bayesian GAMs," Insurance: Mathematics and Economics, Elsevier, vol. 35(3), pages 627-647, December.
    13. Katrien Antonio & Emiliano Valdez, 2012. "Statistical concepts of a priori and a posteriori risk classification in insurance," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 96(2), pages 187-224, June.
    14. Edward W. Frees, 2015. "Analytics of Insurance Markets," Annual Review of Financial Economics, Annual Reviews, vol. 7(1), pages 253-277, December.
    15. Denuit, Michel & Sznajder, Dominik & Trufin, Julien, 2019. "Model selection based on Lorenz and concentration curves, Gini indices and convex order," LIDAM Discussion Papers ISBA 2019006, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    16. Klein, Nadja & Denuit, Michel & Lang, Stefan & Kneib, Thomas, 2014. "Nonlife ratemaking and risk management with Bayesian generalized additive models for location, scale, and shape," LIDAM Reprints ISBA 2014006, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    17. Denuit, Michel & Sznajder, Dominik & Trufin, Julien, 2019. "Model selection based on Lorenz and concentration curves, Gini indices and convex order," LIDAM Reprints ISBA 2019046, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    18. Ai, Chunrong & Norton, Edward C., 2000. "Standard errors for the retransformation problem with heteroscedasticity," Journal of Health Economics, Elsevier, vol. 19(5), pages 697-718, September.
    19. Edward W. Frees & Gee Lee & Lu Yang, 2016. "Multivariate Frequency-Severity Regression Models in Insurance," Risks, MDPI, vol. 4(1), pages 1-36, February.
    20. Jean-Philippe Boucher & Michel Denuit & Montserrat Guillén, 2007. "Risk Classification for Claim Counts," North American Actuarial Journal, Taylor & Francis Journals, vol. 11(4), pages 110-131.
    21. Manning, Willard G. & Mullahy, John, 2001. "Estimating log models: to transform or not to transform?," Journal of Health Economics, Elsevier, vol. 20(4), pages 461-494, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Zuleyka Díaz Martínez & José Fernández Menéndez & Luis Javier García Villalba, 2023. "Tariff Analysis in Automobile Insurance: Is It Time to Switch from Generalized Linear Models to Generalized Additive Models?," Mathematics, MDPI, vol. 11(18), pages 1-16, September.
    2. Mogens Steffensen, 2022. "Special Issue “Risks: Feature Papers 2021”," Risks, MDPI, vol. 10(3), pages 1-2, March.
    3. Ahmed, Hanan, 2022. "Extreme value statistics using related variables," Other publications TiSEM 246f0f13-701c-4c0d-8e09-e, Tilburg University, School of Economics and Management.
    4. Anja Breuer & Yves Staudt, 2022. "Equalization Reserves for Reinsurance and Non-Life Undertakings in Switzerland," Risks, MDPI, vol. 10(3), pages 1-41, March.
    5. Carina Clemente & Gracinda R. Guerreiro & Jorge M. Bravo, 2023. "Modelling Motor Insurance Claim Frequency and Severity Using Gradient Boosting," Risks, MDPI, vol. 11(9), pages 1-20, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jolien Ponnet & Robin Van Oirbeek & Tim Verdonck, 2021. "Concordance Probability for Insurance Pricing Models," Risks, MDPI, vol. 9(10), pages 1-26, October.
    2. Mihaela Covrig & Iulian Mircea & Gheorghita Zbaganu & Alexandru Coser & Alexandru Tindeche, 2015. "Using R In Generalized Linear Models," Romanian Statistical Review, Romanian Statistical Review, vol. 63(3), pages 33-45, September.
    3. Christopher Blier-Wong & Hélène Cossette & Luc Lamontagne & Etienne Marceau, 2020. "Machine Learning in P&C Insurance: A Review for Pricing and Reserving," Risks, MDPI, vol. 9(1), pages 1-26, December.
    4. Denuit, Michel & Trufin, Julien & Verdebout, Thomas, 2021. "Testing for more positive expectation dependence with application to model comparison," Insurance: Mathematics and Economics, Elsevier, vol. 101(PB), pages 163-172.
    5. George Tzougas, 2020. "EM Estimation for the Poisson-Inverse Gamma Regression Model with Varying Dispersion: An Application to Insurance Ratemaking," Risks, MDPI, vol. 8(3), pages 1-23, September.
    6. Tzougas, George & Vrontos, Spyridon D. & Frangos, Nickolaos E., 2015. "Risk classification for claim counts and losses using regression models for location, scale and shape," LSE Research Online Documents on Economics 70921, London School of Economics and Political Science, LSE Library.
    7. Willame, Gireg & Trufin, Julien & Denuit, Michel, 2023. "Boosted Poisson regression trees: A guide to the BT package in R," LIDAM Discussion Papers ISBA 2023008, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    8. Tzougas, George, 2020. "EM estimation for the Poisson-Inverse Gamma regression model with varying dispersion: an application to insurance ratemaking," LSE Research Online Documents on Economics 106539, London School of Economics and Political Science, LSE Library.
    9. Denuit, Michel & Trufin, Julien & Verdebout, Thomas, 2021. "Testing for more positive expectation dependence with application to model comparison," LIDAM Discussion Papers ISBA 2021021, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    10. Denuit, Michel & Trufin, Julien, 2022. "Autocalibration by balance correction in nonlife insurance pricing," LIDAM Discussion Papers ISBA 2022041, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    11. Michel Denuit & Christian Y. Robert, 2021. "Risk sharing under the dominant peer‐to‐peer property and casualty insurance business models," Risk Management and Insurance Review, American Risk and Insurance Association, vol. 24(2), pages 181-205, June.
    12. Denuit, Michel & Robert, Christian Y., 2021. "Risk sharing under the dominant peer-to-peer property and casualty insurance business models," LIDAM Discussion Papers ISBA 2021001, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    13. Hao Yu, 2017. "China’s medical savings accounts: an analysis of the price elasticity of demand for health care," The European Journal of Health Economics, Springer;Deutsche Gesellschaft für Gesundheitsökonomie (DGGÖ), vol. 18(6), pages 773-785, July.
    14. Sarra Ghaddab & Manel Kacem & Christian Peretti & Lotfi Belkacem, 2023. "Extreme severity modeling using a GLM-GPD combination: application to an excess of loss reinsurance treaty," Empirical Economics, Springer, vol. 65(3), pages 1105-1127, September.
    15. Deprez, Laurens & Antonio, Katrien & Boute, Robert, 2023. "Empirical risk assessment of maintenance costs under full-service contracts," European Journal of Operational Research, Elsevier, vol. 304(2), pages 476-493.
    16. Roel Verbelen & Katrien Antonio & Gerda Claeskens, 2018. "Unravelling the predictive power of telematics data in car insurance pricing," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 67(5), pages 1275-1304, November.
    17. Mihaela DAVID, 2014. "Modeling The Frequency Of Claims In Auto Insurance With Application To A French Case," Review of Economic and Business Studies, Alexandru Ioan Cuza University, Faculty of Economics and Business Administration, issue 13, pages 69-85, June.
    18. Denuit, Michel & Legrand, Catherine, 2016. "Risk Classification in Life Insurance: Extension to Continuous Covariates," LIDAM Discussion Papers ISBA 2016045, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    19. Michel Denuit & Arthur Charpentier & Julien Trufin, 2021. "Autocalibration and Tweedie-dominance for Insurance Pricing with Machine Learning," Papers 2103.03635, arXiv.org, revised Jul 2021.
    20. Nadja Klein & Thomas Kneib & Stefan Lang, 2015. "Bayesian Generalized Additive Models for Location, Scale, and Shape for Zero-Inflated and Overdispersed Count Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 405-419, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jrisks:v:9:y:2021:i:3:p:53-:d:517868. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.