IDEAS home Printed from https://ideas.repec.org/p/aiz/louvad/2021012.html
   My bibliography  Save this paper

Response versus gradient boosting trees, GLMs and neural networks under Tweedie loss and log-link

Author

Listed:
  • Hainaut, Donatien

    (Université catholique de Louvain, LIDAM/ISBA, Belgium)

  • Trufin, Julien

    (Université Libre de Bruxelles)

  • Denuit, Michel

    (Université catholique de Louvain, LIDAM/ISBA, Belgium)

Abstract

Thanks to its outstanding performances, boosting has rapidly gained wide acceptance among actuaries. To speed calculations, boosting is often applied to gradients of the loss function, not to responses (hence the name gradient boosting). When the model is trained by minimizing Poisson deviance, this amounts to apply the least-squares principle to raw residuals. This exposes gradient boosting to the same problems that lead to replace least-squares with Poisson GLM to analyze low counts (typically, the number of reported claims at policy level in personal lines). This paper shows that boosting can be conducted directly on the response under Tweedie loss function and log-link, by adapting the weights at each step. Numerical illustrations demonstrate improved performances compared to gradient boosting when trees, GLMs and neural networks are used as weak learners.

Suggested Citation

  • Hainaut, Donatien & Trufin, Julien & Denuit, Michel, 2021. "Response versus gradient boosting trees, GLMs and neural networks under Tweedie loss and log-link," LIDAM Discussion Papers ISBA 2021012, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
  • Handle: RePEc:aiz:louvad:2021012
    as

    Download full text from publisher

    File URL: https://dial.uclouvain.be/pr/boreal/fr/object/boreal%3A244222/datastream/PDF_01/view
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Yue Liu & Bing-Jie Wang & Shao-Gao Lv, 2014. "Using Multi-class AdaBoost Tree for Prediction Frequency of Auto Insurance," Journal of Applied Finance & Banking, SCIENPRESS Ltd, vol. 4(5), pages 1-4.
    2. Tutz, Gerhard & Binder, Harald, 2007. "Boosting ridge regression," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 6044-6059, August.
    3. Yi Yang & Wei Qian & Hui Zou, 2018. "Insurance Premium Prediction via Gradient Tree-Boosted Tweedie Compound Poisson Models," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 36(3), pages 456-470, July.
    4. Denuit, Michel & Hainaut, Donatien & Trufin, Julien, 2020. "Effective Statistical Learning Methods for Actuaries II : Tree-Based Methods and Extensions," LIDAM Reprints ISBA 2020035, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    5. Gerhard Tutz & Harald Binder, 2006. "Generalized Additive Modeling with Implicit Variable Selection by Likelihood-Based Boosting," Biometrics, The International Biometric Society, vol. 62(4), pages 961-971, December.
    6. Simon C. K. Lee & Sheldon Lin, 2018. "Delta Boosting Machine with Application to General Insurance," North American Actuarial Journal, Taylor & Francis Journals, vol. 22(3), pages 405-425, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Christopher Blier-Wong & Hélène Cossette & Luc Lamontagne & Etienne Marceau, 2020. "Machine Learning in P&C Insurance: A Review for Pricing and Reserving," Risks, MDPI, vol. 9(1), pages 1-26, December.
    2. Freek Holvoet & Katrien Antonio & Roel Henckaerts, 2023. "Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff," Papers 2310.12671, arXiv.org, revised Aug 2024.
    3. Trufin, Julien & Denuit, Michel, 2021. "Boosting cost-complexity pruned trees On Tweedie responses: the ABT machine," LIDAM Discussion Papers ISBA 2021015, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    4. Marra, Giampiero & Wood, Simon N., 2011. "Practical variable selection for generalized additive models," Computational Statistics & Data Analysis, Elsevier, vol. 55(7), pages 2372-2387, July.
    5. Stefanie Hieke & Axel Benner & Richard F Schlenk & Martin Schumacher & Lars Bullinger & Harald Binder, 2016. "Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling," PLOS ONE, Public Library of Science, vol. 11(5), pages 1-18, May.
    6. Faisal Zahid & Gerhard Tutz, 2013. "Multinomial logit models with implicit variable selection," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(4), pages 393-416, December.
    7. Zhiyu Quan & Changyue Hu & Panyi Dong & Emiliano A. Valdez, 2024. "Improving Business Insurance Loss Models by Leveraging InsurTech Innovation," Papers 2401.16723, arXiv.org.
    8. Willame, Gireg & Trufin, Julien & Denuit, Michel, 2023. "Boosted Poisson regression trees: A guide to the BT package in R," LIDAM Discussion Papers ISBA 2023008, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    9. Riccardo De Bin, 2016. "Boosting in Cox regression: a comparison between the likelihood-based and the model-based approaches with focus on the R-packages CoxBoost and mboost," Computational Statistics, Springer, vol. 31(2), pages 513-531, June.
    10. Sariyar Murat & Schumacher Martin & Binder Harald, 2014. "A boosting approach for adapting the sparsity of risk prediction signatures based on different molecular levels," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(3), pages 343-357, June.
    11. Simon CK Lee, 2020. "Delta Boosting Implementation of Negative Binomial Regression in Actuarial Pricing," Risks, MDPI, vol. 8(1), pages 1-21, February.
    12. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    13. Wang Zhu & Wang C.Y., 2010. "Buckley-James Boosting for Survival Analysis with High-Dimensional Biomarker Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-33, June.
    14. Philip Kostov, 2010. "Do Buyers’ Characteristics and Personal Relationships Affect Agricultural Land Prices?," Land Economics, University of Wisconsin Press, vol. 86(1), pages 48-65.
    15. Kevin Kuo & Daniel Lupton, 2020. "Towards Explainability of Machine Learning Models in Insurance Pricing," Papers 2003.10674, arXiv.org.
    16. Osamu Komori, 2011. "A boosting method for maximization of the area under the ROC curve," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 63(5), pages 961-979, October.
    17. Belitz, Christiane & Lang, Stefan, 2008. "Simultaneous selection of variables and smoothing parameters in structured additive regression models," Computational Statistics & Data Analysis, Elsevier, vol. 53(1), pages 61-81, September.
    18. Yves Staudt & Joël Wagner, 2021. "Assessing the Performance of Random Forests for Modeling Claim Severity in Collision Car Insurance," Risks, MDPI, vol. 9(3), pages 1-28, March.
    19. Gerhard Tutz & Gunther Schauberger, 2015. "A Penalty Approach to Differential Item Functioning in Rasch Models," Psychometrika, Springer;The Psychometric Society, vol. 80(1), pages 21-43, March.
    20. Battauz, Michela & Vidoni, Paolo, 2022. "A likelihood-based boosting algorithm for factor analysis models with binary data," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).

    More about this item

    Keywords

    Risk classification ; Boosting ; Gradient Boosting ; Regression Trees ; GLM ; Neural Networks;
    All these keywords.

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:aiz:louvad:2021012. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Nadja Peiffer (email available below). General contact details of provider: https://edirc.repec.org/data/isuclbe.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.