IDEAS home Printed from https://ideas.repec.org/p/arx/papers/1903.08920.html
   My bibliography  Save this paper

Feature quantization for parsimonious and interpretable predictive models

Author

Listed:
  • Adrien Ehrhardt
  • Christophe Biernacki
  • Vincent Vandewalle
  • Philippe Heinrich

Abstract

For regulatory and interpretability reasons, logistic regression is still widely used. To improve prediction accuracy and interpretability, a preprocessing step quantizing both continuous and categorical data is usually performed: continuous features are discretized and, if numerous, levels of categorical features are grouped. An even better predictive accuracy can be reached by embedding this quantization estimation step directly into the predictive estimation step itself. But doing so, the predictive loss has to be optimized on a huge set. To overcome this difficulty, we introduce a specific two-step optimization strategy: first, the optimization problem is relaxed by approximating discontinuous quantization functions by smooth functions; second, the resulting relaxed optimization problem is solved via a particular neural network. The good performances of this approach, which we call glmdisc, are illustrated on simulated and real data from the UCI library and Cr\'edit Agricole Consumer Finance (a major European historic player in the consumer credit market).

Suggested Citation

  • Adrien Ehrhardt & Christophe Biernacki & Vincent Vandewalle & Philippe Heinrich, 2019. "Feature quantization for parsimonious and interpretable predictive models," Papers 1903.08920, arXiv.org.
  • Handle: RePEc:arx:papers:1903.08920
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/1903.08920
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Allou Samé & Faicel Chamroukhi & Gérard Govaert & Patrice Aknin, 2011. "Model-based clustering and segmentation of time series with changes in regime," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 5(4), pages 301-321, December.
    2. G. V. Kass, 1980. "An Exploratory Technique for Investigating Large Quantities of Categorical Data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 29(2), pages 119-127, June.
    3. William D. Berry & Jacqueline H. R. DeMeritt & Justin Esarey, 2010. "Testing for Interaction in Binary Logit and Probit Models: Is a Product Term Essential?," American Journal of Political Science, John Wiley & Sons, vol. 54(1), pages 248-266, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Gero Szepannek, 2022. "An Overview on the Landscape of R Packages for Open Source Scorecard Modelling," Risks, MDPI, vol. 10(3), pages 1-33, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Strobl, Carolin & Boulesteix, Anne-Laure & Augustin, Thomas, 2007. "Unbiased split selection for classification trees based on the Gini Index," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 483-501, September.
    2. Hache, Emmanuel & Leboullenger, Déborah & Mignon, Valérie, 2017. "Beyond average energy consumption in the French residential housing market: A household classification approach," Energy Policy, Elsevier, vol. 107(C), pages 82-95.
    3. Ghosh, Atish R. & Qureshi, Mahvash S. & Kim, Jun Il & Zalduendo, Juan, 2014. "Surges," Journal of International Economics, Elsevier, vol. 92(2), pages 266-285.
      • Mahvash S Qureshi & Mr. Atish R. Ghosh & Mr. Juan Zalduendo & Mr. Jun I Kim, 2012. "Surges," IMF Working Papers 2012/022, International Monetary Fund.
    4. Tomàs Aluja-Banet & Eduard Nafria, 2003. "Stability and scalability in decision trees," Computational Statistics, Springer, vol. 18(3), pages 505-520, September.
    5. I. Albarrán & P. Alonso-González & J. M. Marin, 2017. "Some criticism to a general model in Solvency II: an explanation from a clustering point of view," Empirical Economics, Springer, vol. 52(4), pages 1289-1308, June.
    6. E. Keith Smith & Adam Mayer, 2019. "Anomalous Anglophones? Contours of free market ideology, political polarization, and climate change attitudes in English-speaking countries, Western European and post-Communist states," Climatic Change, Springer, vol. 152(1), pages 17-34, January.
    7. Schwartz, Ira M. & York, Peter & Nowakowski-Sims, Eva & Ramos-Hernandez, Ana, 2017. "Predictive and prescriptive analytics, machine learning and child welfare risk assessment: The Broward County experience," Children and Youth Services Review, Elsevier, vol. 81(C), pages 309-320.
    8. Yousaf Muhammad & Dey Sandeep Kumar, 2022. "Best proxy to determine firm performance using financial ratios: A CHAID approach," Review of Economic Perspectives, Sciendo, vol. 22(3), pages 219-239, September.
    9. Ralf Elsner & Manfred Krafft & Arnd Huchzermeier, 2003. "Optimizing Rhenania's Mail-Order Business Through Dynamic Multilevel Modeling (DMLM)," Interfaces, INFORMS, vol. 33(1), pages 50-66, February.
    10. Faicel Chamroukhi, 2016. "Piecewise Regression Mixture for Simultaneous Functional Data Clustering and Optimal Segmentation," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 374-411, October.
    11. Serrano-Cinca, Carlos & Gutiérrez-Nieto, Begoña & Bernate-Valbuena, Martha, 2019. "The use of accounting anomalies indicators to predict business failure," European Management Journal, Elsevier, vol. 37(3), pages 353-375.
    12. Adam William Chalmers & Lisa Maria Dellmuth, 2015. "Fiscal redistribution and public support for European integration," European Union Politics, , vol. 16(3), pages 386-407, September.
    13. Inoue, Hitoshi & Nakashima, Kiyotaka & Takahashi, Koji, 2016. "Comment on Peek and Rosengren (2005) “Unnatural Selection: Perverse Incentives and the Allocation of Credit in Japan”," MPRA Paper 72726, University Library of Munich, Germany.
    14. Osman Taylan & Abdulaziz S. Alkabaa & Mustafa Tahsin Yılmaz, 2022. "Impact of COVID-19 on G20 countries: analysis of economic recession using data mining approaches," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 8(1), pages 1-30, December.
    15. Archana R. Panhalkar & Dharmpal D. Doye, 2020. "An approach of improving decision tree classifier using condensed informative data," DECISION: Official Journal of the Indian Institute of Management Calcutta, Springer;Indian Institute of Management Calcutta, vol. 47(4), pages 431-445, December.
    16. Hessami, Zohal & Resnjanskij, Sven, 2019. "Complex ballot propositions, individual voting behavior, and status quo bias," European Journal of Political Economy, Elsevier, vol. 58(C), pages 82-101.
    17. Bas Donkers & Richard Paap & Jedid‐Jah Jonker & Philip Hans Franses, 2006. "Deriving target selection rules from endogenously selected samples," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 21(5), pages 549-562, July.
    18. Vicente-Cera, Isaías & Acevedo-Merino, Asunción & Nebot, Enrique & López-Ramírez, Juan Antonio, 2020. "Analyzing cruise ship itineraries patterns and vessels diversity in ports of the European maritime region: A hierarchical clustering approach," Journal of Transport Geography, Elsevier, vol. 85(C).
    19. Edward Kozłowski & Anna Borucka & Andrzej Świderski & Przemysław Skoczyński, 2021. "Classification Trees in the Assessment of the Road–Railway Accidents Mortality," Energies, MDPI, vol. 14(12), pages 1-15, June.
    20. Javad Hassannataj Joloudari & Edris Hassannataj Joloudari & Hamid Saadatfar & Mohammad Ghasemigol & Seyyed Mohammad Razavi & Amir Mosavi & Narjes Nabipour & Shahaboddin Shamshirband & Laszlo Nadai, 2020. "Coronary Artery Disease Diagnosis; Ranking the Significant Features Using a Random Trees Model," IJERPH, MDPI, vol. 17(3), pages 1-24, January.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:1903.08920. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.