Feature quantization for parsimonious and interpretable predictive models

My bibliography Save this paper

Feature quantization for parsimonious and interpretable predictive models

Author

Listed:

Adrien Ehrhardt
Christophe Biernacki
Vincent Vandewalle
Philippe Heinrich

Registered:

Abstract

For regulatory and interpretability reasons, logistic regression is still widely used. To improve prediction accuracy and interpretability, a preprocessing step quantizing both continuous and categorical data is usually performed: continuous features are discretized and, if numerous, levels of categorical features are grouped. An even better predictive accuracy can be reached by embedding this quantization estimation step directly into the predictive estimation step itself. But doing so, the predictive loss has to be optimized on a huge set. To overcome this difficulty, we introduce a specific two-step optimization strategy: first, the optimization problem is relaxed by approximating discontinuous quantization functions by smooth functions; second, the resulting relaxed optimization problem is solved via a particular neural network. The good performances of this approach, which we call glmdisc, are illustrated on simulated and real data from the UCI library and Cr\'edit Agricole Consumer Finance (a major European historic player in the consumer credit market).

Suggested Citation

Adrien Ehrhardt & Christophe Biernacki & Vincent Vandewalle & Philippe Heinrich, 2019. "Feature quantization for parsimonious and interpretable predictive models," Papers 1903.08920, arXiv.org.

Handle: RePEc:arx:papers:1903.08920

Download full text from publisher

References listed on IDEAS

Allou Samé & Faicel Chamroukhi & Gérard Govaert & Patrice Aknin, 2011. "Model-based clustering and segmentation of time series with changes in regime," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 5(4), pages 301-321, December.
G. V. Kass, 1980. "An Exploratory Technique for Investigating Large Quantities of Categorical Data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 29(2), pages 119-127, June.
William D. Berry & Jacqueline H. R. DeMeritt & Justin Esarey, 2010. "Testing for Interaction in Binary Logit and Probit Models: Is a Product Term Essential?," American Journal of Political Science, John Wiley & Sons, vol. 54(1), pages 248-266, January.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Gero Szepannek, 2022. "An Overview on the Landscape of R Packages for Open Source Scorecard Modelling," Risks, MDPI, vol. 10(3), pages 1-33, March.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Strobl, Carolin & Boulesteix, Anne-Laure & Augustin, Thomas, 2007. "Unbiased split selection for classification trees based on the Gini Index," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 483-501, September.
Hache, Emmanuel & Leboullenger, Déborah & Mignon, Valérie, 2017. "Beyond average energy consumption in the French residential housing market: A household classification approach," Energy Policy, Elsevier, vol. 107(C), pages 82-95.
- Emmanuel Hache & Déborah Leboullenger & Valérie Mignon, 2016. "Beyond average energy consumption in the French residential housing market: A household classification approach," Post-Print hal-01386095, HAL.
- Emmanuel Hache & Déborah Leboullenger & Valérie Mignon, 2016. "Beyond average energy consumption in the French residential housing market: A household classification approach," Working Papers hal-02475511, HAL.
- Emmanuel Hache & Déborah Leboullenger & Valérie Mignon, 2016. "Beyond average energy consumption in the French residential housing market: A household classification approach," Post-Print hal-01386101, HAL.
- Emmanuel Hache & Déborah Leboullenger & Valérie Mignon, 2017. "Beyond average energy consumption in the French residential housing market: A household classification approach," Post-Print hal-01586597, HAL.
- Emmanuel Hache & Déborah Leboullenger & Valérie Mignon, 2016. "Beyond average energy consumption in the French residential housing market: A household classification approach," Working Papers hal-04141605, HAL.
- Emmanuel Hache & Déborah Leboullenger & Valérie Mignon, 2016. "Beyond average energy consumption in the French residential housing market: A household classification approach," EconomiX Working Papers 2016-6, University of Paris Nanterre, EconomiX.
Ghosh, Atish R. & Qureshi, Mahvash S. & Kim, Jun Il & Zalduendo, Juan, 2014. "Surges," Journal of International Economics, Elsevier, vol. 92(2), pages 266-285.
- Mahvash S Qureshi & Mr. Atish R. Ghosh & Mr. Juan Zalduendo & Mr. Jun I Kim, 2012. "Surges," IMF Working Papers 2012/022, International Monetary Fund.
Tomàs Aluja-Banet & Eduard Nafria, 2003. "Stability and scalability in decision trees," Computational Statistics, Springer, vol. 18(3), pages 505-520, September.
I. Albarrán & P. Alonso-González & J. M. Marin, 2017. "Some criticism to a general model in Solvency II: an explanation from a clustering point of view," Empirical Economics, Springer, vol. 52(4), pages 1289-1308, June.
E. Keith Smith & Adam Mayer, 2019. "Anomalous Anglophones? Contours of free market ideology, political polarization, and climate change attitudes in English-speaking countries, Western European and post-Communist states," Climatic Change, Springer, vol. 152(1), pages 17-34, January.
Schwartz, Ira M. & York, Peter & Nowakowski-Sims, Eva & Ramos-Hernandez, Ana, 2017. "Predictive and prescriptive analytics, machine learning and child welfare risk assessment: The Broward County experience," Children and Youth Services Review, Elsevier, vol. 81(C), pages 309-320.
Israel‐Javier Juma‐Michilena & Maria‐Eugenia Ruiz‐Molina & Irene Gil‐Saura & Sergio Belda‐Miquel, 2023. "How to increase students' motivation to engage in university initiatives towards environmental sustainability," Journal of Consumer Affairs, Wiley Blackwell, vol. 57(3), pages 1304-1323, July.
Yousaf Muhammad & Dey Sandeep Kumar, 2022. "Best proxy to determine firm performance using financial ratios: A CHAID approach," Review of Economic Perspectives, Sciendo, vol. 22(3), pages 219-239, September.
Ralf Elsner & Manfred Krafft & Arnd Huchzermeier, 2003. "Optimizing Rhenania's Mail-Order Business Through Dynamic Multilevel Modeling (DMLM)," Interfaces, INFORMS, vol. 33(1), pages 50-66, February.
Faicel Chamroukhi, 2016. "Piecewise Regression Mixture for Simultaneous Functional Data Clustering and Optimal Segmentation," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 374-411, October.
Serrano-Cinca, Carlos & Gutiérrez-Nieto, Begoña & Bernate-Valbuena, Martha, 2019. "The use of accounting anomalies indicators to predict business failure," European Management Journal, Elsevier, vol. 37(3), pages 353-375.
Naomi Rahimi-Levene & Tomer Ziv-Baran & Victoria Peer & Ahuva Golik & Abraham Kornberg & Ronit Zeidenstein & Maya Koren-Michowitz, 2018. "Hemoglobin transfusion trigger in an internal medicine department – A "real world" six year experience," PLOS ONE, Public Library of Science, vol. 13(3), pages 1-9, March.
Adam William Chalmers & Lisa Maria Dellmuth, 2015. "Fiscal redistribution and public support for European integration," European Union Politics, , vol. 16(3), pages 386-407, September.
Inoue, Hitoshi & Nakashima, Kiyotaka & Takahashi, Koji, 2016. "Comment on Peek and Rosengren (2005) “Unnatural Selection: Perverse Incentives and the Allocation of Credit in Japan”," MPRA Paper 72726, University Library of Munich, Germany.
Osman Taylan & Abdulaziz S. Alkabaa & Mustafa Tahsin Yılmaz, 2022. "Impact of COVID-19 on G20 countries: analysis of economic recession using data mining approaches," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 8(1), pages 1-30, December.
Archana R. Panhalkar & Dharmpal D. Doye, 2020. "An approach of improving decision tree classifier using condensed informative data," DECISION: Official Journal of the Indian Institute of Management Calcutta, Springer;Indian Institute of Management Calcutta, vol. 47(4), pages 431-445, December.
Hessami, Zohal & Resnjanskij, Sven, 2019. "Complex ballot propositions, individual voting behavior, and status quo bias," European Journal of Political Economy, Elsevier, vol. 58(C), pages 82-101.
- Hessami, Zohal & Resnjanskij, Sven, 2016. "Complex ballot propositions, individual voting behavior, and status quo bias," VfS Annual Conference 2016 (Augsburg): Demographic Change 145740, Verein für Socialpolitik / German Economic Association.
- Zohal Hessami & Sven Resnjanskij, 2018. "Complex Ballot Propositions, Individual Voting Behavior, and Status quo Bias," CESifo Working Paper Series 7276, CESifo.
Bas Donkers & Richard Paap & Jedid‐Jah Jonker & Philip Hans Franses, 2006. "Deriving target selection rules from endogenously selected samples," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 21(5), pages 549-562, July.
- Richard Paap & Philip Hans Franses & Bas Donkers & Jedid-Jah Jonker, 2006. "Deriving target selection rules from endogenously selected samples," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 21(5), pages 549-562.
- Donkers, A.C.D. & Jonker, J.-J. & Franses, Ph.H.B.F. & Paap, R., 2001. "Deriving Target Selection Rules from Endogenously Selected Samples," ERIM Report Series Research in Management ERS-2001-68-MKT, Erasmus Research Institute of Management (ERIM), ERIM is the joint research institute of the Rotterdam School of Management, Erasmus University and the Erasmus School of Economics (ESE) at Erasmus University Rotterdam.
Vicente-Cera, Isaías & Acevedo-Merino, Asunción & Nebot, Enrique & López-Ramírez, Juan Antonio, 2020. "Analyzing cruise ship itineraries patterns and vessels diversity in ports of the European maritime region: A hierarchical clustering approach," Journal of Transport Geography, Elsevier, vol. 85(C).

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-CMP-2019-03-25 (Computational Economics)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:1903.08920. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Feature quantization for parsimonious and interpretable predictive models

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data