IDEAS home Printed from https://ideas.repec.org/a/inm/orijds/v3y2024i1p28-48.html
   My bibliography  Save this article

Sparse Density Trees and Lists: An Interpretable Alternative to High-Dimensional Histograms

Author

Listed:
  • Siong Thye Goh

    (Lee Kong Chian School of Business, Singapore Management University, Singapore 178899)

  • Lesia Semenova

    (Department of Computer Science, Duke University, Durham, North Carolina 27708)

  • Cynthia Rudin

    (Department of Computer Science, Duke University, Durham, North Carolina 27708)

Abstract

We present sparse tree-based and list-based density estimation methods for binary/categorical data. Our density estimation models are higher-dimensional analogies to variable bin-width histograms. In each leaf of the tree (or list), the density is constant, similar to the flat density within the bin of a histogram. Histograms, however, cannot easily be visualized in more than two dimensions, whereas our models can. The accuracy of histograms fades as dimensions increase, whereas our models have priors that help with generalization. Our models are sparse, unlike high-dimensional fixed-bin histograms. We present three generative modeling methods, where the first one allows the user to specify the preferred number of leaves in the tree within a Bayesian prior. The second method allows the user to specify the preferred number of branches within the prior. The third method returns density lists (rather than trees) and allows the user to specify the preferred number of rules and the length of rules within the prior. The new approaches often yield a better balance between sparsity and accuracy of density estimates than other methods for this task. We present an application to crime analysis, where we estimate how unusual each type of modus operandi is for a house break-in.

Suggested Citation

  • Siong Thye Goh & Lesia Semenova & Cynthia Rudin, 2024. "Sparse Density Trees and Lists: An Interpretable Alternative to High-Dimensional Histograms," INFORMS Joural on Data Science, INFORMS, vol. 3(1), pages 28-48, April.
  • Handle: RePEc:inm:orijds:v:3:y:2024:i:1:p:28-48
    DOI: 10.1287/ijds.2021.0001
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/ijds.2021.0001
    Download Restriction: no

    File URL: https://libkey.io/10.1287/ijds.2021.0001?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Matias D. Cattaneo & Michael Jansson & Xinwei Ma, 2020. "Simple Local Polynomial Density Estimators," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(531), pages 1449-1455, July.
    2. Kaiyuan Wu & Wei Hou & Hongbo Yang, 2018. "Density estimation via the random forest method," Communications in Statistics - Theory and Methods, Taylor & Francis Journals, vol. 47(4), pages 877-889, February.
    3. Luo Lu & Hui Jiang & Wing H. Wong, 2013. "Multivariate Density Estimation by Bayesian Sequential Partitioning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(504), pages 1402-1410, December.
    4. Tao Chen & Julian Morris & Elaine Martin, 2006. "Probability density estimation via an infinite Gaussian mixture model: application to statistical process monitoring," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 55(5), pages 699-715, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Francesco Decarolis & Raymond Fisman & Paolo Pinotti & Silvia Vannutelli, 2019. "Rules, Discretion, and Corruption in Procurement: Evidence from Italian Government Contracting," Boston University - Department of Economics - The Institute for Economic Development Working Papers Series dp-344, Boston University - Department of Economics.
    2. Albanese, Andrea & Picchio, Matteo & Ghirelli, Corinna, 2020. "Timed to Say Goodbye: Does Unemployment Benefit Eligibility Affect Worker Layoffs?," Labour Economics, Elsevier, vol. 65(C).
    3. Ye Henry Li & Dangna Li & Nikolay Samusik & Xiaowei Wang & Leying Guan & Garry P Nolan & Wing Hung Wong, 2017. "Scalable multi-sample single-cell data analysis by Partition-Assisted Clustering and Multiple Alignments of Networks," PLOS Computational Biology, Public Library of Science, vol. 13(12), pages 1-37, December.
    4. Guida Ayza Estopa, 2024. "Return-to-work policies for disability insurance recipients: The role of financial incentives," French Stata Users' Group Meetings 2024 17, Stata Users Group.
    5. Gonzalez-Eiras, Martín & Sanz, Carlos, 2021. "Women’s representation in politics: The effect of electoral systems," Journal of Public Economics, Elsevier, vol. 198(C).
    6. repec:irs:cepswp:2024-01 is not listed on IDEAS
    7. Gurgand, Marc & Lorenceau, Adrien & Mélonio, Thomas, 2023. "Student loans: Credit constraints and higher education in South Africa," Journal of Development Economics, Elsevier, vol. 161(C).
    8. Graham Elliott & Nikolay Kudrin & Kaspar Wüthrich, 2022. "Detecting p‐Hacking," Econometrica, Econometric Society, vol. 90(2), pages 887-906, March.
    9. De Benedetto, Marco Alberto & De Paola, Maria & Scoppa, Vincenzo & Smirnova, Janna, 2023. "Erasmus Program and Labor Market Outcomes: Evidence from a Fuzzy Regression Discontinuity Design," IZA Discussion Papers 16181, Institute of Labor Economics (IZA).
    10. Isabelle Chort & Maëlys de la Rupelle, 2022. "Managing the impact of climate on migration: evidence from Mexico," Journal of Population Economics, Springer;European Society for Population Economics, vol. 35(4), pages 1777-1819, October.
    11. Albanese, Andrea & Cockx, Bart & Dejemeppe, Muriel, 2024. "Long-term effects of hiring subsidies for low-educated unemployed youths," Journal of Public Economics, Elsevier, vol. 235(C).
    12. Brasington, David M. & Parent, Olivier, 2024. "Fire protection services and house prices: A regression discontinuity investigation," Regional Science and Urban Economics, Elsevier, vol. 105(C).
    13. Bedri Kamil Onur Tas, 2023. "Bunching below thresholds to manipulate public procurement," Empirical Economics, Springer, vol. 64(1), pages 303-319, January.
    14. Meltem Dayioglu & Müşerref Küçükbayrak & Semih Tumen, 2022. "The impact of age-specific minimum wages on youth employment and education: a regression discontinuity analysis," International Journal of Manpower, Emerald Group Publishing Limited, vol. 43(6), pages 1352-1377, March.
    15. Albanese, Andrea & Fallucchi, Francesco & Verheyden, Bertrand, 2021. "Can a supranational medicines agency restore trust after vaccine suspensions? The case of Vaxzevria," GLO Discussion Paper Series 878, Global Labor Organization (GLO).
    16. Takuya Ishihara & Masayuki Sawada, 2020. "Manipulation-Robust Regression Discontinuity Designs," Papers 2009.07551, arXiv.org, revised Sep 2024.
    17. Luca Bellodi & Massimo Morelli & Matia Vannoni, 2021. "A Costly Commitment: Populism, Government Performance, and the Quality of Bureaucracy," CESifo Working Paper Series 9470, CESifo.
    18. Miguel Fajardo-Steinhauser, 2023. "Peace Dividends: The Economic Effects of Colombia's Peace Agreement," Papers 2301.01843, arXiv.org.
    19. Brodeur, Abel & Cook, Nikolai & Heyes, Anthony, 2022. "We Need to Talk about Mechanical Turk: What 22,989 Hypothesis Tests Tell us about p-Hacking and Publication Bias in Online Experiments," GLO Discussion Paper Series 1157, Global Labor Organization (GLO).
    20. Peveri, Julieta & Sangnier, Marc, 2023. "Gender differences in re-contesting decisions: New evidence from French municipal elections," Journal of Economic Behavior & Organization, Elsevier, vol. 214(C), pages 574-594.
    21. Claire MacPherson & Olivier Sterck, 2019. "Humanitarian vs. Development Aid for Refugees: Evidence from a Regression Discontinuity Design," CSAE Working Paper Series 2019-15, Centre for the Study of African Economies, University of Oxford.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijds:v:3:y:2024:i:1:p:28-48. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.