IDEAS home Printed from https://ideas.repec.org/a/inm/orijds/v3y2024i1p28-48.html
   My bibliography  Save this article

Sparse Density Trees and Lists: An Interpretable Alternative to High-Dimensional Histograms

Author

Listed:
  • Siong Thye Goh

    (Lee Kong Chian School of Business, Singapore Management University, Singapore 178899)

  • Lesia Semenova

    (Department of Computer Science, Duke University, Durham, North Carolina 27708)

  • Cynthia Rudin

    (Department of Computer Science, Duke University, Durham, North Carolina 27708)

Abstract

We present sparse tree-based and list-based density estimation methods for binary/categorical data. Our density estimation models are higher-dimensional analogies to variable bin-width histograms. In each leaf of the tree (or list), the density is constant, similar to the flat density within the bin of a histogram. Histograms, however, cannot easily be visualized in more than two dimensions, whereas our models can. The accuracy of histograms fades as dimensions increase, whereas our models have priors that help with generalization. Our models are sparse, unlike high-dimensional fixed-bin histograms. We present three generative modeling methods, where the first one allows the user to specify the preferred number of leaves in the tree within a Bayesian prior. The second method allows the user to specify the preferred number of branches within the prior. The third method returns density lists (rather than trees) and allows the user to specify the preferred number of rules and the length of rules within the prior. The new approaches often yield a better balance between sparsity and accuracy of density estimates than other methods for this task. We present an application to crime analysis, where we estimate how unusual each type of modus operandi is for a house break-in.

Suggested Citation

  • Siong Thye Goh & Lesia Semenova & Cynthia Rudin, 2024. "Sparse Density Trees and Lists: An Interpretable Alternative to High-Dimensional Histograms," INFORMS Joural on Data Science, INFORMS, vol. 3(1), pages 28-48, April.
  • Handle: RePEc:inm:orijds:v:3:y:2024:i:1:p:28-48
    DOI: 10.1287/ijds.2021.0001
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/ijds.2021.0001
    Download Restriction: no

    File URL: https://libkey.io/10.1287/ijds.2021.0001?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Matias D. Cattaneo & Michael Jansson & Xinwei Ma, 2020. "Simple Local Polynomial Density Estimators," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(531), pages 1449-1455, July.
    2. Kaiyuan Wu & Wei Hou & Hongbo Yang, 2018. "Density estimation via the random forest method," Communications in Statistics - Theory and Methods, Taylor & Francis Journals, vol. 47(4), pages 877-889, February.
    3. Luo Lu & Hui Jiang & Wing H. Wong, 2013. "Multivariate Density Estimation by Bayesian Sequential Partitioning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(504), pages 1402-1410, December.
    4. Tao Chen & Julian Morris & Elaine Martin, 2006. "Probability density estimation via an infinite Gaussian mixture model: application to statistical process monitoring," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 55(5), pages 699-715, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Francesco Decarolis & Raymond Fisman & Paolo Pinotti & Silvia Vannutelli, 2019. "Rules, Discretion, and Corruption in Procurement: Evidence from Italian Government Contracting," Boston University - Department of Economics - The Institute for Economic Development Working Papers Series dp-344, Boston University - Department of Economics.
    2. Luca Bellodi & Frederic Docquier & Stefano Iandolo & Massimo Morelli & Riccardo Turati, 2024. "Digging Up Trenches: Populism, Selective Mobility, and the Political Polarization of Italian Municipalities," BAFFI CAREFIN Working Papers 24216, BAFFI CAREFIN, Centre for Applied Research on International Markets Banking Finance and Regulation, Universita' Bocconi, Milano, Italy.
    3. Eibich, Peter & Siedler, Thomas, 2020. "Retirement, intergenerational time transfers, and fertility," European Economic Review, Elsevier, vol. 124(C).
    4. Yoichi Arai & Yu‐Chin Hsu & Toru Kitagawa & Ismael Mourifié & Yuanyuan Wan, 2022. "Testing identifying assumptions in fuzzy regression discontinuity designs," Quantitative Economics, Econometric Society, vol. 13(1), pages 1-28, January.
    5. Luis R. Martinez & Jonas Jessen & Guo Xu, 2023. "A Glimpse of Freedom: Allied Occupation and Political Resistance in East Germany," American Economic Journal: Applied Economics, American Economic Association, vol. 15(1), pages 68-106, January.
    6. Bagues, Manuel & Campa, Pamela, 2021. "Can gender quotas in candidate lists empower women? Evidence from a regression discontinuity design," Journal of Public Economics, Elsevier, vol. 194(C).
    7. Aaron Albert & Nathan Wozny, 2024. "The Impact of Academic Probation: Do Intensive Interventions Help?," Journal of Human Resources, University of Wisconsin Press, vol. 59(3), pages 852-878.
    8. Augusto Cerqua & Guido Pellegrini & Ornella Tarola, 2022. "Can regional policies shape migration flows?," Papers in Regional Science, Wiley Blackwell, vol. 101(3), pages 515-536, June.
    9. Kirschenmann, T.H. & Damien, P. & Walker, S.G., 2015. "A note on the e–a histogram," Statistics & Probability Letters, Elsevier, vol. 103(C), pages 105-109.
    10. Federico Cingano & Filippo Palomba & Paolo Pinotti & Enrico Rettore, 2022. "Making Subsidies Work: Rules vs. Discretion," CESifo Working Paper Series 9560, CESifo.
    11. Annika Lindskog & Dick Durevall, 2021. "To educate a woman and to educate a man: Gender‐specific sexual behavior and human immunodeficiency virus responses to an education reform in Botswana," Health Economics, John Wiley & Sons, Ltd., vol. 30(3), pages 642-658, March.
    12. De Benedetto, Marco Alberto & De Paola, Maria & Scoppa, Vincenzo & Smirnova, Janna, 2022. "The long-run effects of college remedial education," Economics Letters, Elsevier, vol. 216(C).
    13. Gonzalez, Robert & Maffioli, Elisa M., 2024. "Is the phone mightier than the virus? Cellphone access and epidemic containment efforts," Journal of Development Economics, Elsevier, vol. 167(C).
    14. Albanese, Andrea & Picchio, Matteo & Ghirelli, Corinna, 2020. "Timed to Say Goodbye: Does Unemployment Benefit Eligibility Affect Worker Layoffs?," Labour Economics, Elsevier, vol. 65(C).
    15. Matteo Bobba & Tim Ederer & Gianmarco León-Ciliotta & Christopher A. Neilson & Marco Nieddu, 2021. "Teacher compensation and structural inequality: Evidence from centralized teacher school choice in Perú," Economics Working Papers 1788, Department of Economics and Business, Universitat Pompeu Fabra.
    16. Canaan, Serena & Mouganie, Pierre & Zhang, Peng, 2022. "The Long-Run Educational Benefits of High-Achieving Classrooms," IZA Discussion Papers 15039, Institute of Labor Economics (IZA).
    17. Johnsen, Julian V. & Willén, Alexander, 2022. "The effect of negative income shocks on pensioners," Labour Economics, Elsevier, vol. 76(C).
    18. Abel Brodeur, Nikolai M. Cook, Anthony Heyes, 2022. "We Need to Talk about Mechanical Turk: What 22,989 Hypothesis Tests Tell Us about Publication Bias and p-Hacking in Online Experiments," LCERPA Working Papers am0133, Laurier Centre for Economic Research and Policy Analysis.
    19. Chen, Tao & Martin, Elaine & Montague, Gary, 2009. "Robust probabilistic PCA with missing data and contribution analysis for outlier detection," Computational Statistics & Data Analysis, Elsevier, vol. 53(10), pages 3706-3716, August.
    20. Abel Brodeur & Scott Carrell & David Figlio & Lester Lusher, 2023. "Unpacking P-hacking and Publication Bias," American Economic Review, American Economic Association, vol. 113(11), pages 2974-3002, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijds:v:3:y:2024:i:1:p:28-48. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.