IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2309.08755.html
   My bibliography  Save this paper

Ordered Correlation Forest

Author

Listed:
  • Riccardo Di Francesco

Abstract

Empirical studies in various social sciences often involve categorical outcomes with inherent ordering, such as self-evaluations of subjective well-being and self-assessments in health domains. While ordered choice models, such as the ordered logit and ordered probit, are popular tools for analyzing these outcomes, they may impose restrictive parametric and distributional assumptions. This paper introduces a novel estimator, the ordered correlation forest, that can naturally handle non-linearities in the data and does not assume a specific error term distribution. The proposed estimator modifies a standard random forest splitting criterion to build a collection of forests, each estimating the conditional probability of a single class. Under an "honesty" condition, predictions are consistent and asymptotically normal. The weights induced by each forest are used to obtain standard errors for the predicted probabilities and the covariates' marginal effects. Evidence from synthetic data shows that the proposed estimator features a superior prediction performance than alternative forest-based estimators and demonstrates its ability to construct valid confidence intervals for the covariates' marginal effects.

Suggested Citation

  • Riccardo Di Francesco, 2023. "Ordered Correlation Forest," Papers 2309.08755, arXiv.org.
  • Handle: RePEc:arx:papers:2309.08755
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2309.08755
    File Function: Latest version
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Franco Peracchi & Claudio Rossetti, 2013. "The heterogeneous thresholds ordered response model: identification and inference," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 176(3), pages 703-722, June.
    2. Alexandre Belloni & Victor Chernozhukov, 2011. "High Dimensional Sparse Econometric Models: An Introduction," Papers 1106.5242, arXiv.org, revised Sep 2011.
    3. Bruno S. Frey & Alois Stutzer, 2002. "What Can Economists Learn from Happiness Research?," Journal of Economic Literature, American Economic Association, vol. 40(2), pages 402-435, June.
    4. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    5. Franco Peracchi & Claudio Rossetti, 2012. "Heterogeneity in health responses and anchoring vignettes," Empirical Economics, Springer, vol. 42(2), pages 513-538, April.
    6. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2011. "Inference for High-Dimensional Sparse Econometric Models," Papers 1201.0220, arXiv.org.
    7. Janitza, Silke & Tutz, Gerhard & Boulesteix, Anne-Laure, 2016. "Random forest for ordinal responses: Prediction and variable selection," Computational Statistics & Data Analysis, Elsevier, vol. 96(C), pages 57-73.
    8. Lechner, Michael & Okasa, Gabriel, 2019. "Random Forest Estimation of the Ordered Choice Model," Economics Working Paper Series 1908, University of St. Gallen, School of Economics and Political Science.
    9. Wright, Marvin N. & Ziegler, Andreas, 2017. "ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 77(i01).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bertoni, Marco, 2015. "Hungry today, unhappy tomorrow? Childhood hunger and subjective wellbeing later in life," Journal of Health Economics, Elsevier, vol. 40(C), pages 40-53.
    2. Lechner, Michael & Okasa, Gabriel, 2019. "Random Forest Estimation of the Ordered Choice Model," Economics Working Paper Series 1908, University of St. Gallen, School of Economics and Political Science.
    3. Gabriel Okasa, 2022. "Meta-Learners for Estimation of Causal Effects: Finite Sample Cross-Fit Performance," Papers 2201.12692, arXiv.org.
    4. William H. Greene & Mark N. Harris & Rachel J. Knott & Nigel Rice, 2021. "Specification and testing of hierarchical ordered response models with anchoring vignettes," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(1), pages 31-64, January.
    5. Bokelmann, Björn & Lessmann, Stefan, 2024. "Improving uplift model evaluation on randomized controlled trial data," European Journal of Operational Research, Elsevier, vol. 313(2), pages 691-707.
    6. Ning Xu & Jian Hong & Timothy C. G. Fisher, 2016. "Model selection consistency from the perspective of generalization ability and VC theory with an application to Lasso," Papers 1606.00142, arXiv.org.
    7. Domenico Giannone & Michele Lenza & Giorgio E. Primiceri, 2021. "Economic Predictions With Big Data: The Illusion of Sparsity," Econometrica, Econometric Society, vol. 89(5), pages 2409-2437, September.
    8. Philipp Bach & Victor Chernozhukov & Malte S. Kurz & Martin Spindler & Sven Klaassen, 2021. "DoubleML -- An Object-Oriented Implementation of Double Machine Learning in R," Papers 2103.09603, arXiv.org, revised Jun 2024.
    9. Damian Kozbur, 2013. "Inference in additively separable models with a high-dimensional set of conditioning variables," ECON - Working Papers 284, Department of Economics - University of Zurich, revised Apr 2018.
    10. de Paula, Aureo & Rasul, Imran & Souza, Pedro, 2018. "Identifying Network Ties from Panel Data: Theory and an Application to Tax Competition," CEPR Discussion Papers 12792, C.E.P.R. Discussion Papers.
    11. Roman Hornung, 2020. "Ordinal Forests," Journal of Classification, Springer;The Classification Society, vol. 37(1), pages 4-17, April.
    12. Alexandre Belloni & Victor Chernozhukov & Lie Wang, 2013. "Pivotal estimation via square-root lasso in nonparametric regression," CeMMAP working papers CWP62/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    13. Huber, Martin & Meier, Jonas & Wallimann, Hannes, 2022. "Business analytics meets artificial intelligence: Assessing the demand effects of discounts on Swiss train tickets," Transportation Research Part B: Methodological, Elsevier, vol. 163(C), pages 22-39.
    14. Knott, Rachel J. & Lorgelly, Paula K. & Black, Nicole & Hollingsworth, Bruce, 2017. "Differential item functioning in quality of life measurement: An analysis using anchoring vignettes," Social Science & Medicine, Elsevier, vol. 190(C), pages 247-255.
    15. Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
    16. Kock, Anders Bredahl, 2016. "Oracle inequalities, variable selection and uniform inference in high-dimensional correlated random effects panel data models," Journal of Econometrics, Elsevier, vol. 195(1), pages 71-85.
    17. Federico A. Bugni & Mehmet Caner & Anders Bredahl Kock & Soumendra Lahiri, 2016. "Inference in partially identified models with many moment inequalities using Lasso," CREATES Research Papers 2016-12, Department of Economics and Business Economics, Aarhus University.
    18. Daniel Felix Ahelegbey & Monica Billio & Roberto Casarin, 2016. "Sparse Graphical Vector Autoregression: A Bayesian Approach," Annals of Economics and Statistics, GENES, issue 123-124, pages 333-361.
    19. André Nunes Maranhão & Nicole Rennó Castro, 2023. "Dissecting Brazilian agriculture business cycles in high-dimensional and time-irregular span contexts," Empirical Economics, Springer, vol. 65(4), pages 1543-1578, October.
    20. Arthur van Soest & Hana Vonkova, 2014. "Testing the specification of parametric models by using anchoring vignettes," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 177(1), pages 115-133, January.

    More about this item

    JEL classification:

    • C14 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Semiparametric and Nonparametric Methods: General
    • C25 - Mathematical and Quantitative Methods - - Single Equation Models; Single Variables - - - Discrete Regression and Qualitative Choice Models; Discrete Regressors; Proportions; Probabilities
    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2309.08755. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.