IDEAS home Printed from https://ideas.repec.org/a/gam/jstats/v7y2024i3p61-1050d1479789.html
   My bibliography  Save this article

Copula Approximate Bayesian Computation Using Distribution Random Forests

Author

Listed:
  • George Karabatsos

    (Department of Mathematics, Statistics and Computer Science, University of Illinois at Chicago, 1040 W. Harrison St. (MC 147), Chicago, IL 60607, USA
    Department of Educational Psychology in Statistics and Measurement, University of Illinois at Chicago, 1040 W. Harrison St. (MC 147), Chicago, IL 60607, USA)

Abstract

Ongoing modern computational advancements continue to make it easier to collect increasingly large and complex datasets, which can often only be realistically analyzed using models defined by intractable likelihood functions. This Stats invited feature article introduces and provides an extensive simulation study of a new approximate Bayesian computation (ABC) framework for estimating the posterior distribution and the maximum likelihood estimate (MLE) of the parameters of models defined by intractable likelihoods, that unifies and extends previous ABC methods proposed separately. This framework, copulaABCdrf, aims to accurately estimate and describe the possibly skewed and high-dimensional posterior distribution by a novel multivariate copula-based meta- t distribution based on univariate marginal posterior distributions that can be accurately estimated by distribution random forests (drf), while performing automatic summary statistics (covariates) selection, based on robustly estimated copula dependence parameters. The copulaABCdrf framework also provides a novel multivariate mode estimator to perform MLE and posterior mode estimation and an optional step to perform model selection from a given set of models using posterior probabilities estimated by drf. The posterior distribution estimation accuracy of the ABC framework is illustrated and compared with previous standard ABC methods through several simulation studies involving low- and high-dimensional models with computable posterior distributions, which are either unimodal, skewed, or multimodal; and exponential random graph and mechanistic network models, each defined by an intractable likelihood from which it is costly to simulate large network datasets. This paper also proposes and studies a new solution to the simulation cost problem in ABC involving the posterior estimation of parameters from datasets simulated from the given model that are smaller compared to the potentially large size of the dataset being analyzed. This proposal is motivated by the fact that, for many models defined by intractable likelihoods, such as the network models when they are applied to analyze massive networks, the repeated simulation of large datasets (networks) for posterior-based parameter estimation can be too computationally costly and vastly slow down or prohibit the use of standard ABC methods. The copulaABCdrf framework and standard ABC methods are further illustrated through analyses of large real-life networks of sizes ranging between 28,000 and 65.6 million nodes (between 3 million and 1.8 billion edges), including a large multilayer network with weighted directed edges. The results of the simulation studies show that, in settings where the true posterior distribution is not highly multimodal, copulaABCdrf usually produced similar point estimates from the posterior distribution for low-dimensional parametric models as previous ABC methods, but the copula-based method can produce more accurate estimates from the posterior distribution for high-dimensional models, and, in both dimensionality cases, usually produced more accurate estimates of univariate marginal posterior distributions of parameters. Also, posterior estimation accuracy was usually improved when pre-selecting the important summary statistics using drf compared to ABC employing no pre-selection of the subset of important summaries. For all ABC methods studied, accurate estimation of a highly multimodal posterior distribution was challenging. In light of the results of all the simulation studies, this article concludes by discussing how the copulaABCdrf framework can be improved for future research.

Suggested Citation

  • George Karabatsos, 2024. "Copula Approximate Bayesian Computation Using Distribution Random Forests," Stats, MDPI, vol. 7(3), pages 1-49, September.
  • Handle: RePEc:gam:jstats:v:7:y:2024:i:3:p:61-1050:d:1479789
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2571-905X/7/3/61/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2571-905X/7/3/61/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. McFadden, Daniel, 1989. "A Method of Simulated Moments for Estimation of Discrete Response Models without Numerical Integration," Econometrica, Econometric Society, vol. 57(5), pages 995-1026, September.
    2. Yuan Gao & Weidong Liu & Hansheng Wang & Xiaozhou Wang & Yibo Yan & Riquan Zhang, 2022. "A review of distributed statistical inference," Statistical Theory and Related Fields, Taylor & Francis Journals, vol. 6(2), pages 89-99, May.
    3. Peter Xue‐Kun Song, 2000. "Multivariate Dispersion Models Generated From Gaussian Copula," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 27(2), pages 305-320, June.
    4. Genest, Christian & Nešlehová, Johanna, 2007. "A Primer on Copulas for Count Data," ASTIN Bulletin, Cambridge University Press, vol. 37(2), pages 475-515, November.
    5. Wentao Li & Paul Fearnhead, 2018. "On the asymptotic efficiency of approximate Bayesian computation estimators," Biometrika, Biometrika Trust, vol. 105(2), pages 285-299.
    6. Lee, Jihui & Li, Gen & Wilson, James D., 2020. "Varying-coefficient models for dynamic networks," Computational Statistics & Data Analysis, Elsevier, vol. 152(C).
    7. Yuan Gao & Weidong Liu & Hansheng Wang & Xiaozhou Wang & Yibo Yan & Riquan Zhang, 2022. "Rejoinder on ‘A review of distributed statistical inference’," Statistical Theory and Related Fields, Taylor & Francis Journals, vol. 6(2), pages 111-113, May.
    8. Fang, Hong-Bin & Fang, Kai-Tai & Kotz, Samuel, 2002. "The Meta-elliptical Distributions with Given Marginals," Journal of Multivariate Analysis, Elsevier, vol. 82(1), pages 1-16, July.
    9. Duncan J. Watts & Steven H. Strogatz, 1998. "Collective dynamics of ‘small-world’ networks," Nature, Nature, vol. 393(6684), pages 440-442, June.
    10. Michael Pitt & David Chan & Robert Kohn, 2006. "Efficient Bayesian inference for Gaussian copula regression models," Biometrika, Biometrika Trust, vol. 93(3), pages 537-554, September.
    11. Paul Fearnhead & Dennis Prangle, 2012. "Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 74(3), pages 419-474, June.
    12. Pavel N. Krivitsky & Mark S. Handcock, 2014. "A separable model for dynamic networks," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(1), pages 29-46, January.
    13. Denuit, Michel & Lambert, Philippe, 2005. "Constraints on concordance measures in bivariate discrete data," Journal of Multivariate Analysis, Elsevier, vol. 93(1), pages 40-57, March.
    14. Lin, Yi & Jeon, Yongho, 2006. "Random Forests and Adaptive Nearest Neighbors," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 578-590, June.
    15. L. Madsen & Y. Fang, 2011. "Joint Regression Analysis for Discrete Longitudinal Data," Biometrics, The International Biometric Society, vol. 67(3), pages 1171-1175, September.
    16. Stanley Wasserman & Philippa Pattison, 1996. "Logit models and logistic regressions for social networks: I. An introduction to Markov graphs andp," Psychometrika, Springer;The Psychometric Society, vol. 61(3), pages 401-425, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. L. L. Henn, 2022. "Limitations and performance of three approaches to Bayesian inference for Gaussian copula regression models of discrete data," Computational Statistics, Springer, vol. 37(2), pages 909-946, April.
    2. Smith, Michael Stanley, 2023. "Implicit Copulas: An Overview," Econometrics and Statistics, Elsevier, vol. 28(C), pages 81-104.
    3. Michael Stanley Smith, 2021. "Implicit Copulas: An Overview," Papers 2109.04718, arXiv.org.
    4. Jong-Min Kim & Hyunsu Ju & Yoonsung Jung, 2020. "Copula Approach for Developing a Biomarker Panel for Prediction of Dengue Hemorrhagic Fever," Annals of Data Science, Springer, vol. 7(4), pages 697-712, December.
    5. Aristidis Nikoloulopoulos & Dimitris Karlis, 2010. "Regression in a copula model for bivariate count data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 37(9), pages 1555-1568.
    6. Fokianos, Konstantinos, 2024. "Multivariate Count Time Series Modelling," Econometrics and Statistics, Elsevier, vol. 31(C), pages 100-116.
    7. Nadja Klein & Michael Stanley Smith & David J. Nott, 2020. "Deep Distributional Time Series Models and the Probabilistic Forecasting of Intraday Electricity Prices," Papers 2010.01844, arXiv.org, revised May 2021.
    8. Shi, Peng & Valdez, Emiliano A., 2014. "Multivariate negative binomial models for insurance claim counts," Insurance: Mathematics and Economics, Elsevier, vol. 55(C), pages 18-29.
    9. Craiu, V. Radu & Sabeti, Avideh, 2012. "In mixed company: Bayesian inference for bivariate conditional copula models with discrete and continuous outcomes," Journal of Multivariate Analysis, Elsevier, vol. 110(C), pages 106-120.
    10. Samrachana Adhikari & Beau Dabbs, 2018. "Social Network Analysis in R: A Software Review," Journal of Educational and Behavioral Statistics, , vol. 43(2), pages 225-253, April.
    11. Lu Yang & Claudia Czado, 2022. "Two‐part D‐vine copula models for longitudinal insurance claim data," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(4), pages 1534-1561, December.
    12. Azam, Kazim, 2014. "Effects of Marginal Specifcations on Copula Estimation," Economic Research Papers 270230, University of Warwick - Department of Economics.
    13. Xiang, Pengcheng & Zhou, Ling & Tang, Lu, 2024. "Transfer learning via random forests: A one-shot federated approach," Computational Statistics & Data Analysis, Elsevier, vol. 197(C).
    14. Li, J. & Nott, D.J. & Fan, Y. & Sisson, S.A., 2017. "Extending approximate Bayesian computation methods to high dimensions via a Gaussian copula model," Computational Statistics & Data Analysis, Elsevier, vol. 106(C), pages 77-89.
    15. Azam, Kazim & Pitt, Michael, 2014. "Bayesian Inference for a Semi-Parametric Copula-based Markov Chain," The Warwick Economics Research Paper Series (TWERPS) 1051, University of Warwick, Department of Economics.
    16. Gael M. Martin & David T. Frazier & Christian P. Robert, 2020. "Computing Bayes: Bayesian Computation from 1763 to the 21st Century," Monash Econometrics and Business Statistics Working Papers 14/20, Monash University, Department of Econometrics and Business Statistics.
    17. Gaonkar, Shweta & Mele, Angelo, 2023. "A model of inter-organizational network formation," Journal of Economic Behavior & Organization, Elsevier, vol. 214(C), pages 82-104.
    18. Geenens Gery, 2020. "Copula modeling for discrete random vectors," Dependence Modeling, De Gruyter, vol. 8(1), pages 417-440, January.
    19. Shirong Zhao & Jeremy Losak, 2024. "Two-tiered stochastic frontier models: a Bayesian perspective," Journal of Productivity Analysis, Springer, vol. 61(2), pages 85-106, April.
    20. Fokianos, Konstantinos & Fried, Roland & Kharin, Yuriy & Voloshko, Valeriy, 2022. "Statistical analysis of multivariate discrete-valued time series," Journal of Multivariate Analysis, Elsevier, vol. 188(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jstats:v:7:y:2024:i:3:p:61-1050:d:1479789. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.