IDEAS home Printed from https://ideas.repec.org/a/bla/jorssc/v71y2022i1p194-218.html
   My bibliography  Save this article

Pólya‐gamma data augmentation and latent variable models for multivariate binomial data

Author

Listed:
  • John B. Holmes
  • Matthew R. Schofield
  • Richard J. Barker

Abstract

New Zealand police has long been suspected of systematic bias against the indigenous Māori. One resource available to investigate this possibility is the annual counts of police apprehensions and prosecutions, by offence type. However, model specification/fitting is complicated as these data are constrained counts, interdependent and multivariate. For example, there are limited options for factor models beyond continuous or binary data. This is a serious limitation for in our dataset, while measurements are clustered, different individuals are measured at each variable. Focusing on principal component/factor analysis representations, we show that under the canonical logit link, latent variable models can be fitted via Gibbs sampling, to multivariate binomial data of arbitrary trial size by applying Pólya‐gamma augmentation to the binomial likelihood. We demonstrate that this modelling approach, by incorporating shrinkage, will produce a fit with lower mean square error than techniques based on deviance minimization commonly employed for binary datasets. By exploring theoretical properties of the proposed models, we demonstrate a larger range of latent structures can be estimated and the presence of hidden replication improves prediction when data are multivariate binomial, which gives us greater flexibility for investigating associations between ethnicity and prosecution probability.

Suggested Citation

  • John B. Holmes & Matthew R. Schofield & Richard J. Barker, 2022. "Pólya‐gamma data augmentation and latent variable models for multivariate binomial data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(1), pages 194-218, January.
  • Handle: RePEc:bla:jorssc:v:71:y:2022:i:1:p:194-218
    DOI: 10.1111/rssc.12528
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssc.12528
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssc.12528?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Conti, Gabriella & Frühwirth-Schnatter, Sylvia & Heckman, James J. & Piatek, Rémi, 2014. "Bayesian exploratory factor analysis," Journal of Econometrics, Elsevier, vol. 183(1), pages 31-57.
    2. P. Richard Hahn & Carlos M. Carvalho & James G. Scott, 2012. "A sparse factor analytic probit model for congressional voting patterns," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 61(4), pages 619-635, August.
    3. repec:bfi:wpaper:2014-014 is not listed on IDEAS
    4. Nick Patterson & Alkes L Price & David Reich, 2006. "Population Structure and Eigenanalysis," PLOS Genetics, Public Library of Science, vol. 2(12), pages 1-20, December.
    5. Asim Ansari & Kamel Jedidi, 2000. "Bayesian factor analysis for multilevel binary observations," Psychometrika, Springer;The Psychometric Society, vol. 65(4), pages 475-496, December.
    6. de Leeuw, Jan, 2006. "Principal component analysis of binary data by iterated singular value decomposition," Computational Statistics & Data Analysis, Elsevier, vol. 50(1), pages 21-39, January.
    7. Brent A. Coull & Alan Agresti, 2000. "Random Effects Modeling of Multiple Binomial Responses Using the Multivariate Binomial Logit-Normal Distribution," Biometrics, The International Biometric Society, vol. 56(1), pages 73-80, March.
    8. Caughey, Devin & Warshaw, Christopher, 2015. "Dynamic Estimation of Latent Opinion Using a Hierarchical Group-Level IRT Model," Political Analysis, Cambridge University Press, vol. 23(2), pages 197-211, April.
    9. Nicholas G. Polson & James G. Scott & Jesse Windle, 2013. "Bayesian Inference for Logistic Models Using Pólya--Gamma Latent Variables," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(504), pages 1339-1349, December.
    10. Li Cai, 2010. "Metropolis-Hastings Robbins-Monro Algorithm for Confirmatory Item Factor Analysis," Journal of Educational and Behavioral Statistics, , vol. 35(3), pages 307-335, June.
    11. Michael E. Tipping & Christopher M. Bishop, 1999. "Probabilistic Principal Component Analysis," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(3), pages 611-622.
    12. Anders Christoffersson, 1975. "Factor analysis of dichotomized variables," Psychometrika, Springer;The Psychometric Society, vol. 40(1), pages 5-32, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Veronika Ročková & Edward I. George, 2016. "Fast Bayesian Factor Analysis via Automatic Rotations to Sparsity," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1608-1622, October.
    2. Aman Agrawal & Alec M Chiu & Minh Le & Eran Halperin & Sriram Sankararaman, 2020. "Scalable probabilistic PCA for large-scale genetic variation data," PLOS Genetics, Public Library of Science, vol. 16(5), pages 1-19, May.
    3. Ji Seung Yang & Li Cai, 2014. "Estimation of Contextual Effects Through Nonlinear Multilevel Latent Variable Modeling With a Metropolis–Hastings Robbins–Monro Algorithm," Journal of Educational and Behavioral Statistics, , vol. 39(6), pages 550-582, December.
    4. Landgraf, Andrew J. & Lee, Yoonkyung, 2020. "Dimensionality reduction for binary data through the projection of natural parameters," Journal of Multivariate Analysis, Elsevier, vol. 180(C).
    5. Buddhavarapu, Prasad & Bansal, Prateek & Prozzi, Jorge A., 2021. "A new spatial count data model with time-varying parameters," Transportation Research Part B: Methodological, Elsevier, vol. 150(C), pages 566-586.
    6. Oriana Bandiera & Andrea Prat & Stephen Hansen & Raffaella Sadun, 2020. "CEO Behavior and Firm Performance," Journal of Political Economy, University of Chicago Press, vol. 128(4), pages 1325-1369.
    7. Niko Hauzenberger & Florian Huber, 2020. "Model instability in predictive exchange rate regressions," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 39(2), pages 168-186, March.
    8. Wang, Zihan & Daeipour, Mohamad & Xu, Hongyi, 2023. "Quantification and propagation of Aleatoric uncertainties in topological structures," Reliability Engineering and System Safety, Elsevier, vol. 233(C).
    9. de Leeuw, Jan & Lange, Kenneth, 2009. "Sharp quadratic majorization in one dimension," Computational Statistics & Data Analysis, Elsevier, vol. 53(7), pages 2471-2484, May.
    10. Rub'en Loaiza-Maya & Didier Nibbering, 2022. "Fast variational Bayes methods for multinomial probit models," Papers 2202.12495, arXiv.org, revised Oct 2022.
    11. Klaus Wälde, 2016. "Emotion Research in Economics," Working Papers 1611, Gutenberg School of Management and Economics, Johannes Gutenberg-Universität Mainz.
    12. Pablo Pereira Álvarez & Pierre Kerfriden & David Ryckelynck & Vincent Robin, 2021. "Real-Time Data Assimilation in Welding Operations Using Thermal Imaging and Accelerated High-Fidelity Digital Twinning," Mathematics, MDPI, vol. 9(18), pages 1-25, September.
    13. Gyaneshwer Chaubey & Anurag Kadian & Saroj Bala & Vadlamudi Raghavendra Rao, 2015. "Genetic Affinity of the Bhil, Kol and Gond Mentioned in Epic Ramayana," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-11, June.
    14. Fokoué, Ernest, 2005. "Mixtures of factor analyzers: an extension with covariates," Journal of Multivariate Analysis, Elsevier, vol. 95(2), pages 370-384, August.
    15. Anindya Bhadra & Arvind Rao & Veerabhadran Baladandayuthapani, 2018. "Inferring network structure in non†normal and mixed discrete†continuous genomic data," Biometrics, The International Biometric Society, vol. 74(1), pages 185-195, March.
    16. Daniel Svensson & Matilda Rentoft & Anna M Dahlin & Emma Lundholm & Pall I Olason & Andreas Sjödin & Carin Nylander & Beatrice S Melin & Johan Trygg & Erik Johansson, 2020. "A whole-genome sequenced control population in northern Sweden reveals subregional genetic differences," PLOS ONE, Public Library of Science, vol. 15(9), pages 1-18, September.
    17. Haoying Wang & Guohui Wu, 2022. "Modeling discrete choices with large fine-scale spatial data: opportunities and challenges," Journal of Geographical Systems, Springer, vol. 24(3), pages 325-351, July.
    18. Junyang Qian & Yosuke Tanigawa & Wenfei Du & Matthew Aguirre & Chris Chang & Robert Tibshirani & Manuel A Rivas & Trevor Hastie, 2020. "A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank," PLOS Genetics, Public Library of Science, vol. 16(10), pages 1-30, October.
    19. Lee, Sik-Yum & Song, Xin-Yuan, 2008. "On Bayesian estimation and model comparison of an integrated structural equation model," Computational Statistics & Data Analysis, Elsevier, vol. 52(10), pages 4814-4827, June.
    20. Sahar Zarmehri & Ephraim M. Hanks & Lin Lin, 2021. "A Sample Covariance-Based Approach For Spatial Binary Data," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 26(2), pages 220-249, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssc:v:71:y:2022:i:1:p:194-218. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.