IDEAS home Printed from https://ideas.repec.org/a/bla/jorssc/v71y2022i1p194-218.html
   My bibliography  Save this article

Pólya‐gamma data augmentation and latent variable models for multivariate binomial data

Author

Listed:
  • John B. Holmes
  • Matthew R. Schofield
  • Richard J. Barker

Abstract

New Zealand police has long been suspected of systematic bias against the indigenous Māori. One resource available to investigate this possibility is the annual counts of police apprehensions and prosecutions, by offence type. However, model specification/fitting is complicated as these data are constrained counts, interdependent and multivariate. For example, there are limited options for factor models beyond continuous or binary data. This is a serious limitation for in our dataset, while measurements are clustered, different individuals are measured at each variable. Focusing on principal component/factor analysis representations, we show that under the canonical logit link, latent variable models can be fitted via Gibbs sampling, to multivariate binomial data of arbitrary trial size by applying Pólya‐gamma augmentation to the binomial likelihood. We demonstrate that this modelling approach, by incorporating shrinkage, will produce a fit with lower mean square error than techniques based on deviance minimization commonly employed for binary datasets. By exploring theoretical properties of the proposed models, we demonstrate a larger range of latent structures can be estimated and the presence of hidden replication improves prediction when data are multivariate binomial, which gives us greater flexibility for investigating associations between ethnicity and prosecution probability.

Suggested Citation

  • John B. Holmes & Matthew R. Schofield & Richard J. Barker, 2022. "Pólya‐gamma data augmentation and latent variable models for multivariate binomial data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(1), pages 194-218, January.
  • Handle: RePEc:bla:jorssc:v:71:y:2022:i:1:p:194-218
    DOI: 10.1111/rssc.12528
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssc.12528
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssc.12528?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. repec:bfi:wpaper:2014-014 is not listed on IDEAS
    2. Conti, Gabriella & Frühwirth-Schnatter, Sylvia & Heckman, James J. & Piatek, Rémi, 2014. "Bayesian exploratory factor analysis," Journal of Econometrics, Elsevier, vol. 183(1), pages 31-57.
    3. Nick Patterson & Alkes L Price & David Reich, 2006. "Population Structure and Eigenanalysis," PLOS Genetics, Public Library of Science, vol. 2(12), pages 1-20, December.
    4. Nicholas G. Polson & James G. Scott & Jesse Windle, 2013. "Bayesian Inference for Logistic Models Using Pólya--Gamma Latent Variables," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(504), pages 1339-1349, December.
    5. Michael E. Tipping & Christopher M. Bishop, 1999. "Probabilistic Principal Component Analysis," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(3), pages 611-622.
    6. de Leeuw, Jan, 2006. "Principal component analysis of binary data by iterated singular value decomposition," Computational Statistics & Data Analysis, Elsevier, vol. 50(1), pages 21-39, January.
    7. Li Cai, 2010. "Metropolis-Hastings Robbins-Monro Algorithm for Confirmatory Item Factor Analysis," Journal of Educational and Behavioral Statistics, , vol. 35(3), pages 307-335, June.
    8. Brent A. Coull & Alan Agresti, 2000. "Random Effects Modeling of Multiple Binomial Responses Using the Multivariate Binomial Logit-Normal Distribution," Biometrics, The International Biometric Society, vol. 56(1), pages 73-80, March.
    9. P. Richard Hahn & Carlos M. Carvalho & James G. Scott, 2012. "A sparse factor analytic probit model for congressional voting patterns," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 61(4), pages 619-635, August.
    10. Asim Ansari & Kamel Jedidi, 2000. "Bayesian factor analysis for multilevel binary observations," Psychometrika, Springer;The Psychometric Society, vol. 65(4), pages 475-496, December.
    11. Caughey, Devin & Warshaw, Christopher, 2015. "Dynamic Estimation of Latent Opinion Using a Hierarchical Group-Level IRT Model," Political Analysis, Cambridge University Press, vol. 23(2), pages 197-211, April.
    12. Anders Christoffersson, 1975. "Factor analysis of dichotomized variables," Psychometrika, Springer;The Psychometric Society, vol. 40(1), pages 5-32, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Landgraf, Andrew J. & Lee, Yoonkyung, 2020. "Dimensionality reduction for binary data through the projection of natural parameters," Journal of Multivariate Analysis, Elsevier, vol. 180(C).
    2. Veronika Ročková & Edward I. George, 2016. "Fast Bayesian Factor Analysis via Automatic Rotations to Sparsity," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1608-1622, October.
    3. Aman Agrawal & Alec M Chiu & Minh Le & Eran Halperin & Sriram Sankararaman, 2020. "Scalable probabilistic PCA for large-scale genetic variation data," PLOS Genetics, Public Library of Science, vol. 16(5), pages 1-19, May.
    4. Ji Seung Yang & Li Cai, 2014. "Estimation of Contextual Effects Through Nonlinear Multilevel Latent Variable Modeling With a Metropolis–Hastings Robbins–Monro Algorithm," Journal of Educational and Behavioral Statistics, , vol. 39(6), pages 550-582, December.
    5. Buddhavarapu, Prasad & Bansal, Prateek & Prozzi, Jorge A., 2021. "A new spatial count data model with time-varying parameters," Transportation Research Part B: Methodological, Elsevier, vol. 150(C), pages 566-586.
    6. Niko Hauzenberger & Florian Huber, 2020. "Model instability in predictive exchange rate regressions," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 39(2), pages 168-186, March.
    7. Wang, Zihan & Daeipour, Mohamad & Xu, Hongyi, 2023. "Quantification and propagation of Aleatoric uncertainties in topological structures," Reliability Engineering and System Safety, Elsevier, vol. 233(C).
    8. Rub'en Loaiza-Maya & Didier Nibbering, 2022. "Fast variational Bayes methods for multinomial probit models," Papers 2202.12495, arXiv.org, revised Oct 2022.
    9. Gyaneshwer Chaubey & Anurag Kadian & Saroj Bala & Vadlamudi Raghavendra Rao, 2015. "Genetic Affinity of the Bhil, Kol and Gond Mentioned in Epic Ramayana," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-11, June.
    10. Anindya Bhadra & Arvind Rao & Veerabhadran Baladandayuthapani, 2018. "Inferring network structure in non†normal and mixed discrete†continuous genomic data," Biometrics, The International Biometric Society, vol. 74(1), pages 185-195, March.
    11. Daniel Svensson & Matilda Rentoft & Anna M Dahlin & Emma Lundholm & Pall I Olason & Andreas Sjödin & Carin Nylander & Beatrice S Melin & Johan Trygg & Erik Johansson, 2020. "A whole-genome sequenced control population in northern Sweden reveals subregional genetic differences," PLOS ONE, Public Library of Science, vol. 15(9), pages 1-18, September.
    12. Haoying Wang & Guohui Wu, 2022. "Modeling discrete choices with large fine-scale spatial data: opportunities and challenges," Journal of Geographical Systems, Springer, vol. 24(3), pages 325-351, July.
    13. Alberto Maydeu-Olivares & Rosa Montaño, 2013. "How Should We Assess the Fit of Rasch-Type Models? Approximating the Power of Goodness-of-Fit Statistics in Categorical Data Analysis," Psychometrika, Springer;The Psychometric Society, vol. 78(1), pages 116-133, January.
    14. Matteo Barigozzi & Matteo Luciani, 2019. "Quasi Maximum Likelihood Estimation and Inference of Large Approximate Dynamic Factor Models via the EM algorithm," Papers 1910.03821, arXiv.org, revised Sep 2024.
    15. Sylvia Fruhwirth-Schnatter, 2023. "Generalized Cumulative Shrinkage Process Priors with Applications to Sparse Bayesian Factor Analysis," Papers 2303.00473, arXiv.org.
    16. Xin Xu & Yang Lu & Yupeng Zhou & Zhiguo Fu & Yanjie Fu & Minghao Yin, 2021. "An Information-Explainable Random Walk Based Unsupervised Network Representation Learning Framework on Node Classification Tasks," Mathematics, MDPI, vol. 9(15), pages 1-14, July.
    17. Wang, Fa, 2017. "Maximum likelihood estimation and inference for high dimensional nonlinear factor models with application to factor-augmented regressions," MPRA Paper 93484, University Library of Munich, Germany, revised 19 May 2019.
    18. Estavoyer, Maxime & François, Olivier, 2022. "Theoretical analysis of principal components in an umbrella model of intraspecific evolution," Theoretical Population Biology, Elsevier, vol. 148(C), pages 11-21.
    19. Dorota Toczydlowska & Gareth W. Peters & Man Chung Fung & Pavel V. Shevchenko, 2017. "Stochastic Period and Cohort Effect State-Space Mortality Models Incorporating Demographic Factors via Probabilistic Robust Principal Components," Risks, MDPI, vol. 5(3), pages 1-77, July.
    20. Felsenstein, Joseph, 2015. "Covariation of gene frequencies in a stepping-stone lattice of populations," Theoretical Population Biology, Elsevier, vol. 100(C), pages 88-97.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssc:v:71:y:2022:i:1:p:194-218. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.