IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v114y2017icp1-11.html
   My bibliography  Save this article

Simulating longer vectors of correlated binary random variables via multinomial sampling

Author

Listed:
  • Shults, Justine

Abstract

The ability to simulate correlated binary data is important for sample size calculation and comparison of methods for analyzing clustered and longitudinal data with dichotomous outcomes. One available approach for simulating vectors of length n of dichotomous random variables is to sample them from multinomial distribution of all possible length n permutations of zeros and ones. However, the multinomial sampling method has only been implemented in a general form (without making the initial restrictive assumptions) for vectors of length 2 and 3 because constructing multinomial distribution is very challenging for longer vectors. This difficulty can be overcome by presenting an algorithm for simulating correlated binary data via multinomial sampling that can be easily used for directly computing the multinomial distribution for any value of n. To demonstrate the approach, vectors of length 4 and 8 are simulated for assessing the power during the planning phase of a study and for evaluating the choice of working correlation structure in an analysis with generalized estimating equations.

Suggested Citation

  • Shults, Justine, 2017. "Simulating longer vectors of correlated binary random variables via multinomial sampling," Computational Statistics & Data Analysis, Elsevier, vol. 114(C), pages 1-11.
  • Handle: RePEc:eee:csdana:v:114:y:2017:i:c:p:1-11
    DOI: 10.1016/j.csda.2017.04.002
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947317300750
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2017.04.002?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. N. Rao Chaganty & Harry Joe, 2006. "Range of correlation matrices for dependent Bernoulli random variables," Biometrika, Biometrika Trust, vol. 93(1), pages 197-206, March.
    2. Matthew W. Guerra & Justine Shults, 2014. "A Note on the Simulation of Overdispersed Random Variables With Specified Marginal Means and Product Correlations," The American Statistician, Taylor & Francis Journals, vol. 68(2), pages 104-107, May.
    3. Patrick J. Farrell & Katrina Rogers‐Stewart, 2008. "Methods for Generating Longitudinally Correlated Binary Data," International Statistical Review, International Statistical Institute, vol. 76(1), pages 28-38, April.
    4. Bahjat F. Qaqish, 2003. "A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations," Biometrika, Biometrika Trust, vol. 90(2), pages 455-463, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jorge A. Sefair & Oscar Guaje & Andrés L. Medaglia, 2021. "A column-oriented optimization approach for the generation of correlated random vectors," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 43(3), pages 777-808, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Modarres, Reza, 2011. "High-dimensional generation of Bernoulli random vectors," Statistics & Probability Letters, Elsevier, vol. 81(8), pages 1136-1142, August.
    2. Sergei Leonov & Bahjat Qaqish, 2020. "Correlated endpoints: simulation, modeling, and extreme correlations," Statistical Papers, Springer, vol. 61(2), pages 741-766, April.
    3. Berman, Oded & Krass, Dmitry & Menezes, Mozart B.C., 2013. "Location and reliability problems on a line: Impact of objectives and correlated failures on optimal location patterns," Omega, Elsevier, vol. 41(4), pages 766-779.
    4. Matthew W. Guerra & Justine Shults, 2014. "A Note on the Simulation of Overdispersed Random Variables With Specified Marginal Means and Product Correlations," The American Statistician, Taylor & Francis Journals, vol. 68(2), pages 104-107, May.
    5. Fontana, Roberto & Semeraro, Patrizia, 2018. "Representation of multivariate Bernoulli distributions with a given set of specified moments," Journal of Multivariate Analysis, Elsevier, vol. 168(C), pages 290-303.
    6. Krause, Daniel & Scherer, Matthias & Schwinn, Jonas & Werner, Ralf, 2018. "Membership testing for Bernoulli and tail-dependence matrices," Journal of Multivariate Analysis, Elsevier, vol. 168(C), pages 240-260.
    7. Oman, Samuel D., 2009. "Easily simulated multivariate binary distributions with given positive and negative correlations," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 999-1005, February.
    8. Jorge A. Sefair & Oscar Guaje & Andrés L. Medaglia, 2021. "A column-oriented optimization approach for the generation of correlated random vectors," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 43(3), pages 777-808, September.
    9. Serge Darolles & Gaëlle Le Fol & Yang Lu & Ran Sun, 2018. "Bivariate integer-autoregressive process with an application to mutual fund flows," Post-Print hal-04590149, HAL.
    10. Tsung-Shan Tsou & Wan-Chen Chen, 2013. "Estimation of intra-cluster correlation coefficient via the failure of Bartlett’s second identity," Computational Statistics, Springer, vol. 28(4), pages 1681-1698, August.
    11. Moysiadis, Theodoros & Fokianos, Konstantinos, 2014. "On binary and categorical time series models with feedback," Journal of Multivariate Analysis, Elsevier, vol. 131(C), pages 209-228.
    12. B. C. Sutradhar, 2008. "On auto-regression type dynamic mixed models for binary panel data," Metron - International Journal of Statistics, Dipartimento di Statistica, Probabilità e Statistiche Applicate - University of Rome, vol. 0(2), pages 209-221.
    13. Farrell, Patrick J. & Sutradhar, Brajendra C., 2006. "A non-linear conditional probability model for generating correlated binary data," Statistics & Probability Letters, Elsevier, vol. 76(4), pages 353-361, February.
    14. Lennart Bondesson & Daniel Thorburn, 2008. "A List Sequential Sampling Method Suitable for Real‐Time Sampling," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 35(3), pages 466-483, September.
    15. Bruce J. Swihart & Brian S. Caffo & Ciprian M. Crainiceanu, 2014. "A Unifying Framework for Marginalised Random-Intercept Models of Correlated Binary Outcomes," International Statistical Review, International Statistical Institute, vol. 82(2), pages 275-295, August.
    16. Wang, Bin & Wang, Ruodu & Wang, Yuming, 2019. "Compatible matrices of Spearman’s rank correlation," Statistics & Probability Letters, Elsevier, vol. 151(C), pages 67-72.
    17. Kari R. Hart & Teng Fei & John J. Hanfelt, 2021. "Scalable and robust latent trajectory class analysis using artificial likelihood," Biometrics, The International Biometric Society, vol. 77(3), pages 1118-1128, September.
    18. Hammill, Bradley G. & Preisser, John S., 2006. "A SAS/IML software program for GEE and regression diagnostics," Computational Statistics & Data Analysis, Elsevier, vol. 51(2), pages 1197-1212, November.
    19. Molin Wang & John M. Williamson, 2005. "Generalization of the Mantel–Haenszel Estimating Function for Sparse Clustered Binary Data," Biometrics, The International Biometric Society, vol. 61(4), pages 973-981, December.
    20. Deng, Yihao & Sabo, Roy T. & Chaganty, N. Rao, 2012. "Multivariate probit analysis of binary familial data using stochastic representations," Computational Statistics & Data Analysis, Elsevier, vol. 56(3), pages 656-663.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:114:y:2017:i:c:p:1-11. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.