IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v114y2017icp130-145.html
   My bibliography  Save this article

A family of block-wise one-factor distributions for modeling high-dimensional binary data

Author

Listed:
  • Marbac, Matthieu
  • Sedki, Mohammed

Abstract

A new family of one-factor distributions for modeling high-dimensional binary data is introduced. The model provides an explicit probability for each event, thus avoiding the numeric approximations often made by existing methods. Model interpretation is easy, because each variable is described by two continuous parameters (corresponding to the marginal probability and to the strength of dependency with the other variables) and by one binary parameter (defining if the dependencies are positive or negative). This model is extended by splitting the variables into independent blocks, where each block follows the new one-factor distribution. Finally, a parsimonious version of the model, forcing some equality constraints between the dependency parameters, is proposed. Parameter estimation is carried out by an inference margin procedure, where the second step is achieved by an expectation–maximization algorithm. Model selection is performed by a deterministic approach, which strongly reduces the number of competing models. This consistent approach uses a hierarchical ascendant classification of the variables which selects a narrow subset of models. This selection is based on the empirical version of Cramer’s V. The new model is evaluated on numerical experiments and on a real data set. The procedure is implemented in the R package MvBinary.

Suggested Citation

  • Marbac, Matthieu & Sedki, Mohammed, 2017. "A family of block-wise one-factor distributions for modeling high-dimensional binary data," Computational Statistics & Data Analysis, Elsevier, vol. 114(C), pages 130-145.
  • Handle: RePEc:eee:csdana:v:114:y:2017:i:c:p:130-145
    DOI: 10.1016/j.csda.2017.04.010
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947317300932
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2017.04.010?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Hoderlein, Stefan & Sherman, Robert, 2015. "Identification and estimation in a correlated random coefficients binary response model," Journal of Econometrics, Elsevier, vol. 188(1), pages 135-149.
    2. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    3. I. S. Weir & A. N. Pettitt, 2000. "Binary probability maps using a hidden conditional autoregressive Gaussian process with an application to Finnish common toad data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 49(4), pages 473-484.
    4. Sorensen, Andrea Lockhart, 2015. "Asymmetry, uncertainty, and limits in a binary choice experiment with positive spillovers," Journal of Economic Behavior & Organization, Elsevier, vol. 116(C), pages 43-55.
    5. Perez, M. Fabricio & Shkilko, Andriy & Sokolov, Konstantin, 2015. "Factor models for binary financial data," Journal of Banking & Finance, Elsevier, vol. 61(S2), pages 177-188.
    6. Anastasios Panagiotelis & Claudia Czado & Harry Joe, 2012. "Pair Copula Constructions for Multivariate Discrete Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(499), pages 1063-1072, September.
    7. Joe, Harry, 2005. "Asymptotic efficiency of the two-stage estimation method for copula-based models," Journal of Multivariate Analysis, Elsevier, vol. 94(2), pages 401-419, June.
    8. Krupskii, Pavel & Joe, Harry, 2015. "Structured factor copula models: Theory, inference and computation," Journal of Multivariate Analysis, Elsevier, vol. 138(C), pages 53-73.
    9. Ding, Wei & Song, Peter X.-K., 2016. "EM algorithm in Gaussian copula with missing data," Computational Statistics & Data Analysis, Elsevier, vol. 101(C), pages 1-11.
    10. Peter Xue‐Kun Song, 2000. "Multivariate Dispersion Models Generated From Gaussian Copula," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 27(2), pages 305-320, June.
    11. Nikoloulopoulos, Aristidis K. & Joe, Harry & Li, Haijun, 2012. "Vine copulas with asymmetric tail dependence and applications to financial return data," Computational Statistics & Data Analysis, Elsevier, vol. 56(11), pages 3659-3673.
    12. Genest, Christian & Nešlehová, Johanna, 2007. "A Primer on Copulas for Count Data," ASTIN Bulletin, Cambridge University Press, vol. 37(2), pages 475-515, November.
    13. Hernández-Lobato, José Miguel & Suárez, Alberto, 2011. "Semiparametric bivariate Archimedean copulas," Computational Statistics & Data Analysis, Elsevier, vol. 55(6), pages 2038-2058, June.
    14. Zilko, Aurelius A. & Kurowicka, Dorota, 2016. "Copula in a multivariate mixed discrete–continuous model," Computational Statistics & Data Analysis, Elsevier, vol. 103(C), pages 28-55.
    15. Gao, Xin & Song, Peter X.-K., 2010. "Composite Likelihood Bayesian Information Criteria for Model Selection in High-Dimensional Data," Journal of the American Statistical Association, American Statistical Association, vol. 105(492), pages 1531-1540.
    16. Richard Bellman, 1957. "On a Dynamic Programming Approach to the Caterer Problem--I," Management Science, INFORMS, vol. 3(3), pages 270-278, April.
    17. Gilles Celeux & Gérard Govaert, 1991. "Clustering criteria for discrete data and latent class models," Journal of Classification, Springer;The Classification Society, vol. 8(2), pages 157-176, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Aristidis Nikoloulopoulos & Dimitris Karlis, 2010. "Regression in a copula model for bivariate count data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 37(9), pages 1555-1568.
    2. Sayed H. Kadhem & Aristidis K. Nikoloulopoulos, 2023. "Factor Tree Copula Models for Item Response Data," Psychometrika, Springer;The Psychometric Society, vol. 88(3), pages 776-802, September.
    3. Edward W. Frees & Gee Lee & Lu Yang, 2016. "Multivariate Frequency-Severity Regression Models in Insurance," Risks, MDPI, vol. 4(1), pages 1-36, February.
    4. Smith, Michael Stanley, 2023. "Implicit Copulas: An Overview," Econometrics and Statistics, Elsevier, vol. 28(C), pages 81-104.
    5. Krupskii, Pavel & Genton, Marc G., 2019. "A copula model for non-Gaussian multivariate spatial data," Journal of Multivariate Analysis, Elsevier, vol. 169(C), pages 264-277.
    6. Michael Stanley Smith, 2021. "Implicit Copulas: An Overview," Papers 2109.04718, arXiv.org.
    7. Sayed H. Kadhem & Aristidis K. Nikoloulopoulos, 2023. "Bi-factor and Second-Order Copula Models for Item Response Data," Psychometrika, Springer;The Psychometric Society, vol. 88(1), pages 132-157, March.
    8. Lu Yang & Claudia Czado, 2022. "Two‐part D‐vine copula models for longitudinal insurance claim data," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(4), pages 1534-1561, December.
    9. Renato Cordeiro Amorim, 2016. "A Survey on Feature Weighting Based K-Means Algorithms," Journal of Classification, Springer;The Classification Society, vol. 33(2), pages 210-242, July.
    10. Aristidis Nikoloulopoulos & Harry Joe, 2015. "Factor Copula Models for Item Response Data," Psychometrika, Springer;The Psychometric Society, vol. 80(1), pages 126-150, March.
    11. Geenens Gery, 2020. "Copula modeling for discrete random vectors," Dependence Modeling, De Gruyter, vol. 8(1), pages 417-440, January.
    12. Smith, Michael Stanley & Shively, Thomas S., 2018. "Econometric modeling of regional electricity spot prices in the Australian market," Energy Economics, Elsevier, vol. 74(C), pages 886-903.
    13. Fokianos, Konstantinos & Fried, Roland & Kharin, Yuriy & Voloshko, Valeriy, 2022. "Statistical analysis of multivariate discrete-valued time series," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    14. Aristidis K. Nikoloulopoulos & Peter G. Moffatt, 2019. "Coupling Couples With Copulas: Analysis Of Assortative Matching On Risk Attitude," Economic Inquiry, Western Economic Association International, vol. 57(1), pages 654-666, January.
    15. Wang, Fan & Li, Heng & Dong, Chao, 2021. "Understanding near-miss count data on construction sites using greedy D-vine copula marginal regression," Reliability Engineering and System Safety, Elsevier, vol. 213(C).
    16. Derek S. Young & Xi Chen & Dilrukshi C. Hewage & Ricardo Nilo-Poyanco, 2019. "Finite mixture-of-gamma distributions: estimation, inference, and model-based clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(4), pages 1053-1082, December.
    17. Smith, Michael Stanley & Maneesoonthorn, Worapree, 2018. "Inversion copulas from nonlinear state space models with an application to inflation forecasting," International Journal of Forecasting, Elsevier, vol. 34(3), pages 389-407.
    18. Ko, Vinnie & Hjort, Nils Lid, 2019. "Model robust inference with two-stage maximum likelihood estimation for copulas," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 362-381.
    19. Panagiotelis, Anastasios & Czado, Claudia & Joe, Harry & Stöber, Jakob, 2017. "Model selection for discrete regular vine copulas," Computational Statistics & Data Analysis, Elsevier, vol. 106(C), pages 138-152.
    20. Barigozzi, Matteo & Brownlees, Christian & Gallo, Giampiero M. & Veredas, David, 2014. "Disentangling systematic and idiosyncratic dynamics in panels of volatility measures," Journal of Econometrics, Elsevier, vol. 182(2), pages 364-384.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:114:y:2017:i:c:p:130-145. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.