IDEAS home Printed from https://ideas.repec.org/p/ucm/doicae/1724.html
   My bibliography  Save this paper

Automatic regrouping of strata in the chi-square test

Author

Listed:
  • Juan Manuel Pérez-Salamero González

    (Department of Financial Economics and Actuarial Science University of Valencia. (Spain).)

  • Marta Regúlez-Castillo

    (Department of Applied Economics III University of the Basque Country (UPV/EHU) Bilbao (Spain).)

  • Manuel Ventura-Marco

    (Department of Financial Economics and Actuarial Science University of Valencia. (Spain).)

  • Carlos Vidal-Meliá

    (Department of Financial Economics and Actuarial Science, University of Valencia and Research Institute of Economic Analysis (ICAE), Complutense University of Madrid.)

Abstract

Pearson´s chi-square test is widely employed in social and health science to analyze categorical data and contingency tables and to assess sample representativeness. For the test to be valid the sample size must be big enough to provide a minimum number of expected elements per category. If the researcher chooses to regroup the strata in order to solve the failure on the minimum size requirement, the existence of automatic re-grouping procedures in statistical software would be very useful, especially when tests are applied sequentially. After comprehensively reviewing the software that can carry out this test, we find that, with a few exceptions, there is no automatic regrouping of the strata to meet this requirement, although it would be very useful if this were available. This paper develops some functions for regrouping strata automatically no matter where they are located, thus enabling the test to be performed within an iterative procedure. The functions are written in Excel VBA (Visual Basic for Applications) and in Mathematica, so it would not be hard to implement them in other languages. The utility of these functions is shown by using three different datasets. Finally, the iterative use of the functions is applied to the Continuous Sample of Working Lives, a dataset that has been used in a considerable number of studies, especially on labor economics and the Spanish public pension system.

Suggested Citation

  • Juan Manuel Pérez-Salamero González & Marta Regúlez-Castillo & Manuel Ventura-Marco & Carlos Vidal-Meliá, 2017. "Automatic regrouping of strata in the chi-square test," Documentos de Trabajo del ICAE 2017-24, Universidad Complutense de Madrid, Facultad de Ciencias Económicas y Empresariales, Instituto Complutense de Análisis Económico.
  • Handle: RePEc:ucm:doicae:1724
    as

    Download full text from publisher

    File URL: https://eprints.ucm.es/id/eprint/45317/1/1724.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. McCullough, B.D., 2008. "Special section on Microsoft Excel 2007," Computational Statistics & Data Analysis, Elsevier, vol. 52(10), pages 4568-4569, June.
    2. Anton Grafström & Lina Schelin, 2014. "How to Select Representative Samples," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 41(2), pages 277-290, June.
    3. Shalabh, 2006. "Exact Analysis of Discrete Data by K. F. Hirji," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 169(4), pages 1009-1009, October.
    4. David J. Bartholomew & Panagiota Tzamourani, 1999. "The Goodness of Fit of Latent Trait Models in Attitude Measurement," Sociological Methods & Research, , vol. 27(4), pages 525-546, May.
    5. Khan, Haseeb Ahmad, 2003. "A Visual Basic Software for Computing Fisher's Exact Probability," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 8(i21).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jan Klaschka & Jenő Reiczigel, 2021. "On matching confidence intervals and tests for some discrete distributions: methodological and computational aspects," Computational Statistics, Springer, vol. 36(3), pages 1775-1790, September.
    2. Xin Zhao & Anton Grafström, 2020. "A sample coordination method to monitor totals of environmental variables," Environmetrics, John Wiley & Sons, Ltd., vol. 31(6), September.
    3. Li Cai, 2010. "A Two-Tier Full-Information Item Factor Analysis Model with Applications," Psychometrika, Springer;The Psychometric Society, vol. 75(4), pages 581-612, December.
    4. Robertson, Blair & Price, Chris, 2024. "One point per cluster spatially balanced sampling," Computational Statistics & Data Analysis, Elsevier, vol. 191(C).
    5. Habiger, Joshua D. & McCann, Melinda H. & Tebbs, Joshua M., 2013. "On optimal confidence sets for parameters in discrete distributions," Statistics & Probability Letters, Elsevier, vol. 83(1), pages 297-303.
    6. Anastasios Evgenidis & Apostolos Fasianos, 2021. "Unconventional Monetary Policy and Wealth Inequalities in Great Britain," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 83(1), pages 115-175, February.
    7. Vicente Núñez-Antón & Juan Manuel Pérez-Salamero González & Marta Regúlez-Castillo & Carlos Vidal-Meliá, 2020. "Improving the Representativeness of a Simple Random Sample: An Optimization Model and Its Application to the Continuous Sample of Working Lives," Mathematics, MDPI, vol. 8(8), pages 1-27, July.
    8. Isabella Morlini, 2012. "A latent variables approach for clustering mixed binary and continuous variables within a Gaussian mixture model," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 6(1), pages 5-28, April.
    9. P. M. Kroonenberg & Albert Verbeek, 2018. "The Tale of Cochran's Rule: My Contingency Table has so Many Expected Values Smaller than 5, What Am I to Do?," The American Statistician, Taylor & Francis Journals, vol. 72(2), pages 175-183, April.
    10. Carolina Navarro & Luis Ayala & José Labeaga, 2010. "Housing deprivation and health status: evidence from Spain," Empirical Economics, Springer, vol. 38(3), pages 555-582, June.
    11. Yves Tillé, 2022. "Some Solutions Inspired by Survey Sampling Theory to Build Effective Clinical Trials," International Statistical Review, International Statistical Institute, vol. 90(3), pages 481-498, December.
    12. P. Elliott & K. Riggs, 2015. "Confidence regions for two proportions from independent negative binomial distributions," Journal of Applied Statistics, Taylor & Francis Journals, vol. 42(1), pages 27-36, January.
    13. Wilmer Prentius, 2024. "Locally correlated Poisson sampling," Environmetrics, John Wiley & Sons, Ltd., vol. 35(2), March.
    14. Silvia cagnone & Stefania Mignani, 2007. "Assessing the goodness of fit of a latent variable model for ordinal data," Metron - International Journal of Statistics, Dipartimento di Statistica, Probabilità e Statistiche Applicate - University of Rome, vol. 0(3), pages 337-361.
    15. Shing-On Leung, 2008. "A Three-Dimensional Latent Variable Model for Attitude Scales," Sociological Methods & Research, , vol. 37(1), pages 135-154, August.
    16. Hargreaves, Bruce R. & McWilliams, Thomas P., 2010. "Polynomial Trendline function flaws in Microsoft Excel," Computational Statistics & Data Analysis, Elsevier, vol. 54(4), pages 1190-1196, April.
    17. Krivosheya, Egor, 2020. "The role of financial innovations in consumer behavior in the Russian retail payments market," Technological Forecasting and Social Change, Elsevier, vol. 161(C).
    18. Jesús Henares-Montiel & Vivian Benítez-Hidalgo & Isabel Ruiz-Pérez & Guadalupe Pastor-Moreno & Miguel Rodríguez-Barranco, 2022. "Cyberbullying and Associated Factors in Member Countries of the European Union: A Systematic Review and Meta-Analysis of Studies with Representative Population Samples," IJERPH, MDPI, vol. 19(12), pages 1-13, June.
    19. Alberto Maydeu-Olivares & Rosa Montaño, 2013. "How Should We Assess the Fit of Rasch-Type Models? Approximating the Power of Goodness-of-Fit Statistics in Categorical Data Analysis," Psychometrika, Springer;The Psychometric Society, vol. 78(1), pages 116-133, January.
    20. Juan Manuel Pérez-Salamero González & Marta Regúlez-Castillo & Carlos Vidal-Meliá, 2017. "The continuous sample of working lives: improving its representativeness," SERIEs: Journal of the Spanish Economic Association, Springer;Spanish Economic Association, vol. 8(1), pages 43-95, March.

    More about this item

    Keywords

    Chi-square test; statistical software; VBA; Mathematica; Continuous Sample of Working Lives.;
    All these keywords.

    JEL classification:

    • C46 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - Specific Distributions
    • C88 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Other Computer Software
    • H55 - Public Economics - - National Government Expenditures and Related Policies - - - Social Security and Public Pensions

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ucm:doicae:1724. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Águeda González Abad (email available below). General contact details of provider: https://edirc.repec.org/data/feucmes.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.