IDEAS home Printed from https://ideas.repec.org/p/ehl/lserod/112499.html
   My bibliography  Save this paper

Detection of two-way outliers in multivariate data and application to cheating detection in educational tests

Author

Listed:
  • Chen, Yunxiao
  • Lu, Yan
  • Moustaki, Irini

Abstract

The paper proposes a new latent variable model for the simultaneous (two-way) detection of outlying individuals and items for item-response-type data. The proposed model is a synergy between a factor model for binary responses and continuous response times that captures normal item response behaviour and a latent class model that captures the outlying individuals and items. A statistical decision framework is developed under the proposed model that provides compound decision rules for controlling local false discovery/nondiscovery rates of outlier detection. Statistical inference is carried out under a Bayesian framework, for which a Markov chain Monte Carlo algorithm is developed. The proposed method is applied to the detection of cheating in educational tests due to item leakage using a case study of a computer-based nonadaptive licensure assessment. The performance of the proposed method is evaluated by simulation studies.

Suggested Citation

  • Chen, Yunxiao & Lu, Yan & Moustaki, Irini, 2022. "Detection of two-way outliers in multivariate data and application to cheating detection in educational tests," LSE Research Online Documents on Economics 112499, London School of Economics and Political Science, LSE Library.
  • Handle: RePEc:ehl:lserod:112499
    as

    Download full text from publisher

    File URL: http://eprints.lse.ac.uk/112499/
    File Function: Open access version.
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Sun, Wenguang & Cai, T. Tony, 2007. "Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 901-912, September.
    2. J. Ramsay & S. Winsberg, 1991. "Maximum marginal likelihood estimation for semiparametric item analysis," Psychometrika, Springer;The Psychometric Society, vol. 56(3), pages 365-379, September.
    3. Zhan Shu & Robert Henson & Richard Luecht, 2013. "Using Deterministic, Gated Item Response Theory Model to Detect Test Cheating due to Item Compromise," Psychometrika, Springer;The Psychometric Society, vol. 78(3), pages 481-497, July.
    4. Jeff Douglas, 1997. "Joint consistency of nonparametric item characteristic curve and ability estimation," Psychometrika, Springer;The Psychometric Society, vol. 62(1), pages 7-28, March.
    5. Efron B. & Tibshirani R. & Storey J.D. & Tusher V., 2001. "Empirical Bayes Analysis of a Microarray Experiment," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1151-1160, December.
    6. Moustaki, Irini & Victoria-Feser, Maria-Pia, 2006. "Bounded-Influence Robust Estimation in Generalized Linear Latent Variable Models," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 644-653, June.
    7. Jouni Kuha & Myrsini Katsikatsou & Irini Moustaki, 2018. "Latent variable modelling with non‐ignorable item non‐response: multigroup response propensity models for cross‐national analysis," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(4), pages 1169-1192, October.
    8. Poole, Keith T. & Rosenthal, Howard & Koford, Kenneth, 1991. "On Dimensionalizing Roll Call Votes in the U.S. Congress," American Political Science Review, Cambridge University Press, vol. 85(3), pages 955-976, September.
    9. Chun Wang & Gongjun Xu & Zhuoran Shang, 2018. "A Two-Stage Approach to Differentiating Normal and Aberrant Behavior in Computer Based Testing," Psychometrika, Springer;The Psychometric Society, vol. 83(1), pages 223-254, March.
    10. Wim van der Linden, 2007. "A Hierarchical Framework for Modeling Speed and Accuracy on Test Items," Psychometrika, Springer;The Psychometric Society, vol. 72(3), pages 287-308, September.
    11. Xi Wang & Yang Liu, 2020. "Detecting Compromised Items Using Information From Secure Items," Journal of Educational and Behavioral Statistics, , vol. 45(6), pages 667-689, December.
    12. Sylvia. Richardson & Peter J. Green, 1997. "On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion)," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 59(4), pages 731-792.
    13. David J. Spiegelhalter & Nicola G. Best & Bradley P. Carlin & Angelika Van Der Linde, 2002. "Bayesian measures of model complexity and fit," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(4), pages 583-639, October.
    14. Yuri Goegebeur & Paul Boeck & James Wollack & Allan Cohen, 2008. "A Speeded Item Response Model with Gradual Process Change," Psychometrika, Springer;The Psychometric Society, vol. 73(1), pages 65-87, March.
    15. Mark Reiser, 1996. "Analysis of residuals for the multionmial item response model," Psychometrika, Springer;The Psychometric Society, vol. 61(3), pages 509-528, September.
    16. C. O'Muircheartaigh & I. Moustaki, 1999. "Symmetric pattern models: a latent variable approach to item non‐response in attitude scales," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 162(2), pages 177-194.
    17. Bafumi, Joseph & Gelman, Andrew & Park, David K. & Kaplan, Noah, 2005. "Practical Issues in Implementing and Understanding Bayesian Ideal Point Estimation," Political Analysis, Cambridge University Press, vol. 13(2), pages 171-187, April.
    18. Pison, Greet & Rousseeuw, Peter J. & Filzmoser, Peter & Croux, Christophe, 2003. "Robust factor analysis," Journal of Multivariate Analysis, Elsevier, vol. 84(1), pages 145-172, January.
    19. Efron, Bradley, 2004. "Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 96-104, January.
    20. David J. Spiegelhalter & Nicola G. Best & Bradley P. Carlin & Angelika Linde, 2014. "The deviance information criterion: 12 years on," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(3), pages 485-493, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Wallin, Gabriel & Chen, Yunxiao & Moustaki, Irini, 2024. "DIF analysis with unknown groups and anchor items," LSE Research Online Documents on Economics 121991, London School of Economics and Political Science, LSE Library.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yu-Wei Chang & Rung-Ching Tsai & Nan-Jung Hsu, 2014. "A Speeded Item Response Model: Leave the Harder till Later," Psychometrika, Springer;The Psychometric Society, vol. 79(2), pages 255-274, April.
    2. Hyeon-Ah Kang, 2023. "Sequential Generalized Likelihood Ratio Tests for Online Item Monitoring," Psychometrika, Springer;The Psychometric Society, vol. 88(2), pages 672-696, June.
    3. Chun Wang & Gongjun Xu & Zhuoran Shang, 2018. "A Two-Stage Approach to Differentiating Normal and Aberrant Behavior in Computer Based Testing," Psychometrika, Springer;The Psychometric Society, vol. 83(1), pages 223-254, March.
    4. T. Tony Cai & Wenguang Sun & Weinan Wang, 2019. "Covariate‐assisted ranking and screening for large‐scale two‐sample inference," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 81(2), pages 187-234, April.
    5. Inhan Kang & Dylan Molenaar & Roger Ratcliff, 2023. "A Modeling Framework to Examine Psychological Processes Underlying Ordinal Responses and Response Times of Psychometric Data," Psychometrika, Springer;The Psychometric Society, vol. 88(3), pages 940-974, September.
    6. Joshua Habiger & Edsel Peña, 2011. "Randomised -values and nonparametric procedures in multiple testing," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 23(3), pages 583-604.
    7. Pallavi Basu & Luella Fu & Alessio Saretto & Wenguang Sun, 2021. "Empirical Bayes Control of the False Discovery Exceedance," Working Papers 2115, Federal Reserve Bank of Dallas.
    8. Habiger, Joshua D. & Peña, Edsel A., 2014. "Compound p-value statistics for multiple testing procedures," Journal of Multivariate Analysis, Elsevier, vol. 126(C), pages 153-166.
    9. Cai, Jing-Heng & Song, Xin-Yuan & Lam, Kwok-Hap & Ip, Edward Hak-Sing, 2011. "A mixture of generalized latent variable models for mixed mode and heterogeneous data," Computational Statistics & Data Analysis, Elsevier, vol. 55(11), pages 2889-2907, November.
    10. Steffi Pohl & Esther Ulitzsch & Matthias Davier, 2019. "Using Response Times to Model Not-Reached Items due to Time Limits," Psychometrika, Springer;The Psychometric Society, vol. 84(3), pages 892-920, September.
    11. David I. Ohlssen & Linda D. Sharples & David J. Spiegelhalter, 2007. "A hierarchical modelling framework for identifying unusual performance in health care providers," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 170(4), pages 865-890, October.
    12. Roy Costilla & Ivy Liu & Richard Arnold & Daniel Fernández, 2019. "Bayesian model-based clustering for longitudinal ordinal data," Computational Statistics, Springer, vol. 34(3), pages 1015-1038, September.
    13. Pounds Stanley B. & Gao Cuilan L. & Zhang Hui, 2012. "Empirical Bayesian Selection of Hypothesis Testing Procedures for Analysis of Sequence Count Expression Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(5), pages 1-32, October.
    14. Kai Yang & Qingqing Zhang & Xinyang Yu & Xiaogang Dong, 2023. "Bayesian inference for a mixture double autoregressive model," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 77(2), pages 188-207, May.
    15. Daniel Yekutieli, 2015. "Bayesian tests for composite alternative hypotheses in cross-tabulated data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 24(2), pages 287-301, June.
    16. Ghosh Debashis, 2012. "Incorporating the Empirical Null Hypothesis into the Benjamini-Hochberg Procedure," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(4), pages 1-21, July.
    17. Papastamoulis, Panagiotis, 2018. "Overfitting Bayesian mixtures of factor analyzers with an unknown number of components," Computational Statistics & Data Analysis, Elsevier, vol. 124(C), pages 220-234.
    18. He, Yi & Pan, Wei & Lin, Jizhen, 2006. "Cluster analysis using multivariate normal mixture models to detect differential gene expression with microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 51(2), pages 641-658, November.
    19. Yuan Fang & Dimitris Karlis & Sanjeena Subedi, 2022. "Infinite Mixtures of Multivariate Normal-Inverse Gaussian Distributions for Clustering of Skewed Data," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 510-552, November.
    20. Kazuhiko Kakamu, 2022. "Bayesian analysis of mixtures of lognormal distribution with an unknown number of components from grouped data," Papers 2210.05115, arXiv.org, revised Sep 2023.

    More about this item

    Keywords

    Bayesian hierarchical model; outlier detection; false discovery rate; compound decision; test fairness; item response theory; latent class analysis;
    All these keywords.

    JEL classification:

    • C1 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ehl:lserod:112499. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: LSERO Manager (email available below). General contact details of provider: https://edirc.repec.org/data/lsepsuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.