IDEAS home Printed from https://ideas.repec.org/a/sae/jedbes/v48y2023i3p349-383.html
   My bibliography  Save this article

Assessing Inter-rater Reliability With Heterogeneous Variance Components Models: Flexible Approach Accounting for Contextual Variables

Author

Listed:
  • Patrícia Martinková

    (Institute of Computer Science of the Czech Academy of Sciences, Charles University)

  • FrantiÅ¡ek BartoÅ¡

    (Institute of Computer Science of the Czech Academy of Sciences, University of Amsterdam)

  • Marek Brabec

    (Institute of Computer Science of the Czech Academy of Sciences)

Abstract

Inter-rater reliability (IRR), which is a prerequisite of high-quality ratings and assessments, may be affected by contextual variables, such as the rater’s or ratee’s gender, major, or experience. Identification of such heterogeneity sources in IRR is important for the implementation of policies with the potential to decrease measurement error and to increase IRR by focusing on the most relevant subgroups. In this study, we propose a flexible approach for assessing IRR in cases of heterogeneity due to covariates by directly modeling differences in variance components. We use Bayes factors (BFs) to select the best performing model, and we suggest using Bayesian model averaging as an alternative approach for obtaining IRR and variance component estimates, allowing us to account for model uncertainty. We use inclusion BFs considering the whole model space to provide evidence for or against differences in variance components due to covariates. The proposed method is compared with other Bayesian and frequentist approaches in a simulation study, and we demonstrate its superiority in some situations. Finally, we provide real data examples from grant proposal peer review, demonstrating the usefulness of this method and its flexibility in the generalization of more complex designs.

Suggested Citation

  • Patrícia Martinková & FrantiÅ¡ek BartoÅ¡ & Marek Brabec, 2023. "Assessing Inter-rater Reliability With Heterogeneous Variance Components Models: Flexible Approach Accounting for Contextual Variables," Journal of Educational and Behavioral Statistics, , vol. 48(3), pages 349-383, June.
  • Handle: RePEc:sae:jedbes:v:48:y:2023:i:3:p:349-383
    DOI: 10.3102/10769986221150517
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.3102/10769986221150517
    Download Restriction: no

    File URL: https://libkey.io/10.3102/10769986221150517?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Tiago M. Fragoso & Wesley Bertoli & Francisco Louzada, 2018. "Bayesian Model Averaging: A Systematic Review and Conceptual Classification," International Statistical Review, International Statistical Institute, vol. 86(1), pages 1-28, April.
    2. Hjort N.L. & Claeskens G., 2003. "Frequentist Model Average Estimators," Journal of the American Statistical Association, American Statistical Association, vol. 98, pages 879-899, January.
    3. Goldhaber, Dan & Grout, Cyrus & Wolff, Malcolm & Martinková, Patrícia, 2021. "Evidence on the Dimensionality and Reliability of Professional References’ Ratings of Teacher Applicants," Economics of Education Review, Elsevier, vol. 83(C).
    4. Patrícia Martinková & Dan Goldhaber & Elena Erosheva, 2018. "Disparities in ratings of internal and external applicants: A case for model-based inter-rater reliability," PLOS ONE, Public Library of Science, vol. 13(10), pages 1-17, October.
    5. Rüdiger Mutz & Lutz Bornmann & Hans-Dieter Daniel, 2012. "Heterogeneity of Inter-Rater Reliabilities of Grant Peer Reviews and Its Determinants: A General Estimating Equations Approach," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-10, October.
    6. Jeffrey N. Rouder & Richard D. Morey, 2019. "Teaching Bayes’ Theorem: Strength of Evidence as Predictive Accuracy," The American Statistician, Taylor & Francis Journals, vol. 73(2), pages 186-190, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Laha, A. K. & Putatunda, Sayan, 2017. "Travel Time Prediction for Taxi-GPS Data Streams," IIMA Working Papers WP 2017-03-03, Indian Institute of Management Ahmedabad, Research and Publication Department.
    2. Roland Brown & Yingling Fan & Kirti Das & Julian Wolfson, 2021. "Iterated multisource exchangeability models for individualized inference with an application to mobile sensor data," Biometrics, The International Biometric Society, vol. 77(2), pages 401-412, June.
    3. Wan, Alan T.K. & Zhang, Xinyu & Zou, Guohua, 2010. "Least squares model averaging by Mallows criterion," Journal of Econometrics, Elsevier, vol. 156(2), pages 277-283, June.
    4. Wright, Jonathan H., 2008. "Bayesian Model Averaging and exchange rate forecasts," Journal of Econometrics, Elsevier, vol. 146(2), pages 329-341, October.
    5. Hyemin Han, 2024. "Bayesian Model Averaging and Regularized Regression as Methods for Data-Driven Model Exploration, with Practical Considerations," Stats, MDPI, vol. 7(3), pages 1-13, July.
    6. Emanuel Kopp, 2018. "Determinants of U.S. Business Investment," IMF Working Papers 2018/139, International Monetary Fund.
    7. Tumala, Mohammed M & Olubusoye, Olusanya E & Yaaba, Baba N & Yaya, OlaOluwa S & Akanbi, Olawale B, 2017. "Forecasting Nigerian Inflation using Model Averaging methods: Modelling Frameworks to Central Banks," MPRA Paper 88754, University Library of Munich, Germany, revised Feb 2018.
    8. Shaobo Jin, 2022. "Frequentist Model Averaging in Structure Equation Model With Ordinal Data," Psychometrika, Springer;The Psychometric Society, vol. 87(3), pages 1130-1145, September.
    9. Minsu Chang & Francis J. DiTraglia, 2020. "A Generalized Focused Information Criterion for GMM," Papers 2011.07085, arXiv.org.
    10. Liao, Jun & Zou, Guohua, 2020. "Corrected Mallows criterion for model averaging," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    11. Mark F. J. Steel, 2020. "Model Averaging and Its Use in Economics," Journal of Economic Literature, American Economic Association, vol. 58(3), pages 644-719, September.
    12. Antonelli Joseph & Cefalu Matthew, 2020. "Averaging causal estimators in high dimensions," Journal of Causal Inference, De Gruyter, vol. 8(1), pages 92-107, January.
    13. Goldhaber, Dan & Grout, Cyrus & Wolff, Malcolm & Martinková, Patrícia, 2021. "Evidence on the Dimensionality and Reliability of Professional References’ Ratings of Teacher Applicants," Economics of Education Review, Elsevier, vol. 83(C).
    14. Anwen Yin, 2024. "Predictive model averaging with parameter instability and heteroskedasticity," Bulletin of Economic Research, Wiley Blackwell, vol. 76(2), pages 418-442, April.
    15. Leeb, Hannes & Pötscher, Benedikt M., 2008. "Can One Estimate The Unconditional Distribution Of Post-Model-Selection Estimators?," Econometric Theory, Cambridge University Press, vol. 24(2), pages 338-376, April.
    16. Phillip Heiler & Jana Mareckova, 2019. "Shrinkage for Categorical Regressors," Papers 1901.01898, arXiv.org.
    17. John Copas & Shinto Eguchi, 2020. "Strong model dependence in statistical analysis: goodness of fit is not enough for model choice," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 72(2), pages 329-352, April.
    18. Shangwei Zhao & Aman Ullah & Xinyu Zhang, 2018. "A Class of Model Averaging Estimators," Working Paper series 18-11, Rimini Centre for Economic Analysis.
    19. David Kaplan & Chansoon Lee, 2018. "Optimizing Prediction Using Bayesian Model Averaging: Examples Using Large-Scale Educational Assessments," Evaluation Review, , vol. 42(4), pages 423-457, August.
    20. Yuting Wei & Qihua Wang & Wei Liu, 2021. "Model averaging for linear models with responses missing at random," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 73(3), pages 535-553, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:jedbes:v:48:y:2023:i:3:p:349-383. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.