IDEAS home Printed from https://ideas.repec.org/a/spr/qualqt/v58y2024i1d10.1007_s11135-023-01639-2.html
   My bibliography  Save this article

A Bayesian index of association: comparison with other measures and performance

Author

Listed:
  • Anton Oleinik

    (Memorial University of Newfoundland and Memorial)

Abstract

The article discusses a Bayesian measure of association, B-index, and compares it with the other existing measures of agreement, association, and similarity, both chance-corrected and non-corrected: Scott’s π, Krippendorff’s α, Cohen’s κ, Bennett, Alpert & Goldstein’s S, Cosine similarity, and the Jaccard similarity coefficient. PageRank adapted to particularities of annotation is also added to this list. Two versions of B-index are considered: with the informative and non-informative priors. An algorithm for calculating B-index written in pseudocode is provided. Particular attention is devoted to the uses of those measures in content analysis, communication studies, computational linguistics, psychology, computer science and network science. Real-world data gathered using an online platform for content analysis allowed comparing the behavior of all eight measures included in the scope of analysis. Three short texts (164 data points/sentences in total) were coded by 66 annotators. The behaviors of B-index with the non-informative prior and Bennett, Alpert & Goldstein’s S have some common patterns.

Suggested Citation

  • Anton Oleinik, 2024. "A Bayesian index of association: comparison with other measures and performance," Quality & Quantity: International Journal of Methodology, Springer, vol. 58(1), pages 277-305, February.
  • Handle: RePEc:spr:qualqt:v:58:y:2024:i:1:d:10.1007_s11135-023-01639-2
    DOI: 10.1007/s11135-023-01639-2
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11135-023-01639-2
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11135-023-01639-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Mike Thelwall & Kayvan Kousha, 2017. "Goodreads: A social network site for book readers," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 68(4), pages 972-983, April.
    2. Rudy Ligtvoet, 2017. "Exact One-Sided Bayes Factors for 2 by 2 Contingency Tables," Journal of Classification, Springer;The Classification Society, vol. 34(3), pages 465-472, October.
    3. Benoit, Kenneth & Conway, Drew & Lauderdale, Benjamin E. & Laver, Michael & Mikhaylov, Slava, 2016. "Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data," American Political Science Review, Cambridge University Press, vol. 110(2), pages 278-295, May.
    4. Wim Linden & Charles Lewis, 2015. "Bayesian Checks on Cheating on Tests," Psychometrika, Springer;The Psychometric Society, vol. 80(3), pages 689-706, September.
    5. Teague R. Henry & David Banks & Derek Owens-Oas & Christine Chai, 2019. "Modeling Community Structure and Topics in Dynamic Text Networks," Journal of Classification, Springer;The Classification Society, vol. 36(2), pages 322-349, July.
    6. Anton Oleinik, 2011. "Mixing quantitative and qualitative content analysis: triangulation at work," Quality & Quantity: International Journal of Methodology, Springer, vol. 45(4), pages 859-873, June.
    7. Tri Le & Bertrand Clarke, 2018. "On the Interpretation of Ensemble Classifiers in Terms of Bayes Classifiers," Journal of Classification, Springer;The Classification Society, vol. 35(2), pages 198-229, July.
    8. Genane Youness & Gilbert Saporta, 2010. "Comparing partitions of two sets of units based on the same variables," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 4(1), pages 53-64, April.
    9. Anton Oleinik & Irina Popova & Svetlana Kirdina & Tatyana Shatalova, 2014. "On the choice of measures of reliability and validity in the content-analysis of texts," Quality & Quantity: International Journal of Methodology, Springer, vol. 48(5), pages 2703-2718, September.
    10. Bruce Cooil & Roland Rust, 1995. "General estimators for the reliability of qualitative data," Psychometrika, Springer;The Psychometric Society, vol. 60(2), pages 199-220, June.
    11. Anton Oleinik, 2022. "Relevance in Web search: between content, authority and popularity," Quality & Quantity: International Journal of Methodology, Springer, vol. 56(1), pages 173-194, February.
    12. Bruce Cooil & Roland Rust, 1994. "Reliability and expected loss: A unifying principle," Psychometrika, Springer;The Psychometric Society, vol. 59(2), pages 203-216, June.
    13. Michael Scharkow, 2013. "Thematic content analysis using supervised machine learning: An empirical evaluation using German online news," Quality & Quantity: International Journal of Methodology, Springer, vol. 47(2), pages 761-773, February.
    14. Ícaro Cavalcante Dourado & Renata Galante & Marcos André Gonçalves & Ricardo da Silva Torres, 2019. "Bag of textual graphs (BoTG): A general graph‐based text representation model," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 70(8), pages 817-829, August.
    15. Matthijs Warrens, 2008. "On Similarity Coefficients for 2×2 Tables and Correction for Chance," Psychometrika, Springer;The Psychometric Society, vol. 73(3), pages 487-502, September.
    16. Grimmer, Justin & Stewart, Brandon M., 2013. "Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts," Political Analysis, Cambridge University Press, vol. 21(3), pages 267-297, July.
    17. Michael Evans & Wayne McIntosh & Jimmy Lin & Cynthia Cates, 2007. "Recounting the Courts? Applying Automated Content Analysis to Enhance Empirical Legal Research," Journal of Empirical Legal Studies, John Wiley & Sons, vol. 4(4), pages 1007-1039, December.
    18. Jacques Savoy, 2016. "Text representation strategies: An example with the State of the union addresses," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(8), pages 1858-1870, August.
    19. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    20. Sanjib Basu & Mousumi Banerjee & Ananda Sen, 2000. "Bayesian Inference for Kappa from Single and Multiple Studies," Biometrics, The International Biometric Society, vol. 56(2), pages 577-582, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Anton Oleinik, 2022. "Relevance in Web search: between content, authority and popularity," Quality & Quantity: International Journal of Methodology, Springer, vol. 56(1), pages 173-194, February.
    2. Kostovicova Denisa & Kerr Rachel & Sokolić Ivor & Fairey Tiffany & Redwood Henry & Subotić Jelena, 2022. "The “Digital Turn” in Transitional Justice Research: Evaluating Image and Text as Data in the Western Balkans," Comparative Southeast European Studies, De Gruyter, vol. 70(1), pages 24-46, March.
    3. José E. Chacón, 2021. "Explicit Agreement Extremes for a 2 × 2 Table with Given Marginals," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 257-263, July.
    4. Stefano Tonellato & Andrea Pastore, 2013. "On the comparison of model-based clustering solutions," Working Papers 2013:05, Department of Economics, University of Venice "Ca' Foscari".
    5. Martin Haselmayer & Marcelo Jenny, 2017. "Sentiment analysis of political communication: combining a dictionary approach with crowdcoding," Quality & Quantity: International Journal of Methodology, Springer, vol. 51(6), pages 2623-2646, November.
    6. Eyal Eckhaus & Zachary Sheaffer, 2018. "Managerial hubris detection: the case of Enron," Risk Management, Palgrave Macmillan, vol. 20(4), pages 304-325, November.
    7. Anton Oleinik, 2015. "On content analysis of images of mass protests: a case of data triangulation," Quality & Quantity: International Journal of Methodology, Springer, vol. 49(5), pages 2203-2220, September.
    8. Keren Weinshall & Lee Epstein, 2020. "Developing High‐Quality Data Infrastructure for Legal Analytics: Introducing the Israeli Supreme Court Database," Journal of Empirical Legal Studies, John Wiley & Sons, vol. 17(2), pages 416-434, June.
    9. Elio Amicarelli & Jessica Di Salvatore, 2021. "Introducing the PeaceKeeping Operations Corpus (PKOC)," Journal of Peace Research, Peace Research Institute Oslo, vol. 58(5), pages 1137-1148, September.
    10. Isabella Morlini & Sergio Zani, 2012. "Dissimilarity and similarity measures for comparing dendrograms and their applications," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 6(2), pages 85-105, July.
    11. Matthijs J. Warrens & Hanneke Hoef, 2022. "Understanding the Adjusted Rand Index and Other Partition Comparison Indices Based on Counting Object Pairs," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 487-509, November.
    12. Yu Lim Lee & Minji Jung & Robert Jeyakumar Nathan & Jae-Eun Chung, 2020. "Cross-National Study on the Perception of the Korean Wave and Cultural Hybridity in Indonesia and Malaysia Using Discourse on Social Media," Sustainability, MDPI, vol. 12(15), pages 1-33, July.
    13. Richard Hanania, 2021. "The Humanitarian Turn at the UNSC: Explaining the development of international norms through machine learning algorithms," Journal of Peace Research, Peace Research Institute Oslo, vol. 58(4), pages 655-670, July.
    14. Nazila Zarghi, 2021. "Evidence-Based Social Sciences: A New Emerging Field," European Journal of Social Sciences Education and Research Articles, Revistia Research and Publishing, vol. 8, January -.
    15. Yunpeng Zhao & Qing Pan & Chengan Du, 2019. "Logistic regression augmented community detection for network data with application in identifying autism‐related gene pathways," Biometrics, The International Biometric Society, vol. 75(1), pages 222-234, March.
    16. Bernhardt, Lea & Dewenter, Ralf & Thomas, Tobias, 2023. "Measuring partisan media bias in US newscasts from 2001 to 2012," European Journal of Political Economy, Elsevier, vol. 78(C).
    17. Ntentas, Raphael, 2021. "Quantifying political populism and examining the link with economic insecurity: evidence from Greece," LSE Research Online Documents on Economics 112579, London School of Economics and Political Science, LSE Library.
    18. Wu, Han-Ming & Tien, Yin-Jing & Chen, Chun-houh, 2010. "GAP: A graphical environment for matrix visualization and cluster analysis," Computational Statistics & Data Analysis, Elsevier, vol. 54(3), pages 767-778, March.
    19. F. Marta L. Di Lascio & Andrea Menapace & Roberta Pappadà, 2024. "A spatially‐weighted AMH copula‐based dissimilarity measure for clustering variables: An application to urban thermal efficiency," Environmetrics, John Wiley & Sons, Ltd., vol. 35(1), February.
    20. Yifan Zhu & Chongzhi Di & Ying Qing Chen, 2019. "Clustering Functional Data with Application to Electronic Medication Adherence Monitoring in HIV Prevention Trials," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 11(2), pages 238-261, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:qualqt:v:58:y:2024:i:1:d:10.1007_s11135-023-01639-2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.