IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v9y2021i8p875-d536894.html
   My bibliography  Save this article

A Conceptual Probabilistic Framework for Annotation Aggregation of Citizen Science Data

Author

Listed:
  • Jesus Cerquides

    (Institut d’Investigació en Intel ligència Artificial (IIIA), CSIC, 08193 Cerdanyola, Spain)

  • Mehmet Oğuz Mülâyim

    (Institut d’Investigació en Intel ligència Artificial (IIIA), CSIC, 08193 Cerdanyola, Spain)

  • Jerónimo Hernández-González

    (Department de Matemàtiques, Universitat de Barcelona, 08007 Barcelona, Spain)

  • Amudha Ravi Shankar

    (Citizen Cyberlab, CUI, University of Geneva, CH-1227 Geneva, Switzerland)

  • Jose Luis Fernandez-Marquez

    (Citizen Cyberlab, CUI, University of Geneva, CH-1227 Geneva, Switzerland)

Abstract

Over the last decade, hundreds of thousands of volunteers have contributed to science by collecting or analyzing data. This public participation in science, also known as citizen science, has contributed to significant discoveries and led to publications in major scientific journals. However, little attention has been paid to data quality issues. In this work we argue that being able to determine the accuracy of data obtained by crowdsourcing is a fundamental question and we point out that, for many real-life scenarios, mathematical tools and processes for the evaluation of data quality are missing. We propose a probabilistic methodology for the evaluation of the accuracy of labeling data obtained by crowdsourcing in citizen science. The methodology builds on an abstract probabilistic graphical model formalism, which is shown to generalize some already existing label aggregation models. We show how to make practical use of the methodology through a comparison of data obtained from different citizen science communities analyzing the earthquake that took place in Albania in 2019.

Suggested Citation

  • Jesus Cerquides & Mehmet Oğuz Mülâyim & Jerónimo Hernández-González & Amudha Ravi Shankar & Jose Luis Fernandez-Marquez, 2021. "A Conceptual Probabilistic Framework for Annotation Aggregation of Citizen Science Data," Mathematics, MDPI, vol. 9(8), pages 1-15, April.
  • Handle: RePEc:gam:jmathe:v:9:y:2021:i:8:p:875-:d:536894
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/9/8/875/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/9/8/875/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. A. P. Dawid & A. M. Skene, 1979. "Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 28(1), pages 20-28, March.
    2. Carpenter, Bob & Gelman, Andrew & Hoffman, Matthew D. & Lee, Daniel & Goodrich, Ben & Betancourt, Michael & Brubaker, Marcus & Guo, Jiqiang & Li, Peter & Riddell, Allen, 2017. "Stan: A Probabilistic Programming Language," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 76(i01).
    3. Andrei P. Kirilenko & Travis Desell & Hany Kim & Svetlana Stepchenkova, 2017. "Crowdsourcing Analysis of Twitter Data on Climate Change: Paid Workers vs. Volunteers," Sustainability, MDPI, vol. 9(11), pages 1-15, November.
    4. Trisha Gura, 2013. "Citizen science: Amateur experts," Nature, Nature, vol. 496(7444), pages 259-261, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Francis,David C. & Kubinec ,Robert, 2022. "Beyond Political Connections : A Measurement Model Approach to Estimating Firm-levelPolitical Influence in 41 Economies," Policy Research Working Paper Series 10119, The World Bank.
    2. Martinovici, A., 2019. "Revealing attention - how eye movements predict brand choice and moment of choice," Other publications TiSEM 7dca38a5-9f78-4aee-bd81-c, Tilburg University, School of Economics and Management.
    3. Yongping Bao & Ludwig Danwitz & Fabian Dvorak & Sebastian Fehrler & Lars Hornuf & Hsuan Yu Lin & Bettina von Helversen, 2022. "Similarity and Consistency in Algorithm-Guided Exploration," CESifo Working Paper Series 10188, CESifo.
    4. Heinrich, Torsten & Yang, Jangho & Dai, Shuanping, 2020. "Growth, development, and structural change at the firm-level: The example of the PR China," MPRA Paper 105011, University Library of Munich, Germany.
    5. van Kesteren Erik-Jan & Bergkamp Tom, 2023. "Bayesian analysis of Formula One race results: disentangling driver skill and constructor advantage," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 19(4), pages 273-293, December.
    6. Xin Xu & Yang Lu & Yupeng Zhou & Zhiguo Fu & Yanjie Fu & Minghao Yin, 2021. "An Information-Explainable Random Walk Based Unsupervised Network Representation Learning Framework on Node Classification Tasks," Mathematics, MDPI, vol. 9(15), pages 1-14, July.
    7. Xiaoyue Xi & Simon E. F. Spencer & Matthew Hall & M. Kate Grabowski & Joseph Kagaayi & Oliver Ratmann & Rakai Health Sciences Program and PANGEA‐HIV, 2022. "Inferring the sources of HIV infection in Africa from deep‐sequence data with semi‐parametric Bayesian Poisson flow models," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(3), pages 517-540, June.
    8. Kuschnig, Nikolas, 2021. "Bayesian Spatial Econometrics and the Need for Software," Department of Economics Working Paper Series 318, WU Vienna University of Economics and Business.
    9. Deniz Aksoy & David Carlson, 2022. "Electoral support and militants’ targeting strategies," Journal of Peace Research, Peace Research Institute Oslo, vol. 59(2), pages 229-241, March.
    10. Luo, Nanyu & Ji, Feng & Han, Yuting & He, Jinbo & Zhang, Xiaoya, 2024. "Fitting item response theory models using deep learning computational frameworks," OSF Preprints tjxab, Center for Open Science.
    11. Richard Hunt & Shelton Peiris & Neville Weber, 2022. "Estimation methods for stationary Gegenbauer processes," Statistical Papers, Springer, vol. 63(6), pages 1707-1741, December.
    12. D. Fouskakis & G. Petrakos & I. Rotous, 2020. "A Bayesian longitudinal model for quantifying students’ preferences regarding teaching quality indicators," METRON, Springer;Sapienza Università di Roma, vol. 78(2), pages 255-270, August.
    13. Joseph B. Bak-Coleman & Ian Kennedy & Morgan Wack & Andrew Beers & Joseph S. Schafer & Emma S. Spiro & Kate Starbird & Jevin D. West, 2022. "Combining interventions to reduce the spread of viral misinformation," Nature Human Behaviour, Nature, vol. 6(10), pages 1372-1380, October.
    14. Jonas Moss & Riccardo De Bin, 2023. "Modelling publication bias and p‐hacking," Biometrics, The International Biometric Society, vol. 79(1), pages 319-331, March.
    15. Gael M. Martin & David T. Frazier & Christian P. Robert, 2020. "Computing Bayes: Bayesian Computation from 1763 to the 21st Century," Monash Econometrics and Business Statistics Working Papers 14/20, Monash University, Department of Econometrics and Business Statistics.
    16. David M. Phillippo & Sofia Dias & A. E. Ades & Mark Belger & Alan Brnabic & Alexander Schacht & Daniel Saure & Zbigniew Kadziola & Nicky J. Welton, 2020. "Multilevel network meta‐regression for population‐adjusted treatment comparisons," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 183(3), pages 1189-1210, June.
    17. Matthias Breuer & Harm H. Schütt, 2023. "Accounting for uncertainty: an application of Bayesian methods to accruals models," Review of Accounting Studies, Springer, vol. 28(2), pages 726-768, June.
    18. Alina Ferecatu & Arnaud Bruyn & Prithwiraj Mukherjee, 2024. "Silently killing your panelists one email at a time: The true cost of email solicitations," Journal of the Academy of Marketing Science, Springer, vol. 52(4), pages 1216-1239, July.
    19. Loke Schmalensee & Pauline Caillault & Katrín Hulda Gunnarsdóttir & Karl Gotthard & Philipp Lehmann, 2023. "Seasonal specialization drives divergent population dynamics in two closely related butterflies," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    20. Edgar Santos‐Fernandez & Erin E. Peterson & Julie Vercelloni & Em Rushworth & Kerrie Mengersen, 2021. "Correcting misclassification errors in crowdsourced ecological data: A Bayesian perspective," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(1), pages 147-173, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:9:y:2021:i:8:p:875-:d:536894. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.