IDEAS home Printed from https://ideas.repec.org/p/hal/journl/hal-01354368.html
   My bibliography  Save this paper

Mining team characteristics to predict Wikipedia article quality

Author

Listed:
  • Grace Gimon Betancourt

    (LUSSI - Département Logique des Usages, Sciences sociales et Sciences de l'Information - UEB - Université européenne de Bretagne - European University of Brittany - Télécom Bretagne - IMT - Institut Mines-Télécom [Paris])

  • Armando Segnini

    (LUSSI - Département Logique des Usages, Sciences sociales et Sciences de l'Information - UEB - Université européenne de Bretagne - European University of Brittany - Télécom Bretagne - IMT - Institut Mines-Télécom [Paris])

  • Carlos Trabuco

    (LUSSI - Département Logique des Usages, Sciences sociales et Sciences de l'Information - UEB - Université européenne de Bretagne - European University of Brittany - Télécom Bretagne - IMT - Institut Mines-Télécom [Paris])

  • Amira Rezgui

    (LUSSI - Département Logique des Usages, Sciences sociales et Sciences de l'Information - UEB - Université européenne de Bretagne - European University of Brittany - Télécom Bretagne - IMT - Institut Mines-Télécom [Paris], MARSOUIN - Môle Armoricain de Recherche sur la SOciété de l'information et des usages d'INternet - UR - Université de Rennes - UEB - Université européenne de Bretagne - European University of Brittany - UBS - Université de Bretagne Sud - ENSAI - Ecole Nationale de la Statistique et de l'Analyse de l'Information [Bruz] - UBO - Université de Brest - Télécom Bretagne - IMT - Institut Mines-Télécom [Paris] - UR2 - Université de Rennes 2, ICI - Laboratoire Information, Coordination, Incitations - UEB - Université européenne de Bretagne - European University of Brittany - UBO - Université de Brest - Télécom Bretagne - IMT - Institut Mines-Télécom [Paris] - IBSHS - Institut Brestois des Sciences de l'Homme et de la Société - UBO - Université de Brest)

  • Nicolas Jullien

    (LUSSI - Département Logique des Usages, Sciences sociales et Sciences de l'Information - UEB - Université européenne de Bretagne - European University of Brittany - Télécom Bretagne - IMT - Institut Mines-Télécom [Paris], MARSOUIN - Môle Armoricain de Recherche sur la SOciété de l'information et des usages d'INternet - UR - Université de Rennes - UEB - Université européenne de Bretagne - European University of Brittany - UBS - Université de Bretagne Sud - ENSAI - Ecole Nationale de la Statistique et de l'Analyse de l'Information [Bruz] - UBO - Université de Brest - Télécom Bretagne - IMT - Institut Mines-Télécom [Paris] - UR2 - Université de Rennes 2, ICI - Laboratoire Information, Coordination, Incitations - UEB - Université européenne de Bretagne - European University of Brittany - UBO - Université de Brest - Télécom Bretagne - IMT - Institut Mines-Télécom [Paris] - IBSHS - Institut Brestois des Sciences de l'Homme et de la Société - UBO - Université de Brest)

Abstract

In this study, we were interested in studying which characteristics of virtual teams are good predictors for the quality of their production. The experiment involved obtaining the Spanish Wikipedia database dump and applying different data mining techniques suitable for large data sets to label the whole set of articles according to their quality (comparing them with the Featured/Good Articles, or FA/GA). Then we created the attributes that describe the characteristics of the team who produced the articles and using decision tree methods, we obtained the most relevant characteristics of the teams that produced FA/GA. The team's maximum efficiency and the total length of contribution are the most important predictors. This article contributes to the literature on virtual team organization.

Suggested Citation

  • Grace Gimon Betancourt & Armando Segnini & Carlos Trabuco & Amira Rezgui & Nicolas Jullien, 2016. "Mining team characteristics to predict Wikipedia article quality," Post-Print hal-01354368, HAL.
  • Handle: RePEc:hal:journl:hal-01354368
    Note: View the original document on HAL open archive server: https://hal.science/hal-01354368v1
    as

    Download full text from publisher

    File URL: https://hal.science/hal-01354368v1/document
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Dirk Lewandowski & Ulrike Spree, 2011. "Ranking of Wikipedia articles in search engines revisited: Fair ranking for reasonable quality?," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(1), pages 117-132, January.
    2. Besiki Stvilia & Michael B. Twidale & Linda C. Smith & Les Gasser, 2008. "Information quality work organization in wikipedia," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 59(6), pages 983-1001, April.
    3. Dirk Lewandowski & Ulrike Spree, 2011. "Ranking of Wikipedia articles in search engines revisited: Fair ranking for reasonable quality?," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(1), pages 117-132, January.
    4. Rullani, Francesco & Haefliger, Stefan, 2013. "The periphery on stage: The intra-organizational dynamics in online communities of creation," Research Policy, Elsevier, vol. 42(4), pages 941-953.
    5. Besiki Stvilia & Les Gasser & Michael B. Twidale & Linda C. Smith, 2007. "A framework for information quality assessment," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 58(12), pages 1720-1733, October.
    6. Eduardo A. Haddad & Jaime Bonet & Geoffrey J. D. Hewings, 2023. "Introduction and Overview," Advances in Spatial Science, in: Eduardo A. Haddad & Jaime Bonet & Geoffrey J. D. Hewings (ed.), The Colombian Economy and Its Regional Structural Challenges, chapter 0, pages 1-16, Springer.
    7. Nicolas Jullien & Kevin Crowston & Felipe Ortega, 2015. "The Rise and Fall of an Online Project. Is Bureaucracy Killing Efficiency in Open Knowledge Production?," Post-Print hal-01192596, HAL.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Nicolas Jullien, 2012. "What We Know About Wikipedia: A Review of the Literature Analyzing the Project(s)," Post-Print hal-00857208, HAL.
    2. Kevin Crowston & Nicolas Jullien & Felipe Ortega, 2013. "Is Wikipedia Inefficient? Modelling Effort and Participation in Wikipedia," Post-Print hal-00947731, HAL.
    3. Dejean, Sylvain & Jullien, Nicolas, 2015. "Big from the beginning: Assessing online contributors’ behavior by their first contribution," Research Policy, Elsevier, vol. 44(6), pages 1226-1239.
    4. Dirk Lewandowski, 2015. "Evaluating the retrieval effectiveness of web search engines using a representative query sample," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 66(9), pages 1763-1775, September.
    5. Morgan Bazilian & Patrick Nussbaumer & Hans-Holger Rogner & Abeeku Brew-Hammond & Vivien Foster & Shonali Pachauri & Eric Williams & Mark Howells & Philippe Niyongabo & Lawrence Musaba & Brian Ó Galla, 2011. "Energy Access Scenarios to 2030 for the Power Sector in Sub-Saharan Africa," Working Papers 2011.68, Fondazione Eni Enrico Mattei.
    6. Sorin Matei & Nicolas Jullien & Amira Rezgui & Diane Jackson, 2019. "The evolution of online co-production groups and its effects on content quality," Post-Print hal-01985702, HAL.
    7. Chris Desmond & Janet Seeley & Candice Groenewald & Nothando Ngwenya & Kate Rich & Tony Barnett, 2019. "Interpreting social determinants: Emergent properties and adolescent risk behaviour," PLOS ONE, Public Library of Science, vol. 14(12), pages 1-17, December.
    8. Evans, Garen K., 2008. "Spatial Shift-Share Analysis of the Leisure and Hospitality Sector on the Gulf Coast following Hurricane Katrina," 2008 Annual Meeting, February 2-6, 2008, Dallas, Texas 6744, Southern Agricultural Economics Association.
    9. Li, Yung-Ming & Lee, Yi-Lin, 2010. "Pricing peer-produced services: Quality, capacity, and competition issues," European Journal of Operational Research, Elsevier, vol. 207(3), pages 1658-1668, December.
    10. Bernhard Christoph, 2010. "The Relation Between Life Satisfaction and the Material Situation: A Re-Evaluation Using Alternative Measures," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 98(3), pages 475-499, September.
    11. Olimpia Markiewicz & Anna Bartczak & Agnieszka Markowska, 2007. "Wartość dodatkowego roku przeżycia w Polsce," Ekonomia journal, Faculty of Economic Sciences, University of Warsaw, vol. 19.
    12. Shuchih Ernest Chang & Hueimin Louis Luo & YiChian Chen, 2019. "Blockchain-Enabled Trade Finance Innovation: A Potential Paradigm Shift on Using Letter of Credit," Sustainability, MDPI, vol. 12(1), pages 1-16, December.
    13. repec:iab:iabfda:201307(en is not listed on IDEAS
    14. Freeman, Christopher & Soete, Luc, 2009. "Developing science, technology and innovation indicators: What we can learn from the past," Research Policy, Elsevier, vol. 38(4), pages 583-589, May.
    15. Knieps, Günter, 2010. "Network Neutrality and the Evolution of the Internet," 21st European Regional ITS Conference, Copenhagen 2010: Telecommunications at new crossroads - Changing value configurations, user roles, and regulation 19, International Telecommunications Society (ITS).
    16. Kenneth Shelton Aikins & Richard Ametefe, 2017. "Ethnic Factor and Politics in the Asuogyaman District of Ghana," SAGE Open, , vol. 7(3), pages 21582440177, July.
    17. Natina Yaduma & Mika Kortelainen & Ada Wossink, 2013. "Estimating Mortality and Economic Costs of Particulate Air Pollution in Developing Countries: The Case of Nigeria," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 54(3), pages 361-387, March.
    18. Kody T. Ponds & Ali Arefi & Ali Sayigh & Gerard Ledwich, 2018. "Aggregator of Demand Response for Renewable Integration and Customer Engagement: Strengths, Weaknesses, Opportunities, and Threats," Energies, MDPI, vol. 11(9), pages 1-20, September.
    19. Justesen, Mogens K. & Bjørnskov, Christian, 2014. "Exploiting the Poor: Bureaucratic Corruption and Poverty in Africa," World Development, Elsevier, vol. 58(C), pages 106-115.
    20. Anke Becker, 2019. "On the Economic Origins of Restrictions on Women's Sexuality," CESifo Working Paper Series 7770, CESifo.
    21. Krikke, H.R. & van der Laan, E., 2009. "Last Time Buy and Control Policies With Phase-Out Returns : A Case Study in Plant Control Systems," Discussion Paper 2009-66, Tilburg University, Center for Economic Research.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hal:journl:hal-01354368. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: CCSD (email available below). General contact details of provider: https://hal.archives-ouvertes.fr/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.