IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1007652.html
   My bibliography  Save this article

Collective intelligence defines biological functions in Wikipedia as communities in the hidden protein connection network

Author

Listed:
  • Andrei Zinovyev
  • Urszula Czerwinska
  • Laura Cantini
  • Emmanuel Barillot
  • Klaus M Frahm
  • Dima L Shepelyansky

Abstract

English Wikipedia, containing more than five millions articles, has approximately eleven thousands web pages devoted to proteins or genes most of which were generated by the Gene Wiki project. These pages contain information about interactions between proteins and their functional relationships. At the same time, they are interconnected with other Wikipedia pages describing biological functions, diseases, drugs and other topics curated by independent, not coordinated collective efforts. Therefore, Wikipedia contains a directed network of protein functional relations or physical interactions embedded into the global network of the encyclopedia terms, which defines hidden (indirect) functional proximity between proteins. We applied the recently developed reduced Google Matrix (REGOMAX) algorithm in order to extract the network of hidden functional connections between proteins in Wikipedia. In this network we discovered tight communities which reflect areas of interest in molecular biology or medicine and can be considered as definitions of biological functions shaped by collective intelligence. Moreover, by comparing two snapshots of Wikipedia graph (from years 2013 and 2017), we studied the evolution of the network of direct and hidden protein connections. We concluded that the hidden connections are more dynamic compared to the direct ones and that the size of the hidden interaction communities grows with time. We recapitulate the results of Wikipedia protein community analysis and annotation in the form of an interactive online map, which can serve as a portal to the Gene Wiki project.Author summary: The long-standing effort for annotating protein functions from published experimental evidences is still far from being completed, partly due to a limited number of biocurators involved in it. Wikipedia was thought to be a suitable platform for the protein function curation crowdsourcing through exploiting the wisdom of the crowd principle. Starting from 2008, English Wikipedia was automatically populated with thousands of protein pages and links between them (Gene Wiki project), which created a useful and rapidly evolving knowledge resource. However, it remains unclear what is the benefit of hyperlinking protein pages with the whole Wikipedia knowledge corpus. We applied the recently introduced network analysis method, called reduced Google Matrix (REGOMAX), in order to study the structure of direct and indirect (hidden) links between protein pages through the rest of the global Wikipedia network. As expected, the network of direct links had node degree distribution approximately following the power law. In contrast, the network of hidden links was characterized by larger than expected tight communities of proteins related to their known functions, such as involvement in immune system. The “friendship network” of these protein groups can be used for automated annotations of their functions from non-protein Wikipedia pages. We estimated the size of the expert Wikipedia contributor community, specifically working on protein and associated pages, to be nearly 1000 wikipedians with primarily biomedical background. We conclude that the structure of global Wikipedia network can improve the annotation of protein functions by amplifying the wisdom of the crowd effect.

Suggested Citation

  • Andrei Zinovyev & Urszula Czerwinska & Laura Cantini & Emmanuel Barillot & Klaus M Frahm & Dima L Shepelyansky, 2020. "Collective intelligence defines biological functions in Wikipedia as communities in the hidden protein connection network," PLOS Computational Biology, Public Library of Science, vol. 16(2), pages 1-19, February.
  • Handle: RePEc:plo:pcbi00:1007652
    DOI: 10.1371/journal.pcbi.1007652
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007652
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1007652&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1007652?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Young-Ho Eom & Pablo Aragón & David Laniado & Andreas Kaltenbrunner & Sebastiano Vigna & Dima L Shepelyansky, 2015. "Interactions of Cultures and Top People of Wikipedia from Ranking of 24 Language Editions," PLOS ONE, Public Library of Science, vol. 10(3), pages 1-27, March.
    2. José Lages & Dima L Shepelyansky & Andrei Zinovyev, 2018. "Inferring hidden causal relations between pathway members using reduced Google matrix of directed biological networks," PLOS ONE, Public Library of Science, vol. 13(1), pages 1-28, January.
    3. Célestin Coquidé & José Lages & Dima L. Shepelyansky, 2019. "World influence and interactions of universities from Wikipedia networks," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 92(1), pages 1-20, January.
    4. Samer El Zant & Katia Jaffrès-Runser & Dima L Shepelyansky, 2018. "Capturing the influence of geopolitical ties from Wikipedia with reduced Google matrix," PLOS ONE, Public Library of Science, vol. 13(8), pages 1-31, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Demidov, Denis & Frahm, Klaus M. & Shepelyansky, Dima L., 2020. "What is the central bank of Wikipedia?," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 542(C).
    2. C'elestin Coquid'e & Leonardo Ermann & Jos'e Lages & D. L. Shepelyansky, 2019. "Influence of petroleum and gas trade on EU economies from the reduced Google matrix analysis of UN COMTRADE data," Papers 1903.01820, arXiv.org.
    3. Frahm, Klaus M. & Shepelyansky, Dima L., 2020. "Google matrix analysis of bi-functional SIGNOR network of protein–protein interactions," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 559(C).
    4. Denis Demidov & Klaus M. Frahm & Dima L. Shepelyansky, 2019. "What is the central bank of Wikipedia?," Papers 1902.07920, arXiv.org.
    5. Célestin Coquidé & José Lages & Dima Shepelyansky, 2020. "Interdependence of sectors of economic activities for world countries from the reduced Google matrix analysis of WTO data," Post-Print hal-02132487, HAL.
    6. Célestin Coquidé & José Lages & Leonardo Ermann & Dima Shepelyansky, 2022. "COVID-19 impact on the international trade," Post-Print hal-03536528, HAL.
    7. Guillaume Rollin & José Lages & Dima L Shepelyansky, 2019. "Wikipedia network analysis of cancer interactions and world influence," PLOS ONE, Public Library of Science, vol. 14(9), pages 1-26, September.
    8. Stephany, Fabian & Braesemann, Fabian, 2017. "An Exploration of Wikipedia Data as a Measure of Regional Knowledge Distribution," SocArXiv c2gd8, Center for Open Science.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1007652. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.