IDEAS home Printed from https://ideas.repec.org/a/eee/phsmap/v562y2021ics037843712030707x.html
   My bibliography  Save this article

Using virtual edges to improve the discriminability of co-occurrence text networks

Author

Listed:
  • Quispe, Laura V.C.
  • Tohalino, Jorge A.V.
  • Amancio, Diego R.

Abstract

Word co-occurrence networks have been employed to analyze texts both in the practical and theoretical scenarios. Despite the relative success in several applications, traditional co-occurrence networks fail in establishing links between similar words whenever they appear distant in the text. Here we investigate whether the use of word embeddings as a tool to create virtual links in co-occurrence networks may improve the quality of classification systems. Our results revealed that the discriminability in the stylometry task is improved when using Glove, Word2Vec and FastText. In addition, we found that optimized results are obtained when stopwords are not disregarded and a simple global thresholding strategy is used to establish virtual links. Because the proposed approach is able to improve the representation of texts as complex networks, we believe that it could be extended to study other natural language processing tasks. Likewise, theoretical languages studies could benefit from the adopted enriched representation of word co-occurrence networks.

Suggested Citation

  • Quispe, Laura V.C. & Tohalino, Jorge A.V. & Amancio, Diego R., 2021. "Using virtual edges to improve the discriminability of co-occurrence text networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 562(C).
  • Handle: RePEc:eee:phsmap:v:562:y:2021:i:c:s037843712030707x
    DOI: 10.1016/j.physa.2020.125344
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S037843712030707X
    Download Restriction: Full text for ScienceDirect subscribers only. Journal offers the option of making the article available online on Science direct for a fee of $3,000

    File URL: https://libkey.io/10.1016/j.physa.2020.125344?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Mayra Z Rodriguez & Cesar H Comin & Dalcimar Casanova & Odemir M Bruno & Diego R Amancio & Luciano da F Costa & Francisco A Rodrigues, 2019. "Clustering algorithms: A comparative approach," PLOS ONE, Public Library of Science, vol. 14(1), pages 1-34, January.
    2. Marcelo A Montemurro & Damián H Zanette, 2013. "Keywords and Co-Occurrence Patterns in the Voynich Manuscript: An Information-Theoretic Analysis," PLOS ONE, Public Library of Science, vol. 8(6), pages 1-9, June.
    3. Diego R Amancio, 2015. "Probing the Topological Properties of Complex Networks Modeling Short Written Texts," PLOS ONE, Public Library of Science, vol. 10(2), pages 1-17, February.
    4. Ren, Fu-Xin & Shen, Hua-Wei & Cheng, Xue-Qi, 2012. "Modeling the clustering in citation networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 391(12), pages 3533-3539.
    5. Gao, Yuyang & Liang, Wei & Shi, Yuming & Huang, Qiuling, 2014. "Comparison of directed and weighted co-occurrence networks of six languages," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 393(C), pages 579-589.
    6. Garg, Muskan & Kumar, Mukesh, 2018. "The structure of word co-occurrence network for microblogs," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 512(C), pages 698-720.
    7. David Liben‐Nowell & Jon Kleinberg, 2007. "The link‐prediction problem for social networks," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 58(7), pages 1019-1031, May.
    8. Yu, Shuiyuan & Liu, Haitao & Xu, Chunshan, 2011. "Statistical properties of Chinese phonemic networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 390(7), pages 1370-1380.
    9. Barbieri, Andre L. & de Arruda, G.F. & Rodrigues, Francisco A. & Bruno, Odemir M. & Costa, Luciano da Fontoura, 2011. "An entropy-based approach to automatic image segmentation of satellite images," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 390(3), pages 512-518.
    10. Mehri, Ali & Darooneh, Amir H. & Shariati, Ashrafalsadat, 2012. "The complex networks approach for authorship attribution of books," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 391(7), pages 2429-2437.
    11. Liu, Haitao, 2008. "The complexity of Chinese syntactic dependency networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 387(12), pages 3048-3058.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Guerreiro, Lucas & Silva, Filipi N. & Amancio, Diego R., 2024. "Recovering network topology and dynamics from sequences: A machine learning approach," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 638(C).
    2. Stefan Claus & Massimo Stella, 2022. "Natural Language Processing and Cognitive Networks Identify UK Insurers’ Trends in Investor Day Transcripts," Future Internet, MDPI, vol. 14(10), pages 1-18, October.
    3. Heng Chen, 2023. "A lexical network approach to second language development," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-9, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Corrêa, Edilson A. & Marinho, Vanessa Q. & Amancio, Diego R., 2020. "Semantic flow in language networks discriminates texts by genre and publication date," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 557(C).
    2. Tohalino, Jorge V. & Amancio, Diego R., 2018. "Extractive multi-document summarization using multilayer networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 503(C), pages 526-539.
    3. Mehri, Ali & Jamaati, Maryam, 2021. "Statistical metrics for languages classification: A case study of the Bible translations," Chaos, Solitons & Fractals, Elsevier, vol. 144(C).
    4. Akimushkin, Camilo & Amancio, Diego R. & Oliveira, Osvaldo N., 2018. "On the role of words in the network structure of texts: Application to authorship attribution," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 495(C), pages 49-58.
    5. de Arruda, Henrique F. & Marinho, Vanessa Q. & Lima, Thales S. & Amancio, Diego R. & Costa, Luciano da F., 2018. "An image analysis approach to text analytics based on complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 510(C), pages 110-120.
    6. Cynthia S. Q. Siew & Dirk U. Wulff & Nicole M. Beckage & Yoed N. Kenett, 2019. "Cognitive Network Science: A Review of Research on Cognition through the Lens of Network Representations, Processes, and Dynamics," Complexity, Hindawi, vol. 2019, pages 1-24, June.
    7. Diego Raphael Amancio, 2015. "Comparing the topological properties of real and artificially generated scientific manuscripts," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1763-1779, December.
    8. Jeong, Yujin & Park, Inchae & Yoon, Byungun, 2019. "Identifying emerging Research and Business Development (R&BD) areas based on topic modeling and visualization with intellectual property right data," Technological Forecasting and Social Change, Elsevier, vol. 146(C), pages 655-672.
    9. Liang, Wei & Shi, Yuming & Huang, Qiuling, 2014. "Modeling the Chinese language as an evolving network," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 393(C), pages 268-276.
    10. Yifei Zhou & Shaoyong Li & Yaping Liu, 2020. "Graph-based Method for App Usage Prediction with Attributed Heterogeneous Network Embedding," Future Internet, MDPI, vol. 12(3), pages 1-16, March.
    11. Karimi, Fatemeh & Lotfi, Shahriar & Izadkhah, Habib, 2021. "Community-guided link prediction in multiplex networks," Journal of Informetrics, Elsevier, vol. 15(4).
    12. Xiaofang Wo & Guichen Li & Yuantian Sun & Jinghua Li & Sen Yang & Haoran Hao, 2022. "The Changing Tendency and Association Analysis of Intelligent Coal Mines in China: A Policy Text Mining Study," Sustainability, MDPI, vol. 14(18), pages 1-14, September.
    13. Gamallo, Pablo & Pichel, José Ramom & Alegria, Iñaki, 2017. "From language identification to language distance," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 484(C), pages 152-162.
    14. Lahmiri, Salim, 2016. "Image characterization by fractal descriptors in variational mode decomposition domain: Application to brain magnetic resonance," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 456(C), pages 235-243.
    15. Fernandez Martinez, Roberto & Lostado Lorza, Ruben & Santos Delgado, Ana Alexandra & Piedra, Nelson, 2021. "Use of classification trees and rule-based models to optimize the funding assignment to research projects: A case study of UTPL," Journal of Informetrics, Elsevier, vol. 15(1).
    16. Xu, Hua & Wang, Minggang & Jiang, Shumin & Yang, Weiguo, 2020. "Carbon price forecasting with complex network and extreme learning machine," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 545(C).
    17. Andreas Spitz & Anna Gimmler & Thorsten Stoeck & Katharina Anna Zweig & Emőke-Ágnes Horvát, 2016. "Assessing Low-Intensity Relationships in Complex Networks," PLOS ONE, Public Library of Science, vol. 11(4), pages 1-17, April.
    18. Jiang, Jingchi & Zheng, Jichuan & Zhao, Chao & Su, Jia & Guan, Yi & Yu, Qiubin, 2016. "Clinical-decision support based on medical literature: A complex network approach," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 459(C), pages 42-54.
    19. Qiaoran Yang & Zhiliang Dong & Yichi Zhang & Man Li & Ziyi Liang & Chao Ding, 2021. "Who Will Establish New Trade Relations? Looking for Potential Relationship in International Nickel Trade," Sustainability, MDPI, vol. 13(21), pages 1-15, October.
    20. Nora Connor & Albert Barberán & Aaron Clauset, 2017. "Using null models to infer microbial co-occurrence networks," PLOS ONE, Public Library of Science, vol. 12(5), pages 1-23, May.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:phsmap:v:562:y:2021:i:c:s037843712030707x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/physica-a-statistical-mechpplications/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.