IDEAS home Printed from https://ideas.repec.org/a/eee/phsmap/v562y2021ics037843712030707x.html
   My bibliography  Save this article

Using virtual edges to improve the discriminability of co-occurrence text networks

Author

Listed:
  • Quispe, Laura V.C.
  • Tohalino, Jorge A.V.
  • Amancio, Diego R.

Abstract

Word co-occurrence networks have been employed to analyze texts both in the practical and theoretical scenarios. Despite the relative success in several applications, traditional co-occurrence networks fail in establishing links between similar words whenever they appear distant in the text. Here we investigate whether the use of word embeddings as a tool to create virtual links in co-occurrence networks may improve the quality of classification systems. Our results revealed that the discriminability in the stylometry task is improved when using Glove, Word2Vec and FastText. In addition, we found that optimized results are obtained when stopwords are not disregarded and a simple global thresholding strategy is used to establish virtual links. Because the proposed approach is able to improve the representation of texts as complex networks, we believe that it could be extended to study other natural language processing tasks. Likewise, theoretical languages studies could benefit from the adopted enriched representation of word co-occurrence networks.

Suggested Citation

  • Quispe, Laura V.C. & Tohalino, Jorge A.V. & Amancio, Diego R., 2021. "Using virtual edges to improve the discriminability of co-occurrence text networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 562(C).
  • Handle: RePEc:eee:phsmap:v:562:y:2021:i:c:s037843712030707x
    DOI: 10.1016/j.physa.2020.125344
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S037843712030707X
    Download Restriction: Full text for ScienceDirect subscribers only. Journal offers the option of making the article available online on Science direct for a fee of $3,000

    File URL: https://libkey.io/10.1016/j.physa.2020.125344?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Mayra Z Rodriguez & Cesar H Comin & Dalcimar Casanova & Odemir M Bruno & Diego R Amancio & Luciano da F Costa & Francisco A Rodrigues, 2019. "Clustering algorithms: A comparative approach," PLOS ONE, Public Library of Science, vol. 14(1), pages 1-34, January.
    2. Marcelo A Montemurro & Damián H Zanette, 2013. "Keywords and Co-Occurrence Patterns in the Voynich Manuscript: An Information-Theoretic Analysis," PLOS ONE, Public Library of Science, vol. 8(6), pages 1-9, June.
    3. David Liben‐Nowell & Jon Kleinberg, 2007. "The link‐prediction problem for social networks," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 58(7), pages 1019-1031, May.
    4. Yu, Shuiyuan & Liu, Haitao & Xu, Chunshan, 2011. "Statistical properties of Chinese phonemic networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 390(7), pages 1370-1380.
    5. Diego R Amancio, 2015. "Probing the Topological Properties of Complex Networks Modeling Short Written Texts," PLOS ONE, Public Library of Science, vol. 10(2), pages 1-17, February.
    6. Ren, Fu-Xin & Shen, Hua-Wei & Cheng, Xue-Qi, 2012. "Modeling the clustering in citation networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 391(12), pages 3533-3539.
    7. Barbieri, Andre L. & de Arruda, G.F. & Rodrigues, Francisco A. & Bruno, Odemir M. & Costa, Luciano da Fontoura, 2011. "An entropy-based approach to automatic image segmentation of satellite images," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 390(3), pages 512-518.
    8. Mehri, Ali & Darooneh, Amir H. & Shariati, Ashrafalsadat, 2012. "The complex networks approach for authorship attribution of books," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 391(7), pages 2429-2437.
    9. Liu, Haitao, 2008. "The complexity of Chinese syntactic dependency networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 387(12), pages 3048-3058.
    10. Gao, Yuyang & Liang, Wei & Shi, Yuming & Huang, Qiuling, 2014. "Comparison of directed and weighted co-occurrence networks of six languages," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 393(C), pages 579-589.
    11. Garg, Muskan & Kumar, Mukesh, 2018. "The structure of word co-occurrence network for microblogs," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 512(C), pages 698-720.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Guerreiro, Lucas & Silva, Filipi N. & Amancio, Diego R., 2024. "Recovering network topology and dynamics from sequences: A machine learning approach," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 638(C).
    2. Stefan Claus & Massimo Stella, 2022. "Natural Language Processing and Cognitive Networks Identify UK Insurers’ Trends in Investor Day Transcripts," Future Internet, MDPI, vol. 14(10), pages 1-18, October.
    3. Heng Chen, 2023. "A lexical network approach to second language development," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-9, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tohalino, Jorge V. & Amancio, Diego R., 2018. "Extractive multi-document summarization using multilayer networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 503(C), pages 526-539.
    2. Corrêa, Edilson A. & Marinho, Vanessa Q. & Amancio, Diego R., 2020. "Semantic flow in language networks discriminates texts by genre and publication date," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 557(C).
    3. Mehri, Ali & Jamaati, Maryam, 2021. "Statistical metrics for languages classification: A case study of the Bible translations," Chaos, Solitons & Fractals, Elsevier, vol. 144(C).
    4. Cynthia S. Q. Siew & Dirk U. Wulff & Nicole M. Beckage & Yoed N. Kenett, 2019. "Cognitive Network Science: A Review of Research on Cognition through the Lens of Network Representations, Processes, and Dynamics," Complexity, Hindawi, vol. 2019, pages 1-24, June.
    5. Diego Raphael Amancio, 2015. "Comparing the topological properties of real and artificially generated scientific manuscripts," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1763-1779, December.
    6. Akimushkin, Camilo & Amancio, Diego R. & Oliveira, Osvaldo N., 2018. "On the role of words in the network structure of texts: Application to authorship attribution," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 495(C), pages 49-58.
    7. de Arruda, Henrique F. & Marinho, Vanessa Q. & Lima, Thales S. & Amancio, Diego R. & Costa, Luciano da F., 2018. "An image analysis approach to text analytics based on complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 510(C), pages 110-120.
    8. Liang, Wei & Shi, Yuming & Huang, Qiuling, 2014. "Modeling the Chinese language as an evolving network," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 393(C), pages 268-276.
    9. Yifei Zhou & Shaoyong Li & Yaping Liu, 2020. "Graph-based Method for App Usage Prediction with Attributed Heterogeneous Network Embedding," Future Internet, MDPI, vol. 12(3), pages 1-16, March.
    10. Fernandez Martinez, Roberto & Lostado Lorza, Ruben & Santos Delgado, Ana Alexandra & Piedra, Nelson, 2021. "Use of classification trees and rule-based models to optimize the funding assignment to research projects: A case study of UTPL," Journal of Informetrics, Elsevier, vol. 15(1).
    11. Jiang, Jingchi & Zheng, Jichuan & Zhao, Chao & Su, Jia & Guan, Yi & Yu, Qiubin, 2016. "Clinical-decision support based on medical literature: A complex network approach," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 459(C), pages 42-54.
    12. Nora Connor & Albert Barberán & Aaron Clauset, 2017. "Using null models to infer microbial co-occurrence networks," PLOS ONE, Public Library of Science, vol. 12(5), pages 1-23, May.
    13. Leto Peel & Tiago P. Peixoto & Manlio De Domenico, 2022. "Statistical inference links data and theory in network science," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    14. Rafiee, Samira & Salavati, Chiman & Abdollahpouri, Alireza, 2020. "CNDP: Link prediction based on common neighbors degree penalization," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 539(C).
    15. Bikramjit Das & Tiandong Wang & Gengling Dai, 2022. "Asymptotic Behavior of Common Connections in Sparse Random Networks," Methodology and Computing in Applied Probability, Springer, vol. 24(3), pages 2071-2092, September.
    16. Diego R Amancio, 2015. "Probing the Topological Properties of Complex Networks Modeling Short Written Texts," PLOS ONE, Public Library of Science, vol. 10(2), pages 1-17, February.
    17. Lee, Yan-Li & Zhou, Tao, 2021. "Collaborative filtering approach to link prediction," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 578(C).
    18. Greg Morrison & L Mahadevan, 2012. "Discovering Communities through Friendship," PLOS ONE, Public Library of Science, vol. 7(7), pages 1-9, July.
    19. Liu, Zhenfeng & Feng, Jian & Uden, Lorna, 2023. "Technology opportunity analysis using hierarchical semantic networks and dual link prediction," Technovation, Elsevier, vol. 128(C).
    20. Shugang Li & Ziming Wang & Beiyan Zhang & Boyi Zhu & Zhifang Wen & Zhaoxu Yu, 2022. "The Research of “Products Rapidly Attracting Users” Based on the Fully Integrated Link Prediction Algorithm," Mathematics, MDPI, vol. 10(14), pages 1-19, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:phsmap:v:562:y:2021:i:c:s037843712030707x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/physica-a-statistical-mechpplications/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.