IDEAS home Printed from https://ideas.repec.org/a/spr/eurphb/v97y2024i6d10.1140_epjb_s10051-024-00717-0.html
   My bibliography  Save this article

Modeling texts with networks: comparing five approaches to sentence representation

Author

Listed:
  • Davi Alves Oliveira

    (University of Bahia State (UNEB)
    Federal University of Bahia (UFBA), University of Bahia State (UNEB), Bahia Federal Institute of Education, Science and Technology (IFBA), State University of Feira de Santana (UEFS), National Scientific Computing Laboratory (LNCC), SENAI CIMATEC University Center)

  • Hernane Borges de Barros Pereira

    (University of Bahia State (UNEB)
    SENAI CIMATEC University Center
    Federal University of Bahia (UFBA), University of Bahia State (UNEB), Bahia Federal Institute of Education, Science and Technology (IFBA), State University of Feira de Santana (UEFS), National Scientific Computing Laboratory (LNCC), SENAI CIMATEC University Center)

Abstract

Complex networks offer a powerful framework for modeling linguistic phenomena. This study compares five distinct methods for representing sentences as networks, each with unique edge definitions: (1) a lines approach, where edges represent token (e.g., word) adjacency; (2) a close-range co-occurrence approach, where edges are based on the probability of tokens co-occurring at distance one or two; (3) a cliques approach, where edges connect tokens co-occurring within the same sentence; (4) a dependency-based approach, where edges are defined by syntactic dependencies extracted by a parser; (5) an IF-trimmed-subgraphs approach, where edges are determined by the Incidence-Fidelity (IF) Index. While the first four approaches are well established in the literature, the last one is a novel proposal. We also examined the effects of limiting the vertices to lemmas (i.e., words with inflections removed) and to lexical lemmas (i.e., nouns, adjectives, verbs, and adverbs) as opposed to the unaltered words. Our results reveal that these approaches yield networks with varying average minimal path lengths and degrees, influencing the interpretation of results. While small-world behavior remains consistent across networks, scale-free behavior analysis is affected. Notably, excluding functional words significantly alters degree distributions. We suggest, in order of relevance and according to the resources available, the dependency-based, the close-range co-occurrence, and the lines approaches for cases in which syntactic relations are central, and the IF-trimmed-subgraphs and the cliques approaches for cases in which semantic relations are central. Graphical Abstract Representation of the sentence “we calculated two sets of adjusted values as follows” using five approaches - (1) the lines approach, (2) the close-range cooccurrence approach, (3) the cliques approach, (4) the dependency-based approach, and (5) the IF-trimmed-subgraphs approach - and three vertex definitions - (1) vertices representing unaltered words, (2) vertices representing lemmas, and (3) vertices representing lexical lemmas

Suggested Citation

  • Davi Alves Oliveira & Hernane Borges de Barros Pereira, 2024. "Modeling texts with networks: comparing five approaches to sentence representation," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 97(6), pages 1-12, June.
  • Handle: RePEc:spr:eurphb:v:97:y:2024:i:6:d:10.1140_epjb_s10051-024-00717-0
    DOI: 10.1140/epjb/s10051-024-00717-0
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1140/epjb/s10051-024-00717-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1140/epjb/s10051-024-00717-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Rosário, R.S. & Cardoso, P.T. & Muñoz, M.A. & Montoya, P. & Miranda, J.G.V., 2015. "Motif-Synchronization: A new method for analysis of dynamic brain networks with EEG," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 439(C), pages 7-19.
    2. S. M.G. Caldeira & T. C. Petit Lobão & R. F.S. Andrade & A. Neme & J. G.V. Miranda, 2006. "The network of concepts in written texts," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 49(4), pages 523-529, February.
    3. Gillespie, Colin S., 2015. "Fitting Heavy Tailed Distributions: The poweRlaw Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 64(i02).
    4. Fadigas, I.S. & Pereira, H.B.B., 2013. "A network approach based on cliques," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 392(10), pages 2576-2587.
    5. Iwona Grabska-Gradzińska & Andrzej Kulig & Jarosław Kwapień & Stanisław Drożdż, 2012. "Complex Network Analysis Of Literary And Scientific Texts," International Journal of Modern Physics C (IJMPC), World Scientific Publishing Co. Pte. Ltd., vol. 23(07), pages 1-15.
    6. Grilo, M. & Fadigas, I.S. & Miranda, J.G.V. & Cunha, M.V. & Monteiro, R.L.S. & Pereira, H.B.B., 2017. "Robustness in semantic networks based on cliques," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 472(C), pages 94-102.
    7. Corrêa, Edilson A. & Amancio, Diego R., 2019. "Word sense induction using word embeddings and community detection in complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 523(C), pages 180-190.
    8. G. M. Teixeira & M. S. F. Aguiar & C. F. Carvalho & D. R. Dantas & M. V. Cunha & J. H. M. Morais & H. B. B. Pereira & J. G. V. Miranda, 2010. "Complex Semantic Networks," International Journal of Modern Physics C (IJMPC), World Scientific Publishing Co. Pte. Ltd., vol. 21(03), pages 333-347.
    9. Pereira, H.B.B. & Fadigas, I.S. & Monteiro, R.L.S. & Cordeiro, A.J.A. & Moret, M.A., 2016. "Density: A measure of the diversity of concepts addressed in semantic networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 441(C), pages 81-84.
    10. Pereira, H.B.B. & Fadigas, I.S. & Senna, V. & Moret, M.A., 2011. "Semantic networks based on titles of scientific papers," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 390(6), pages 1192-1197.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hernane Pereira & Ludmilla Monfort Oliveira Sousa & Maíra Lima Souza & Thiago B. Murari & Marcelo A. Moret, 2024. "Overview of the initial phase of scientific production on COVID-19 during the pandemic," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 97(7), pages 1-11, July.
    2. Inácio Sousa Fadigas & Marcos Grilo & Hernane Borges Barros Pereira, 2023. "Scientific journal disciplinarity quantification and sorting using a network index," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(3), pages 2563-2573, June.
    3. Monteiro, R.L.S. & Fontoura, J.R.A. & Carneiro, T.K.G. & Moret, M.A. & Pereira, H.B.B., 2014. "Evolution based on chromosome affinity from a network perspective," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 403(C), pages 276-283.
    4. Solomija Buk & Yuri Krynytskyi & Andrij Rovenchak, 2019. "Properties Of Autosemantic Word Networks In Ukrainian Texts," Advances in Complex Systems (ACS), World Scientific Publishing Co. Pte. Ltd., vol. 22(06), pages 1-22, December.
    5. Ghosh, Dipak & Chakraborty, Sayantan & Samanta, Shukla, 2019. "Study of translational effect in Tagore’s Gitanjali using Chaos based Multifractal analysis technique," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 523(C), pages 1343-1354.
    6. Goldrosen, Nicholas, 2024. "Is corrections officers' use of illegal force networked? Network structure, brokerage, and key players in the New York City Department of Correction," Journal of Criminal Justice, Elsevier, vol. 92(C).
    7. Lyócsa, Štefan & Výrost, Tomáš, 2018. "Scale-free distribution of firm-size distribution in emerging economies," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 508(C), pages 501-505.
    8. Musa, Hussam & Krištofík, Peter & Medzihorský, Juraj & Klieštik, Tomáš, 2024. "The development of firm size distribution – Evidence from four Central European countries," International Review of Economics & Finance, Elsevier, vol. 91(C), pages 98-110.
    9. Corrêa, Edilson A. & Marinho, Vanessa Q. & Amancio, Diego R., 2020. "Semantic flow in language networks discriminates texts by genre and publication date," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 557(C).
    10. Katahira, Kei & Chen, Yu & Akiyama, Eizo, 2021. "Self-organized Speculation Game for the spontaneous emergence of financial stylized facts," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 582(C).
    11. Pauline Formaglio & Marina E. Wosniack & Raphael M. Tromer & Jaderson G. Polli & Yuri B. Matos & Hang Zhong & Ernesto P. Raposo & Marcos G. E. Luz & Rogerio Amino, 2023. "Plasmodium sporozoite search strategy to locate hotspots of blood vessel invasion," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    12. Joseph L Servadio & Gustavo Machado & Julio Alvarez & Francisco Edilson de Ferreira Lima Júnior & Renato Vieira Alves & Matteo Convertino, 2020. "Information differences across spatial resolutions and scales for disease surveillance and analysis: The case of Visceral Leishmaniasis in Brazil," PLOS ONE, Public Library of Science, vol. 15(7), pages 1-17, July.
    13. Diego R Amancio, 2015. "Probing the Topological Properties of Complex Networks Modeling Short Written Texts," PLOS ONE, Public Library of Science, vol. 10(2), pages 1-17, February.
    14. Cura, Robin & Cottineau, Clémentine & Swerts, Elfie & Ignazzi, Cosmo Antonio & Bretagnolle, Anne & Vacchiani-Marcuzzo, Celine & Pumain, Denise, 2017. "The Old and the New: Qualifying City Systems in the World with Classical Models and New Data," SocArXiv pbzn6, Center for Open Science.
    15. repec:wsi:acsxxx:v:21:y:2018:i:08:n:s0219525918500194 is not listed on IDEAS
    16. Zhang, Qi & Luo, Chuanhai & Li, Meizhu & Deng, Yong & Mahadevan, Sankaran, 2015. "Tsallis information dimension of complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 419(C), pages 707-717.
    17. Yongcong Luo & Jing Ma & Chi Li, 2020. "Entity name recognition of cross-border e-commerce commodity titles based on TWs-LSTM," Electronic Commerce Research, Springer, vol. 20(2), pages 405-426, June.
    18. Tiandong Wang & Sidney Resnick, 2023. "Poisson Edge Growth and Preferential Attachment Networks," Methodology and Computing in Applied Probability, Springer, vol. 25(1), pages 1-25, March.
    19. Ellen Brooks-Pollock & Leon Danon & Hester Korthals Altes & Jennifer A Davidson & Andrew M T Pollock & Dick van Soolingen & Colin Campbell & Maeve K Lalor, 2020. "A model of tuberculosis clustering in low incidence countries reveals more transmission in the United Kingdom than the Netherlands between 2010 and 2015," PLOS Computational Biology, Public Library of Science, vol. 16(3), pages 1-14, March.
    20. Xue Cui & Lu Yang, 2024. "Systemic risk and idiosyncratic networks among global systemically important banks," International Journal of Finance & Economics, John Wiley & Sons, Ltd., vol. 29(1), pages 58-75, January.
    21. Kei Katahira & Yu Chen, 2019. "Heterogeneous wealth distribution, round-trip trading and the emergence of volatility clustering in Speculation Game," Papers 1909.03185, arXiv.org.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:eurphb:v:97:y:2024:i:6:d:10.1140_epjb_s10051-024-00717-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.