IDEAS home Printed from https://ideas.repec.org/a/eee/phsmap/v506y2018icp749-754.html
   My bibliography  Save this article

Robustness of sentence length measures in written texts

Author

Listed:
  • Vieira, Denner S.
  • Picoli, Sergio
  • Mendes, Renio S.

Abstract

Hidden structural patterns in written texts have been subject of considerable research in the last decades. In particular, mapping a text into a time series of sentence lengths is a natural way to investigate text structure. Typically, sentence length has been quantified by using measures based on the number of words and the number of characters, but other variations are possible. To quantify the robustness of different sentence length measures, we analyzed a database containing about five hundred books in English. For each book, we extracted six distinct measures of sentence length, including the number of words and number of characters (taking into account lemmatization and stop words removal). We compared these six measures for each book by using (i) Pearson’s coefficient to investigate linear correlations; (ii) Kolmogorov–Smirnov test to compare distributions; and (iii) detrended fluctuation analysis (DFA) to quantify auto–correlations. We have found that all six measures exhibit very similar behavior, suggesting that sentence length is a robust measure related to text structure.

Suggested Citation

  • Vieira, Denner S. & Picoli, Sergio & Mendes, Renio S., 2018. "Robustness of sentence length measures in written texts," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 506(C), pages 749-754.
  • Handle: RePEc:eee:phsmap:v:506:y:2018:i:c:p:749-754
    DOI: 10.1016/j.physa.2018.04.104
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0378437118305326
    Download Restriction: Full text for ScienceDirect subscribers only. Journal offers the option of making the article available online on Science direct for a fee of $3,000

    File URL: https://libkey.io/10.1016/j.physa.2018.04.104?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ausloos, M., 2010. "Punctuation effects in english and esperanto texts," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 389(14), pages 2835-2840.
    2. Yue Yang & Changgui Gu & Qin Xiao & Huijie Yang, 2017. "Evolution of scaling behaviors embedded in sentence series from A Story of the Stone," PLOS ONE, Public Library of Science, vol. 12(2), pages 1-14, February.
    3. Ebeling, Werner & Neiman, Alexander, 1995. "Long-range correlations between letters and sentences in texts," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 215(3), pages 233-241.
    4. Eduardo G Altmann & Janet B Pierrehumbert & Adilson E Motter, 2009. "Beyond Word Frequency: Bursts, Lulls, and Scaling in the Temporal Distributions of Words," PLOS ONE, Public Library of Science, vol. 4(11), pages 1-7, November.
    5. Ausloos, M., 2012. "Measuring complexity with multifractals in texts. Translation effects," Chaos, Solitons & Fractals, Elsevier, vol. 45(11), pages 1349-1357.
    6. Ausloos, M., 2008. "Equilibrium and dynamic methods when comparing an English text and its Esperanto translation," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 387(25), pages 6411-6420.
    7. Tianguang Yang & Changgui Gu & Huijie Yang, 2016. "Long-Range Correlations in Sentence Series from A Story of the Stone," PLOS ONE, Public Library of Science, vol. 11(9), pages 1-11, September.
    8. Kantelhardt, Jan W & Koscielny-Bunde, Eva & Rego, Henio H.A & Havlin, Shlomo & Bunde, Armin, 2001. "Detecting long-range correlations with detrended fluctuation analysis," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 295(3), pages 441-454.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Liu, Yang & Zhuo, Xuru & Zhou, Xiaozhu, 2024. "Multifractal analysis of Chinese literary and web novels," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 641(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ghosh, Dipak & Chakraborty, Sayantan & Samanta, Shukla, 2019. "Study of translational effect in Tagore’s Gitanjali using Chaos based Multifractal analysis technique," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 523(C), pages 1343-1354.
    2. Liu, Yang & Zhuo, Xuru & Zhou, Xiaozhu, 2024. "Multifractal analysis of Chinese literary and web novels," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 641(C).
    3. Kumiko Tanaka-Ishii & Armin Bunde, 2016. "Long-Range Memory in Literary Texts: On the Universal Clustering of the Rare Words," PLOS ONE, Public Library of Science, vol. 11(11), pages 1-14, November.
    4. Heng Chen & Haitao Liu, 2018. "Quantifying Evolution of Short and Long-Range Correlations in Chinese Narrative Texts across 2000 Years," Complexity, Hindawi, vol. 2018, pages 1-12, February.
    5. Ficcadenti, Valerio & Cerqueti, Roy & Ausloos, Marcel & Dhesi, Gurjeet, 2020. "Words ranking and Hirsch index for identifying the core of the hapaxes in political texts," Journal of Informetrics, Elsevier, vol. 14(3).
    6. Ausloos, M., 2012. "Measuring complexity with multifractals in texts. Translation effects," Chaos, Solitons & Fractals, Elsevier, vol. 45(11), pages 1349-1357.
    7. Yue Yang & Changgui Gu & Qin Xiao & Huijie Yang, 2017. "Evolution of scaling behaviors embedded in sentence series from A Story of the Stone," PLOS ONE, Public Library of Science, vol. 12(2), pages 1-14, February.
    8. Yuan, Qianshun & Semba, Sherehe & Zhang, Jing & Weng, Tongfeng & Gu, Changgui & Yang, Huijie, 2021. "Multi-scale transition matrix approach to time series," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 578(C).
    9. Stanisz, Tomasz & Drożdż, Stanisław & Kwapień, Jarosław, 2023. "Universal versus system-specific features of punctuation usage patterns in major Western languages," Chaos, Solitons & Fractals, Elsevier, vol. 168(C).
    10. Suárez-García, Pablo & Gómez-Ullate, David, 2014. "Multifractality and long memory of a financial index," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 394(C), pages 226-234.
    11. Shuntaro Takahashi & Kumiko Tanaka-Ishii, 2017. "Do neural nets learn statistical laws behind natural language?," PLOS ONE, Public Library of Science, vol. 12(12), pages 1-17, December.
    12. Edoardo Magnone, 2014. "A novel graphical representation of sentence complexity: the description and its application," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(2), pages 1301-1329, February.
    13. Lavička, Hynek & Kracík, Jiří, 2020. "Fluctuation analysis of electric power loads in Europe: Correlation multifractality vs. Distribution function multifractality," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 545(C).
    14. Vitanov, Nikolay K. & Sakai, Kenshi & Dimitrova, Zlatinka I., 2008. "SSA, PCA, TDPSC, ACFA: Useful combination of methods for analysis of short and nonstationary time series," Chaos, Solitons & Fractals, Elsevier, vol. 37(1), pages 187-202.
    15. Muchnik, Lev & Bunde, Armin & Havlin, Shlomo, 2009. "Long term memory in extreme returns of financial time series," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 388(19), pages 4145-4150.
    16. Zhong, Meirui & Zhang, Rui & Ren, Xiaohang, 2023. "The time-varying effects of liquidity and market efficiency of the European Union carbon market: Evidence from the TVP-SVAR-SV approach," Energy Economics, Elsevier, vol. 123(C).
    17. Amiri, Babak & Karimianghadim, Ramin, 2024. "A novel text clustering model based on topic modelling and social network analysis," Chaos, Solitons & Fractals, Elsevier, vol. 181(C).
    18. Currenti, Gilda & Negro, Ciro Del & Lapenna, Vincenzo & Telesca, Luciano, 2005. "Fluctuation analysis of the hourly time variability of volcano-magnetic signals recorded at Mt. Etna Volcano, Sicily (Italy)," Chaos, Solitons & Fractals, Elsevier, vol. 23(5), pages 1921-1929.
    19. El Alaoui, Marwane & Benbachir, Saâd, 2013. "Multifractal detrended cross-correlation analysis in the MENA area," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 392(23), pages 5985-5993.
    20. Gerlich, Nikolas & Rostek, Stefan, 2015. "Estimating serial correlation and self-similarity in financial time series—A diversification approach with applications to high frequency data," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 434(C), pages 84-98.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:phsmap:v:506:y:2018:i:c:p:749-754. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/physica-a-statistical-mechpplications/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.