IDEAS home Printed from https://ideas.repec.org/a/eee/chsofr/v144y2021ics0960077921000321.html
   My bibliography  Save this article

Statistical metrics for languages classification: A case study of the Bible translations

Author

Listed:
  • Mehri, Ali
  • Jamaati, Maryam

Abstract

Automatic language classification is an important contribution to linguistic research. Four statistical features concerning long-range correlations are applied to classify syntactic properties of languages. We calculate Zipf’s exponent, Heaps’ exponent, fractal dimension and entropy, for the Bible translations to one hundred live languages from twenty-eight language families. The Bible has unique concept regardless of its language, but the discrepancy in grammatical rules of the languages leads to difference in extracted measures from its various translations. The results show that, geographical distance and cultural differences can lead to statistical discrepancies. All extracted features for the Bible translations have normal distribution around their average value. This fact categorizes the languages into two groups; a majority of normal languages and a minority of abnormal ones. There is also evident (anti)correlation relation between each pair of the mentioned metrics due to their respective mechanism. Standard deviation of the considered statistical features over language families is affected by geographical distance between communities that speak to their languages and their cultural diversity.

Suggested Citation

  • Mehri, Ali & Jamaati, Maryam, 2021. "Statistical metrics for languages classification: A case study of the Bible translations," Chaos, Solitons & Fractals, Elsevier, vol. 144(C).
  • Handle: RePEc:eee:chsofr:v:144:y:2021:i:c:s0960077921000321
    DOI: 10.1016/j.chaos.2021.110679
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0960077921000321
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.chaos.2021.110679?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Petroni, Filippo & Serva, Maurizio, 2010. "Measures of lexical distance between languages," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 389(11), pages 2280-2283.
    2. Mehri, Ali & Darooneh, Amir H. & Shariati, Ashrafalsadat, 2012. "The complex networks approach for authorship attribution of books," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 391(7), pages 2429-2437.
    3. Jamaati, Maryam & Mehri, Ali, 2018. "Text mining by Tsallis entropy," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 490(C), pages 1368-1376.
    4. Ted Briscoe, 2008. "Language learning, power laws, and sexual selection," Mind & Society: Cognitive Studies in Economics and Social Sciences, Springer;Fondazione Rosselli, vol. 7(1), pages 65-76, June.
    5. Mehri, Ali & Darooneh, Amir H., 2011. "The role of entropy in word ranking," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 390(18), pages 3157-3163.
    6. Ali Mehri & Sahar Mohammadpour Lashkari, 2016. "Power-law regularities in human language," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 89(11), pages 1-6, November.
    7. Gamallo, Pablo & Pichel, José Ramom & Alegria, Iñaki, 2017. "From language identification to language distance," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 484(C), pages 152-162.
    8. Gao, Yuyang & Liang, Wei & Shi, Yuming & Huang, Qiuling, 2014. "Comparison of directed and weighted co-occurrence networks of six languages," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 393(C), pages 579-589.
    9. Mehri, Ali & Agahi, Hamzeh & Mehri-Dehnavi, Hossein, 2019. "A novel word ranking method based on distorted entropy," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 521(C), pages 484-492.
    10. Marcelo A Montemurro & Damián H Zanette, 2011. "Universal Entropy of Word Ordering Across Linguistic Families," PLOS ONE, Public Library of Science, vol. 6(5), pages 1-9, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gamallo, Pablo & Pichel, José Ramom & Alegria, Iñaki, 2017. "From language identification to language distance," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 484(C), pages 152-162.
    2. Jamaati, Maryam & Mehri, Ali, 2018. "Text mining by Tsallis entropy," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 490(C), pages 1368-1376.
    3. Mehri, Ali & Agahi, Hamzeh & Mehri-Dehnavi, Hossein, 2019. "A novel word ranking method based on distorted entropy," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 521(C), pages 484-492.
    4. Quispe, Laura V.C. & Tohalino, Jorge A.V. & Amancio, Diego R., 2021. "Using virtual edges to improve the discriminability of co-occurrence text networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 562(C).
    5. Espitia, Diego & Larralde, Hernán, 2020. "Universal and non-universal text statistics: Clustering coefficient for language identification," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 553(C).
    6. Ramezani, Zahra & Pourdarvish, Ahmad, 2021. "Transfer learning using Tsallis entropy: An application to Gravity Spy," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 561(C).
    7. Ingo Eduard Isphording & Sebastian Otten, 2013. "The Costs of Babylon—Linguistic Distance in Applied Economics," Review of International Economics, Wiley Blackwell, vol. 21(2), pages 354-369, May.
    8. repec:zbw:rwirep:0337 is not listed on IDEAS
    9. repec:zbw:hohpro:352 is not listed on IDEAS
    10. Jennifer A. Byrne & Cyril Labbé, 2017. "Striking similarities between publications from China describing single gene knockdown experiments in human cancer cell lines," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(3), pages 1471-1493, March.
    11. Isphording, Ingo E. & Otten, Sebastian, 2014. "Linguistic barriers in the destination language acquisition of immigrants," Journal of Economic Behavior & Organization, Elsevier, vol. 105(C), pages 30-50.
    12. de Arruda, Henrique F. & Marinho, Vanessa Q. & Lima, Thales S. & Amancio, Diego R. & Costa, Luciano da F., 2018. "An image analysis approach to text analytics based on complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 510(C), pages 110-120.
    13. Jiang, Jingchi & Zheng, Jichuan & Zhao, Chao & Su, Jia & Guan, Yi & Yu, Qiubin, 2016. "Clinical-decision support based on medical literature: A complex network approach," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 459(C), pages 42-54.
    14. Isphording, Ingo E. & Piopiunik, Marc & Rodríguez-Planas, Núria, 2016. "Speaking in numbers: The effect of reading performance on math performance among immigrants," Economics Letters, Elsevier, vol. 139(C), pages 52-56.
    15. Ghosh, Dipak & Chakraborty, Sayantan & Samanta, Shukla, 2019. "Study of translational effect in Tagore’s Gitanjali using Chaos based Multifractal analysis technique," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 523(C), pages 1343-1354.
    16. Ibrahim Bousmah & Gilles Grenier & David M. Gray, 2021. "Linguistic Distance, Languages of Work and Wages of Immigrants in Montreal," Journal of Labor Research, Springer, vol. 42(1), pages 1-28, March.
    17. Louise Bogéa Ribeiro & Anderson Raiol Rodrigues & Kauê Machado Costa & Manoel da Silva Filho, 2019. "Quantification of textual comprehension difficulty with an information theory-based algorithm," Palgrave Communications, Palgrave Macmillan, vol. 5(1), pages 1-9, December.
    18. Carretero-Campos, C. & Bernaola-Galván, P. & Coronado, A.V. & Carpena, P., 2013. "Improving statistical keyword detection in short texts: Entropic and clustering approaches," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 392(6), pages 1481-1492.
    19. Carlos F Alvarez & Luis E Palafox & Leocundo Aguilar & Mauricio A Sanchez & Luis G Martinez, 2016. "Using Link Disconnection Entropy Disorder to Detect Fast Moving Nodes in MANETs," PLOS ONE, Public Library of Science, vol. 11(5), pages 1-15, May.
    20. Zhao, Na & Li, Jie & Wang, Jian & Li, Tong & Yu, Yong & Zhou, Tao, 2020. "Identifying significant edges via neighborhood information," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 548(C).
    21. Tohalino, Jorge V. & Amancio, Diego R., 2018. "Extractive multi-document summarization using multilayer networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 503(C), pages 526-539.
    22. Erkan Gören, 2013. "Economic Effects of Domestic and Neighbouring Countries' Cultural Diversity," ZenTra Working Papers in Transnational Studies 16 / 2013, ZenTra - Center for Transnational Studies, revised Apr 2013.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:chsofr:v:144:y:2021:i:c:s0960077921000321. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Thayer, Thomas R. (email available below). General contact details of provider: https://www.journals.elsevier.com/chaos-solitons-and-fractals .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.