IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0020109.html
   My bibliography  Save this article

On the Accuracy of Language Trees

Author

Listed:
  • Simone Pompei
  • Vittorio Loreto
  • Francesca Tria

Abstract

Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or characters for many different languages: a set of parallel corpora whose compilation represents a paramount achievement in linguistics. From this perspective the reconstruction of language trees is an example of inverse problems: starting from present, incomplete and often noisy, information, one aims at inferring the most likely past evolutionary history. A fundamental issue in inverse problems is the evaluation of the inference made. A standard way of dealing with this question is to generate data with artificial models in order to have full access to the evolutionary process one is going to infer. This procedure presents an intrinsic limitation: when dealing with real data sets, one typically does not know which model of evolution is the most suitable for them. A possible way out is to compare algorithmic inference with expert classifications. This is the point of view we take here by conducting a thorough survey of the accuracy of reconstruction methods as compared with the Ethnologue expert classifications. We focus in particular on state-of-the-art distance-based methods for phylogeny reconstruction using worldwide linguistic databases. In order to assess the accuracy of the inferred trees we introduce and characterize two generalizations of standard definitions of distances between trees. Based on these scores we quantify the relative performances of the distance-based algorithms considered. Further we quantify how the completeness and the coverage of the available databases affect the accuracy of the reconstruction. Finally we draw some conclusions about where the accuracy of the reconstructions in historical linguistics stands and about the leading directions to improve it.

Suggested Citation

  • Simone Pompei & Vittorio Loreto & Francesca Tria, 2011. "On the Accuracy of Language Trees," PLOS ONE, Public Library of Science, vol. 6(6), pages 1-11, June.
  • Handle: RePEc:plo:pone00:0020109
    DOI: 10.1371/journal.pone.0020109
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0020109
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0020109&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0020109?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Russell D. Gray & Quentin D. Atkinson, 2003. "Language-tree divergence times support the Anatolian theory of Indo-European origin," Nature, Nature, vol. 426(6965), pages 435-439, November.
    2. Mark Pagel & Quentin D. Atkinson & Andrew Meade, 2007. "Frequency of word-use predicts rates of lexical evolution throughout Indo-European history," Nature, Nature, vol. 449(7163), pages 717-720, October.
    3. Wichmann, Søren & Holman, Eric W. & Bakker, Dik & Brown, Cecil H., 2010. "Evaluating linguistic distance measures," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 389(17), pages 3632-3639.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Job Schepens & Ton Dijkstra & Franc Grootjen & Walter J B van Heuven, 2013. "Cross-Language Distributions of High Frequency and Phonetically Similar Cognates," PLOS ONE, Public Library of Science, vol. 8(5), pages 1-15, May.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Nico Neureiter & Peter Ranacher & Nour Efrat-Kowalsky & Gereon A. Kaiping & Robert Weibel & Paul Widmer & Remco R. Bouckaert, 2022. "Detecting contact in language trees: a Bayesian phylogenetic model with horizontal transfer," Palgrave Communications, Palgrave Macmillan, vol. 9(1), pages 1-14, December.
    2. Taraka Rama, 2013. "Phonotactic Diversity Predicts the Time Depth of the World’s Language Families," PLOS ONE, Public Library of Science, vol. 8(5), pages 1-9, May.
    3. Job Schepens & Ton Dijkstra & Franc Grootjen & Walter J B van Heuven, 2013. "Cross-Language Distributions of High Frequency and Phonetically Similar Cognates," PLOS ONE, Public Library of Science, vol. 8(5), pages 1-15, May.
    4. Klaus Desmet & Ignacio Ortuño-Ortín & Romain Wacziarg, 2009. "The political economy of ethnolinguistic cleavages," Working Papers 2009-17, Instituto Madrileño de Estudios Avanzados (IMDEA) Ciencias Sociales.
    5. Victor Ginsburgh & Shlomo Weber, 2020. "The Economics of Language," Journal of Economic Literature, American Economic Association, vol. 58(2), pages 348-404, June.
    6. Gamallo, Pablo & Pichel, José Ramom & Alegria, Iñaki, 2017. "From language identification to language distance," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 484(C), pages 152-162.
    7. Liang Xu & Min Xu & Zehua Jiang & Xin Wen & Yishan Liu & Zaoyi Sun & Hongting Li & Xiuying Qian, 2023. "How have music emotions been described in Google books? Historical trends and corpus differences," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-11, December.
    8. Kristen, Cornelia & Mühlau, Peter & Schacht, Diana, 2016. "Language acquisition of recently arrived immigrants in England, Germany, Ireland, and the Netherlands," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 16(2), pages 180-212.
    9. Joseph Flavian Gomes, 2020. "The health costs of ethnic distance: evidence from sub-Saharan Africa," Journal of Economic Growth, Springer, vol. 25(2), pages 195-226, June.
    10. Dustin S. Stoltz & Marshall A. Taylor, 2019. "Concept Mover’s Distance: measuring concept engagement via word embeddings in texts," Journal of Computational Social Science, Springer, vol. 2(2), pages 293-313, July.
    11. Aparicio Fenoll, Ainoa & Kuehn, Zoë, 2016. "Education Policies and Migration across European Countries," IZA Discussion Papers 9755, Institute of Labor Economics (IZA).
    12. Andrew Dickens, 2022. "Understanding Ethnolinguistic Differences: The Roles of Geography and Trade," The Economic Journal, Royal Economic Society, vol. 132(643), pages 953-980.
    13. Ainhoa Aparicio Fenoll & Zoë Kuehn, 2017. "Compulsory Schooling Laws and Migration Across European Countries," Demography, Springer;Population Association of America (PAA), vol. 54(6), pages 2181-2200, December.
    14. Eduardo G Altmann & Janet B Pierrehumbert & Adilson E Motter, 2011. "Niche as a Determinant of Word Fate in Online Groups," PLOS ONE, Public Library of Science, vol. 6(5), pages 1-12, May.
    15. Stanisz, Tomasz & Drożdż, Stanisław & Kwapień, Jarosław, 2023. "Universal versus system-specific features of punctuation usage patterns in major Western languages," Chaos, Solitons & Fractals, Elsevier, vol. 168(C).
    16. Desmet, Klaus & Ortuño-Ortín, Ignacio & Wacziarg, Romain, 2012. "The political economy of linguistic cleavages," Journal of Development Economics, Elsevier, vol. 97(2), pages 322-338.
    17. Petroni, Filippo & Serva, Maurizio, 2010. "Measures of lexical distance between languages," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 389(11), pages 2280-2283.
    18. Ginsburgh, Victor & Weber, Shlomo, 2015. "Linguistic Distances and their Use in Economics," CEPR Discussion Papers 10640, C.E.P.R. Discussion Papers.
    19. Matthew J. Baker, 2021. "Foundations of the Age-Area Hypothesis," Palgrave Communications, Palgrave Macmillan, vol. 8(1), pages 1-17, December.
    20. Stelios Michalopoulos, 2012. "The Origins of Ethnolinguistic Diversity," American Economic Review, American Economic Association, vol. 102(4), pages 1508-1539, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0020109. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.