IDEAS home Printed from https://ideas.repec.org/a/eee/phsmap/v608y2022ip1s0378437122008172.html
   My bibliography  Save this article

Random sampling of the Zipf–Mandelbrot distribution as a representation of vocabulary growth

Author

Listed:
  • Tunnicliffe, Martin
  • Hunter, Gordon

Abstract

We develop a discrete model of type-token dynamics based on random type selection from the Zipf–Mandelbrot probability distribution, with a view to examining the relationships between the constants of Zipf’s and Heaps’ laws. Analysis of items randomly selected items from the Standardised Project Gutenberg Corpus (SPGC) reveal a significant low-frequency “droop” in the β-slope of the types vs. frequency distribution, inconsistent with the model when vocabulary is unlimited: when a finite vocabulary limit is imposed, optimal parameter selection allows the droop to be reproduced. We adjust the parameters of both the limited and unlimited vocabulary models to obtain optimal agreement with the vocabulary growth curves: the limited vocabulary model usually yields the best optimised agreement, but a sizeable minority of items are better represented by an unlimited vocabulary. While the optimised Zipf α indices correlate strongly with the corresponding values obtained directly from document statistics, the former are generally larger than the latter (though this is partially explained by the distorting effect of large values of the Mandelbrot parameter m). The β indices optimised from the limited vocabulary model are also compared with their directly measured equivalents, showing significant positive correlation. The relationship between optimised α and β agrees plausibly with the well-known continuum model, though the degree of agreement depends on how β is defined. The experiments yield repeatable results from each of three 100-item samples, demonstrating the statistical significance of the experiments.

Suggested Citation

  • Tunnicliffe, Martin & Hunter, Gordon, 2022. "Random sampling of the Zipf–Mandelbrot distribution as a representation of vocabulary growth," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 608(P1).
  • Handle: RePEc:eee:phsmap:v:608:y:2022:i:p1:s0378437122008172
    DOI: 10.1016/j.physa.2022.128259
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0378437122008172
    Download Restriction: Full text for ScienceDirect subscribers only. Journal offers the option of making the article available online on Science direct for a fee of $3,000

    File URL: https://libkey.io/10.1016/j.physa.2022.128259?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. repec:cup:cbooks:9780511771576 is not listed on IDEAS
    2. Montemurro, Marcelo A., 2001. "Beyond the Zipf–Mandelbrot law in quantitative linguistics," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 300(3), pages 567-578.
    3. H. Bauke, 2007. "Parameter estimation for power-law distributions by maximum likelihood methods," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 58(2), pages 167-173, July.
    4. Linyuan Lü & Zi-Ke Zhang & Tao Zhou, 2010. "Zipf's Law Leads to Heaps' Law: Analyzing Their Relation in Finite-Size Systems," PLOS ONE, Public Library of Science, vol. 5(12), pages 1-11, December.
    5. Eliazar, Iddo, 2011. "The growth statistics of Zipfian ensembles: Beyond Heaps’ law," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 390(20), pages 3189-3203.
    6. Easley,David & Kleinberg,Jon, 2010. "Networks, Crowds, and Markets," Cambridge Books, Cambridge University Press, number 9780521195331, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Blazquez-Soriano, Amparo & Ramos-Sandoval, Rosmery, 2022. "Information transfer as a tool to improve the resilience of farmers against the effects of climate change: The case of the Peruvian National Agrarian Innovation System," Agricultural Systems, Elsevier, vol. 200(C).
    2. Martin L. Weitzman, 2015. "A Voting Architecture for the Governance of Free-Driver Externalities, with Application to Geoengineering," Scandinavian Journal of Economics, Wiley Blackwell, vol. 117(4), pages 1049-1068, October.
    3. Wei Zhong, 2017. "Simulating influenza pandemic dynamics with public risk communication and individual responsive behavior," Computational and Mathematical Organization Theory, Springer, vol. 23(4), pages 475-495, December.
    4. Guo Weilong & Minca Andreea & Wang Li, 2016. "The topology of overlapping portfolio networks," Statistics & Risk Modeling, De Gruyter, vol. 33(3-4), pages 139-155, December.
    5. Kwame Boamah‐Addo & Tomasz J. Kozubowski & Anna K. Panorska, 2023. "A discrete truncated Zipf distribution," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 77(2), pages 156-187, May.
    6. Thomas J. Sargent & John Stachurski, 2022. "Economic Networks: Theory and Computation," Papers 2203.11972, arXiv.org, revised Jul 2022.
    7. Bernd (B.) Heidergott & Jia-Ping Huang & Ines (I.) Lindner, 2018. "Naive Learning in Social Networks with Random Communication," Tinbergen Institute Discussion Papers 18-018/II, Tinbergen Institute.
    8. Johannes M. Bauer & Michael Latzer, 2016. "The economics of the Internet: an overview," Chapters, in: Johannes M. Bauer & Michael Latzer (ed.), Handbook on the Economics of the Internet, chapter 1, pages 3-20, Edward Elgar Publishing.
    9. Kobayashi, Teruyoshi & Takaguchi, Taro, 2018. "Identifying relationship lending in the interbank market: A network approach," Journal of Banking & Finance, Elsevier, vol. 97(C), pages 20-36.
    10. Konstantinos Antoniadis & Kostas Zafiropoulos & Vasiliki Vrana, 2016. "A Method for Assessing the Performance of e-Government Twitter Accounts," Future Internet, MDPI, vol. 8(2), pages 1-18, April.
    11. Maness, Michael & Cirillo, Cinzia, 2016. "An indirect latent informational conformity social influence choice model: Formulation and case study," Transportation Research Part B: Methodological, Elsevier, vol. 93(PA), pages 75-101.
    12. Bauer, Johannes M., 2014. "Platforms, systems competition, and innovation: Reassessing the foundations of communications policy," Telecommunications Policy, Elsevier, vol. 38(8), pages 662-673.
    13. Julia Neidhardt & Nataliia Rümmele & Hannes Werthner, 0. "Predicting happiness: user interactions and sentiment analysis in an online travel forum," Information Technology & Tourism, Springer, vol. 0, pages 1-19.
    14. OKUBO Toshihiro & ONO Yukako & SAITO Yukiko, 2014. "Roles of Wholesalers in Transaction Networks," Discussion papers 14059, Research Institute of Economy, Trade and Industry (RIETI).
    15. Glover, Dominic & Kim, Sung Kyu & Stone, Glenn Davis, 2020. "Golden Rice and technology adoption theory: A study of seed choice dynamics among rice growers in the Philippines," Technology in Society, Elsevier, vol. 60(C).
    16. Daron Acemoglu & Victor Chernozhukov & Iván Werning & Michael D. Whinston, 2021. "Optimal Targeted Lockdowns in a Multigroup SIR Model," American Economic Review: Insights, American Economic Association, vol. 3(4), pages 487-502, December.
    17. Mark Braverman & Jing Chen & Sampath Kannan, 2016. "Optimal Provision-After-Wait in Healthcare," Mathematics of Operations Research, INFORMS, vol. 41(1), pages 352-376, February.
    18. Lomi, Alessandro & Fonti, Fabio, 2012. "Networks in markets and the propensity of companies to collaborate: An empirical test of three mechanisms," Economics Letters, Elsevier, vol. 114(2), pages 216-220.
    19. Zhang, Xuxi & Liu, Xianping & Lewis, Frank L. & Wang, Xia, 2020. "Bipartite tracking consensus of nonlinear multi-agent systems," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 545(C).
    20. Venkat Venkatasubramanian & Yu Luo, 2018. "How much income inequality is fair? Nash bargaining solution and its connection to entropy," Papers 1806.05262, arXiv.org.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:phsmap:v:608:y:2022:i:p1:s0378437122008172. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/physica-a-statistical-mechpplications/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.