IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v145y2020ics0167947320300050.html
   My bibliography  Save this article

Bias reduction in the population size estimation of large data sets

Author

Listed:
  • Chu, Jeffrey
  • Zhang, Yuanyuan
  • Chan, Stephen
  • Nadarajah, Saralees

Abstract

Estimation of the population size of large data sets and hard to reach populations can be a significant problem. For example, in the military, manpower is limited and the manual processing of large data sets can be time consuming. In addition, accessing the full population of data may be restricted by factors such as cost, time, and safety. Four new population size estimators are proposed, as extensions of existing methods, and their performances are compared in terms of bias with two existing methods in the big data literature. These would be particularly beneficial in the context of time-critical decisions or actions. The comparison is based on a simulation study and the application to five real network data sets (Twitter, LiveJournal, Pokec, Youtube, Wikipedia Talk). Whilst no single estimator (out of the four proposed) generates the most accurate estimates overall, the proposed estimators are shown to produce more accurate population size estimates for small sample sizes, but in some cases show more variability than existing estimators in the literature.

Suggested Citation

  • Chu, Jeffrey & Zhang, Yuanyuan & Chan, Stephen & Nadarajah, Saralees, 2020. "Bias reduction in the population size estimation of large data sets," Computational Statistics & Data Analysis, Elsevier, vol. 145(C).
  • Handle: RePEc:eee:csdana:v:145:y:2020:i:c:s0167947320300050
    DOI: 10.1016/j.csda.2020.106914
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947320300050
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2020.106914?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Yuanyuan Zhang & Saralees Nadarajah, 2017. "Flexible Heavy Tailed Distributions for Big Data," Annals of Data Science, Springer, vol. 4(3), pages 421-432, September.
    2. Zaman, Asad, 1981. "Estimators without moments : The case of the reciprocal of a normal mean," Journal of Econometrics, Elsevier, vol. 15(2), pages 289-298, February.
    3. Forrest W. Crawford & Jiacheng Wu & Robert Heimer, 2018. "Hidden Population Size Estimation From Respondent-Driven Sampling: A Network Approach," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(522), pages 755-766, April.
    4. Boginski, Vladimir & Butenko, Sergiy & Pardalos, Panos M., 2005. "Statistical analysis of financial networks," Computational Statistics & Data Analysis, Elsevier, vol. 48(2), pages 431-443, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Teh, Boon Kin & Goo, Yik Wen & Lian, Tong Wei & Ong, Wei Guang & Choi, Wen Ting & Damodaran, Mridula & Cheong, Siew Ann, 2015. "The Chinese Correction of February 2007: How financial hierarchies change in a market crash," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 424(C), pages 225-241.
    2. Stosic, Darko & Stosic, Dusan & Ludermir, Teresa B. & Stosic, Tatijana, 2018. "Collective behavior of cryptocurrency price changes," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 507(C), pages 499-509.
    3. Marton Gosztonyi, 2021. "A Snapshot of the Ownership Network of the Budapest Stock Exchange," Financial and Economic Review, Magyar Nemzeti Bank (Central Bank of Hungary), vol. 20(3), pages 31-58.
    4. Lillo, Felipe & Valdés, Rodrigo, 2016. "Dynamics of financial markets and transaction costs: A graph-based study," Research in International Business and Finance, Elsevier, vol. 38(C), pages 455-465.
    5. Xue Guo & Hu Zhang & Tianhai Tian, 2019. "Multi-Likelihood Methods for Developing Stock Relationship Networks Using Financial Big Data," Papers 1906.08088, arXiv.org.
    6. Wang, Gang-Jin & Chen, Yang-Yang & Si, Hui-Bin & Xie, Chi & Chevallier, Julien, 2021. "Multilayer information spillover networks analysis of China’s financial institutions based on variance decompositions," International Review of Economics & Finance, Elsevier, vol. 73(C), pages 325-347.
    7. Erick Treviño Aguilar, 2020. "The interdependency structure in the Mexican stock exchange: A network approach," PLOS ONE, Public Library of Science, vol. 15(10), pages 1-31, October.
    8. Elisa Letizia & Fabrizio Lillo, 2017. "Corporate payments networks and credit risk rating," Papers 1711.07677, arXiv.org, revised Sep 2018.
    9. Nie, Chun-Xiao, 2017. "Correlation dimension of financial market," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 473(C), pages 632-639.
    10. Zugang Liu, 2013. "The co-evolution of integrated corporate financial networks and supply chain networks with insolvency risk," Computational Management Science, Springer, vol. 10(2), pages 253-275, June.
    11. Diebold, Francis X. & Lamb, Russell L., 1997. "Why are estimates of agricultural supply response so variable?," Journal of Econometrics, Elsevier, vol. 76(1-2), pages 357-373.
    12. Frank Emmert-Streib & Matthias Dehmer, 2010. "Influence of the Time Scale on the Construction of Financial Networks," PLOS ONE, Public Library of Science, vol. 5(9), pages 1-9, September.
    13. Radhakrishnan, Srinivasan & Duvvuru, Arjun & Sultornsanee, Sivarit & Kamarthi, Sagar, 2016. "Phase synchronization based minimum spanning trees for analysis of financial time series with nonlinear correlations," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 444(C), pages 259-270.
    14. Seyed Soheil Hosseini & Nick Wormald & Tianhai Tian, 2019. "A Weight-based Information Filtration Algorithm for Stock-Correlation Networks," Papers 1904.06007, arXiv.org.
    15. Oleg Shirokikh & Grigory Pastukhov & Vladimir Boginski & Sergiy Butenko, 2013. "Computational study of the US stock market evolution: a rank correlation-based network model," Computational Management Science, Springer, vol. 10(2), pages 81-103, June.
    16. Nie, Chun-Xiao, 2022. "Analysis of critical events in the correlation dynamics of cryptocurrency market," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 586(C).
    17. Xu, Shiyun & Shao, Menglin & Qiao, Wenxuan & Shang, Pengjian, 2018. "Generalized AIC method based on higher-order moments and entropy of financial time series," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 505(C), pages 1127-1138.
    18. François Caron & Emily B. Fox, 2017. "Sparse graphs using exchangeable random measures," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(5), pages 1295-1366, November.
    19. Biplab Bhattacharjee & Muhammad Shafi & Animesh Acharjee, 2017. "Investigating the Evolution of Linkage Dynamics among Equity Markets Using Network Models and Measures: The Case of Asian Equity Market Integration," Data, MDPI, vol. 2(4), pages 1-28, December.
    20. Nie, Chun-Xiao, 2023. "Time-varying characteristics of information flow networks in the Chinese market: An analysis based on sector indices," Finance Research Letters, Elsevier, vol. 54(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:145:y:2020:i:c:s0167947320300050. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.