IDEAS home Printed from https://ideas.repec.org/a/eme/ijmpps/v36y2015i1p13-25.html
   My bibliography  Save this article

Demographic research with non-representative internet data

Author

Listed:
  • Emilio Zagheni
  • Ingmar Weber

Abstract

Purpose - – Internet data hold many promises for demographic research, but come with severe drawbacks due to several types of bias. The purpose of this paper is to review the literature that uses internet data for demographic studies and presents a general framework for addressing the problem of selection bias in non-representative samples. Design/methodology/approach - – The authors propose two main approaches to reduce bias. When ground truth data are available, the authors suggest a method that relies on calibration of the online data against reliable official statistics. When no ground truth data are available, the authors propose a difference in differences approach to evaluate relative trends. Findings - – The authors offer a generalization of existing techniques. Although there is not a definite answer to the question of whether statistical inference can be made from non-representative samples, the authors show that, when certain assumptions are met, the authors can extract signal from noisy and biased data. Research limitations/implications - – The methods are sensitive to a number of assumptions. These include some regularities in the way the bias changes across different locations, different demographic groups and between time steps. The assumptions that we discuss might not always hold. In particular, the scenario where bias varies in an unpredictable manner and, at the same time, there is no “ground truth” available to continuously calibrate the model, remains challenging and beyond the scope of this paper. Originality/value - – The paper combines a critical review of existing substantive and methodological literature with a generalization of prior techniques. It intends to provide a fresh perspective on the issue and to stimulate the methodological discussion among social scientists.

Suggested Citation

  • Emilio Zagheni & Ingmar Weber, 2015. "Demographic research with non-representative internet data," International Journal of Manpower, Emerald Group Publishing Limited, vol. 36(1), pages 13-25, April.
  • Handle: RePEc:eme:ijmpps:v:36:y:2015:i:1:p:13-25
    DOI: 10.1108/IJM-12-2014-0261
    as

    Download full text from publisher

    File URL: https://www.emerald.com/insight/content/doi/10.1108/IJM-12-2014-0261/full/html?utm_source=repec&utm_medium=feed&utm_campaign=repec
    Download Restriction: Access to full text is restricted to subscribers

    File URL: https://www.emerald.com/insight/content/doi/10.1108/IJM-12-2014-0261/full/pdf?utm_source=repec&utm_medium=feed&utm_campaign=repec
    Download Restriction: Access to full text is restricted to subscribers

    File URL: https://libkey.io/10.1108/IJM-12-2014-0261?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Sun, Xiangdong & Yuan, Ouyang & Xu, Zhao & Yin, Yanhui & Liu, Qian & Wu, Ling, 2021. "Did Zipf's Law hold for Chinese cities and why? Evidence from multi-source data," Land Use Policy, Elsevier, vol. 106(C).
    2. Spyridon Spyratos & Michele Vespe & Fabrizio Natale & Ingmar Weber & Emilio Zagheni & Marzia Rango, 2019. "Quantifying international human mobility patterns using Facebook Network data," PLOS ONE, Public Library of Science, vol. 14(10), pages 1-22, October.
    3. Grow, André & Perrotta, Daniela & Del Fava, Emanuele & Cimentada, Jorge & Rampazzo, Francesco & Gil-Clavel, Sofia & Zagheni, Emilio, 2020. "Addressing Public Health Emergencies via Facebook Surveys: Advantages, Challenges, and Practical Considerations," SocArXiv ez9pb, Center for Open Science.
    4. Stefano Breschi & Francesco Lissoni & Ernest Miguelez, 2018. "Return Migrants' Self-Selection: Evidence for Indian Inventors," NBER Chapters, in: The Roles of Immigrants and Foreign Students in US Science, Innovation, and Entrepreneurship, pages 17-48, National Bureau of Economic Research, Inc.
    5. Letizia Mencarini & Delia Irazú Hernández-Farías & Mirko Lai & Viviana Patti & Emilio Sulis & Daniele Vignoli, 2018. "Italian happy parents In Twitter," Working Papers 117, "Carlo F. Dondena" Centre for Research on Social Dynamics (DONDENA), Università Commerciale Luigi Bocconi.
    6. Lawrence M Berger & Giulia Ferrari & Marion Leturcq & Lidia Panico & Anne Solaz, 2021. "COVID-19 lockdowns and demographically-relevant Google Trends: A cross-national analysis," PLOS ONE, Public Library of Science, vol. 16(3), pages 1-28, March.
    7. Klein, Jordan D. & Weber, Ingmar & Zagheni, Emilio, 2022. "Stop, in the name of COVID!," SocArXiv s3ztq, Center for Open Science.
    8. Jan Pablo Burgard & Joscha Krause & Ralf Münnich, 2020. "A Study of Discontinuity Effects in Regression Inference based on Web-Augmented Mixed Mode Surveys," Research Papers in Economics 2020-03, University of Trier, Department of Economics.
    9. Dilek Yildiz & Jo Munson & Agnese Vitali & Ramine Tinati & Jennifer A. Holland, 2017. "Using Twitter data for demographic research," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 37(46), pages 1477-1514.
    10. Mingxiao Li & Song Gao & Feng Lu & Huan Tong & Hengcai Zhang, 2019. "Dynamic Estimation of Individual Exposure Levels to Air Pollution Using Trajectories Reconstructed from Mobile Phone Data," IJERPH, MDPI, vol. 16(22), pages 1-20, November.
    11. Simionescu, Mihaela & Zimmermann, Klaus F., 2017. "Big Data and Unemployment Analysis," GLO Discussion Paper Series 81, Global Labor Organization (GLO).
    12. Michele Tizzoni & Elaine O. Nsoesie & Laetitia Gauvin & Márton Karsai & Nicola Perra & Shweta Bansal, 2022. "Addressing the socioeconomic divide in computational modeling for infectious diseases," Nature Communications, Nature, vol. 13(1), pages 1-7, December.
    13. Letizia Mencarini & Delia Irazú Hernández Farías & Mirko Lai & Viviana Patti & Emilio Sulis & Daniele Vignoli, 2019. "Happy parents’ tweets: An exploration of Italian Twitter data using sentiment analysis," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 40(25), pages 693-724.
    14. Barbara Brollo & Filippo Celata, 2023. "Temporary populations and sociospatial polarisation in the short-term city," Urban Studies, Urban Studies Journal Limited, vol. 60(10), pages 1815-1832, August.
    15. Barslund, Mikkel & Busse, Matthias, 2016. "How mobile is tech talent? A case study of IT professionals based on data from LinkedIn," CEPS Papers 11692, Centre for European Policy Studies.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eme:ijmpps:v:36:y:2015:i:1:p:13-25. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Emerald Support (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.