IDEAS home Printed from https://ideas.repec.org/a/gam/jforec/v5y2023i1p15-296d1082814.html
   My bibliography  Save this article

Assessing Spurious Correlations in Big Search Data

Author

Listed:
  • Jesse T. Richman

    (Department of Political Science and Geography, Old Dominion University, BAL 7000, Norfolk, VA 23529, USA)

  • Ryan J. Roberts

    (Department of Public Service, Gardner-Webb University, Boiling Springs, NC 28017, USA)

Abstract

Big search data offers the opportunity to identify new and potentially real-time measures and predictors of important political, geographic, social, cultural, economic, and epidemiological phenomena, measures that might serve an important role as leading indicators in forecasts and nowcasts. However, it also presents vast new risks that scientists or the public will identify meaningless and totally spurious ‘relationships’ between variables. This study is the first to quantify that risk in the context of search data. We find that spurious correlations arise at exceptionally high frequencies among probability distributions examined for random variables based upon gamma (1, 1) and Gaussian random walk distributions. Quantifying these spurious correlations and their likely magnitude for various distributions has value for several reasons. First, analysts can make progress toward accurate inference. Second, they can avoid unwarranted credulity. Third, they can demand appropriate disclosure from the study authors.

Suggested Citation

  • Jesse T. Richman & Ryan J. Roberts, 2023. "Assessing Spurious Correlations in Big Search Data," Forecasting, MDPI, vol. 5(1), pages 1-12, February.
  • Handle: RePEc:gam:jforec:v:5:y:2023:i:1:p:15-296:d:1082814
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2571-9394/5/1/15/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2571-9394/5/1/15/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Adrian Letchford & Tobias Preis & Helen Susannah Moat, 2016. "Quantifying the Search Behaviour of Different Demographics Using Google Correlate," PLOS ONE, Public Library of Science, vol. 11(2), pages 1-11, February.
    2. Hyunyoung Choi & Hal Varian, 2012. "Predicting the Present with Google Trends," The Economic Record, The Economic Society of Australia, vol. 88(s1), pages 2-9, June.
    3. Jeremy Ginsberg & Matthew H. Mohebbi & Rajan S. Patel & Lynnette Brammer & Mark S. Smolinski & Larry Brilliant, 2009. "Detecting influenza epidemics using search engine query data," Nature, Nature, vol. 457(7232), pages 1012-1014, February.
    4. Ahmed Shoukry Rashad, 2022. "The Power of Travel Search Data in Forecasting the Tourism Demand in Dubai," Forecasting, MDPI, vol. 4(3), pages 1-11, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bentzen, Jeanet Sinding, 2021. "In crisis, we pray: Religiosity and the COVID-19 pandemic," Journal of Economic Behavior & Organization, Elsevier, vol. 192(C), pages 541-583.
    2. Grechyna, Daryna, 2025. "Raising awareness of climate change: Nature, activists, politicians?," Ecological Economics, Elsevier, vol. 227(C).
    3. Daniele Barchiesi & Helen Susannah Moat & Christian Alis & Steven Bishop & Tobias Preis, 2015. "Quantifying International Travel Flows Using Flickr," PLOS ONE, Public Library of Science, vol. 10(7), pages 1-8, July.
    4. Breithaupt, Patrick & Kesler, Reinhold & Niebel, Thomas & Rammer, Christian, 2020. "Intangible capital indicators based on web scraping of social media," ZEW Discussion Papers 20-046, ZEW - Leibniz Centre for European Economic Research.
    5. Götz, Thomas B. & Knetsch, Thomas A., 2019. "Google data in bridge equation models for German GDP," International Journal of Forecasting, Elsevier, vol. 35(1), pages 45-66.
    6. Kristina Gligorić & Arnaud Chiolero & Emre Kıcıman & Ryen W. White & Robert West, 2022. "Population-scale dietary interests during the COVID-19 pandemic," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    7. Abay,Kibrom A. & Hirfrfot,Kibrom Tafere & Woldemichael,Andinet, 2020. "Winners and Losers from COVID-19 : Global Evidence from Google Search," Policy Research Working Paper Series 9268, The World Bank.
    8. Stephen L. France & Yuying Shi, 2017. "Aggregating Google Trends: Multivariate Testing and Analysis," Papers 1712.03152, arXiv.org, revised Mar 2018.
    9. Oestmann Marco & Bennöhr Lars, 2015. "Determinants of house price dynamics. What can we learn from search engine data?," Review of Economics, De Gruyter, vol. 66(1), pages 99-127, April.
    10. Georg von Graevenitz & Christian Helmers & Valentine Millot & Oliver Turnbull, 2016. "Does Online Search Predict Sales? Evidence from Big Data for Car Markets in Germany and the UK," Working Papers 71, Queen Mary, University of London, School of Business and Management, Centre for Globalisation Research.
    11. Hulya Bakirtas & Vildan Gulpinar Demirci, 2022. "Can Google Trends data provide information on consumer’s perception regarding hotel brands?," Information Technology & Tourism, Springer, vol. 24(1), pages 57-83, March.
    12. Fantazzini, Dean & Toktamysova, Zhamal, 2015. "Forecasting German car sales using Google data and multivariate models," International Journal of Production Economics, Elsevier, vol. 170(PA), pages 97-135.
    13. Zhongchen Song & Tom Coupé, 2023. "Predicting Chinese consumption series with Baidu," Journal of Chinese Economic and Business Studies, Taylor & Francis Journals, vol. 21(3), pages 429-463, July.
    14. Philip ME Garboden, 2019. "Sources and Types of Big Data for Macroeconomic Forecasting," Working Papers 2019-3, University of Hawaii Economic Research Organization, University of Hawaii at Manoa.
    15. Junzhao Ma & Dewi Tojib & Yelena Tsarenko, 2022. "Sex Robots: Are We Ready for Them? An Exploration of the Psychological Mechanisms Underlying People’s Receptiveness of Sex Robots," Journal of Business Ethics, Springer, vol. 178(4), pages 1091-1107, July.
    16. Qadan, Mahmoud & Nama, Hazar, 2018. "Investor sentiment and the price of oil," Energy Economics, Elsevier, vol. 69(C), pages 42-58.
    17. Jaroslav Pavlicek & Ladislav Kristoufek, 2015. "Nowcasting Unemployment Rates with Google Searches: Evidence from the Visegrad Group Countries," PLOS ONE, Public Library of Science, vol. 10(5), pages 1-11, May.
    18. Fabio Milani, 2021. "COVID-19 outbreak, social response, and early economic effects: a global VAR analysis of cross-country interdependencies," Journal of Population Economics, Springer;European Society for Population Economics, vol. 34(1), pages 223-252, January.
    19. Park, Sungjun & Kim, Jinsoo, 2018. "The effect of interest in renewable energy on US household electricity consumption: An analysis using Google Trends data," Renewable Energy, Elsevier, vol. 127(C), pages 1004-1010.
    20. Rivera, Roberto, 2016. "A dynamic linear model to forecast hotel registrations in Puerto Rico using Google Trends data," Tourism Management, Elsevier, vol. 57(C), pages 12-20.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jforec:v:5:y:2023:i:1:p:15-296:d:1082814. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.