IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1007258.html
   My bibliography  Save this article

Reappraising the utility of Google Flu Trends

Author

Listed:
  • Sasikiran Kandula
  • Jeffrey Shaman

Abstract

Estimation of influenza-like illness (ILI) using search trends activity was intended to supplement traditional surveillance systems, and was a motivation behind the development of Google Flu Trends (GFT). However, several studies have previously reported large errors in GFT estimates of ILI in the US. Following recent release of time-stamped surveillance data, which better reflects real-time operational scenarios, we reanalyzed GFT errors. Using three data sources—GFT: an archive of weekly ILI estimates from Google Flu Trends; ILIf: fully-observed ILI rates from ILINet; and, ILIp: ILI rates available in real-time based on partial reporting—five influenza seasons were analyzed and mean square errors (MSE) of GFT and ILIp as estimates of ILIf were computed. To correct GFT errors, a random forest regression model was built with ILI and GFT rates from the previous three weeks as predictors. An overall reduction in error of 44% was observed and the errors of the corrected GFT are lower than those of ILIp. An 80% reduction in error during 2012/13, when GFT had large errors, shows that extreme failures of GFT could have been avoided. Using autoregressive integrated moving average (ARIMA) models, one- to four-week ahead forecasts were generated with two separate data streams: ILIp alone, and with both ILIp and corrected GFT. At all forecast targets and seasons, and for all but two regions, inclusion of GFT lowered MSE. Results from two alternative error measures, mean absolute error and mean absolute proportional error, were largely consistent with results from MSE. Taken together these findings provide an error profile of GFT in the US, establish strong evidence for the adoption of search trends based 'nowcasts' in influenza forecast systems, and encourage reevaluation of the utility of this data source in diverse domains.Author summary: Google Flu Trends (GFT) was proposed as a method to estimate influenza-like illness (ILI) in the general population and to be used in conjunction with traditional surveillance systems. Several previous studies have documented that GFT estimates were often overestimates of ILI. In this study, using a recently released archive of data of provisional incidence from a large surveillance system in the US (ILINet), we report errors in GFT alongside errors from ILINet’s initial estimates of ILI. This comparison using information available in real-time allows for a more nuanced assessment of GFT errors. Additionally, we describe a method to correct errors in GFT and show that the corrected GFT estimates are at least as accurate as initial estimates from ILINet. Finally, we show that inclusion of corrected GFT while forecasting ILI in the next four weeks considerably improves forecast accuracy. Taken together, our results indicate that the GFT model could have added value to traditional surveillance and forecasting systems, and a reevaluation of the utility of the underlying search trends data, which is now more openly accessible, in fields beyond influenza is warranted.

Suggested Citation

  • Sasikiran Kandula & Jeffrey Shaman, 2019. "Reappraising the utility of Google Flu Trends," PLOS Computational Biology, Public Library of Science, vol. 15(8), pages 1-16, August.
  • Handle: RePEc:plo:pcbi00:1007258
    DOI: 10.1371/journal.pcbi.1007258
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007258
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1007258&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1007258?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Kwiatkowski, Denis & Phillips, Peter C. B. & Schmidt, Peter & Shin, Yongcheol, 1992. "Testing the null hypothesis of stationarity against the alternative of a unit root : How sure are we that economic time series have a unit root?," Journal of Econometrics, Elsevier, vol. 54(1-3), pages 159-178.
    2. Logan C Brooks & David C Farrow & Sangwon Hyun & Ryan J Tibshirani & Roni Rosenfeld, 2018. "Nonmechanistic forecasts of seasonal influenza with iterative one-week-ahead distributions," PLOS Computational Biology, Public Library of Science, vol. 14(6), pages 1-29, June.
    3. Hyndman, Rob J. & Khandakar, Yeasmin, 2008. "Automatic Time Series Forecasting: The forecast Package for R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 27(i03).
    4. Canova, Fabio & Hansen, Bruce E, 1995. "Are Seasonal Patterns Constant over Time? A Test for Seasonal Stability," Journal of Business & Economic Statistics, American Statistical Association, vol. 13(3), pages 237-252, July.
    5. Smolinski, M.S. & Crawley, A.W. & Baltrusaitis, K. & Chunara, R. & Olsen, J.M. & Wójcik, O. & Santillana, M. & Nguyen, A. & Brownstein, J.S., 2015. "Flu near you: Crowdsourced symptom reporting spanning 2 influenza seasons," American Journal of Public Health, American Public Health Association, vol. 105(10), pages 2124-2130.
    6. Jeremy Ginsberg & Matthew H. Mohebbi & Rajan S. Patel & Lynnette Brammer & Mark S. Smolinski & Larry Brilliant, 2009. "Detecting influenza epidemics using search engine query data," Nature, Nature, vol. 457(7232), pages 1012-1014, February.
    7. Dave Osthus & Ashlynn R Daughton & Reid Priedhorsky, 2019. "Even a good influenza forecasting model can benefit from internet-based nowcasts, but those benefits are limited," PLOS Computational Biology, Public Library of Science, vol. 15(2), pages 1-19, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jun, Seung-Pyo & Yoo, Hyoung Sun & Lee, Jae-Seong, 2021. "The impact of the pandemic declaration on public awareness and behavior: Focusing on COVID-19 google searches," Technological Forecasting and Social Change, Elsevier, vol. 166(C).
    2. Marlene Amstad & Giulio Cornelli & Leonardo Gambacorta & Dora Xia, 2020. "Investors' risk attitudes in the pandemic and the stock market: new evidence based on internet searches," BIS Bulletins 25, Bank for International Settlements.
    3. Mostafa Abbas & Thomas B. Morland & Eric S. Hall & Yasser EL-Manzalawy, 2021. "Associations between Google Search Trends for Symptoms and COVID-19 Confirmed and Death Cases in the United States," IJERPH, MDPI, vol. 18(9), pages 1-24, April.
    4. Katsikopoulos, Konstantinos V. & Şimşek, Özgür & Buckmann, Marcus & Gigerenzer, Gerd, 2022. "Transparent modeling of influenza incidence: Big data or a single data point from psychological theory?," International Journal of Forecasting, Elsevier, vol. 38(2), pages 613-619.
    5. Lisa Singh & Carole Roan Gresenz, 2022. "Social Media Data for Firearms Research: Promise and Perils," The ANNALS of the American Academy of Political and Social Science, , vol. 704(1), pages 267-291, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hyndman, Rob J. & Khandakar, Yeasmin, 2008. "Automatic Time Series Forecasting: The forecast Package for R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 27(i03).
    2. Long Wen & Chang Liu & Haiyan Song, 2019. "Forecasting tourism demand using search query data: A hybrid modelling approach," Tourism Economics, , vol. 25(3), pages 309-329, May.
    3. Rice, William L. & Park, So Young & Pan, Bing & Newman, Peter, 2019. "Forecasting campground demand in US national parks," Annals of Tourism Research, Elsevier, vol. 75(C), pages 424-438.
    4. Fröhlich Markus, 2018. "Nowcasting Austrian Short Term Statistics," Journal of Official Statistics, Sciendo, vol. 34(2), pages 503-522, June.
    5. Joana M. Barros & Ruth Melia & Kady Francis & John Bogue & Mary O’Sullivan & Karen Young & Rebecca A. Bernert & Dietrich Rebholz-Schuhmann & Jim Duggan, 2019. "The Validity of Google Trends Search Volumes for Behavioral Forecasting of National Suicide Rates in Ireland," IJERPH, MDPI, vol. 16(17), pages 1-18, September.
    6. Pulapre Balakrishnan & M Parameswaran, 2019. "Modeling the Dynamics of Inflation in India," Working Papers 16, Ashoka University, Department of Economics.
    7. Kuchler, Theresa & Russel, Dominic & Stroebel, Johannes, 2022. "JUE Insight: The geographic spread of COVID-19 correlates with the structure of social networks as measured by Facebook," Journal of Urban Economics, Elsevier, vol. 127(C).
    8. Meira, Erick & Cyrino Oliveira, Fernando Luiz & de Menezes, Lilian M., 2022. "Forecasting natural gas consumption using Bagging and modified regularization techniques," Energy Economics, Elsevier, vol. 106(C).
    9. Mr. Francis Y Kumah, 2006. "The Role of Seasonality and Monetary Policy in Inflation Forecasting," IMF Working Papers 2006/175, International Monetary Fund.
    10. repec:ebl:ecbull:v:3:y:2006:i:13:p:1-9 is not listed on IDEAS
    11. Shafiqah Azman & Dharini Pathmanathan & Aerambamoorthy Thavaneswaran, 2022. "Forecasting the Volatility of Cryptocurrencies in the Presence of COVID-19 with the State Space Model and Kalman Filter," Mathematics, MDPI, vol. 10(17), pages 1-15, September.
    12. Shipra Banik & Param Silvapulle, 1999. "Testing for Seasonal Stability in Unemployment Series: International Evidence," Empirica, Springer;Austrian Institute for Economic Research;Austrian Economic Association, vol. 26(2), pages 123-139, June.
    13. Athanasopoulos, George & Hyndman, Rob J. & Song, Haiyan & Wu, Doris C., 2011. "The tourism forecasting competition," International Journal of Forecasting, Elsevier, vol. 27(3), pages 822-844.
    14. Junyi Lu & Sebastian Meyer, 2020. "Forecasting Flu Activity in the United States: Benchmarking an Endemic-Epidemic Beta Model," IJERPH, MDPI, vol. 17(4), pages 1-13, February.
    15. L. A. Gil-Alana & P. M. Robinson, 2001. "Testing of seasonal fractional integration in UK and Japanese consumption and income," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 16(2), pages 95-114.
    16. Han Lin Shang, 2017. "Reconciling Forecasts of Infant Mortality Rates at National and Sub-National Levels: Grouped Time-Series Methods," Population Research and Policy Review, Springer;Southern Demographic Association (SDA), vol. 36(1), pages 55-84, February.
    17. repec:iab:iabjlr:v:53:i:1:p:art.3 is not listed on IDEAS
    18. Pulapre Balakrishnan & M. Parameswaran, 2019. "Modeling the Dynamics of Inflation in India," Working Papers 1023, Ashoka University, Department of Economics.
    19. Jaroslav Pavlicek & Ladislav Kristoufek, 2015. "Nowcasting Unemployment Rates with Google Searches: Evidence from the Visegrad Group Countries," PLOS ONE, Public Library of Science, vol. 10(5), pages 1-11, May.
    20. Leonardo Di Gangi & M. Lapucci & F. Schoen & A. Sortino, 2019. "An efficient optimization approach for best subset selection in linear regression, with application to model selection and fitting in autoregressive time-series," Computational Optimization and Applications, Springer, vol. 74(3), pages 919-948, December.
    21. Fabio Busetti & Silvestro di Sanzo, 2011. "Bootstrap LR tests of stationarity, common trends and cointegration," Temi di discussione (Economic working papers) 799, Bank of Italy, Economic Research and International Relations Area.
    22. Svend Hylleberg, 2006. "Seasonal Adjustment," Economics Working Papers 2006-04, Department of Economics and Business Economics, Aarhus University.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1007258. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.