IDEAS home Printed from https://ideas.repec.org/a/vrs/offsta/v38y2022i1p5-21n13.html
   My bibliography  Save this article

Estimating Weights for Web-Scraped Data in Consumer Price Indices

Author

Listed:
  • Ayoubkhani Daniel
  • Thomas Heledd

    (Office for National Statistics, Government Buildings, Cardiff Road, Newport, NP10 8XG, UK.)

Abstract

In recent years, there has been much interest among national statistical agencies in using web-scraped data in consumer price indices, potentially supplementing or replacing manually collected price quotes. Yet one challenge that has received very little attention to date is the estimation of expenditure weights in the absence of quantity information, which would enable the construction of weighted item-level price indices. In this article we propose the novel approach of predicting sales quantities from their ranks (for example, when products are sorted ‘by popularity’ on consumer websites) via appropriate statistical distributions. Using historical transactional data supplied by a UK retailer for two consumer items, we assessed the out-of-sample accuracy of the Pareto, log-normal and truncated log-normal distributions, finding that the last of these resulted in an index series that most closely approximated an expenditure-weighted benchmark. Our results demonstrate the value of supplementing web-scraped price quotes with a simple set of retailer-supplied summary statistics relating to quantities, allowing statistical agencies to realise the benefits of freely available internet data whilst placing minimal burden on retailers. However, further research would need to be undertaken before the approach could be implemented in the compilation of official price indices.

Suggested Citation

  • Ayoubkhani Daniel & Thomas Heledd, 2022. "Estimating Weights for Web-Scraped Data in Consumer Price Indices," Journal of Official Statistics, Sciendo, vol. 38(1), pages 5-21, March.
  • Handle: RePEc:vrs:offsta:v:38:y:2022:i:1:p:5-21:n:13
    DOI: 10.2478/jos-2022-0002
    as

    Download full text from publisher

    File URL: https://doi.org/10.2478/jos-2022-0002
    Download Restriction: no

    File URL: https://libkey.io/10.2478/jos-2022-0002?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Alberto Cavallo, 2017. "Are Online and Offline Prices Similar? Evidence from Large Multi-channel Retailers," American Economic Review, American Economic Association, vol. 107(1), pages 283-303, January.
    2. Alberto Cavallo & Roberto Rigobon, 2016. "The Billion Prices Project: Using Online Prices for Measurement and Research," Journal of Economic Perspectives, American Economic Association, vol. 30(2), pages 151-178, Spring.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Patrick Bajari & Zhihao Cen & Victor Chernozhukov & Manoj Manukonda & Jin Wang & Ramon Huerta & Junbo Li & Ling Leng & George Monokroussos & Suhas Vijaykunar & Shan Wan, 2023. "Hedonic prices and quality adjusted price indices powered by AI," CeMMAP working papers 08/23, Institute for Fiscal Studies.
    2. Brian Fabo & Sharon Sarah Belli, 2017. "(Un)beliveable wages? An analysis of minimum wage policies in Europe from a living wage perspective," IZA Journal of Labor Policy, Springer;Forschungsinstitut zur Zukunft der Arbeit GmbH (IZA), vol. 6(1), pages 1-11, December.
    3. David Staines, 2023. "Stochastic Equilibrium the Lucas Critique and Keynesian Economics," Papers 2312.16214, arXiv.org, revised May 2024.
    4. W. Erwin Diewert & Kevin J. Fox, 2020. "Measuring Real Consumption and CPI Bias under Lockdown Conditions," NBER Working Papers 27144, National Bureau of Economic Research, Inc.
    5. Santiago E. Alvarez & Sarah M. Lein, 2020. "Tracking inflation on a daily basis," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 156(1), pages 1-13, December.
    6. Resce, Giuliano & Maynard, Diana, 2018. "What matters most to people around the world? Retrieving Better Life Index priorities on Twitter," Technological Forecasting and Social Change, Elsevier, vol. 137(C), pages 61-75.
    7. Dennis Bonam & Gabriele Galati & Irma Hindrayanto & Marco Hoeberichts & Anna Samarina & Irina Stanga, 2019. "Inflation in the euro area since the Global Financial Crisis," DNB Occasional Studies 1703, Netherlands Central Bank, Research Department.
    8. Hillen, Judith & Fedoseeva, Svetlana, 2021. "E-commerce and the end of price rigidity?," Journal of Business Research, Elsevier, vol. 125(C), pages 63-73.
    9. Alexey S. Evseev & Rodion R. Latypov & Egor A. Postolit & Elena S. Sinelnikova-Muryleva, 2022. "Техданные О Ценах Онлайн-Ритейлеров Обладают Огромной Ценностью С Точки Зрения Экономической Науки: Их Использование Позволяет Уточнять Прогнозы Инфляции И Предвосхищать Будущие Тенденции В Моменте, К," Russian Economic Development (in Russian), Gaidar Institute for Economic Policy, issue 11, pages 36-45, November.
    10. Brave, Scott A. & Butters, R. Andrew & Fogarty, Michael, 2022. "The perils of working with big data, and a SMALL checklist you can use to recognize them," Business Horizons, Elsevier, vol. 65(4), pages 481-492.
    11. Stéphane Dupraz, 2024. "A Kinked‐Demand Theory of Price Rigidity," Journal of Money, Credit and Banking, Blackwell Publishing, vol. 56(2-3), pages 325-363, March.
    12. Aparicio, Diego & Bertolotto, Manuel I., 2020. "Forecasting inflation with online prices," International Journal of Forecasting, Elsevier, vol. 36(2), pages 232-247.
    13. Harchaoui, Tarek M. & Janssen, Robert V., 2018. "How can big data enhance the timeliness of official statistics?," International Journal of Forecasting, Elsevier, vol. 34(2), pages 225-234.
    14. Diewert W. Erwin & Fox Kevin J., 2022. "Measuring Inflation under Pandemic Conditions," Journal of Official Statistics, Sciendo, vol. 38(1), pages 255-285, March.
    15. Alexey S. Evseev & Rodion R. Latypov & Egor A. Postolit & Elena S. Sinelnikova-Muryleva, 2022. "Technical and Methodological Challenges of Collecting Price Data from Online Retailers [Технические И Методологические Проблемы Сбора Данных О Ценах Онлайн-Ритейлеров]," Russian Economic Development, Gaidar Institute for Economic Policy, issue 11, pages 36-45, November.
    16. Hillen, Judith, 2018. "Web Scraping For Food Price Research," 58th Annual Conference, Kiel, Germany, September 12-14, 2018 275840, German Association of Agricultural Economists (GEWISOLA).
    17. Macias, Paweł & Stelmasiak, Damian & Szafranek, Karol, 2023. "Nowcasting food inflation with a massive amount of online prices," International Journal of Forecasting, Elsevier, vol. 39(2), pages 809-826.
    18. Yim, Sung Taek & Son, Jong Chil & Lee, Jiwon, 2022. "Spread of E-commerce, prices and inflation dynamics: Evidence from online price big data in Korea," Journal of Asian Economics, Elsevier, vol. 80(C).
    19. Ilaria Benedetti & Tiziana Laureti & Luigi Palumbo & Brandon M. Rose, 2022. "Computation of High-Frequency Sub-National Spatial Consumer Price Indexes Using Web Scraping Techniques," Economies, MDPI, vol. 10(4), pages 1-20, April.
    20. Ademmer, Martin & Beckmann, Joscha & Bode, Eckhardt & Boysen-Hogrefe, Jens & Funke, Manuel & Hauber, Philipp & Heidland, Tobias & Hinz, Julian & Jannsen, Nils & Kooths, Stefan & Söder, Mareike & Stame, 2021. "Big Data in der makroökonomischen Analyse," Kieler Beiträge zur Wirtschaftspolitik 32, Kiel Institute for the World Economy (IfW Kiel).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:vrs:offsta:v:38:y:2022:i:1:p:5-21:n:13. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.sciendo.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.