IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i7p1588-d1107027.html
   My bibliography  Save this article

Automatic Product Classification Using Supervised Machine Learning Algorithms in Price Statistics

Author

Listed:
  • Bogdan Oancea

    (Department of Applied Economics and Quantitative Analysis, University of Bucharest, 030018 Bucharest, Romania)

Abstract

Modern approaches to computing consumer price indices include the use of various data sources, such as web-scraped data or scanner data, which are very large in volume and need special processing techniques. In this paper, we address one of the main problems in the consumer price index calculation, namely the product classification, which cannot be performed manually when using large data sources. Therefore, we conducted an experiment on automatic product classification according to an international classification scheme. We combined 9 different word-embedding techniques with 13 classification methods with the aim of identifying the best combination in terms of the quality of the resultant classification. Because the dataset used in this experiment was significantly imbalanced, we compared these methods not only using the accuracy, F1-score, and AUC, but also using a weighted F1-score that better reflected the overall classification quality. Our experiment showed that logistic regression, support vector machines, and random forests, combined with the FastText skip-gram embedding technique provided the best classification results, with superior values in performance metrics, as compared to other similar studies. An execution time analysis showed that, among the three mentioned methods, logistic regression was the fastest while the random forest recorded a longer execution time. We also provided per-class performance metrics and formulated an error analysis that enabled us to identify methods that could be excluded from the range of choices because they provided less reliable classifications for our purposes.

Suggested Citation

  • Bogdan Oancea, 2023. "Automatic Product Classification Using Supervised Machine Learning Algorithms in Price Statistics," Mathematics, MDPI, vol. 11(7), pages 1-32, March.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:7:p:1588-:d:1107027
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/7/1588/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/7/1588/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Wright, Marvin N. & Ziegler, Andreas, 2017. "ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 77(i01).
    2. Ivancic, Lorraine & Erwin Diewert, W. & Fox, Kevin J., 2011. "Scanner data, time aggregation and the construction of price indexes," Journal of Econometrics, Elsevier, vol. 161(1), pages 24-35, March.
    3. Kurt Hornik & Christian Buchta & Achim Zeileis, 2009. "Open-source machine learning: R meets Weka," Computational Statistics, Springer, vol. 24(2), pages 225-232, May.
    4. Alberto Cavallo & Roberto Rigobon, 2016. "The Billion Prices Project: Using Online Prices for Measurement and Research," Journal of Economic Perspectives, American Economic Association, vol. 30(2), pages 151-178, Spring.
    5. Yim, Sung Taek & Son, Jong Chil & Lee, Jiwon, 2022. "Spread of E-commerce, prices and inflation dynamics: Evidence from online price big data in Korea," Journal of Asian Economics, Elsevier, vol. 80(C).
    6. Harchaoui, Tarek M. & Janssen, Robert V., 2018. "How can big data enhance the timeliness of official statistics?," International Journal of Forecasting, Elsevier, vol. 34(2), pages 225-234.
    7. Christopher Haynes & Marco A. Palomino & Liz Stuart & David Viira & Frances Hannon & Gemma Crossingham & Kate Tantam, 2022. "Automatic Classification of National Health Service Feedback," Mathematics, MDPI, vol. 10(6), pages 1-23, March.
    8. de Haan, Jan & van der Grient, Heymerik A., 2011. "Eliminating chain drift in price indexes based on scanner data," Journal of Econometrics, Elsevier, vol. 161(1), pages 36-46, March.
    9. Nobuhiro Abe & Kimiaki Shinozaki, 2018. "Compilation of Experimental Price Indices Using Big Data and Machine Learning:A Comparative Analysis and Validity Verification of Quality Adjustments," Bank of Japan Working Paper Series 18-E-13, Bank of Japan.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhenkun Zhou & Zikun Song & Tao Ren, 2022. "Predicting China's CPI by Scanner Big Data," Papers 2211.16641, arXiv.org, revised Oct 2023.
    2. Erica L. Groshen & Brian C. Moyer & Ana M. Aizcorbe & Ralph Bradley & David M. Friedman, 2017. "How Government Statistics Adjust for Potential Biases from Quality Change and New Goods in an Age of Digital Technologies: A View from the Trenches," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 187-210, Spring.
    3. Diewert, Erwin & Shimizu, Chihiro, 2015. "Residential Property Price Indices For Tokyo," Macroeconomic Dynamics, Cambridge University Press, vol. 19(8), pages 1659-1714, December.
    4. Diewert, W. Erwin & Fox, Kevin J., 2017. "Substitution Bias in Multilateral Methods for CPI Construction using Scanner Data," Microeconomics.ca working papers erwin_diewert-2017-3, Vancouver School of Economics, revised 23 Mar 2017.
    5. Jacek Białek, 2023. "Improving quality of the scanner CPI: proposition of new multilateral methods," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(3), pages 2893-2921, June.
    6. Kota Watanabe & Tsutomu Watanabe, 2014. "Estimating Daily Inflation Using Scanner Data: A Progress Report," UTokyo Price Project Working Paper Series 020, University of Tokyo, Graduate School of Economics.
    7. Qingxiao Li & Metin Çakır, 2024. "Estimating SNAP purchasing power and its effect on participation," American Journal of Agricultural Economics, John Wiley & Sons, vol. 106(2), pages 779-804, March.
    8. Gabriel Ehrlich & John C. Haltiwanger & Ron S. Jarmin & David Johnson & Ed Olivares & Luke W. Pardue & Matthew D. Shapiro & Laura Zhao, 2023. "Quality Adjustment at Scale: Hedonic vs. Exact Demand-Based Price Indices," NBER Working Papers 31309, National Bureau of Economic Research, Inc.
    9. Fox, Kevin J. & Levell, Peter & O'Connell, Martin, 2023. "Inflation measurement with high frequency data," CEPR Discussion Papers 18539, C.E.P.R. Discussion Papers.
    10. Philip ME Garboden, 2019. "Sources and Types of Big Data for Macroeconomic Forecasting," Working Papers 2019-3, University of Hawaii Economic Research Organization, University of Hawaii at Manoa.
    11. Kozo Ueda & Kota Watanabe & Tsutomu Watanabe, 2020. "Consumer Inventory and the Cost of Living Index: Theory and Some Evidence from Japan," Working Papers on Central Bank Communication 025, University of Tokyo, Graduate School of Economics.
    12. Beck, Günter W. & Carstensen, Kai & Menz, Jan-Oliver & Schnorrenberger, Richard & Wieland, Elisabeth, 2023. "Nowcasting consumer price inflation using high-frequency scanner data: Evidence from Germany," Discussion Papers 34/2023, Deutsche Bundesbank.
    13. Kevin J, Fox. & Iqbal A. Syed, 2016. "Price Discounts and the Measurement of Inflation: Further Results," Discussion Papers 2016-05, School of Economics, The University of New South Wales.
    14. Mr. Daniel Leigh & Weicheng Lian & Mr. Marcos Poplawski Ribeiro & Rachel Szymanski & Viktor Tsyrennikov & Hong Yang, 2017. "Exchange Rates and Trade: A Disconnect?," IMF Working Papers 2017/058, International Monetary Fund.
    15. Kota Watanabe & Tsutomu Watanabe, 2014. "We construct a Törnqvist daily price index using Japanese point of sale (POS) scannerdata spanning from 1988 to 2013. We find the following. First, the POS based inflation rate tends to be about 0.5 ," CARF F-Series CARF-F-342, Center for Advanced Research in Finance, Faculty of Economics, The University of Tokyo.
    16. Fox, Kevin J. & Syed, Iqbal A., 2016. "Price discounts and the measurement of inflation," Journal of Econometrics, Elsevier, vol. 191(2), pages 398-406.
    17. Diewert W. Erwin & Fox Kevin J., 2022. "Measuring Inflation under Pandemic Conditions," Journal of Official Statistics, Sciendo, vol. 38(1), pages 255-285, March.
    18. Huang Ning & Wimalaratne Waruna & Pollard Brent, 2017. "The Effects of the Frequency and Implementation Lag of Basket Updates on the Canadian CPI," Journal of Official Statistics, Sciendo, vol. 33(4), pages 979-1004, December.
    19. Kevin J. Fox & Peter Levell & Martin O'Connell, 2022. "Multilateral index number methods for Consumer Price Statistics," Economic Statistics Centre of Excellence (ESCoE) Discussion Papers ESCoE DP-2022-08, Economic Statistics Centre of Excellence (ESCoE).
    20. Sweitzer, Megan & Byrne, Anne T. & Page, Elina T. & Carlson, Andrea & Kantor, Linda & Muth, Mary K. & Karns, Shawn & Zhen, Chen, 2024. "Development of the Food-at-Home Monthly Area Prices Data," Technical Bulletins 342467, United States Department of Agriculture, Economic Research Service.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:7:p:1588-:d:1107027. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.