IDEAS home Printed from https://ideas.repec.org/a/vrs/offsta/v37y2021i2p395-410n11.html
   My bibliography  Save this article

Applying Machine Learning for Automatic Product Categorization

Author

Listed:
  • Roberson Andrea

    (U.S. Census Bureau, 4600 Silver Hill Road, Washington, D.C., 20233, U.S.A.)

Abstract

Every five years, the U.S. Census Bureau conducts the Economic Census, the official count of US businesses and the most extensive collection of data related to business activity. Businesses, policymakers, governments and communities use Economic Census data for economic development, business decisions, and strategic planning. The Economic Census provides key inputs for economic measures such as the Gross Domestic Product and the Producer Price Index. The Economic Census requires businesses to fill out a lengthy questionnaire, including an extended section about the goods and services provided by the business.To address the challenges of high respondent burden and low survey response rates, we devised a strategy to automatically classify goods and services based on product information provided by the business. We asked several businesses to provide a spreadsheet containing Universal Product Codes and associated text descriptions for the products they sell. We then used natural language processing to classify the products according to the North American Product Classification System. This novel strategy classified text with very high accuracy rates - our best algorithms surpassed over 90%.

Suggested Citation

  • Roberson Andrea, 2021. "Applying Machine Learning for Automatic Product Categorization," Journal of Official Statistics, Sciendo, vol. 37(2), pages 395-410, June.
  • Handle: RePEc:vrs:offsta:v:37:y:2021:i:2:p:395-410:n:11
    DOI: 10.2478/jos-2021-0017
    as

    Download full text from publisher

    File URL: https://doi.org/10.2478/jos-2021-0017
    Download Restriction: no

    File URL: https://libkey.io/10.2478/jos-2021-0017?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Vanya Van Belle & Ben Van Calster & Sabine Van Huffel & Johan A K Suykens & Paulo Lisboa, 2016. "Explaining Support Vector Machines: A Color Based Nomogram," PLOS ONE, Public Library of Science, vol. 11(10), pages 1-33, October.
    2. Nick Guenther & Matthias Schonlau, 2016. "Support vector machines," Stata Journal, StataCorp LP, vol. 16(4), pages 917-937, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chris Reimann, 2024. "Predicting financial crises: an evaluation of machine learning algorithms and model explainability for early warning systems," Review of Evolutionary Political Economy, Springer, vol. 5(1), pages 51-83, June.
    2. Dario Sansone & Anna Zhu, 2023. "Using Machine Learning to Create an Early Warning System for Welfare Recipients," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 85(5), pages 959-992, October.
    3. Arthur C. Santos & Wesley A. Souza & Gustavo V. Barbara & Marcelo F. Castoldi & Alessandro Goedtel, 2023. "Diagnostics of Early Faults in Wind Generator Bearings Using Hjorth Parameters," Sustainability, MDPI, vol. 15(20), pages 1-17, October.
    4. Yi Yang & Yuting Bai & Xiaoyi Wang & Li Wang & Xuebo Jin & Qian Sun, 2020. "Group Decision-Making Support for Sustainable Governance of Algal Bloom in Urban Lakes," Sustainability, MDPI, vol. 12(4), pages 1-16, February.
    5. Achim Ahrens & Christian B. Hansen & Mark E. Schaffer, 2023. "pystacked: Stacking generalization and machine learning in Stata," Stata Journal, StataCorp LP, vol. 23(4), pages 909-931, December.
    6. Yu, Baojun & Li, Changming & Mirza, Nawazish & Umar, Muhammad, 2022. "Forecasting credit ratings of decarbonized firms: Comparative assessment of machine learning models," Technological Forecasting and Social Change, Elsevier, vol. 174(C).
    7. Salman Khalid & Hyunho Hwang & Heung Soo Kim, 2021. "Real-World Data-Driven Machine-Learning-Based Optimal Sensor Selection Approach for Equipment Fault Detection in a Thermal Power Plant," Mathematics, MDPI, vol. 9(21), pages 1-27, November.
    8. Li, Jing-Ping & Mirza, Nawazish & Rahat, Birjees & Xiong, Deping, 2020. "Machine learning and credit ratings prediction in the age of fourth industrial revolution," Technological Forecasting and Social Change, Elsevier, vol. 161(C).
    9. Gründler, Klaus & Krieger, Tommy, 2021. "Using Machine Learning for measuring democracy: A practitioners guide and a new updated dataset for 186 countries from 1919 to 2019," European Journal of Political Economy, Elsevier, vol. 70(C).
    10. Na Tang & Maoxiang Yuan & Zhijun Chen & Jian Ma & Rui Sun & Yide Yang & Quanyuan He & Xiaowei Guo & Shixiong Hu & Junhua Zhou, 2023. "Machine Learning Prediction Model of Tuberculosis Incidence Based on Meteorological Factors and Air Pollutants," IJERPH, MDPI, vol. 20(5), pages 1-17, February.
    11. Hazlee Azil Illias & Wee Zhao Liang, 2018. "Identification of transformer fault based on dissolved gas analysis using hybrid support vector machine-modified evolutionary particle swarm optimisation," PLOS ONE, Public Library of Science, vol. 13(1), pages 1-15, January.
    12. Alkhaleel, Basem A., 2024. "Machine learning applications in the resilience of interdependent critical infrastructure systems—A systematic literature review," International Journal of Critical Infrastructure Protection, Elsevier, vol. 44(C).
    13. McKenzie, David & Sansone, Dario, 2017. "Man vs. Machine in Predicting Successful Entrepreneurs: Evidence from a Business Plan Competition in Nigeria," CEPR Discussion Papers 12523, C.E.P.R. Discussion Papers.
    14. Yousefzadeh Barri, Elnaz & Farber, Steven & Jahanshahi, Hadi & Beyazit, Eda, 2022. "Understanding transit ridership in an equity context through a comparison of statistical and machine learning algorithms," Journal of Transport Geography, Elsevier, vol. 105(C).
    15. Wenninger, Simon & Kaymakci, Can & Wiethe, Christian, 2022. "Explainable long-term building energy consumption prediction using QLattice," Applied Energy, Elsevier, vol. 308(C).
    16. McKenzie, David & Sansone, Dario, 2019. "Predicting entrepreneurial success is hard: Evidence from a business plan competition in Nigeria," Journal of Development Economics, Elsevier, vol. 141(C).
    17. Hakan Gunduz, 2021. "An efficient stock market prediction model using hybrid feature reduction method based on variational autoencoders and recursive feature elimination," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 7(1), pages 1-24, December.
    18. Ma, Dingyuan & Li, Xiaodong & Lin, Borong & Zhu, Yimin, 2023. "An intelligent retrofit decision-making model for building program planning considering tacit knowledge and multiple objectives," Energy, Elsevier, vol. 263(PB).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:vrs:offsta:v:37:y:2021:i:2:p:395-410:n:11. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.sciendo.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.