IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v18y2021i23p12635-d691848.html
   My bibliography  Save this article

Predicting Food Safety Compliance for Informed Food Outlet Inspections: A Machine Learning Approach

Author

Listed:
  • Rachel A. Oldroyd

    (Leeds Institute for Data Analytics, University of Leeds, Leeds LS2 9JT, UK
    School of Geography, University of Leeds, Leeds LS2 9JT, UK)

  • Michelle A. Morris

    (Leeds Institute for Data Analytics, University of Leeds, Leeds LS2 9JT, UK
    School of Medicine, University of Leeds, Leeds LS2 9JT, UK
    Alan Turing Institute, London NW1 2DB, UK)

  • Mark Birkin

    (Leeds Institute for Data Analytics, University of Leeds, Leeds LS2 9JT, UK
    School of Geography, University of Leeds, Leeds LS2 9JT, UK
    Alan Turing Institute, London NW1 2DB, UK)

Abstract

Consumer food environments have transformed dramatically in the last decade. Food outlet prevalence has increased, and people are eating food outside the home more than ever before. Despite these developments, national spending on food control has reduced. The National Audit Office report that only 14% of local authorities are up to date with food business inspections, exposing consumers to unknown levels of risk. Given the scarcity of local authority resources, this paper presents a data-driven approach to predict compliance for newly opened businesses and those awaiting repeat inspections. This work capitalizes on the theory that food outlet compliance is a function of its geographic context, namely the characteristics of the neighborhood within which it sits. We explore the utility of three machine learning approaches to predict non-compliant food outlets in England and Wales using openly accessible socio-demographic, business type, and urbanness features at the output area level. We find that the synthetic minority oversampling technique alongside a random forest algorithm with a 1:1 sampling strategy provides the best predictive power. Our final model retrieves and identifies 84% of total non-compliant outlets in a test set of 92,595 (sensitivity = 0.843, specificity = 0.745, precision = 0.274). The originality of this work lies in its unique and methodological approach which combines the use of machine learning with fine-grained neighborhood data to make robust predictions of compliance.

Suggested Citation

  • Rachel A. Oldroyd & Michelle A. Morris & Mark Birkin, 2021. "Predicting Food Safety Compliance for Informed Food Outlet Inspections: A Machine Learning Approach," IJERPH, MDPI, vol. 18(23), pages 1-20, November.
  • Handle: RePEc:gam:jijerp:v:18:y:2021:i:23:p:12635-:d:691848
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/18/23/12635/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/18/23/12635/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Strobl, Carolin & Boulesteix, Anne-Laure & Augustin, Thomas, 2007. "Unbiased split selection for classification trees based on the Gini Index," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 483-501, September.
    2. Jennifer J. Quinlan, 2013. "Foodborne Illness Incidence Rates and Food Safety Risks for Populations of Low Socioeconomic Status and Minority Race/Ethnicity: A Review of the Literature," IJERPH, MDPI, vol. 10(8), pages 1-19, August.
    3. Susan Arendt & Lakshman Rajagopal & Catherine Strohbehn & Nathan Stokes & Janell Meyer & Steven Mandernach, 2013. "Reporting of Foodborne Illness by U.S. Consumers and Healthcare Professionals," IJERPH, MDPI, vol. 10(8), pages 1-31, August.
    4. Kameshwari Pothukuchi & Rayman Mohamed & David Gebben, 2008. "Explaining disparities in food safety compliance by food stores: does community matter?," Agriculture and Human Values, Springer;The Agriculture, Food, & Human Values Society (AFHVS), vol. 25(3), pages 319-332, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kalbfuss, Jörg & Odermatt, Reto & Stutzer, Alois, 2024. "Medical marijuana laws and mental health in the United States," Health Economics, Policy and Law, Cambridge University Press, vol. 19(3), pages 307-322, July.
    2. Qingrong Tan & Yan Cai & Fen Luo & Dongbo Tu, 2023. "Development of a High-Accuracy and Effective Online Calibration Method in CD-CAT Based on Gini Index," Journal of Educational and Behavioral Statistics, , vol. 48(1), pages 103-141, February.
    3. David Podgorelec & Borut Žalik & Domen Mongus & Dino Vlahek, 2024. "A New Alternating Suboptimal Dynamic Programming Algorithm with Applications for Feature Selection," Mathematics, MDPI, vol. 12(13), pages 1-22, June.
    4. Hapfelmeier, A. & Ulm, K., 2013. "A new variable selection approach using Random Forests," Computational Statistics & Data Analysis, Elsevier, vol. 60(C), pages 50-69.
    5. Burim Ramosaj & Markus Pauly, 2019. "Predicting missing values: a comparative study on non-parametric approaches for imputation," Computational Statistics, Springer, vol. 34(4), pages 1741-1764, December.
    6. Limon Barua & Bo Zou & Yan Zhou & Yulin Liu, 2023. "Modeling household online shopping demand in the U.S.: a machine learning approach and comparative investigation between 2009 and 2017," Transportation, Springer, vol. 50(2), pages 437-476, April.
    7. Ravit Bassal & Maya Davidovich-Cohen & Eugenia Yakunin & Assaf Rokney & Shifra Ken-Dror & Merav Strauss & Tamar Wolf & Orli Sagi & Sharon Amit & Jacob Moran-Gilad & Orit Treygerman & Racheli Karyo & L, 2023. "Trends in the Epidemiology of Non-Typhoidal Salmonellosis in Israel between 2010 and 2021," IJERPH, MDPI, vol. 20(9), pages 1-12, April.
    8. Enrico Biffis & Erik Chavez & Alexis Louaas & Pierre Picard, 2022. "Parametric insurance and technology adoption in developing countries," The Geneva Risk and Insurance Review, Palgrave Macmillan;International Association for the Study of Insurance Economics (The Geneva Association), vol. 47(1), pages 7-44, March.
    9. Paola Zuccolotto, 2010. "Evaluating the impact of a grouping variable on Job Satisfaction drivers," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 19(2), pages 287-305, June.
    10. Jennifer J. Quinlan, 2013. "Foodborne Illness Incidence Rates and Food Safety Risks for Populations of Low Socioeconomic Status and Minority Race/Ethnicity: A Review of the Literature," IJERPH, MDPI, vol. 10(8), pages 1-19, August.
    11. Gerhard Tutz & Moritz Berger, 2016. "Item-focussed Trees for the Identification of Items in Differential Item Functioning," Psychometrika, Springer;The Psychometric Society, vol. 81(3), pages 727-750, September.
    12. Montes, Ignacio & Miranda, Enrique & Montes, Susana, 2014. "Stochastic dominance with imprecise information," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 868-886.
    13. Guanghua Han & Yihong Liu, 2018. "Does Information Pattern Affect Risk Perception of Food Safety? A National Survey in China," IJERPH, MDPI, vol. 15(9), pages 1-14, September.
    14. Shu-Fu Kuo & Yu-Shan Shih, 2012. "Variable selection for functional density trees," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(7), pages 1387-1395, December.
    15. Daniel L. Chen & Markus Loecher, 2022. "Mood and the Malleability of Moral Reasoning: The Impact of Irrelevant Factors on Judicial Decisions," Working Papers hal-03864854, HAL.
    16. Xiaomu Ye & Pengfei Ding & Dawei Jin & Chuanyue Zhou & Yi Li & Jin Zhang, 2023. "Intelligent Analysis of Construction Costs of Shield Tunneling in Complex Geological Conditions by Machine Learning Method," Mathematics, MDPI, vol. 11(6), pages 1-22, March.
    17. Wei-Yin Loh, 2014. "Fifty Years of Classification and Regression Trees," International Statistical Review, International Statistical Institute, vol. 82(3), pages 329-348, December.
    18. Achim Zeileis & Torsten Hothorn, 2013. "A toolbox of permutation tests for structural change," Statistical Papers, Springer, vol. 54(4), pages 931-954, November.
    19. Jenine K. Harris & Leslie Hinyard & Kate Beatty & Jared B. Hawkins & Elaine O. Nsoesie & Raed Mansour & John S. Brownstein, 2018. "Evaluating the Implementation of a Twitter-Based Foodborne Illness Reporting Tool in the City of St. Louis Department of Health," IJERPH, MDPI, vol. 15(5), pages 1-13, April.
    20. Archer, Kellie J. & Kimes, Ryan V., 2008. "Empirical characterization of random forest variable importance measures," Computational Statistics & Data Analysis, Elsevier, vol. 52(4), pages 2249-2260, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:18:y:2021:i:23:p:12635-:d:691848. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.