IDEAS home Printed from https://ideas.repec.org/a/eee/soceps/v82y2022ipbs0038012122000556.html
   My bibliography  Save this article

Predicting lead water service lateral locations: Geospatial data science in support of municipal programming

Author

Listed:
  • Hajiseyedjavadi, Seyedsaeed
  • Karimi, Hassan A.
  • Blackhurst, Michael

Abstract

We present and discuss machine learning predictions of customers' service line materials in Pittsburgh, PA and demonstrate the degree to which these predictions and the supporting data can and cannot improve municipal lead programming. Like previous work, predictive features reflect a combination of property characteristics, administrative spatial data, and tap water quality samples. Our work also includes labels of service line materials diagnosed by photographs taken at the curb box, which prove to boost predictions but are imperfect exclusive diagnostic methods. We use sample weighting and spatial cross validation in an effort to overcome the oversampling of lead service line characteristics of data collected for regulatory compliance. Cross-validation demonstrates precise predictions (precision >90%) for only 13% of customers, suggesting that predictions could improve short-term replacement decisions in avoiding unnecessary excavations. However, model precision declines when expanding predictions to more customers, limiting the degree to which predictions can estimate system-wide inventories and inform the regulatory decisions requiring complete inventories. We discuss the necessary trade-offs between biased sampling for regulatory compliance, which favors finding and replacing lead, and predictive modeling, which improves with unbiased sampling. We present a flow diagram that can help municipalities balance biased and unbiased sampling when integrating predictive modeling into compliance with federal regulations.

Suggested Citation

  • Hajiseyedjavadi, Seyedsaeed & Karimi, Hassan A. & Blackhurst, Michael, 2022. "Predicting lead water service lateral locations: Geospatial data science in support of municipal programming," Socio-Economic Planning Sciences, Elsevier, vol. 82(PB).
  • Handle: RePEc:eee:soceps:v:82:y:2022:i:pb:s0038012122000556
    DOI: 10.1016/j.seps.2022.101277
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0038012122000556
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.seps.2022.101277?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Donna Katzman McClish, 1989. "Analyzing a Portion of the ROC Curve," Medical Decision Making, , vol. 9(3), pages 190-195, August.
    2. Laureti, Tiziana & Benedetti, Ilaria & Branca, Giacomo, 2021. "Water use efficiency and public goods conservation: A spatial stochastic frontier model applied to irrigation in Southern Italy," Socio-Economic Planning Sciences, Elsevier, vol. 73(C).
    3. Rebai, Sonia & Ben Yahia, Fatma & Essid, Hédi, 2020. "A graphically based machine learning approach to predict secondary schools performance in Tunisia," Socio-Economic Planning Sciences, Elsevier, vol. 70(C).
    4. Portnov, Boris A. & Dubnov, Jonathan & Barchana, Micha, 2009. "Studying the association between air pollution and lung cancer incidence in a large metropolitan area using a kernel density function," Socio-Economic Planning Sciences, Elsevier, vol. 43(3), pages 141-150, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chen, Zhelun & O’Neill, Zheng & Wen, Jin & Pradhan, Ojas & Yang, Tao & Lu, Xing & Lin, Guanjing & Miyata, Shohei & Lee, Seungjae & Shen, Chou & Chiosa, Roberto & Piscitelli, Marco Savino & Capozzoli, , 2023. "A review of data-driven fault detection and diagnostics for building HVAC systems," Applied Energy, Elsevier, vol. 339(C).
    2. Seul-gi Lee & Bashir Adelodun & Mirza Junaid Ahmad & Kyung Sook Choi, 2022. "Multi-Level Prioritization Analysis of Water Governance Components to Improve Agricultural Water-Saving Policy: A Case Study from Korea," Sustainability, MDPI, vol. 14(6), pages 1-18, March.
    3. Jianxu Liu & Xiaoqing Li & Shutong Liu & Sanzidur Rahman & Songsak Sriboonchitta, 2022. "Addressing Rural–Urban Income Gap in China through Farmers’ Education and Agricultural Productivity Growth via Mediation and Interaction Effects," Agriculture, MDPI, vol. 12(11), pages 1-23, November.
    4. Hand, David J., 2009. "Mining the past to determine the future: Problems and possibilities," International Journal of Forecasting, Elsevier, vol. 25(3), pages 441-451, July.
    5. Yizhou Wu & Peilei Fan & Heyuan You, 2018. "Spatial Evolution of Producer Service Sectors and Its Influencing Factors in Cities: A Case Study of Hangzhou, China," Sustainability, MDPI, vol. 10(4), pages 1-23, March.
    6. Dario Aversa & Nino Adamashvili & Mariantonietta Fiore & Alessia Spada, 2022. "Scoping Review (SR) via Text Data Mining on Water Scarcity and Climate Change," Sustainability, MDPI, vol. 15(1), pages 1-13, December.
    7. Chiara Perelli & Giacomo Branca & Chiara Corbari & Marco Mancini, 2024. "Physical and Economic Water Productivity in Agriculture between Traditional and Water-Saving Irrigation Systems: A Case Study in Southern Italy," Sustainability, MDPI, vol. 16(12), pages 1-12, June.
    8. Chinmay Mungi & Dejian Lai & Xianglin L. Du, 2019. "Spatial Analysis of Industrial Benzene Emissions and Cancer Incidence Rates in Texas," IJERPH, MDPI, vol. 16(15), pages 1-13, July.
    9. Jincai Zhao & Yiyao Wang & Xiufeng Zhang & Qianxi Liu, 2022. "Industrial and Agricultural Water Use Efficiency and Influencing Factors in the Process of Urbanization in the Middle and Lower Reaches of the Yellow River Basin, China," Land, MDPI, vol. 11(8), pages 1-18, August.
    10. Gigliarano, Chiara & Figini, Silvia & Muliere, Pietro, 2014. "Making classifier performance comparisons when ROC curves intersect," Computational Statistics & Data Analysis, Elsevier, vol. 77(C), pages 300-312.
    11. Yu, Wenbao & Park, Taesung, 2015. "Two simple algorithms on linear combination of multiple biomarkers to maximize partial area under the ROC curve," Computational Statistics & Data Analysis, Elsevier, vol. 88(C), pages 15-27.
    12. Vidoli, Francesco & Pignataro, Giacomo & Benedetti, Roberto, 2022. "Identification of spatial regimes of the production function of Italian hospitals through spatially constrained cluster-wise regression," Socio-Economic Planning Sciences, Elsevier, vol. 82(PA).
    13. Daniel Felsenstein & Eilat Elbaum & Tsafrir Levi & Ran Calvo, 2021. "Post-processing HAZUS earthquake damage and loss assessments for individual buildings," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 105(1), pages 21-45, January.
    14. Wu, Xianhua & Tian, Zhiqing & Kuai, Yun & Song, Shunfeng & Marson, Stephen M., 2022. "Study on spatial correlation of air pollution and control effect of development plan for the city cluster in the Yangtze River Delta," Socio-Economic Planning Sciences, Elsevier, vol. 83(C).
    15. Yousef, Waleed A., 2013. "Assessing classifiers in terms of the partial area under the ROC curve," Computational Statistics & Data Analysis, Elsevier, vol. 64(C), pages 51-70.
    16. Erick C. Jones & Benjamin D. Leibowicz, 2022. "Climate risk management in agriculture using alternative electricity and water resources: a stochastic programming framework," Environment Systems and Decisions, Springer, vol. 42(1), pages 117-135, March.
    17. Marina Zusman & Dani Broitman & Boris A. Portnov, 2016. "Application of the double kernel density approach to the multivariate analysis of attributeless event point datasets," Letters in Spatial and Resource Sciences, Springer, vol. 9(3), pages 363-382, October.
    18. Schmid Matthias & Hothorn Torsten & Krause Friedemann & Rabe Christina, 2012. "A PAUC-based Estimation Technique for Disease Classification and Biomarker Selection," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(5), pages 1-26, October.
    19. Qinghua Pang & Hailiang Huang & Lina Zhang, 2022. "Characteristics of Spatial–Temporal Variations in Coupling Coordination between Industrial Water Use and Industrial Green Development Systems in China," Sustainability, MDPI, vol. 15(1), pages 1-19, December.
    20. Ma, Hua & Bandos, Andriy I. & Gur, David, 2018. "Informativeness of diagnostic marker values and the impact of data grouping," Computational Statistics & Data Analysis, Elsevier, vol. 117(C), pages 76-89.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:soceps:v:82:y:2022:i:pb:s0038012122000556. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/seps .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.