IDEAS home Printed from https://ideas.repec.org/a/gam/jlands/v11y2022i11p1971-d962994.html
   My bibliography  Save this article

A Positive-Unlabeled Learning Algorithm for Urban Flood Susceptibility Modeling

Author

Listed:
  • Wenkai Li

    (School of Geography and Planning, Sun Yat-Sen University, Guangzhou 510006, China)

  • Yuanchi Liu

    (School of Geography and Planning, Sun Yat-Sen University, Guangzhou 510006, China)

  • Ziyue Liu

    (School of Geography and Planning, Sun Yat-Sen University, Guangzhou 510006, China)

  • Zhen Gao

    (Guangzhou Institute of Geography, Guangdong Academy of Sciences, Guangzhou 510070, China)

  • Huabing Huang

    (School of Geography and Planning, Sun Yat-Sen University, Guangzhou 510006, China)

  • Weijun Huang

    (School of Geography and Planning, Sun Yat-Sen University, Guangzhou 510006, China)

Abstract

Flood susceptibility modeling helps understand the relationship between influencing factors and occurrence of urban flooding and further provides spatial distribution of flood risk, which is critical for flood-risk reduction. Machine learning methods have been widely applied in flood susceptibility modeling, but traditional supervised learning requires both positive (flood) and negative (non-flood) samples in model training. Historical flood inventory data usually contain positive-only data, whereas negative data selected from areas without flood records are prone to be contaminated by positive data, which is referred to as case-control sampling with contaminated controls. In order to address this problem, we propose to apply a novel positive-unlabeled learning algorithm, namely positive and background learning with constraints (PBLC), in flood susceptibility modeling. PBLC trains a binary classifier from case-control positive and unlabeled samples without requiring truly labeled negative data. With historical records of flood locations and environmental covariates, including elevation, slope, aspect, plan curvature, profile curvature, slope length factor, stream power index, topographic position index, topographic wetness index, distance to rivers, distance to roads, land use, normalized difference vegetation index, and precipitation, we compared the performances of the traditional artificial neural network (ANN) and the novel PBLC in flood susceptibility modeling in the city of Guangzhou, China. Experimental results show that PBLC can produce more calibrated probabilistic prediction, more accurate binary prediction, and more reliable susceptibility mapping of urban flooding than traditional ANN, indicating that PBLC is effective in addressing the problem of case-control sampling with contaminated controls and it can be successfully applied in urban flood susceptibility mapping.

Suggested Citation

  • Wenkai Li & Yuanchi Liu & Ziyue Liu & Zhen Gao & Huabing Huang & Weijun Huang, 2022. "A Positive-Unlabeled Learning Algorithm for Urban Flood Susceptibility Modeling," Land, MDPI, vol. 11(11), pages 1-17, November.
  • Handle: RePEc:gam:jlands:v:11:y:2022:i:11:p:1971-:d:962994
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2073-445X/11/11/1971/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2073-445X/11/11/1971/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Hossain, Mohammad Khalid & Meng, Qingmin, 2020. "A fine-scale spatial analytics of the assessment and mapping of buildings and population at different risk levels of urban flood," Land Use Policy, Elsevier, vol. 99(C).
    2. Gill Ward & Trevor Hastie & Simon Barry & Jane Elith & John R. Leathwick, 2009. "Presence-Only Data and the EM Algorithm," Biometrics, The International Biometric Society, vol. 65(2), pages 554-563, June.
    3. Lancaster, Tony & Imbens, Guido, 1996. "Case-control studies with contaminated controls," Journal of Econometrics, Elsevier, vol. 71(1-2), pages 145-160.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Erard Brian, 2022. "Modeling Qualitative Outcomes by Supplementing Participant Data with General Population Data: A New and More Versatile Approach," Journal of Econometric Methods, De Gruyter, vol. 11(1), pages 35-53, January.
    2. Małgorzata Łazęcka & Jan Mielniczuk & Paweł Teisseyre, 2021. "Estimating the class prior for positive and unlabelled data via logistic regression," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(4), pages 1039-1068, December.
    3. Robert M. Dorazio, 2012. "Predicting the Geographic Distribution of a Species from Presence-Only Data Subject to Detection Errors," Biometrics, The International Biometric Society, vol. 68(4), pages 1303-1312, December.
    4. Sung Jae Jun & Sokbae Lee, 2024. "Causal Inference Under Outcome-Based Sampling with Monotonicity Assumptions," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 42(3), pages 998-1009, July.
    5. Masahiro Kato & Shota Yasui, 2020. "Learning Classifiers under Delayed Feedback with a Time Window Assumption," Papers 2009.13092, arXiv.org, revised Jun 2022.
    6. Esmerelda A. Ramalho & Richard Smith, 2003. "Discrete choice non-response," CeMMAP working papers 07/03, Institute for Fiscal Studies.
    7. Erard, Brian & Langetieg, Patrick & Payne, Mark & Plumley, Alan, 2020. "Ghosts in the Income Tax Machinery," MPRA Paper 100036, University Library of Munich, Germany.
    8. Amanda Coston & Edward H. Kennedy, 2022. "The role of the geometric mean in case-control studies," Papers 2207.09016, arXiv.org.
    9. Mohammad Khalid Hossain & Qingmin Meng, 2020. "A Multi-Decadal Spatial Analysis of Demographic Vulnerability to Urban Flood: A Case Study of Birmingham City, USA," Sustainability, MDPI, vol. 12(21), pages 1-32, November.
    10. Schwemmer, Philipp & Güpner, Franziska & Adler, Sven & Klingbeil, Knut & Garthe, Stefan, 2016. "Modelling small-scale foraging habitat use in breeding Eurasian oystercatchers (Haematopus ostralegus) in relation to prey distribution and environmental predictors," Ecological Modelling, Elsevier, vol. 320(C), pages 322-333.
    11. Ashton, John & Burnett, Tim & Diaz-Rainey, Ivan & Ormosi, Peter, 2021. "Known unknowns: How much financial misconduct is detected and deterred?," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 74(C).
    12. Vincenzo Caponi & Miana Plesca, 2014. "Empirical characteristics of legal and illegal immigrants in the USA," Journal of Population Economics, Springer;European Society for Population Economics, vol. 27(4), pages 923-960, October.
    13. Sung Jae Jun & Sokbae (Simon) Lee, 2020. "Causal inference in case-control studies," CeMMAP working papers CWP19/20, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    14. Lee, Kangbok & Joo, Sunghoon & Baik, Hyeoncheol & Han, Sumin & In, Joonhwan, 2020. "Unbalanced data, type II error, and nonlinearity in predicting M&A failure," Journal of Business Research, Elsevier, vol. 109(C), pages 271-287.
    15. Łukasz Piątek & Magdalena Wojnowska-Heciak, 2020. "Multicase Study Comparison of Different Types of Flood-Resilient Buildings (Elevated, Amphibious, and Floating) at the Vistula River in Warsaw, Poland," Sustainability, MDPI, vol. 12(22), pages 1-20, November.
    16. Saupe, E.E. & Barve, V. & Myers, C.E. & Soberón, J. & Barve, N. & Hensz, C.M. & Peterson, A.T. & Owens, H.L. & Lira-Noriega, A., 2012. "Variation in niche and distribution model performance: The need for a priori assessment of key causal factors," Ecological Modelling, Elsevier, vol. 237, pages 11-22.
    17. Becker, Bo & Cronqvist, Henrik & Fahlenbrach, Rüdiger, 2011. "Estimating the Effects of Large Shareholders Using a Geographic Instrument," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 46(4), pages 907-942, August.
    18. Adam M. Kleinbaum & Toby E. Stuart & Michael L. Tushman, 2013. "Discretion Within Constraint: Homophily and Structure in a Formal Organization," Organization Science, INFORMS, vol. 24(5), pages 1316-1336, October.
    19. Herkt, K. Matthias B. & Barnikel, Günter & Skidmore, Andrew K. & Fahr, Jakob, 2016. "A high-resolution model of bat diversity and endemism for continental Africa," Ecological Modelling, Elsevier, vol. 320(C), pages 9-28.
    20. Gill Ward & Trevor Hastie & Simon Barry & Jane Elith & John R. Leathwick, 2009. "Presence-Only Data and the EM Algorithm," Biometrics, The International Biometric Society, vol. 65(2), pages 554-563, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jlands:v:11:y:2022:i:11:p:1971-:d:962994. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.