IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v314y2024i1p297-307.html
   My bibliography  Save this article

Column generation-based prototype learning for optimizing area under the receiver operating characteristic curve

Author

Listed:
  • Ozcan, Erhan C.
  • Görgülü, Berk
  • Baydogan, Mustafa G.

Abstract

The traditional classification algorithms focus on the maximization of classification accuracy which might lead to poor performance in practice by forcing classifiers to overfit to the majority class. In order to overcome this issue, various approaches focus on the optimization of alternative loss functions such as the Area Under the Curve (AUC). AUC is a Receiver Operating Characteristics (ROC) metric that has been widely used to measure classification performance, especially when there are class imbalances. In this work, we propose a column generation (CG)-based algorithm called Ranking-CG, which learns a model, similar to the popular Ranking SVM, through approximate maximization of the AUC. Unlike the Ranking SVM, our algorithm utilizes a column generation method that iteratively adds features to control the model complexity effectively working as an internal feature selection procedure. Our experiments show that column generation can be an important tool to prevent overfitting. We extend the Ranking-CG by proposing a prototype generation method, denoted by Ranking-CG Prototype, that constructs reference points by solving a non-linear optimization problem. Based on the extensive experiments conducted on 74 binary classification problems, the Ranking-CG Prototype yields the best average test AUC among all competing methods by using significantly few features than other benchmarks.

Suggested Citation

  • Ozcan, Erhan C. & Görgülü, Berk & Baydogan, Mustafa G., 2024. "Column generation-based prototype learning for optimizing area under the receiver operating characteristic curve," European Journal of Operational Research, Elsevier, vol. 314(1), pages 297-307.
  • Handle: RePEc:eee:ejores:v:314:y:2024:i:1:p:297-307
    DOI: 10.1016/j.ejor.2023.11.016
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0377221723008573
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ejor.2023.11.016?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Jiménez-Cordero, Asunción & Morales, Juan Miguel & Pineda, Salvador, 2021. "A novel embedded min-max approach for feature selection in nonlinear Support Vector Machine classification," European Journal of Operational Research, Elsevier, vol. 293(1), pages 24-35.
    2. Aytug, Haldun, 2015. "Feature selection for support vector machines using Generalized Benders Decomposition," European Journal of Operational Research, Elsevier, vol. 244(1), pages 210-218.
    3. Fu, Saiji & Tian, Yingjie & Tang, Long, 2023. "Robust regression under the general framework of bounded loss functions," European Journal of Operational Research, Elsevier, vol. 310(3), pages 1325-1339.
    4. Patrick J. Heagerty & Thomas Lumley & Margaret S. Pepe, 2000. "Time-Dependent ROC Curves for Censored Survival Data and a Diagnostic Marker," Biometrics, The International Biometric Society, vol. 56(2), pages 337-344, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chin-Tsang Chiang & Shr-Yan Huang, 2009. "Estimation for the Optimal Combination of Markers without Modeling the Censoring Distribution," Biometrics, The International Biometric Society, vol. 65(1), pages 152-158, March.
    2. Te-Ling Ma & Tsung-Hui Hu & Chao-Hung Hung & Jing-Houng Wang & Sheng-Nan Lu & Chien-Hung Chen, 2019. "Incidence and predictors of retreatment in chronic hepatitis B patients after discontinuation of entecavir or tenofovir treatment," PLOS ONE, Public Library of Science, vol. 14(10), pages 1-16, October.
    3. Yingye Zheng & Patrick Heagerty, 2004. "Semiparametric Estimation of Time-Dependent: ROC Curves for Longitudinal Marker Data," UW Biostatistics Working Paper Series 1052, Berkeley Electronic Press.
    4. Shannon M Lynch & Elizabeth Handorf & Kristen A Sorice & Elizabeth Blackman & Lisa Bealin & Veda N Giri & Elias Obeid & Camille Ragin & Mary Daly, 2020. "The effect of neighborhood social environment on prostate cancer development in black and white men at high risk for prostate cancer," PLOS ONE, Public Library of Science, vol. 15(8), pages 1-18, August.
    5. Weining Shen & Jing Ning & Ying Yuan, 2015. "A direct method to evaluate the time-dependent predictive accuracy for biomarkers," Biometrics, The International Biometric Society, vol. 71(2), pages 439-449, June.
    6. Si Cheng & Kathleen F Kerr & Heather Thiessen-Philbrook & Steven G Coca & Chirag R Parikh, 2020. "BioPETsurv: Methodology and open source software to evaluate biomarkers for prognostic enrichment of time-to-event clinical trials," PLOS ONE, Public Library of Science, vol. 15(9), pages 1-11, September.
    7. Yuxun Wang & Liang Fang & Chao Liu & Lanxin Wang & Huimei Xu, 2023. "The Influential Factors of the Habitat Quality of the Red-crowned Crane: A Case Study of Yancheng, Jiangsu Province, China," Land, MDPI, vol. 12(6), pages 1-20, June.
    8. Lori E. Dodd, 2010. "ROC Curves for Continuous Data by KRZANOWSKI, W. J. and HAND, D. J," Biometrics, The International Biometric Society, vol. 66(2), pages 657-658, June.
    9. Yingye Zheng & Tianxi Cai & Ziding Feng, 2006. "Application of the Time-Dependent ROC Curves for Prognostic Accuracy with Multiple Biomarkers," Biometrics, The International Biometric Society, vol. 62(1), pages 279-287, March.
    10. C. Jason Liang & Patrick J. Heagerty, 2017. "Rejoinder to discussions on: A risk-based measure of time-varying prognostic discrimination for survival models," Biometrics, The International Biometric Society, vol. 73(3), pages 745-748, September.
    11. C. Jason Liang & Patrick J. Heagerty, 2017. "A risk-based measure of time-varying prognostic discrimination for survival models," Biometrics, The International Biometric Society, vol. 73(3), pages 725-734, September.
    12. Foucher Yohann & Danger Richard, 2012. "Time Dependent ROC Curves for the Estimation of True Prognostic Capacity of Microarray Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(6), pages 1-22, November.
    13. Yingye Zheng & Patrick J. Heagerty, 2007. "Prospective Accuracy for Longitudinal Markers," Biometrics, The International Biometric Society, vol. 63(2), pages 332-341, June.
    14. Jie Xiong & Zhitong Bing & Yanlin Su & Defeng Deng & Xiaoning Peng, 2014. "An Integrated mRNA and microRNA Expression Signature for Glioblastoma Multiforme Prognosis," PLOS ONE, Public Library of Science, vol. 9(5), pages 1-8, May.
    15. Yingye Zheng & Patrick J. Heagerty, 2005. "Partly Conditional Survival Models for Longitudinal Data," Biometrics, The International Biometric Society, vol. 61(2), pages 379-391, June.
    16. Engler David & Li Yi, 2009. "Survival Analysis with High-Dimensional Covariates: An Application in Microarray Studies," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-24, February.
    17. Patrick J. Heagerty & Yingye Zheng, 2005. "Survival Model Predictive Accuracy and ROC Curves," Biometrics, The International Biometric Society, vol. 61(1), pages 92-105, March.
    18. Tianxi Cai & Thomas A Gerds & Yingye Zheng & Jinbo Chen, 2011. "Robust Prediction of t-Year Survival with Data from Multiple Studies," Biometrics, The International Biometric Society, vol. 67(2), pages 436-444, June.
    19. Mogensen, Ulla B. & Ishwaran, Hemant & Gerds, Thomas A., 2012. "Evaluating Random Forests for Survival Analysis Using Prediction Error Curves," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 50(i11).
    20. Yang, Dongchuan & Guo, Ju-e & Li, Yanzhao & Sun, Shaolong & Wang, Shouyang, 2023. "Short-term load forecasting with an improved dynamic decomposition-reconstruction-ensemble approach," Energy, Elsevier, vol. 263(PA).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:314:y:2024:i:1:p:297-307. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.