IDEAS home Printed from https://ideas.repec.org/p/bep/uwabio/1021.html
   My bibliography  Save this paper

Combining Predictors for Classification Using the Area Under the ROC Curve

Author

Listed:
  • Margaret Pepe

    (University of Washington)

  • Tianxi Cai

    (Harvard University)

  • Zheng Zhang

    (University of Washington)

Abstract

We compare simple logistic regression with an alternative robust procedure for constructing linear predictors to be used for the two state classification task. Theoritical advantages of the robust procedure over logistic regression are: (i) although it assumes a generalized linear model for the dichotomous outcome variable, it does not require specification of the link function; (ii) it accommodates case-control designs even when the model is not logistic; and (iii) it yields sensible results even when the generalized linear model assumption fails to hold. Surprisingly, we find that the linear predictor derived from the logistic regression likelihood is very robust in the following sense: it yields prediction performance comparable with our theoretically robust procedure when the logistic model fails and even when the form of the linear predictor is incorrectly specified. This raises some intriguing questions about using logistic regression for prediction. Some preliminary explanations are given that draw from recent literature.Next we suggest that it may not be necessary to fit the linear function over the whole predictor space to achieve adequate classification properties. Procedures that restrict modeling to a subspace defined by minimally acceptable false-positive and false-negative error rates are suggested. We find that relaxing linearity assumptions to a subspace infers further robustness and that the logistic likelihood calculated over the restricted region provides a robust objective function for determining classification rules.Overall, our new procedure performs well but not substantially better than logistic regression. Further work is warranted to clarify the relationship between the two conceptually distinct procedures, and may provide a new conceptual basis for using the logistic likelihood to combine predictors.Note: This Working Paper is a revised version of the previously posted "Robust Binary Regression for Optimally Combining Predictors."

Suggested Citation

  • Margaret Pepe & Tianxi Cai & Zheng Zhang, 2004. "Combining Predictors for Classification Using the Area Under the ROC Curve," UW Biostatistics Working Paper Series 1021, Berkeley Electronic Press.
  • Handle: RePEc:bep:uwabio:1021
    Note: oai:bepress.com:uwbiostat-1021
    as

    Download full text from publisher

    File URL: http://www.bepress.com/cgi/viewcontent.cgi?article=1021&context=uwbiostat
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. J. B. Copas, 2002. "Overestimation of the receiver operating characteristic curve for logistic regression," Biometrika, Biometrika Trust, vol. 89(2), pages 315-331, June.
    2. Stuart G. Baker, 2000. "Identifying Combinations of Cancer Markers for Further Study as Triggers of Early Intervention," Biometrics, The International Biometric Society, vol. 56(4), pages 1082-1087, December.
    3. Martin W. McIntosh & Margaret Sullivan Pepe, 2002. "Combining Several Screening Tests: Optimality of the Risk Score," Biometrics, The International Biometric Society, vol. 58(3), pages 657-664, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Wang, Hansheng, 2007. "A note on iterative marginal optimization: a simple algorithm for maximum rank correlation estimation," Computational Statistics & Data Analysis, Elsevier, vol. 51(6), pages 2803-2812, March.
    2. Shuangge Ma & Michael R. Kosorok & Jason P. Fine, 2006. "Additive Risk Models for Survival Data with High-Dimensional Covariates," Biometrics, The International Biometric Society, vol. 62(1), pages 202-210, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Margaret Sullivan Pepe & Tianxi Cai & Gary Longton, 2006. "Combining Predictors for Classification Using the Area under the Receiver Operating Characteristic Curve," Biometrics, The International Biometric Society, vol. 62(1), pages 221-229, March.
    2. Drehmann, Mathias & Juselius, Mikael, 2014. "Evaluating early warning indicators of banking crises: Satisfying policy requirements," International Journal of Forecasting, Elsevier, vol. 30(3), pages 759-780.
    3. Debashis Ghosh, 2004. "Semiparametric methods for the binormal model with multiple biomarkers," The University of Michigan Department of Biostatistics Working Paper Series 1046, Berkeley Electronic Press.
    4. Yuxin Zhu & Mei‐Cheng Wang, 2022. "Obtaining optimal cutoff values for tree classifiers using multiple biomarkers," Biometrics, The International Biometric Society, vol. 78(1), pages 128-140, March.
    5. Carol Y. Lin & Lance A. Waller & Robert H. Lyles, 2012. "The likelihood approach for the comparison of medical diagnostic system with multiple binary tests," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(7), pages 1437-1454, December.
    6. Debashis Ghosh & Moulinath Banerjee & Pinaki Biswas, 2004. "Binary isotonic regression procedures, with application to cancer biomarkers," The University of Michigan Department of Biostatistics Working Paper Series 1037, Berkeley Electronic Press.
    7. Yue Wang & Jeremy Taylor, 2004. "Monotone Constrained Tensor-product B-spline with application to screening studies," The University of Michigan Department of Biostatistics Working Paper Series 1022, Berkeley Electronic Press.
    8. Pablo Martínez-Camblor & Sonia Pérez-Fernández & Susana Díaz-Coto, 2021. "Optimal classification scores based on multivariate marker transformations," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 105(4), pages 581-599, December.
    9. Yingye Zheng & Tianxi Cai & Ziding Feng, 2006. "Application of the Time-Dependent ROC Curves for Prognostic Accuracy with Multiple Biomarkers," Biometrics, The International Biometric Society, vol. 62(1), pages 279-287, March.
    10. Debashis Ghosh, 2004. "Semiparametic models and estimation procedures for binormal ROC curves with multiple biomarkers," The University of Michigan Department of Biostatistics Working Paper Series 1038, Berkeley Electronic Press.
    11. Debashis Ghosh & Arul Chinnaiyan, 2004. "Classification and selection of biomarkers in genomic data using LASSO," The University of Michigan Department of Biostatistics Working Paper Series 1041, Berkeley Electronic Press.
    12. Yanqing Wang & Yingqi Zhao & Yingye Zheng, 2022. "Targeted Search for Individualized Clinical Decision Rules to Optimize Clinical Outcomes," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 14(3), pages 564-581, December.
    13. Yanqing Wang & Ying‐Qi Zhao & Yingye Zheng, 2020. "Learning‐based biomarker‐assisted rules for optimized clinical benefit under a risk constraint," Biometrics, The International Biometric Society, vol. 76(3), pages 853-862, September.
    14. Binbing Yu, 2009. "Approximating the risk score for disease diagnosis using MARS," Journal of Applied Statistics, Taylor & Francis Journals, vol. 36(7), pages 769-778.
    15. Qing Lu & Nancy Obuchowski & Sungho Won & Xiaofeng Zhu & Robert C. Elston, 2010. "Using the Optimal Robust Receiver Operating Characteristic (ROC) Curve for Predictive Genetic Tests," Biometrics, The International Biometric Society, vol. 66(2), pages 586-593, June.
    16. Mei-Cheng Wang & Shanshan Li, 2012. "Bivariate Marker Measurements and ROC Analysis," Biometrics, The International Biometric Society, vol. 68(4), pages 1207-1218, December.
    17. Daniel J. Luckett & Eric B. Laber & Samer S. El‐Kamary & Cheng Fan & Ravi Jhaveri & Charles M. Perou & Fatma M. Shebl & Michael R. Kosorok, 2021. "Receiver operating characteristic curves and confidence bands for support vector machines," Biometrics, The International Biometric Society, vol. 77(4), pages 1422-1430, December.
    18. Ming-Yueh Huang & Chin-Tsang Chiang, 2017. "Estimation and Inference Procedures for Semiparametric Distribution Models with Varying Linear-Index," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 44(2), pages 396-424, June.
    19. Chin-Tsang Chiang & Shr-Yan Huang, 2009. "Estimation for the Optimal Combination of Markers without Modeling the Censoring Distribution," Biometrics, The International Biometric Society, vol. 65(1), pages 152-158, March.
    20. Jin, Hua & Lu, Ying, 2009. "Permutation test for non-inferiority of the linear to the optimal combination of multiple tests," Statistics & Probability Letters, Elsevier, vol. 79(5), pages 664-669, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bep:uwabio:1021. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F. Baum (email available below). General contact details of provider: http://www.bepress.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.