IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0204897.html
   My bibliography  Save this article

Comparison of the modified unbounded penalty and the LASSO to select predictive genes of response to chemotherapy in breast cancer

Author

Listed:
  • Olivier Collignon
  • Jeongseop Han
  • Hyungmi An
  • Seungyoung Oh
  • Youngjo Lee

Abstract

Covariate selection is a fundamental step when building sparse prediction models in order to avoid overfitting and to gain a better interpretation of the classifier without losing its predictive accuracy. In practice the LASSO regression of Tibshirani, which penalizes the likelihood of the model by the L1 norm of the regression coefficients, has become the gold-standard to reach these objectives. Recently Lee and Oh developed a novel random-effect covariate selection method called the modified unbounded penalty (MUB) regression, whose penalization function can equal minus infinity at 0 in order to produce very sparse models. We sought to compare the predictive accuracy and the number of covariates selected by these two methods in several high-dimensional datasets, consisting in genes expressions measured to predict response to chemotherapy in breast cancer patients. These comparisons were performed by building the Receiver Operating Characteristics (ROC) curves of the classifiers obtained with the selected genes and by comparing their area under the ROC curve (AUC) corrected for optimism using several variants of bootstrap internal validation and cross-validation. We found consistently in all datasets that the MUB penalization selected a remarkably smaller number of covariates than the LASSO while offering a similar—and encouraging—predictive accuracy. The models selected by the MUB were actually nested in the ones obtained with the LASSO. Similar findings were observed when comparing these results to those obtained in their first publication by other authors or when using the area under the Precision-Recall curve (AUCPR) as another measure of predictive performance. In conclusion, the MUB penalization seems therefore to be one of the best options when sparsity is required in high-dimension. Further investigation in other datasets is however required to validate these findings.

Suggested Citation

  • Olivier Collignon & Jeongseop Han & Hyungmi An & Seungyoung Oh & Youngjo Lee, 2018. "Comparison of the modified unbounded penalty and the LASSO to select predictive genes of response to chemotherapy in breast cancer," PLOS ONE, Public Library of Science, vol. 13(10), pages 1-15, October.
  • Handle: RePEc:plo:pone00:0204897
    DOI: 10.1371/journal.pone.0204897
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0204897
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0204897&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0204897?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    2. Park, Trevor & Casella, George, 2008. "The Bayesian Lasso," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 681-686, June.
    3. Kwon, Sunghoon & Oh, Seungyoung & Lee, Youngjo, 2016. "The use of random-effect models for high-dimensional variable selection problems," Computational Statistics & Data Analysis, Elsevier, vol. 103(C), pages 401-412.
    4. Lee, Youngjo & Oh, Hee-Seok, 2014. "A new sparse variable selection via random-effect model," Journal of Multivariate Analysis, Elsevier, vol. 125(C), pages 89-99.
    5. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    6. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mehmet Güney Celbiş & Pui-Hang Wong & Karima Kourtit & Peter Nijkamp, 2021. "Innovativeness, Work Flexibility, and Place Characteristics: A Spatial Econometric and Machine Learning Approach," Sustainability, MDPI, vol. 13(23), pages 1-29, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shutes, Karl & Adcock, Chris, 2013. "Regularized Extended Skew-Normal Regression," MPRA Paper 58445, University Library of Munich, Germany, revised 09 Sep 2014.
    2. Yu-Zhu Tian & Man-Lai Tang & Wai-Sum Chan & Mao-Zai Tian, 2021. "Bayesian bridge-randomized penalized quantile regression for ordinal longitudinal data, with application to firm’s bond ratings," Computational Statistics, Springer, vol. 36(2), pages 1289-1319, June.
    3. Ricardo P. Masini & Marcelo C. Medeiros & Eduardo F. Mendes, 2023. "Machine learning advances for time series forecasting," Journal of Economic Surveys, Wiley Blackwell, vol. 37(1), pages 76-111, February.
    4. Philip Kostov & Thankom Arun & Samuel Annim, 2014. "Financial Services to the Unbanked: the case of the Mzansi intervention in South Africa," Contemporary Economics, University of Economics and Human Sciences in Warsaw., vol. 8(2), June.
    5. Ruggieri, Eric & Lawrence, Charles E., 2012. "On efficient calculations for Bayesian variable selection," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1319-1332.
    6. Mogliani, Matteo & Simoni, Anna, 2021. "Bayesian MIDAS penalized regressions: Estimation, selection, and prediction," Journal of Econometrics, Elsevier, vol. 222(1), pages 833-860.
    7. Young Joo Yoon & Cheolwoo Park & Erik Hofmeister & Sangwook Kang, 2012. "Group variable selection in cardiopulmonary cerebral resuscitation data for veterinary patients," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(7), pages 1605-1621, January.
    8. Posch, Konstantin & Arbeiter, Maximilian & Pilz, Juergen, 2020. "A novel Bayesian approach for variable selection in linear regression models," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    9. Tanin Sirimongkolkasem & Reza Drikvandi, 2019. "On Regularisation Methods for Analysis of High Dimensional Data," Annals of Data Science, Springer, vol. 6(4), pages 737-763, December.
    10. van Erp, Sara & Oberski, Daniel L. & Mulder, Joris, 2018. "Shrinkage priors for Bayesian penalized regression," OSF Preprints cg8fq, Center for Open Science.
    11. Kshitij Khare & Malay Ghosh, 2022. "MCMC Convergence for Global-Local Shrinkage Priors," Journal of Quantitative Economics, Springer;The Indian Econometric Society (TIES), vol. 20(1), pages 211-234, September.
    12. Tian, Yuzhu & Song, Xinyuan, 2020. "Bayesian bridge-randomized penalized quantile regression," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    13. Diego Vidaurre & Concha Bielza & Pedro Larrañaga, 2013. "A Survey of L1 Regression," International Statistical Review, International Statistical Institute, vol. 81(3), pages 361-387, December.
    14. Sweata Sen & Damitri Kundu & Kiranmoy Das, 2023. "Variable selection for categorical response: a comparative study," Computational Statistics, Springer, vol. 38(2), pages 809-826, June.
    15. Li, Jiahan & Chen, Weiye, 2014. "Forecasting macroeconomic time series: LASSO-based approaches and their forecast combinations with dynamic factor models," International Journal of Forecasting, Elsevier, vol. 30(4), pages 996-1015.
    16. Mike K. P. So & Wing Ki Liu & Amanda M. Y. Chu, 2018. "Bayesian Shrinkage Estimation Of Time-Varying Covariance Matrices In Financial Time Series," Advances in Decision Sciences, Asia University, Taiwan, vol. 22(1), pages 369-404, December.
    17. Shutes, Karl & Adcock, Chris, 2013. "Regularized Skew-Normal Regression," MPRA Paper 52217, University Library of Munich, Germany, revised 11 Dec 2013.
    18. Banerjee, Sayantan, 2022. "Horseshoe shrinkage methods for Bayesian fusion estimation," Computational Statistics & Data Analysis, Elsevier, vol. 174(C).
    19. Matthew Gentzkow & Bryan T. Kelly & Matt Taddy, 2017. "Text as Data," NBER Working Papers 23276, National Bureau of Economic Research, Inc.
    20. Cox Lwaka Tamba & Yuan-Li Ni & Yuan-Ming Zhang, 2017. "Iterative sure independence screening EM-Bayesian LASSO algorithm for multi-locus genome-wide association studies," PLOS Computational Biology, Public Library of Science, vol. 13(1), pages 1-20, January.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0204897. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.