IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v9y2021i3p222-d485789.html
   My bibliography  Save this article

Iterative Variable Selection for High-Dimensional Data: Prediction of Pathological Response in Triple-Negative Breast Cancer

Author

Listed:
  • Juan C. Laria

    (UC3M-BS Santander Big Data Institute, 28903 Getafe, Spain)

  • M. Carmen Aguilera-Morillo

    (UC3M-BS Santander Big Data Institute, 28903 Getafe, Spain
    Department of Applied Statistics and Operational Research and Quality, Universitat Politècnica de València, 46022 Valencia, Spain)

  • Enrique Álvarez

    (Department of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, Spain)

  • Rosa E. Lillo

    (UC3M-BS Santander Big Data Institute, 28903 Getafe, Spain
    Department of Statistics, University Carlos III of Madrid, 28903 Getafe, Spain)

  • Sara López-Taruella

    (Department of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, Spain
    CiberOnc—Centro de Investigación Biomédica en Red, 28029 Madrid, Spain
    Universidad Complutense de Madrid, 28040 Madrid, Spain
    GEICAM—Grupo Español de Investigación en Cáncer de Mama, 28703 San Sebastián de los Reyes, Spain)

  • María del Monte-Millán

    (Department of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, Spain
    CiberOnc—Centro de Investigación Biomédica en Red, 28029 Madrid, Spain)

  • Antonio C. Picornell

    (Department of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, Spain)

  • Miguel Martín

    (Department of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, Spain
    CiberOnc—Centro de Investigación Biomédica en Red, 28029 Madrid, Spain
    Universidad Complutense de Madrid, 28040 Madrid, Spain
    GEICAM—Grupo Español de Investigación en Cáncer de Mama, 28703 San Sebastián de los Reyes, Spain)

  • Juan Romo

    (UC3M-BS Santander Big Data Institute, 28903 Getafe, Spain
    Department of Statistics, University Carlos III of Madrid, 28903 Getafe, Spain)

Abstract

Over the last decade, regularized regression methods have offered alternatives for performing multi-marker analysis and feature selection in a whole genome context. The process of defining a list of genes that will characterize an expression profile remains unclear. It currently relies upon advanced statistics and can use an agnostic point of view or include some a priori knowledge, but overfitting remains a problem. This paper introduces a methodology to deal with the variable selection and model estimation problems in the high-dimensional set-up, which can be particularly useful in the whole genome context. Results are validated using simulated data and a real dataset from a triple-negative breast cancer study.

Suggested Citation

  • Juan C. Laria & M. Carmen Aguilera-Morillo & Enrique Álvarez & Rosa E. Lillo & Sara López-Taruella & María del Monte-Millán & Antonio C. Picornell & Miguel Martín & Juan Romo, 2021. "Iterative Variable Selection for High-Dimensional Data: Prediction of Pathological Response in Triple-Negative Breast Cancer," Mathematics, MDPI, vol. 9(3), pages 1-14, January.
  • Handle: RePEc:gam:jmathe:v:9:y:2021:i:3:p:222-:d:485789
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/9/3/222/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/9/3/222/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    2. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    3. Ming Yuan & Yi Lin, 2006. "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(1), pages 49-67, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    2. Shuichi Kawano, 2014. "Selection of tuning parameters in bridge regression models via Bayesian information criterion," Statistical Papers, Springer, vol. 55(4), pages 1207-1223, November.
    3. Yize Zhao & Matthias Chung & Brent A. Johnson & Carlos S. Moreno & Qi Long, 2016. "Hierarchical Feature Selection Incorporating Known and Novel Biological Information: Identifying Genomic Features Related to Prostate Cancer Recurrence," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1427-1439, October.
    4. Bilin Zeng & Xuerong Meggie Wen & Lixing Zhu, 2017. "A link-free sparse group variable selection method for single-index model," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(13), pages 2388-2400, October.
    5. Capanu, Marinela & Giurcanu, Mihai & Begg, Colin B. & Gönen, Mithat, 2023. "Subsampling based variable selection for generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 184(C).
    6. Yu-Min Yen, 2010. "A Note on Sparse Minimum Variance Portfolios and Coordinate-Wise Descent Algorithms," Papers 1005.5082, arXiv.org, revised Sep 2013.
    7. Tomáš Plíhal, 2021. "Scheduled macroeconomic news announcements and Forex volatility forecasting," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 40(8), pages 1379-1397, December.
    8. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    9. Osamu Komori & Shinto Eguchi & John B. Copas, 2015. "Generalized t-statistic for two-group classification," Biometrics, The International Biometric Society, vol. 71(2), pages 404-416, June.
    10. Murat Genç & M. Revan Özkale, 2021. "Usage of the GO estimator in high dimensional linear models," Computational Statistics, Springer, vol. 36(1), pages 217-239, March.
    11. Victor Chernozhukov & Christian Hansen & Yuan Liao, 2015. "A lava attack on the recovery of sums of dense and sparse signals," CeMMAP working papers CWP56/15, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    12. Wang, Shixuan & Syntetos, Aris A. & Liu, Ying & Di Cairano-Gilfedder, Carla & Naim, Mohamed M., 2023. "Improving automotive garage operations by categorical forecasts using a large number of variables," European Journal of Operational Research, Elsevier, vol. 306(2), pages 893-908.
    13. Zhang, Tonglin, 2024. "Variables selection using L0 penalty," Computational Statistics & Data Analysis, Elsevier, vol. 190(C).
    14. Takumi Saegusa & Tianzhou Ma & Gang Li & Ying Qing Chen & Mei-Ling Ting Lee, 2020. "Variable Selection in Threshold Regression Model with Applications to HIV Drug Adherence Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 12(3), pages 376-398, December.
    15. Zeng, Yaohui & Yang, Tianbao & Breheny, Patrick, 2021. "Hybrid safe–strong rules for efficient optimization in lasso-type problems," Computational Statistics & Data Analysis, Elsevier, vol. 153(C).
    16. Ruidi Chen & Ioannis Ch. Paschalidis, 2022. "Robust Grouped Variable Selection Using Distributionally Robust Optimization," Journal of Optimization Theory and Applications, Springer, vol. 194(3), pages 1042-1071, September.
    17. Korobilis, Dimitris, 2013. "Hierarchical shrinkage priors for dynamic regressions with many predictors," International Journal of Forecasting, Elsevier, vol. 29(1), pages 43-59.
    18. Yoshiki Nakajima & Naoya Sueishi, 2022. "Forecasting the Japanese macroeconomy using high-dimensional data," The Japanese Economic Review, Springer, vol. 73(2), pages 299-324, April.
    19. Sophie Lambert-Lacroix & Laurent Zwald, 2016. "The adaptive BerHu penalty in robust regression," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 28(3), pages 487-514, September.
    20. Huicong Yu & Jiaqi Wu & Weiping Zhang, 2024. "Simultaneous subgroup identification and variable selection for high dimensional data," Computational Statistics, Springer, vol. 39(6), pages 3181-3205, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:9:y:2021:i:3:p:222-:d:485789. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.