IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v9y2021i3p222-d485789.html
   My bibliography  Save this article

Iterative Variable Selection for High-Dimensional Data: Prediction of Pathological Response in Triple-Negative Breast Cancer

Author

Listed:
  • Juan C. Laria

    (UC3M-BS Santander Big Data Institute, 28903 Getafe, Spain)

  • M. Carmen Aguilera-Morillo

    (UC3M-BS Santander Big Data Institute, 28903 Getafe, Spain
    Department of Applied Statistics and Operational Research and Quality, Universitat Politècnica de València, 46022 Valencia, Spain)

  • Enrique Álvarez

    (Department of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, Spain)

  • Rosa E. Lillo

    (UC3M-BS Santander Big Data Institute, 28903 Getafe, Spain
    Department of Statistics, University Carlos III of Madrid, 28903 Getafe, Spain)

  • Sara López-Taruella

    (Department of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, Spain
    CiberOnc—Centro de Investigación Biomédica en Red, 28029 Madrid, Spain
    Universidad Complutense de Madrid, 28040 Madrid, Spain
    GEICAM—Grupo Español de Investigación en Cáncer de Mama, 28703 San Sebastián de los Reyes, Spain)

  • María del Monte-Millán

    (Department of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, Spain
    CiberOnc—Centro de Investigación Biomédica en Red, 28029 Madrid, Spain)

  • Antonio C. Picornell

    (Department of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, Spain)

  • Miguel Martín

    (Department of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, Spain
    CiberOnc—Centro de Investigación Biomédica en Red, 28029 Madrid, Spain
    Universidad Complutense de Madrid, 28040 Madrid, Spain
    GEICAM—Grupo Español de Investigación en Cáncer de Mama, 28703 San Sebastián de los Reyes, Spain)

  • Juan Romo

    (UC3M-BS Santander Big Data Institute, 28903 Getafe, Spain
    Department of Statistics, University Carlos III of Madrid, 28903 Getafe, Spain)

Abstract

Over the last decade, regularized regression methods have offered alternatives for performing multi-marker analysis and feature selection in a whole genome context. The process of defining a list of genes that will characterize an expression profile remains unclear. It currently relies upon advanced statistics and can use an agnostic point of view or include some a priori knowledge, but overfitting remains a problem. This paper introduces a methodology to deal with the variable selection and model estimation problems in the high-dimensional set-up, which can be particularly useful in the whole genome context. Results are validated using simulated data and a real dataset from a triple-negative breast cancer study.

Suggested Citation

  • Juan C. Laria & M. Carmen Aguilera-Morillo & Enrique Álvarez & Rosa E. Lillo & Sara López-Taruella & María del Monte-Millán & Antonio C. Picornell & Miguel Martín & Juan Romo, 2021. "Iterative Variable Selection for High-Dimensional Data: Prediction of Pathological Response in Triple-Negative Breast Cancer," Mathematics, MDPI, vol. 9(3), pages 1-14, January.
  • Handle: RePEc:gam:jmathe:v:9:y:2021:i:3:p:222-:d:485789
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/9/3/222/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/9/3/222/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    2. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    3. Ming Yuan & Yi Lin, 2006. "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(1), pages 49-67, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    2. Shuichi Kawano, 2014. "Selection of tuning parameters in bridge regression models via Bayesian information criterion," Statistical Papers, Springer, vol. 55(4), pages 1207-1223, November.
    3. Bilin Zeng & Xuerong Meggie Wen & Lixing Zhu, 2017. "A link-free sparse group variable selection method for single-index model," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(13), pages 2388-2400, October.
    4. Capanu, Marinela & Giurcanu, Mihai & Begg, Colin B. & Gönen, Mithat, 2023. "Subsampling based variable selection for generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 184(C).
    5. Yu-Min Yen, 2010. "A Note on Sparse Minimum Variance Portfolios and Coordinate-Wise Descent Algorithms," Papers 1005.5082, arXiv.org, revised Sep 2013.
    6. Tomáš Plíhal, 2021. "Scheduled macroeconomic news announcements and Forex volatility forecasting," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 40(8), pages 1379-1397, December.
    7. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    8. Osamu Komori & Shinto Eguchi & John B. Copas, 2015. "Generalized t-statistic for two-group classification," Biometrics, The International Biometric Society, vol. 71(2), pages 404-416, June.
    9. Victor Chernozhukov & Christian Hansen & Yuan Liao, 2015. "A lava attack on the recovery of sums of dense and sparse signals," CeMMAP working papers CWP56/15, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    10. Zhang, Tonglin, 2024. "Variables selection using L0 penalty," Computational Statistics & Data Analysis, Elsevier, vol. 190(C).
    11. Takumi Saegusa & Tianzhou Ma & Gang Li & Ying Qing Chen & Mei-Ling Ting Lee, 2020. "Variable Selection in Threshold Regression Model with Applications to HIV Drug Adherence Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 12(3), pages 376-398, December.
    12. Ruidi Chen & Ioannis Ch. Paschalidis, 2022. "Robust Grouped Variable Selection Using Distributionally Robust Optimization," Journal of Optimization Theory and Applications, Springer, vol. 194(3), pages 1042-1071, September.
    13. Sophie Lambert-Lacroix & Laurent Zwald, 2016. "The adaptive BerHu penalty in robust regression," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 28(3), pages 487-514, September.
    14. Zanhua Yin, 2020. "Variable selection for sparse logistic regression," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 83(7), pages 821-836, October.
    15. Qingliang Fan & Yaqian Wu, 2020. "Endogenous Treatment Effect Estimation with some Invalid and Irrelevant Instruments," Papers 2006.14998, arXiv.org.
    16. Matteo Barigozzi & Marc Hallin, 2017. "A network analysis of the volatility of high dimensional financial series," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 66(3), pages 581-605, April.
    17. Dmitry Kobak & Yves Bernaerts & Marissa A. Weis & Federico Scala & Andreas S. Tolias & Philipp Berens, 2021. "Sparse reduced‐rank regression for exploratory visualisation of paired multivariate data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 980-1000, August.
    18. Pei Wang & Shunjie Chen & Sijia Yang, 2022. "Recent Advances on Penalized Regression Models for Biological Data," Mathematics, MDPI, vol. 10(19), pages 1-24, October.
    19. David Degras, 2021. "Sparse group fused lasso for model segmentation: a hybrid approach," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(3), pages 625-671, September.
    20. Ricardo P. Masini & Marcelo C. Medeiros & Eduardo F. Mendes, 2023. "Machine learning advances for time series forecasting," Journal of Economic Surveys, Wiley Blackwell, vol. 37(1), pages 76-111, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:9:y:2021:i:3:p:222-:d:485789. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.