IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i4p819-d1059153.html
   My bibliography  Save this article

Machine Learning at the Service of Survival Analysis: Predictions Using Time-to-Event Decomposition and Classification Applied to a Decrease of Blood Antibodies against COVID-19

Author

Listed:
  • Lubomír Štěpánek

    (Department of Statistics and Probability, Faculty of Informatics and Statistics, Prague University of Economics and Business, W. Churchill’s Square 1938/4, 130 67 Prague, Czech Republic)

  • Filip Habarta

    (Department of Statistics and Probability, Faculty of Informatics and Statistics, Prague University of Economics and Business, W. Churchill’s Square 1938/4, 130 67 Prague, Czech Republic)

  • Ivana Malá

    (Department of Statistics and Probability, Faculty of Informatics and Statistics, Prague University of Economics and Business, W. Churchill’s Square 1938/4, 130 67 Prague, Czech Republic)

  • Ladislav Štěpánek

    (Department of Occupational Medicine, University Hospital Olomouc and Faculty of Medicine and Dentistry, Palacký University Olomouc, I. P. Pavlova 185/6, 779 00 Olomouc, Czech Republic)

  • Marie Nakládalová

    (Department of Occupational Medicine, University Hospital Olomouc and Faculty of Medicine and Dentistry, Palacký University Olomouc, I. P. Pavlova 185/6, 779 00 Olomouc, Czech Republic)

  • Alena Boriková

    (Department of Occupational Medicine, University Hospital Olomouc and Faculty of Medicine and Dentistry, Palacký University Olomouc, I. P. Pavlova 185/6, 779 00 Olomouc, Czech Republic)

  • Luboš Marek

    (Department of Statistics and Probability, Faculty of Informatics and Statistics, Prague University of Economics and Business, W. Churchill’s Square 1938/4, 130 67 Prague, Czech Republic)

Abstract

The Cox proportional hazard model may predict whether an individual belonging to a given group would likely register an event of interest at a given time. However, the Cox model is limited by relatively strict statistical assumptions. In this study, we propose decomposing the time-to-event variable into “time” and “event” components and using the latter as a target variable for various machine-learning classification algorithms, which are almost assumption-free, unlike the Cox model. While the time component is continuous and is used as one of the covariates, i.e., input variables for various classification algorithms such as logistic regression, naïve Bayes classifiers, decision trees, random forests, and artificial neural networks, the event component is binary and thus may be modeled using these classification algorithms. Moreover, we apply the proposed method to predict a decrease or non-decrease of IgG and IgM blood antibodies against COVID-19 (SARS-CoV-2), respectively, below a laboratory cut-off, for a given individual at a given time point. Using train-test splitting of the COVID-19 dataset ( n = 663 individuals), models for the mentioned algorithms, including the Cox proportional hazard model, are learned and built on the train subsets while tested on the test ones. To increase robustness of the model performance evaluation, models’ predictive accuracies are estimated using 10-fold cross-validation on the split dataset. Even though the time-to-event variable decomposition might ignore the effect of individual data censoring, many algorithms show similar or even higher predictive accuracy compared to the traditional Cox proportional hazard model. In COVID-19 IgG decrease prediction, multivariate logistic regression (of accuracy 0.811 ), support vector machines (of accuracy 0.845 ), random forests (of accuracy 0.836 ), artificial neural networks (of accuracy 0.806 ) outperform the Cox proportional hazard model (of accuracy 0.796 ), while in COVID-19 IgM antibody decrease prediction, neither Cox regression nor other algorithms perform well (best accuracy is 0.627 for Cox regression). An accurate prediction of mainly COVID-19 IgG antibody decrease can help the healthcare system manage, with no need for extensive blood testing, to identify individuals, for instance, who could postpone boosting vaccination if new COVID-19 variant incomes or should be flagged as high risk due to low COVID-19 antibodies.

Suggested Citation

  • Lubomír Štěpánek & Filip Habarta & Ivana Malá & Ladislav Štěpánek & Marie Nakládalová & Alena Boriková & Luboš Marek, 2023. "Machine Learning at the Service of Survival Analysis: Predictions Using Time-to-Event Decomposition and Classification Applied to a Decrease of Blood Antibodies against COVID-19," Mathematics, MDPI, vol. 11(4), pages 1-27, February.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:4:p:819-:d:1059153
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/4/819/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/4/819/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Ming-Hui Chen & Joseph G. Ibrahim & Qi-Man Shao, 2006. "Posterior propriety and computation for the Cox regression model with applications to missing covariates," Biometrika, Biometrika Trust, vol. 93(4), pages 791-807, December.
    2. Chen, Ming-Hui & Ibrahim, Joseph G. & Shao, Qi-Man, 2009. "Maximum likelihood inference for the Cox regression model with applications to missing covariates," Journal of Multivariate Analysis, Elsevier, vol. 100(9), pages 2018-2030, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ana Ezquerro & Brais Cancela & Ana López-Cheda, 2023. "On the Reliability of Machine Learning Models for Survival Analysis When Cure Is a Possibility," Mathematics, MDPI, vol. 11(19), pages 1-21, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Peter A. F. Fraser‐Mackenzie & Tiejun Ma & Ming‐Chien Sung & Johnnie E. V. Johnson, 2019. "Let's Call it Quits: Break‐Even Effects in the Decision to Stop Taking Risks," Risk Analysis, John Wiley & Sons, vol. 39(7), pages 1560-1581, July.
    2. Ryo Kato & Takahiro Hoshino, 2020. "Semiparametric Bayesian multiple imputation for regression models with missing mixed continuous–discrete covariates," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 72(3), pages 803-825, June.
    3. Ryo Kato & Takahiro Hoshino, 2018. "Semiparametric Bayes Multiple Imputation for Regression Models with Missing Mixed Continuous-Discrete Covariates," Discussion Paper Series DP2018-15, Research Institute for Economics & Business Administration, Kobe University.
    4. Chen, Ming-Hui & Ibrahim, Joseph G. & Shao, Qi-Man, 2009. "Maximum likelihood inference for the Cox regression model with applications to missing covariates," Journal of Multivariate Analysis, Elsevier, vol. 100(9), pages 2018-2030, October.
    5. Joseph Ibrahim & Geert Molenberghs, 2009. "Missing data methods in longitudinal studies: a review," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 18(1), pages 1-43, May.
    6. Mário de Castro & Ming‐Hui Chen & Yuanye Zhang & Anthony V. D'Amico, 2020. "A Bayesian multi‐risks survival (MRS) model in the presence of double censorings," Biometrics, The International Biometric Society, vol. 76(4), pages 1297-1309, December.
    7. Joseph Ibrahim & Geert Molenberghs, 2009. "Rejoinder on: Missing data methods in longitudinal studies: a review," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 18(1), pages 68-75, May.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:4:p:819-:d:1059153. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.