IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v93y2016icp373-387.html
   My bibliography  Save this article

Bayesian network data imputation with application to survival tree analysis

Author

Listed:
  • Rancoita, Paola M.V.
  • Zaffalon, Marco
  • Zucca, Emanuele
  • Bertoni, Francesco
  • de Campos, Cassio P.

Abstract

Retrospective clinical datasets are often characterized by a relatively small sample size and many missing data. In this case, a common way for handling the missingness consists in discarding from the analysis patients with missing covariates, further reducing the sample size. Alternatively, if the mechanism that generated the missing allows, incomplete data can be imputed on the basis of the observed data, avoiding the reduction of the sample size and allowing methods to deal with complete data later on. Moreover, methodologies for data imputation might depend on the particular purpose and might achieve better results by considering specific characteristics of the domain. The problem of missing data treatment is studied in the context of survival tree analysis for the estimation of a prognostic patient stratification. Survival tree methods usually address this problem by using surrogate splits, that is, splitting rules that use other variables yielding similar results to the original ones. Instead, our methodology consists in modeling the dependencies among the clinical variables with a Bayesian network, which is then used to perform data imputation, thus allowing the survival tree to be applied on the completed dataset. The Bayesian network is directly learned from the incomplete data using a structural expectation–maximization (EM) procedure in which the maximization step is performed with an exact anytime method, so that the only source of approximation is due to the EM formulation itself. On both simulated and real data, our proposed methodology usually outperformed several existing methods for data imputation and the imputation so obtained improved the stratification estimated by the survival tree (especially with respect to using surrogate splits).

Suggested Citation

  • Rancoita, Paola M.V. & Zaffalon, Marco & Zucca, Emanuele & Bertoni, Francesco & de Campos, Cassio P., 2016. "Bayesian network data imputation with application to survival tree analysis," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 373-387.
  • Handle: RePEc:eee:csdana:v:93:y:2016:i:c:p:373-387
    DOI: 10.1016/j.csda.2014.12.008
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947314003569
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2014.12.008?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Marco Di Zio & Mauro Scanu & Lucia Coppola & Orietta Luzi & Alessandra Ponti, 2004. "Bayesian networks for imputation," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 167(2), pages 309-322, May.
    2. Fan, Juanjuan & Nunn, Martha E. & Su, Xiaogang, 2009. "Multivariate exponential survival trees and their application to tooth prognosis," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1110-1121, February.
    3. Simon, Noah & Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2011. "Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 39(i05).
    4. Ciampi, Antonio & Thiffault, Johanne & Nakache, Jean-Pierre & Asselain, Bernard, 1986. "Stratification by stepwise regression, correspondence analysis and recursive partition: a comparison of three methods of analysis for survival data with covariates," Computational Statistics & Data Analysis, Elsevier, vol. 4(3), pages 185-204, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Soave, David & Lawless, Jerald F., 2023. "Regularized regression for two phase failure time studies," Computational Statistics & Data Analysis, Elsevier, vol. 182(C).
    2. Zemin Zheng & Jie Zhang & Yang Li, 2022. "L 0 -Regularized Learning for High-Dimensional Additive Hazards Regression," INFORMS Journal on Computing, INFORMS, vol. 34(5), pages 2762-2775, September.
    3. Simon Bussy & Mokhtar Z. Alaya & Anne‐Sophie Jannot & Agathe Guilloux, 2022. "Binacox: automatic cut‐point detection in high‐dimensional Cox model with applications in genetics," Biometrics, The International Biometric Society, vol. 78(4), pages 1414-1426, December.
    4. Biagini, Francesca & Groll, Andreas & Widenmann, Jan, 2013. "Intensity-based premium evaluation for unemployment insurance products," Insurance: Mathematics and Economics, Elsevier, vol. 53(1), pages 302-316.
    5. Benedicte Sjo Tislevoll & Monica Hellesøy & Oda Helen Eck Fagerholt & Stein-Erik Gullaksen & Aashish Srivastava & Even Birkeland & Dimitrios Kleftogiannis & Pilar Ayuda-Durán & Laure Piechaczyk & Dagi, 2023. "Early response evaluation by single cell signaling profiling in acute myeloid leukemia," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    6. Leandro C. Hermida & E. Michael Gertz & Eytan Ruppin, 2022. "Predicting cancer prognosis and drug response from the tumor microbiome," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    7. Takumi Saegusa & Tianzhou Ma & Gang Li & Ying Qing Chen & Mei-Ling Ting Lee, 2020. "Variable Selection in Threshold Regression Model with Applications to HIV Drug Adherence Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 12(3), pages 376-398, December.
    8. Wenhua Liang & Jianhua Yao & Ailan Chen & Qingquan Lv & Mark Zanin & Jun Liu & SookSan Wong & Yimin Li & Jiatao Lu & Hengrui Liang & Guoqiang Chen & Haiyan Guo & Jun Guo & Rong Zhou & Limin Ou & Niyun, 2020. "Early triage of critically ill COVID-19 patients using deep learning," Nature Communications, Nature, vol. 11(1), pages 1-7, December.
    9. Tatyana Deryugina & Garth Heutel & Nolan H. Miller & David Molitor & Julian Reif, 2019. "The Mortality and Medical Costs of Air Pollution: Evidence from Changes in Wind Direction," American Economic Review, American Economic Association, vol. 109(12), pages 4178-4219, December.
    10. Andreas Groll & Gerhard Tutz, 2017. "Variable selection in discrete survival models including heterogeneity," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(2), pages 305-338, April.
    11. Kevin He & Yue Wang & Xiang Zhou & Han Xu & Can Huang, 2019. "An improved variable selection procedure for adaptive Lasso in high-dimensional survival analysis," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 25(3), pages 569-585, July.
    12. Yan Zhou & John McArdle, 2015. "Rationale and Applications of Survival Tree and Survival Ensemble Methods," Psychometrika, Springer;The Psychometric Society, vol. 80(3), pages 811-833, September.
    13. Besse, Philippe & Leconte, Eve & Walschaerts, Marie, 2012. "Stable variable selection for right censored data: comparison of methods," TSE Working Papers 12-486, Toulouse School of Economics (TSE).
    14. Lidia Ceriani & Chiara Gigliarano, 2020. "Multidimensional Well-Being: A Bayesian Networks Approach," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 152(1), pages 237-263, November.
    15. Das Ujjwal & Ebrahimi Nader, 2018. "A New Method For Covariate Selection In Cox Model," Statistics in Transition New Series, Polish Statistical Association, vol. 19(2), pages 297-314, June.
    16. Matthew F Dixon, 2017. "Sequence Classification of the Limit Order Book using Recurrent Neural Networks," Papers 1707.05642, arXiv.org.
    17. Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
    18. Thierry Chekouo & Francesco C. Stingo & James D. Doecke & Kim-Anh Do, 2017. "A Bayesian integrative approach for multi-platform genomic data: A kidney cancer case study," Biometrics, The International Biometric Society, vol. 73(2), pages 615-624, June.
    19. Jie Xiong & Zhitong Bing & Yanlin Su & Defeng Deng & Xiaoning Peng, 2014. "An Integrated mRNA and microRNA Expression Signature for Glioblastoma Multiforme Prognosis," PLOS ONE, Public Library of Science, vol. 9(5), pages 1-8, May.
    20. Liao Zhu & Robert A. Jarrow & Martin T. Wells, 2021. "Time-Invariance Coefficients Tests with the Adaptive Multi-Factor Model," Quarterly Journal of Finance (QJF), World Scientific Publishing Co. Pte. Ltd., vol. 11(04), pages 1-30, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:93:y:2016:i:c:p:373-387. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.