IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v79y2023i1p190-202.html
   My bibliography  Save this article

Risk prediction with imperfect survival outcome information from electronic health records

Author

Listed:
  • Jue Hou
  • Stephanie F. Chan
  • Xuan Wang
  • Tianxi Cai

Abstract

Readily available proxies for the time of disease onset such as the time of the first diagnostic code can lead to substantial risk prediction error if performing analyses based on poor proxies. Due to the lack of detailed documentation and labor intensiveness of manual annotation, it is often only feasible to ascertain for a small subset the current status of the disease by a follow‐up time rather than the exact time. In this paper, we aim to develop risk prediction models for the onset time efficiently leveraging both a small number of labels on the current status and a large number of unlabeled observations on imperfect proxies. Under a semiparametric transformation model for onset and a highly flexible measurement error model for proxy onset time, we propose the semisupervised risk prediction method by combining information from proxies and limited labels efficiently. From an initially estimator solely based on the labeled subset, we perform a one‐step correction with the full data augmenting against a mean zero rank correlation score derived from the proxies. We establish the consistency and asymptotic normality of the proposed semisupervised estimator and provide a resampling procedure for interval estimation. Simulation studies demonstrate that the proposed estimator performs well in a finite sample. We illustrate the proposed estimator by developing a genetic risk prediction model for obesity using data from Mass General Brigham Healthcare Biobank.

Suggested Citation

  • Jue Hou & Stephanie F. Chan & Xuan Wang & Tianxi Cai, 2023. "Risk prediction with imperfect survival outcome information from electronic health records," Biometrics, The International Biometric Society, vol. 79(1), pages 190-202, March.
  • Handle: RePEc:bla:biomet:v:79:y:2023:i:1:p:190-202
    DOI: 10.1111/biom.13599
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13599
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13599?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Donglin Zeng & Lu Mao & D. Y. Lin, 2016. "Maximum likelihood estimation for semiparametric transformation models with interval-censored data," Biometrika, Biometrika Trust, vol. 103(2), pages 253-271.
    2. Chen, Ling & Sun, Jianguo, 2010. "A multiple imputation approach to the analysis of interval-censored failure time data with the additive hazards model," Computational Statistics & Data Analysis, Elsevier, vol. 54(4), pages 1109-1116, April.
    3. Laber, Eric B. & Murphy, Susan A., 2011. "Adaptive Confidence Intervals for the Test Error in Classification," Journal of the American Statistical Association, American Statistical Association, vol. 106(495), pages 904-913.
    4. Lu Tian & Tianxi Cai, 2006. "On the accelerated failure time model for current status and interval censored data," Biometrika, Biometrika Trust, vol. 93(2), pages 329-342, June.
    5. Sherman, Robert P, 1993. "The Limiting Distribution of the Maximum Rank Correlation Estimator," Econometrica, Econometric Society, vol. 61(1), pages 123-137, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Choi, Taehwa & Kim, Arlene K.H. & Choi, Sangbum, 2021. "Semiparametric least-squares regression with doubly-censored data," Computational Statistics & Data Analysis, Elsevier, vol. 164(C).
    2. Patrick Bajari & Jeremy Fox & Stephen Ryan, 2008. "Evaluating wireless carrier consolidation using semiparametric demand estimation," Quantitative Marketing and Economics (QME), Springer, vol. 6(4), pages 299-338, December.
    3. Lavergne, Pascal & Patilea, Valentin, 2013. "Smooth minimum distance estimation and testing with conditional estimating equations: Uniform in bandwidth theory," Journal of Econometrics, Elsevier, vol. 177(1), pages 47-59.
    4. Jochmans, Koen, 2012. "The variance of a rank estimator of transformation models," Economics Letters, Elsevier, vol. 117(1), pages 168-169.
    5. Khan, Shakeeb & Tamer, Elie, 2007. "Partial rank estimation of duration models with general forms of censoring," Journal of Econometrics, Elsevier, vol. 136(1), pages 251-280, January.
    6. Jochmans, Koen, 2015. "Multiplicative-error models with sample selection," Journal of Econometrics, Elsevier, vol. 184(2), pages 315-327.
    7. Jiannan Lu & Peng Ding & Tirthankar Dasgupta, 2018. "Treatment Effects on Ordinal Outcomes: Causal Estimands and Sharp Bounds," Journal of Educational and Behavioral Statistics, , vol. 43(5), pages 540-567, October.
    8. Qingning Zhou & Jianwen Cai & Haibo Zhou, 2018. "Outcome†dependent sampling with interval†censored failure time data," Biometrics, The International Biometric Society, vol. 74(1), pages 58-67, March.
    9. Ming-Yueh Huang & Chin-Tsang Chiang, 2017. "Estimation and Inference Procedures for Semiparametric Distribution Models with Varying Linear-Index," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 44(2), pages 396-424, June.
    10. repec:hal:wpspec:info:hdl:2441/3vl5fe4i569nbr005tctlc8ll5 is not listed on IDEAS
    11. Shakeeb Khan & Arnaud Maurel & Yichong Zhang, 2023. "Informational Content of Factor Structures in Simultaneous Binary Response Models," Advances in Econometrics, in: Essays in Honor of Joon Y. Park: Econometric Methodology in Empirical Applications, volume 45, pages 385-410, Emerald Group Publishing Limited.
    12. Chin-Tsang Chiang & Shr-Yan Huang, 2009. "Estimation for the Optimal Combination of Markers without Modeling the Censoring Distribution," Biometrics, The International Biometric Society, vol. 65(1), pages 152-158, March.
    13. repec:spo:wpmain:info:hdl:2441/dambferfb7dfprc9m01h6f4h2 is not listed on IDEAS
    14. Koen Jochmans, 2011. "Identification in Bivariate binary-choice Models with elliptical innovations," Working Papers hal-01069483, HAL.
    15. Isaiah Andrews & Toru Kitagawa & Adam McCloskey, 2024. "Inference on Winners," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 139(1), pages 305-358.
    16. Sokbae Lee & Myung Hwan Seo & Youngki Shin, 2017. "Correction," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 883-883, April.
    17. Margaret Sullivan Pepe & Tianxi Cai & Gary Longton, 2006. "Combining Predictors for Classification Using the Area under the Receiver Operating Characteristic Curve," Biometrics, The International Biometric Society, vol. 62(1), pages 221-229, March.
    18. Beilin Jia & Donglin Zeng & Jason J. Z. Liao & Guanghan F. Liu & Xianming Tan & Guoqing Diao & Joseph G. Ibrahim, 2022. "Mixture survival trees for cancer risk classification," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 28(3), pages 356-379, July.
    19. Gorgens, Tue & Horowitz, Joel L., 1999. "Semiparametric estimation of a censored regression model with an unknown transformation of the dependent variable," Journal of Econometrics, Elsevier, vol. 90(2), pages 155-191, June.
    20. repec:spo:wpmain:info:hdl:2441/3vl5fe4i569nbr005tctlc8ll5 is not listed on IDEAS
    21. Youngki Shin & Zvezdomir Todorov, 2021. "Exact computation of maximum rank correlation estimator," The Econometrics Journal, Royal Economic Society, vol. 24(3), pages 589-607.
    22. Xin Qiu & Donglin Zeng & Yuanjia Wang, 2018. "Estimation and evaluation of linear individualized treatment rules to guarantee performance," Biometrics, The International Biometric Society, vol. 74(2), pages 517-528, June.
    23. Lewbel, Arthur & McFadden, Daniel & Linton, Oliver, 2011. "Estimating features of a distribution from binomial data," Journal of Econometrics, Elsevier, vol. 162(2), pages 170-188, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:79:y:2023:i:1:p:190-202. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.