IDEAS home Printed from https://ideas.repec.org/a/spr/lifeda/v28y2022i2d10.1007_s10985-021-09544-2.html
   My bibliography  Save this article

Scalable proximal methods for cause-specific hazard modeling with time-varying coefficients

Author

Listed:
  • Wenbo Wu

    (University of Michigan)

  • Jeremy M. G. Taylor

    (University of Michigan)

  • Andrew F. Brouwer

    (University of Michigan)

  • Lingfeng Luo

    (University of Michigan)

  • Jian Kang

    (University of Michigan)

  • Hui Jiang

    (University of Michigan)

  • Kevin He

    (University of Michigan)

Abstract

Survival modeling with time-varying coefficients has proven useful in analyzing time-to-event data with one or more distinct failure types. When studying the cause-specific etiology of breast and prostate cancers using the large-scale data from the Surveillance, Epidemiology, and End Results (SEER) Program, we encountered two major challenges that existing methods for estimating time-varying coefficients cannot tackle. First, these methods, dependent on expanding the original data in a repeated measurement format, result in formidable time and memory consumption as the sample size escalates to over one million. In this case, even a well-configured workstation cannot accommodate their implementations. Second, when the large-scale data under analysis include binary predictors with near-zero variance (e.g., only 0.6% of patients in our SEER prostate cancer data had tumors regional to the lymph nodes), existing methods suffer from numerical instability due to ill-conditioned second-order information. The estimation accuracy deteriorates further with multiple competing risks. To address these issues, we propose a proximal Newton algorithm with a shared-memory parallelization scheme and tests of significance and nonproportionality for the time-varying effects. A simulation study shows that our scalable approach reduces the time and memory costs by orders of magnitude and enjoys improved estimation accuracy compared with alternative approaches. Applications to the SEER cancer data demonstrate the real-world performance of the proximal Newton algorithm.

Suggested Citation

  • Wenbo Wu & Jeremy M. G. Taylor & Andrew F. Brouwer & Lingfeng Luo & Jian Kang & Hui Jiang & Kevin He, 2022. "Scalable proximal methods for cause-specific hazard modeling with time-varying coefficients," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 28(2), pages 194-218, April.
  • Handle: RePEc:spr:lifeda:v:28:y:2022:i:2:d:10.1007_s10985-021-09544-2
    DOI: 10.1007/s10985-021-09544-2
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10985-021-09544-2
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10985-021-09544-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Eddelbuettel, Dirk & Sanderson, Conrad, 2014. "RcppArmadillo: Accelerating R with high-performance C++ linear algebra," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 1054-1063.
    2. Eddelbuettel, Dirk & Francois, Romain, 2011. "Rcpp: Seamless R and C++ Integration," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 40(i08).
    3. Jun Yan & Jian Huang, 2012. "Model Selection for Cox Models with Time-Varying Coefficients," Biometrics, The International Biometric Society, vol. 68(2), pages 419-428, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wilson J. Wright & Peter N. Neitlich & Alyssa E. Shiel & Mevin B. Hooten, 2022. "Mechanistic spatial models for heavy metal pollution," Environmetrics, John Wiley & Sons, Ltd., vol. 33(8), December.
    2. François Bachoc & Marc G Genton & Klaus Nordhausen & Anne Ruiz-Gazen & Joni Virta, 2020. "Spatial blind source separation," Biometrika, Biometrika Trust, vol. 107(3), pages 627-646.
    3. James Joseph Balamuta & Steven Andrew Culpepper, 2022. "Exploratory Restricted Latent Class Models with Monotonicity Requirements under PÒLYA–GAMMA Data Augmentation," Psychometrika, Springer;The Psychometric Society, vol. 87(3), pages 903-945, September.
    4. Athanasios C. Micheas & Jiaxun Chen, 2018. "sppmix: Poisson point process modeling using normal mixture models," Computational Statistics, Springer, vol. 33(4), pages 1767-1798, December.
    5. Helmut Lutkepohl & Fei Shang & Luis Uzeda & Tomasz Wo'zniak, 2024. "Partial Identification of Heteroskedastic Structural VARs: Theory and Bayesian Inference," Papers 2404.11057, arXiv.org.
    6. Battauz, Michela & Vidoni, Paolo, 2022. "A likelihood-based boosting algorithm for factor analysis models with binary data," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    7. Francis J. DiTraglia, 2011. "Using Invalid Instruments on Purpose: Focused Moment Selection and Averaging for GMM, Second Version," PIER Working Paper Archive 14-045, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, revised 09 Dec 2014.
    8. Jos'e Vin'icius de Miranda Cardoso & Jiaxi Ying & Daniel Perez Palomar, 2020. "Algorithms for Learning Graphs in Financial Markets," Papers 2012.15410, arXiv.org.
    9. Batarce, Marco, 2024. "Estimation of discrete choice models with error in variables: An application to revealed preference data with aggregate service level variables," Transportation Research Part B: Methodological, Elsevier, vol. 185(C).
    10. Shen, Yunyi & Olson, Erik R. & Van Deelen, Timothy R., 2021. "Spatially explicit modeling of community occupancy using Markov Random Field models with imperfect observation: Mesocarnivores in Apostle Islands National Lakeshore," Ecological Modelling, Elsevier, vol. 459(C).
    11. Xiaotian Zhu & David R. Hunter, 2019. "Clustering via finite nonparametric ICA mixture models," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 65-87, March.
    12. Matthew Pietrosanu & Jueyu Gao & Linglong Kong & Bei Jiang & Di Niu, 2021. "Advanced algorithms for penalized quantile and composite quantile regression," Computational Statistics, Springer, vol. 36(1), pages 333-346, March.
    13. DiTraglia, Francis J., 2016. "Using invalid instruments on purpose: Focused moment selection and averaging for GMM," Journal of Econometrics, Elsevier, vol. 195(2), pages 187-208.
    14. Lee, Xing Ju & Hainy, Markus & McKeone, James P. & Drovandi, Christopher C. & Pettitt, Anthony N., 2018. "ABC model selection for spatial extremes models applied to South Australian maximum temperature data," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 128-144.
    15. Sbrana, Giacomo & Pelagatti, Matteo, 2024. "Optimal hierarchical EWMA forecasting," International Journal of Forecasting, Elsevier, vol. 40(2), pages 616-625.
    16. Francis DiTraglia, 2011. "Using Invalid Instruments on Purpose: Focused Moment Selection and Averaging for GMM, Second Version," PIER Working Paper Archive 15-027, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, revised 10 Aug 2015.
    17. Dirk Eddelbuettel & James Joseph Balamuta, 2018. "Extending R with C++: A Brief Introduction to Rcpp," The American Statistician, Taylor & Francis Journals, vol. 72(1), pages 28-36, January.
    18. Erick da Conceição Amorim & Vinícius Diniz Mayrink, 2020. "Clustering non-linear interactions in factor analysis," METRON, Springer;Sapienza Università di Roma, vol. 78(3), pages 329-352, December.
    19. Fernández de Marcos Giménez de los Galanes, Alberto, 2022. "Data-driven stabilizations of goodness-of-fit tests," DES - Working Papers. Statistics and Econometrics. WS 35324, Universidad Carlos III de Madrid. Departamento de Estadística.
    20. Sloot Henrik, 2022. "Implementing Markovian models for extendible Marshall–Olkin distributions," Dependence Modeling, De Gruyter, vol. 10(1), pages 308-343, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:lifeda:v:28:y:2022:i:2:d:10.1007_s10985-021-09544-2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.