Scalable proximal methods for cause-specific hazard modeling with time-varying coefficients

My bibliography Save this article

Scalable proximal methods for cause-specific hazard modeling with time-varying coefficients

Author

Listed:

Wenbo Wu
(University of Michigan)
Jeremy M. G. Taylor
(University of Michigan)
Andrew F. Brouwer
(University of Michigan)
Lingfeng Luo
(University of Michigan)
Jian Kang
(University of Michigan)
Hui Jiang
(University of Michigan)
Kevin He
(University of Michigan)

Registered:

Abstract

Survival modeling with time-varying coefficients has proven useful in analyzing time-to-event data with one or more distinct failure types. When studying the cause-specific etiology of breast and prostate cancers using the large-scale data from the Surveillance, Epidemiology, and End Results (SEER) Program, we encountered two major challenges that existing methods for estimating time-varying coefficients cannot tackle. First, these methods, dependent on expanding the original data in a repeated measurement format, result in formidable time and memory consumption as the sample size escalates to over one million. In this case, even a well-configured workstation cannot accommodate their implementations. Second, when the large-scale data under analysis include binary predictors with near-zero variance (e.g., only 0.6% of patients in our SEER prostate cancer data had tumors regional to the lymph nodes), existing methods suffer from numerical instability due to ill-conditioned second-order information. The estimation accuracy deteriorates further with multiple competing risks. To address these issues, we propose a proximal Newton algorithm with a shared-memory parallelization scheme and tests of significance and nonproportionality for the time-varying effects. A simulation study shows that our scalable approach reduces the time and memory costs by orders of magnitude and enjoys improved estimation accuracy compared with alternative approaches. Applications to the SEER cancer data demonstrate the real-world performance of the proximal Newton algorithm.

Suggested Citation

Wenbo Wu & Jeremy M. G. Taylor & Andrew F. Brouwer & Lingfeng Luo & Jian Kang & Hui Jiang & Kevin He, 2022. "Scalable proximal methods for cause-specific hazard modeling with time-varying coefficients," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 28(2), pages 194-218, April.

Handle: RePEc:spr:lifeda:v:28:y:2022:i:2:d:10.1007_s10985-021-09544-2
DOI: 10.1007/s10985-021-09544-2

Download full text from publisher

As the access to this document is restricted, you may want to search for a different version of it.

References listed on IDEAS

Wright, Marvin N. & Ziegler, Andreas, 2017. "ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 77(i01).
Eddelbuettel, Dirk & Sanderson, Conrad, 2014. "RcppArmadillo: Accelerating R with high-performance C++ linear algebra," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 1054-1063.
Eddelbuettel, Dirk & Francois, Romain, 2011. "Rcpp: Seamless R and C++ Integration," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 40(i08).
Jun Yan & Jian Huang, 2012. "Model Selection for Cox Models with Time-Varying Coefficients," Biometrics, The International Biometric Society, vol. 68(2), pages 419-428, June.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Wilson J. Wright & Peter N. Neitlich & Alyssa E. Shiel & Mevin B. Hooten, 2022. "Mechanistic spatial models for heavy metal pollution," Environmetrics, John Wiley & Sons, Ltd., vol. 33(8), December.
François Bachoc & Marc G Genton & Klaus Nordhausen & Anne Ruiz-Gazen & Joni Virta, 2020. "Spatial blind source separation," Biometrika, Biometrika Trust, vol. 107(3), pages 627-646.
- Bachoc, François & Genton, Mark G. & Nordhausen, Klaus & Ruiz-Gazen, Anne & Virta, Joni, 2019. "Spatial Blind Source Separation," TSE Working Papers 19-998, Toulouse School of Economics (TSE).
James Joseph Balamuta & Steven Andrew Culpepper, 2022. "Exploratory Restricted Latent Class Models with Monotonicity Requirements under PÒLYA–GAMMA Data Augmentation," Psychometrika, Springer;The Psychometric Society, vol. 87(3), pages 903-945, September.
Athanasios C. Micheas & Jiaxun Chen, 2018. "sppmix: Poisson point process modeling using normal mixture models," Computational Statistics, Springer, vol. 33(4), pages 1767-1798, December.
Helmut Lutkepohl & Fei Shang & Luis Uzeda & Tomasz Wo'zniak, 2024. "Partial Identification of Heteroskedastic Structural VARs: Theory and Bayesian Inference," Papers 2404.11057, arXiv.org.
- Helmut Lütkepohl & Fei Shang & Luis Uzeda & Tomasz Woźniak, 2024. "Partial Identification of Heteroskedastic Structural VARs: Theory and Bayesian Inference," Discussion Papers of DIW Berlin 2081, DIW Berlin, German Institute for Economic Research.
Battauz, Michela & Vidoni, Paolo, 2022. "A likelihood-based boosting algorithm for factor analysis models with binary data," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
Francis J. DiTraglia, 2011. "Using Invalid Instruments on Purpose: Focused Moment Selection and Averaging for GMM, Second Version," PIER Working Paper Archive 14-045, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, revised 09 Dec 2014.
Jos'e Vin'icius de Miranda Cardoso & Jiaxi Ying & Daniel Perez Palomar, 2020. "Algorithms for Learning Graphs in Financial Markets," Papers 2012.15410, arXiv.org.
Batarce, Marco, 2024. "Estimation of discrete choice models with error in variables: An application to revealed preference data with aggregate service level variables," Transportation Research Part B: Methodological, Elsevier, vol. 185(C).
Shen, Yunyi & Olson, Erik R. & Van Deelen, Timothy R., 2021. "Spatially explicit modeling of community occupancy using Markov Random Field models with imperfect observation: Mesocarnivores in Apostle Islands National Lakeshore," Ecological Modelling, Elsevier, vol. 459(C).
Xiaotian Zhu & David R. Hunter, 2019. "Clustering via finite nonparametric ICA mixture models," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 65-87, March.
Matthew Pietrosanu & Jueyu Gao & Linglong Kong & Bei Jiang & Di Niu, 2021. "Advanced algorithms for penalized quantile and composite quantile regression," Computational Statistics, Springer, vol. 36(1), pages 333-346, March.
DiTraglia, Francis J., 2016. "Using invalid instruments on purpose: Focused moment selection and averaging for GMM," Journal of Econometrics, Elsevier, vol. 195(2), pages 187-208.
- Francis J. DiTraglia, 2011. "Using Invalid Instruments on Purpose: Focused Moment Selection and Averaging for GMM," PIER Working Paper Archive 14-037, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, revised 04 Aug 2014.
Lee, Xing Ju & Hainy, Markus & McKeone, James P. & Drovandi, Christopher C. & Pettitt, Anthony N., 2018. "ABC model selection for spatial extremes models applied to South Australian maximum temperature data," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 128-144.
Sbrana, Giacomo & Pelagatti, Matteo, 2024. "Optimal hierarchical EWMA forecasting," International Journal of Forecasting, Elsevier, vol. 40(2), pages 616-625.
Francis DiTraglia, 2011. "Using Invalid Instruments on Purpose: Focused Moment Selection and Averaging for GMM, Second Version," PIER Working Paper Archive 15-027, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, revised 10 Aug 2015.
Dirk Eddelbuettel & James Joseph Balamuta, 2018. "Extending R with C++: A Brief Introduction to Rcpp," The American Statistician, Taylor & Francis Journals, vol. 72(1), pages 28-36, January.
Erick da Conceição Amorim & Vinícius Diniz Mayrink, 2020. "Clustering non-linear interactions in factor analysis," METRON, Springer;Sapienza Università di Roma, vol. 78(3), pages 329-352, December.
Backer, David & Billing, Trey, 2024. "Forecasting the prevalence of child acute malnutrition using environmental and conflict conditions as leading indicators," World Development, Elsevier, vol. 176(C).
Mariana Oliveira & Luís Torgo & Vítor Santos Costa, 2021. "Evaluation Procedures for Forecasting with Spatiotemporal Data," Mathematics, MDPI, vol. 9(6), pages 1-27, March.

More about this item

Keywords

Kronecker product; B-spline; Proximal algorithm; Parallel computing; Breast cancer; Prostate cancer;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:lifeda:v:28:y:2022:i:2:d:10.1007_s10985-021-09544-2. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Scalable proximal methods for cause-specific hazard modeling with time-varying coefficients

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data