IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v79y2023i2p1280-1292.html
   My bibliography  Save this article

A cross‐validation statistical framework for asymmetric data integration

Author

Listed:
  • Lam Tran
  • Kevin He
  • Di Wang
  • Hui Jiang

Abstract

The proliferation of biobanks and large public clinical data sets enables their integration with a smaller amount of locally gathered data for the purposes of parameter estimation and model prediction. However, public data sets may be subject to context‐dependent confounders and the protocols behind their generation are often opaque; naively integrating all external data sets equally can bias estimates and lead to spurious conclusions. Weighted data integration is a potential solution, but current methods still require subjective specifications of weights and can become computationally intractable. Under the assumption that local data are generated from the set of unknown true parameters, we propose a novel weighted integration method based upon using the external data to minimize the local data leave‐one‐out cross validation (LOOCV) error. We demonstrate how the optimization of LOOCV errors for linear and Cox proportional hazards models can be rewritten as functions of external data set integration weights. Significant reductions in estimation error and prediction error are shown using simulation studies mimicking the heterogeneity of clinical data as well as a real‐world example using kidney transplant patients from the Scientific Registry of Transplant Recipients.

Suggested Citation

  • Lam Tran & Kevin He & Di Wang & Hui Jiang, 2023. "A cross‐validation statistical framework for asymmetric data integration," Biometrics, The International Biometric Society, vol. 79(2), pages 1280-1292, June.
  • Handle: RePEc:bla:biomet:v:79:y:2023:i:2:p:1280-1292
    DOI: 10.1111/biom.13685
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13685
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13685?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Yuan Jiang & Yunxiao He & Heping Zhang, 2016. "Variable Selection With Prior Information for Generalized Linear Models via the Prior LASSO Method," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 355-376, March.
    2. Ibrahim J.G. & Chen M-H. & Sinha D., 2003. "On Optimality Properties of the Power Prior," Journal of the American Statistical Association, American Statistical Association, vol. 98, pages 204-213, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Thomas A. Murray & Brian P. Hobbs & Theodore C. Lystig & Bradley P. Carlin, 2014. "Semiparametric Bayesian commensurate survival model for post-market medical device surveillance with non-exchangeable historical data," Biometrics, The International Biometric Society, vol. 70(1), pages 185-191, March.
    2. Huangdi Yi & Qingzhao Zhang & Cunjie Lin & Shuangge Ma, 2022. "Information‐incorporated Gaussian graphical model for gene expression data," Biometrics, The International Biometric Society, vol. 78(2), pages 512-523, June.
    3. Krist'of N'emeth & D'aniel Hadh'azi, 2024. "Generating density nowcasts for U.S. GDP growth with deep learning: Bayes by Backprop and Monte Carlo dropout," Papers 2405.15579, arXiv.org.
    4. Chen, Shunjie & Yang, Sijia & Wang, Pei & Xue, Liugen, 2023. "Two-stage penalized algorithms via integrating prior information improve gene selection from omics data," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 628(C).
    5. Pei Wang & Shunjie Chen & Sijia Yang, 2022. "Recent Advances on Penalized Regression Models for Biological Data," Mathematics, MDPI, vol. 10(19), pages 1-24, October.
    6. Haitao Pan & Ying Yuan & Jielai Xia, 2017. "A calibrated power prior approach to borrow information from historical data with application to biosimilar clinical trials," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 66(5), pages 979-996, November.
    7. Stavros Nikolakopoulos & Ingeborg van der Tweel & Kit C. B. Roes, 2018. "Dynamic borrowing through empirical power priors that control type I error," Biometrics, The International Biometric Society, vol. 74(3), pages 874-880, September.
    8. Andrés R. Masegosa & Darío Ramos-López & Antonio Salmerón & Helge Langseth & Thomas D. Nielsen, 2020. "Variational Inference over Nonstationary Data Streams for Exponential Family Models," Mathematics, MDPI, vol. 8(11), pages 1-27, November.
    9. Yu-Fang Chien & Haiming Zhou & Timothy Hanson & Theodore Lystig, 2023. "Informative g -Priors for Mixed Models," Stats, MDPI, vol. 6(1), pages 1-23, January.
    10. Yimei Li & Ying Yuan, 2020. "PA‐CRM: A continuous reassessment method for pediatric phase I oncology trials with concurrent adult trials," Biometrics, The International Biometric Society, vol. 76(4), pages 1364-1373, December.
    11. Lee, Juyong & Reiner, David M., 2023. "Determinants of public preferences on low-carbon energy sources: Evidence from the United Kingdom," Energy, Elsevier, vol. 284(C).
    12. Haibo Chu & Jiahua Wei & Yuan Jiang, 2021. "Middle- and Long-Term Streamflow Forecasting and Uncertainty Analysis Using Lasso-DBN-Bootstrap Model," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 35(8), pages 2617-2632, June.
    13. Fan, Xianqiu & Cheng, Jun & Wang, Hailing & Zhang, Bin & Chen, Zhenzhen, 2024. "A fast trans-lasso algorithm with penalized weighted score function," Computational Statistics & Data Analysis, Elsevier, vol. 192(C).
    14. Sourish Das & Dipak K. Dey, 2013. "On Dynamic Generalized Linear Models with Applications," Methodology and Computing in Applied Probability, Springer, vol. 15(2), pages 407-421, June.
    15. Xu, Ganggang & Zhu, Huirong & Lee, J. Jack, 2020. "Borrowing strength and borrowing index for Bayesian hierarchical models," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    16. Kristoffer Pons Bertelsen, 2022. "The Prior Adaptive Group Lasso and the Factor Zoo," CREATES Research Papers 2022-05, Department of Economics and Business Economics, Aarhus University.
    17. D. Kurz & H. Lewitschnig & J. Pilz, 2017. "Failure probability estimation under additional subsystem information with application to semiconductor burn-in," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(6), pages 955-967, April.
    18. Keying Ye & Yuyan Duan, 2008. "Normalized Power Prior Bayesian Analysis," Working Papers 0058, College of Business, University of Texas at San Antonio.
    19. Md. Tuhin Sheikh & Ming-Hui Chen & Jonathan A. Gelfond & Joseph G. Ibrahim, 2022. "A Power Prior Approach for Leveraging External Longitudinal and Competing Risks Survival Data Within the Joint Modeling Framework," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 14(2), pages 318-336, July.
    20. Asadi, Majid & Ebrahimi, Nader & Soofi, Ehsan S., 2018. "Optimal hazard models based on partial information," European Journal of Operational Research, Elsevier, vol. 270(2), pages 723-733.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:79:y:2023:i:2:p:1280-1292. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.