IDEAS home Printed from https://ideas.repec.org/a/taf/jnlasa/v111y2016i513p107-117.html
   My bibliography  Save this article

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-Level Information From External Big Data Sources

Author

Listed:
  • Nilanjan Chatterjee
  • Yi-Hau Chen
  • Paige Maas
  • Raymond J. Carroll

Abstract

Information from various public and private data sources of extremely large sample sizes are now increasingly available for research purposes. Statistical methods are needed for using information from such big data sources while analyzing data from individual studies that may collect more detailed information required for addressing specific hypotheses of interest. In this article, we consider the problem of building regression models based on individual-level data from an “internal” study while using summary-level information, such as information on parameters for reduced models, from an “external” big data source. We identify a set of very general constraints that link internal and external models. These constraints are used to develop a framework for semiparametric maximum likelihood inference that allows the distribution of covariates to be estimated using either the internal sample or an external reference sample. We develop extensions for handling complex stratified sampling designs, such as case-control sampling, for the internal study. Asymptotic theory and variance estimators are developed for each case. We use simulation studies and a real data application to assess the performance of the proposed methods in contrast to the generalized regression calibration methodology that is popular in the sample survey literature. Supplementary materials for this article are available online.

Suggested Citation

  • Nilanjan Chatterjee & Yi-Hau Chen & Paige Maas & Raymond J. Carroll, 2016. "Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-Level Information From External Big Data Sources," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 107-117, March.
  • Handle: RePEc:taf:jnlasa:v:111:y:2016:i:513:p:107-117
    DOI: 10.1080/01621459.2015.1123157
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/01621459.2015.1123157
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/01621459.2015.1123157?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Fei Gao & K. C. G. Chan, 2023. "Noniterative adjustment to regression estimators with population‐based auxiliary information for semiparametric models," Biometrics, The International Biometric Society, vol. 79(1), pages 140-150, March.
    2. Jie He & Hui Li & Shumei Zhang & Xiaogang Duan, 2019. "Additive hazards model with auxiliary subgroup survival information," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 25(1), pages 128-149, January.
    3. Prosenjit Kundu & Nilanjan Chatterjee, 2023. "Logistic regression analysis of two‐phase studies using generalized method of moments," Biometrics, The International Biometric Society, vol. 79(1), pages 241-252, March.
    4. Han Zhang & Lu Deng & William Wheeler & Jing Qin & Kai Yu, 2022. "Integrative analysis of multiple case‐control studies," Biometrics, The International Biometric Society, vol. 78(3), pages 1080-1091, September.
    5. Debashis Ghosh & Michael S. Sabel, 2022. "A Weighted Sample Framework to Incorporate External Calculators for Risk Modeling," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 14(3), pages 363-379, December.
    6. Chixiang Chen & Ming Wang & Shuo Chen, 2023. "An efficient data integration scheme for synthesizing information from multiple secondary datasets for the parameter inference of the main analysis," Biometrics, The International Biometric Society, vol. 79(4), pages 2947-2960, December.
    7. Albert S. Berahas & Jiahao Shi & Zihong Yi & Baoyu Zhou, 2023. "Accelerating stochastic sequential quadratic programming for equality constrained optimization using predictive variance reduction," Computational Optimization and Applications, Springer, vol. 86(1), pages 79-116, September.
    8. Ying Sheng & Yifei Sun & Detian Deng & Chiung‐Yu Huang, 2020. "Censored linear regression in the presence or absence of auxiliary survival information," Biometrics, The International Biometric Society, vol. 76(3), pages 734-745, September.
    9. Ying Sheng & Yifei Sun & Chiung‐Yu Huang & Mi‐Ok Kim, 2022. "Synthesizing external aggregated information in the presence of population heterogeneity: A penalized empirical likelihood approach," Biometrics, The International Biometric Society, vol. 78(2), pages 679-690, June.
    10. Saegusa Takumi, 2020. "Confidence bands for a distribution function with merged data from multiple sources," Statistics in Transition New Series, Statistics Poland, vol. 21(4), pages 144-158, August.
    11. Ziqi Chen & Jing Ning & Yu Shen & Jing Qin, 2021. "Combining primary cohort data with external aggregate information without assuming comparability," Biometrics, The International Biometric Society, vol. 77(3), pages 1024-1036, September.
    12. Jan Pablo Burgard & Joscha Krause & Simon Schmaus, 2019. "Estimation of Regional Transition Probabilities for Spatial Dynamic Microsimulations from Survey Data Lacking in Regional Detail," Research Papers in Economics 2019-12, University of Trier, Department of Economics.
    13. Cao, Yongxiu & Yu, Jichang, 2023. "Adjusting for unmeasured confounding in survival causal effect using validation data," Computational Statistics & Data Analysis, Elsevier, vol. 180(C).
    14. Ruoyu Wang & Qihua Wang & Wang Miao, 2023. "A robust fusion-extraction procedure with summary statistics in the presence of biased sources," Biometrika, Biometrika Trust, vol. 110(4), pages 1023-1040.
    15. Takumi Saegusa, 2020. "Confidence bands for a distribution function with merged data from multiple sources," Statistics in Transition New Series, Polish Statistical Association, vol. 21(4), pages 144-158, August.
    16. Tian Gu & Jeremy Michael George Taylor & Bhramar Mukherjee, 2023. "A synthetic data integration framework to leverage external summary‐level information from heterogeneous populations," Biometrics, The International Biometric Society, vol. 79(4), pages 3831-3845, December.
    17. Yu‐Jen Cheng & Yen‐Chun Liu & Chang‐Yu Tsai & Chiung‐Yu Huang, 2023. "Semiparametric estimation of the transformation model by leveraging external aggregate data in the presence of population heterogeneity," Biometrics, The International Biometric Society, vol. 79(3), pages 1996-2009, September.
    18. Bo Han & Ingrid Van Keilegom & Xiaoguang Wang, 2022. "Semiparametric estimation of the nonmixture cure model with auxiliary survival information," Biometrics, The International Biometric Society, vol. 78(2), pages 448-459, June.
    19. Burgard, Jan Pablo & Krause, Joscha & Schmaus, Simon, 2021. "Estimation of regional transition probabilities for spatial dynamic microsimulations from survey data lacking in regional detail," Computational Statistics & Data Analysis, Elsevier, vol. 154(C).
    20. Sahar Z. Zangeneh & Roderick J. Little, 2022. "Likelihood‐Based Inference for the Finite Population Mean with Post‐Stratification Information Under Non‐Ignorable Non‐Response," International Statistical Review, International Statistical Institute, vol. 90(S1), pages 17-36, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:jnlasa:v:111:y:2016:i:513:p:107-117. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UASA20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.