IDEAS home Printed from https://ideas.repec.org/a/oup/biomet/v110y2023i4p1023-1040..html
   My bibliography  Save this article

A robust fusion-extraction procedure with summary statistics in the presence of biased sources

Author

Listed:
  • Ruoyu Wang
  • Qihua Wang
  • Wang Miao

Abstract

SummaryInformation from multiple data sources is increasingly available. However, some data sources may produce biased estimates due to biased sampling, data corruption or model misspecification. Thus there is a need for robust data combination methods that can be used with biased sources. In this paper, a robust data fusion-extraction method is proposed. Unlike existing methods, the proposed method can be applied in the important case where researchers have no knowledge of which data sources are unbiased. The proposed estimator is easy to compute and employs only summary statistics; hence it can be applied in many different fields, such as meta-analysis, Mendelian randomization and distributed systems. The proposed estimator is consistent, even if many data sources are biased, and is asymptotically equivalent to the oracle estimator that uses only unbiased data. Asymptotic normality of the proposed estimator is also established. In contrast to existing meta-analysis methods, the theoretical properties are guaranteed for our estimator, even if the number of data sources and the dimension of the parameter diverge as the sample size increases. Furthermore, the proposed method provides consistent selection for unbiased data sources with probability approaching 1. Simulation studies demonstrate the efficiency and robustness of the proposed method empirically. The method is applied to a meta-analysis dataset to evaluate surgical treatment for moderate periodontal disease and to a Mendelian randomization dataset to study the risk factors for head and neck cancer.

Suggested Citation

  • Ruoyu Wang & Qihua Wang & Wang Miao, 2023. "A robust fusion-extraction procedure with summary statistics in the presence of biased sources," Biometrika, Biometrika Trust, vol. 110(4), pages 1023-1040.
  • Handle: RePEc:oup:biomet:v:110:y:2023:i:4:p:1023-1040.
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1093/biomet/asad013
    Download Restriction: Access to full text is restricted to subscribers.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Frank Windmeijer & Helmut Farbmacher & Neil Davies & George Davey Smith, 2019. "On the Use of the Lasso for Instrumental Variables Estimation with Some Invalid Instruments," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(527), pages 1339-1350, July.
    2. Jieli Shen & Regina Y. Liu & Min-ge Xie, 2020. "iFusion: Individualized Fusion Learning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(531), pages 1251-1267, July.
    3. D. Y. Lin & D. Zeng, 2010. "On the relative efficiency of using summary statistics versus individual-level data in meta-analysis," Biometrika, Biometrika Trust, vol. 97(2), pages 321-332.
    4. Nilanjan Chatterjee & Yi-Hau Chen & Paige Maas & Raymond J. Carroll, 2016. "Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-Level Information From External Big Data Sources," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 107-117, March.
    5. Guanghao Qi & Nilanjan Chatterjee, 2019. "Mendelian randomization analysis using mixture models for robust and efficient estimation of causal effects," Nature Communications, Nature, vol. 10(1), pages 1-10, December.
    6. Stephen Burgess & Christopher N Foley & Elias Allara & James R Staley & Joanna M. M. Howson, 2020. "A robust and efficient method for Mendelian randomization with hundreds of genetic variants," Nature Communications, Nature, vol. 11(1), pages 1-11, December.
    7. Han, Chirok, 2008. "Detecting invalid instruments using L1-GMM," Economics Letters, Elsevier, vol. 101(3), pages 285-287, December.
    8. Han Zhang & Lu Deng & Mark Schiffman & Jing Qin & Kai Yu, 2020. "Generalized integration model for improved statistical inference by leveraging external summary data," Biometrika, Biometrika Trust, vol. 107(3), pages 689-703.
    9. Ying Sheng & Yifei Sun & Detian Deng & Chiung‐Yu Huang, 2020. "Censored linear regression in the presence or absence of auxiliary survival information," Biometrics, The International Biometric Society, vol. 76(3), pages 734-745, September.
    10. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    11. Dungang Liu & Regina Y. Liu & Minge Xie, 2015. "Multivariate Meta-Analysis of Heterogeneous Studies Using Only Summary Statistics: Efficiency and Robustness," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 326-340, March.
    12. Zijian Guo & Hyunseung Kang & T. Tony Cai & Dylan S. Small, 2018. "Confidence intervals for causal effects with invalid instruments by using two‐stage hard thresholding with voting," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(4), pages 793-815, September.
    13. Brian Claggett & Minge Xie & Lu Tian, 2014. "Meta-Analysis With Fixed, Unknown, Study-Specific Parameters," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(508), pages 1660-1671, December.
    14. Thomas Mathew & Kenneth Nordstrom, 1999. "On the Equivalence of Meta-Analysis Using Literature and Using Individual Patient Data," Biometrics, The International Biometric Society, vol. 55(4), pages 1221-1223, December.
    15. Xie, Minge & Singh, Kesar & Strawderman, William E., 2011. "Confidence Distributions and a Unifying Framework for Meta-Analysis," Journal of the American Statistical Association, American Statistical Association, vol. 106(493), pages 320-333.
    16. Jing Qin & Han Zhang & Pengfei Li & Demetrius Albanes & Kai Yu, 2015. "Using covariate-specific disease prevalence information to increase the power of case-control studies," Biometrika, Biometrika Trust, vol. 102(1), pages 169-180.
    17. Hyunseung Kang & Anru Zhang & T. Tony Cai & Dylan S. Small, 2016. "Instrumental Variables Estimation With Some Invalid Instruments and its Application to Mendelian Randomization," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 132-144, March.
    18. Mark Gormley & Tom Dudding & Eleanor Sanderson & Richard M. Martin & Steven Thomas & Jessica Tyrrell & Andrew R. Ness & Paul Brennan & Marcus Munafò & Miranda Pring & Stefania Boccia & Andrew F. Olsha, 2020. "A multivariable Mendelian randomization analysis investigating smoking and alcohol consumption in oral and oropharyngeal cancer," Nature Communications, Nature, vol. 11(1), pages 1-10, December.
    19. Shu Yang & Peng Ding, 2020. "Combining Multiple Observational Data Sources to Estimate Causal Effects," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(531), pages 1540-1554, July.
    20. Ming Yuan & Yi Lin, 2006. "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(1), pages 49-67, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xiaoran Liang & Eleanor Sanderson & Frank Windmeijer, 2022. "Selecting Valid Instrumental Variables in Linear Models with Multiple Exposure Variables: Adaptive Lasso and the Median-of-Medians Estimator," Papers 2208.05278, arXiv.org.
    2. Qingliang Fan & Yaqian Wu, 2020. "Endogenous Treatment Effect Estimation with some Invalid and Irrelevant Instruments," Papers 2006.14998, arXiv.org.
    3. Nicolas Apfel, 2019. "Relaxing the Exclusion Restriction in Shift-Share Instrumental Variable Estimation," Papers 1907.00222, arXiv.org, revised Jul 2022.
    4. Yiqi Lin & Frank Windmeijer & Xinyuan Song & Qingliang Fan, 2022. "On the instrumental variable estimation with many weak and invalid instruments," Papers 2207.03035, arXiv.org, revised Dec 2023.
    5. Nicolas Apfel & Helmut Farbmacher & Rebecca Groh & Martin Huber & Henrika Langen, 2022. "Detecting Grouped Local Average Treatment Effects and Selecting True Instruments," Papers 2207.04481, arXiv.org, revised Oct 2023.
    6. Fei Gao & K. C. G. Chan, 2023. "Noniterative adjustment to regression estimators with population‐based auxiliary information for semiparametric models," Biometrics, The International Biometric Society, vol. 79(1), pages 140-150, March.
    7. Tang, Lu & Zhou, Ling & Song, Peter X.-K., 2020. "Distributed simultaneous inference in generalized linear models via confidence distribution," Journal of Multivariate Analysis, Elsevier, vol. 176(C).
    8. Guang Yang & Dungang Liu & Junyuan Wang & Min‐ge Xie, 2016. "Meta‐analysis framework for exact inferences with application to the analysis of rare events," Biometrics, The International Biometric Society, vol. 72(4), pages 1378-1386, December.
    9. Frank Windmeijer & Helmut Farbmacher & Neil Davies & George Davey Smith, 2019. "On the Use of the Lasso for Instrumental Variables Estimation with Some Invalid Instruments," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(527), pages 1339-1350, July.
    10. Chixiang Chen & Ming Wang & Shuo Chen, 2023. "An efficient data integration scheme for synthesizing information from multiple secondary datasets for the parameter inference of the main analysis," Biometrics, The International Biometric Society, vol. 79(4), pages 2947-2960, December.
    11. Jiang, Rong & Yu, Keming, 2020. "Single-index composite quantile regression for massive data," Journal of Multivariate Analysis, Elsevier, vol. 180(C).
    12. Elena Kulinskaya & Stephan Morgenthaler & Robert G. Staudte, 2014. "Combining Statistical Evidence," International Statistical Review, International Statistical Institute, vol. 82(2), pages 214-242, August.
    13. Wei Wang & Shou‐En Lu & Jerry Q. Cheng & Minge Xie & John B. Kostis, 2022. "Multivariate survival analysis in big data: A divide‐and‐combine approach," Biometrics, The International Biometric Society, vol. 78(3), pages 852-866, September.
    14. Tian Gu & Jeremy Michael George Taylor & Bhramar Mukherjee, 2023. "A synthetic data integration framework to leverage external summary‐level information from heterogeneous populations," Biometrics, The International Biometric Society, vol. 79(4), pages 3831-3845, December.
    15. Han Zhang & Lu Deng & William Wheeler & Jing Qin & Kai Yu, 2022. "Integrative analysis of multiple case‐control studies," Biometrics, The International Biometric Society, vol. 78(3), pages 1080-1091, September.
    16. Frank Windmeijer & Xiaoran Liang & Fernando P. Hartwig & Jack Bowden, 2021. "The confidence interval method for selecting valid instrumental variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(4), pages 752-776, September.
    17. Qingliang Fan & Zijian Guo & Ziwei Mei, 2022. "A Heteroskedasticity-Robust Overidentifying Restriction Test with High-Dimensional Covariates," Papers 2205.00171, arXiv.org, revised May 2024.
    18. Ying Sheng & Yifei Sun & Chiung‐Yu Huang & Mi‐Ok Kim, 2022. "Synthesizing external aggregated information in the presence of population heterogeneity: A penalized empirical likelihood approach," Biometrics, The International Biometric Society, vol. 78(2), pages 679-690, June.
    19. Kumari, Meena & Bao, Yanchun & S. Clarke, Paul & Smart, Melissa, 2018. "A comparison of robust methods for Mendelian randomization using multiple genetic variants," ISER Working Paper Series 2018-08, Institute for Social and Economic Research.
    20. Breunig, Christoph & Mammen, Enno & Simoni, Anna, 2020. "Ill-posed estimation in high-dimensional models with instrumental variables," Journal of Econometrics, Elsevier, vol. 219(1), pages 171-200.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:oup:biomet:v:110:y:2023:i:4:p:1023-1040.. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Oxford University Press (email available below). General contact details of provider: https://academic.oup.com/biomet .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.