IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v157y2021ics0167947320302450.html
   My bibliography  Save this article

Communication-efficient distributed estimator for generalized linear models with a diverging number of covariates

Author

Listed:
  • Zhou, Ping
  • Yu, Zhen
  • Ma, Jingyi
  • Tian, Maozai
  • Fan, Ye

Abstract

Nowadays, it has become increasingly common to store large-scale data sets distributedly across a great number of clients. The aim of the study is to develop a distributed estimator for generalized linear models (GLMs) in the “large n, diverging pn” framework with a weak assumption on the number of clients. When the dimension diverges at the rate of o(n), the asymptotic efficiency of the global maximum likelihood estimator (MLE), the one-step MLE, and the aggregated estimating equation (AEE) estimator for GLMs are established. A novel distributed estimator is then proposed with two rounds of communication. It has the same asymptotic efficiency as the global MLE under pn=o(n). The assumption on the number of clients is more relaxed than that of the AEE estimator and the proposed method is thus more practical for real-world applications. Simulations and a case study demonstrate the satisfactory finite-sample performance of the proposed estimator.

Suggested Citation

  • Zhou, Ping & Yu, Zhen & Ma, Jingyi & Tian, Maozai & Fan, Ye, 2021. "Communication-efficient distributed estimator for generalized linear models with a diverging number of covariates," Computational Statistics & Data Analysis, Elsevier, vol. 157(C).
  • Handle: RePEc:eee:csdana:v:157:y:2021:i:c:s0167947320302450
    DOI: 10.1016/j.csda.2020.107154
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947320302450
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2020.107154?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Bin Guo & Song Xi Chen, 2016. "Tests for high dimensional generalized linear models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(5), pages 1079-1102, November.
    2. Michael I. Jordan & Jason D. Lee & Yun Yang, 2019. "Communication-Efficient Distributed Statistical Inference," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(526), pages 668-681, April.
    3. Bai, Z. D. & Wu, Y., 1994. "Limiting Behavior of M-Estimators of Regression Coefficients in High Dimensional Linear Models I. Scale Dependent Case," Journal of Multivariate Analysis, Elsevier, vol. 51(2), pages 211-239, November.
    4. Bai, Z. D. & Wu, Y., 1994. "Limiting Behavior of M-Estimators of Regression-Coefficients in High Dimensional Linear Models II. Scale-Invariant Case," Journal of Multivariate Analysis, Elsevier, vol. 51(2), pages 240-251, November.
    5. He, Xuming & Shao, Qi-Man, 2000. "On Parameters of Increasing Dimensions," Journal of Multivariate Analysis, Elsevier, vol. 73(1), pages 120-135, April.
    6. HaiYing Wang & Rong Zhu & Ping Ma, 2018. "Optimal Subsampling for Large Sample Logistic Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(522), pages 829-844, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Miaomiao Wang & Xinyu Zhang & Alan T. K. Wan & Kang You & Guohua Zou, 2023. "Jackknife model averaging for high‐dimensional quantile regression," Biometrics, The International Biometric Society, vol. 79(1), pages 178-189, March.
    2. Ding, Hao & Wang, Zhanfeng & Wu, Yaohua, 2017. "Tobit regression model with parameters of increasing dimensions," Statistics & Probability Letters, Elsevier, vol. 120(C), pages 1-7.
    3. Ioannis Kalogridis, 2022. "Asymptotics for M-type smoothing splines with non-smooth objective functions," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(2), pages 373-389, June.
    4. Zhang, Haixiang & Wang, HaiYing, 2021. "Distributed subdata selection for big data via sampling-based approach," Computational Statistics & Data Analysis, Elsevier, vol. 153(C).
    5. He, Xuming & Shao, Qi-Man, 2000. "On Parameters of Increasing Dimensions," Journal of Multivariate Analysis, Elsevier, vol. 73(1), pages 120-135, April.
    6. Luo, Jiyu & Sun, Qiang & Zhou, Wen-Xin, 2022. "Distributed adaptive Huber regression," Computational Statistics & Data Analysis, Elsevier, vol. 169(C).
    7. Hafner, Christian M. & Linton, Oliver B. & Tang, Haihan, 2020. "Estimation of a multiplicative correlation structure in the large dimensional case," Journal of Econometrics, Elsevier, vol. 217(2), pages 431-470.
    8. Tang, Linjun & Zhou, Zhangong & Wu, Changchun, 2012. "Weighted composite quantile estimation and variable selection method for censored regression model," Statistics & Probability Letters, Elsevier, vol. 82(3), pages 653-663.
    9. Feifei Wang & Danyang Huang & Tianchen Gao & Shuyuan Wu & Hansheng Wang, 2022. "Sequential one‐step estimator by sub‐sampling for customer churn analysis with massive data sets," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1753-1786, November.
    10. Yingying Jiang & Fuming Lin & Yong Zhou, 2021. "The kth power expectile regression," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 73(1), pages 83-113, February.
    11. Kalogridis, Ioannis & Van Aelst, Stefan, 2024. "Robust penalized spline estimation with difference penalties," Econometrics and Statistics, Elsevier, vol. 29(C), pages 169-188.
    12. Zhijie Xiao & Roger Koenker, 2009. "Conditional Quantile Estimation for GARCH Models," Boston College Working Papers in Economics 725, Boston College Department of Economics.
    13. Ding, Hao & Qin, Shanshan & Wu, Yuehua & Wu, Yaohua, 2021. "Asymptotic properties on high-dimensional multivariate regression M-estimation," Journal of Multivariate Analysis, Elsevier, vol. 183(C).
    14. Lulu Zuo & Haixiang Zhang & HaiYing Wang & Liuquan Sun, 2021. "Optimal subsample selection for massive logistic regression with distributed data," Computational Statistics, Springer, vol. 36(4), pages 2535-2562, December.
    15. Alexandre Belloni & Victor Chernozhukov & Kengo Kato, 2019. "Valid Post-Selection Inference in High-Dimensional Approximately Sparse Quantile Regression Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(526), pages 749-758, April.
    16. Kato, Kengo & F. Galvao, Antonio & Montes-Rojas, Gabriel V., 2012. "Asymptotics for panel quantile regression models with individual effects," Journal of Econometrics, Elsevier, vol. 170(1), pages 76-91.
    17. Adam C. Sales & Ben B. Hansen, 2020. "Limitless Regression Discontinuity," Journal of Educational and Behavioral Statistics, , vol. 45(2), pages 143-174, April.
    18. Xiang, Pengcheng & Zhou, Ling & Tang, Lu, 2024. "Transfer learning via random forests: A one-shot federated approach," Computational Statistics & Data Analysis, Elsevier, vol. 197(C).
    19. Ji-Yeon Yang & Xuming He, 2011. "A Multistep Protein Lysate Array Quantification Method and its Statistical Properties," Biometrics, The International Biometric Society, vol. 67(4), pages 1197-1205, December.
    20. Alexandre Belloni & Victor Chernozhukov & Kengo Kato, 2013. "Uniform post selection inference for LAD regression and other z-estimation problems," CeMMAP working papers CWP74/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:157:y:2021:i:c:s0167947320302450. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.