IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v160y2021ics0167947321000773.html
   My bibliography  Save this article

Parallel integrative learning for large-scale multi-response regression with incomplete outcomes

Author

Listed:
  • Dong, Ruipeng
  • Li, Daoji
  • Zheng, Zemin

Abstract

Multi-task learning is increasingly used to investigate the association structure between multiple responses and a single set of predictor variables in many applications. In the era of big data, the coexistence of incomplete outcomes, large number of responses, and high dimensionality in predictors poses unprecedented challenges in estimation, prediction and computation. In this paper, we propose a scalable and computationally efficient procedure, called PEER, for large-scale multi-response regression with incomplete outcomes, where both the numbers of responses and predictors can be high-dimensional. Motivated by sparse factor regression, we convert the multi-response regression into a set of univariate-response regressions, which can be efficiently implemented in parallel. Under some mild regularity conditions, we show that PEER enjoys nice sampling properties including consistency in estimation, prediction, and variable selection. Extensive simulation studies show that our proposal compares favorably with several existing methods in estimation accuracy, variable selection, and computation efficiency.

Suggested Citation

  • Dong, Ruipeng & Li, Daoji & Zheng, Zemin, 2021. "Parallel integrative learning for large-scale multi-response regression with incomplete outcomes," Computational Statistics & Data Analysis, Elsevier, vol. 160(C).
  • Handle: RePEc:eee:csdana:v:160:y:2021:i:c:s0167947321000773
    DOI: 10.1016/j.csda.2021.107243
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947321000773
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2021.107243?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    2. Yunzhang Zhu & Xiaotong Shen & Changqing Ye, 2016. "Personalized Prediction and Sparsity Pursuit in Latent Factor Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 241-252, March.
    3. Ma, Shujie & Linton, Oliver & Gao, Jiti, 2021. "Estimation and inference in semiparametric quantile factor models," Journal of Econometrics, Elsevier, vol. 222(1), pages 295-323.
    4. Luo, Chongliang & Liang, Jian & Li, Gen & Wang, Fei & Zhang, Changshui & Dey, Dipak K. & Chen, Kun, 2018. "Leveraging mixed and incomplete outcomes via reduced-rank modeling," Journal of Multivariate Analysis, Elsevier, vol. 167(C), pages 378-394.
    5. Fan, Jianqing & Gong, Wenyan & Zhu, Ziwei, 2019. "Generalized high-dimensional trace regression via nuclear norm regularization," Journal of Econometrics, Elsevier, vol. 212(1), pages 177-202.
    6. Yingying Fan & Cheng Yong Tang, 2013. "Tuning parameter selection in high dimensional penalized likelihood," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 75(3), pages 531-552, June.
    7. Zemin Zheng & Jinchi Lv & Wei Lin, 2021. "Nonsparse Learning with Latent Variables," Operations Research, INFORMS, vol. 69(1), pages 346-359, January.
    8. Izenman, Alan Julian, 1975. "Reduced-rank regression for the multivariate linear model," Journal of Multivariate Analysis, Elsevier, vol. 5(2), pages 248-264, June.
    9. Yingying Fan & Jinchi Lv, 2013. "Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(503), pages 1044-1061, September.
    10. Zhu, Xuening & Huang, Danyang & Pan, Rui & Wang, Hansheng, 2020. "Multivariate spatial autoregressive model for large scale social networks," Journal of Econometrics, Elsevier, vol. 215(2), pages 591-606.
    11. Kun Chen & Kung‐Sik Chan & Nils Chr. Stenseth, 2012. "Reduced rank stochastic regression with a sparse singular value decomposition," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 74(2), pages 203-221, March.
    12. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    13. Zhang, Jun & Feng, Zhenghui & Peng, Heng, 2018. "Estimation and hypothesis test for partial linear multiplicative models," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 87-103.
    14. Kejun He & Heng Lian & Shujie Ma & Jianhua Z. Huang, 2018. "Dimensionality Reduction and Variable Selection in Multivariate Varying-Coefficient Models With a Large Number of Covariates," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(522), pages 746-754, April.
    15. Lisha Chen & Jianhua Z. Huang, 2012. "Sparse Reduced-Rank Regression for Simultaneous Dimension Reduction and Variable Selection," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(500), pages 1533-1545, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lian, Heng & Feng, Sanying & Zhao, Kaifeng, 2015. "Parametric and semiparametric reduced-rank regression with flexible sparsity," Journal of Multivariate Analysis, Elsevier, vol. 136(C), pages 163-174.
    2. Luo, Chongliang & Liang, Jian & Li, Gen & Wang, Fei & Zhang, Changshui & Dey, Dipak K. & Chen, Kun, 2018. "Leveraging mixed and incomplete outcomes via reduced-rank modeling," Journal of Multivariate Analysis, Elsevier, vol. 167(C), pages 378-394.
    3. Hu, Jianhua & Liu, Xiaoqian & Liu, Xu & Xia, Ningning, 2022. "Some aspects of response variable selection and estimation in multivariate linear regression," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    4. Zhao, Weihua & Jiang, Xuejun & Lian, Heng, 2018. "A principal varying-coefficient model for quantile regression: Joint variable selection and dimension reduction," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 269-280.
    5. Goh, Gyuhyeong & Dey, Dipak K. & Chen, Kun, 2017. "Bayesian sparse reduced rank multivariate regression," Journal of Multivariate Analysis, Elsevier, vol. 157(C), pages 14-28.
    6. Lian, Heng & Kim, Yongdai, 2016. "Nonconvex penalized reduced rank regression and its oracle properties in high dimensions," Journal of Multivariate Analysis, Elsevier, vol. 143(C), pages 383-393.
    7. Zemin Zheng & Jie Zhang & Yang Li, 2022. "L 0 -Regularized Learning for High-Dimensional Additive Hazards Regression," INFORMS Journal on Computing, INFORMS, vol. 34(5), pages 2762-2775, September.
    8. Zhang, Tonglin, 2024. "Variables selection using L0 penalty," Computational Statistics & Data Analysis, Elsevier, vol. 190(C).
    9. Takumi Saegusa & Tianzhou Ma & Gang Li & Ying Qing Chen & Mei-Ling Ting Lee, 2020. "Variable Selection in Threshold Regression Model with Applications to HIV Drug Adherence Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 12(3), pages 376-398, December.
    10. Mishra, Aditya & Dey, Dipak K. & Chen, Yong & Chen, Kun, 2021. "Generalized co-sparse factor regression," Computational Statistics & Data Analysis, Elsevier, vol. 157(C).
    11. Dmitry Kobak & Yves Bernaerts & Marissa A. Weis & Federico Scala & Andreas S. Tolias & Philipp Berens, 2021. "Sparse reduced‐rank regression for exploratory visualisation of paired multivariate data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 980-1000, August.
    12. Jeon, Jong-June & Kwon, Sunghoon & Choi, Hosik, 2017. "Homogeneity detection for the high-dimensional generalized linear model," Computational Statistics & Data Analysis, Elsevier, vol. 114(C), pages 61-74.
    13. Canhong Wen & Zhenduo Li & Ruipeng Dong & Yijin Ni & Wenliang Pan, 2023. "Simultaneous Dimension Reduction and Variable Selection for Multinomial Logistic Regression," INFORMS Journal on Computing, INFORMS, vol. 35(5), pages 1044-1060, September.
    14. Yang Feng & Qingfeng Liu, 2020. "Nested Model Averaging on Solution Path for High-dimensional Linear Regression," Papers 2005.08057, arXiv.org.
    15. Kun Chen & Kung-Sik Chan & Nils Chr. Stenseth, 2014. "Source-Sink Reconstruction Through Regularized Multicomponent Regression Analysis-With Application to Assessing Whether North Sea Cod Larvae Contributed to Local Fjord Cod in Skagerrak," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(506), pages 560-573, June.
    16. Hui Xiao & Yiguo Sun, 2019. "On Tuning Parameter Selection in Model Selection and Model Averaging: A Monte Carlo Study," JRFM, MDPI, vol. 12(3), pages 1-16, June.
    17. Kwon, Sunghoon & Oh, Seungyoung & Lee, Youngjo, 2016. "The use of random-effect models for high-dimensional variable selection problems," Computational Statistics & Data Analysis, Elsevier, vol. 103(C), pages 401-412.
    18. An, Baiguo & Guo, Jianhua & Wang, Hansheng, 2013. "Multivariate regression shrinkage and selection by canonical correlation analysis," Computational Statistics & Data Analysis, Elsevier, vol. 62(C), pages 93-107.
    19. Luo, Ruiyan & Qi, Xin, 2017. "Signal extraction approach for sparse multivariate response regression," Journal of Multivariate Analysis, Elsevier, vol. 153(C), pages 83-97.
    20. Kaixu Yang & Tapabrata Maiti, 2022. "Ultrahigh‐dimensional generalized additive model: Unified theory and methods," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(3), pages 917-942, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:160:y:2021:i:c:s0167947321000773. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.