IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v16y2017i3p159-171n4.html
   My bibliography  Save this article

Regularized estimation in sparse high-dimensional multivariate regression, with application to a DNA methylation study

Author

Listed:
  • Zhang Haixiang

    (Center for Applied Mathematics, Tianjin University, Tianjin, 300072, China)

  • Zheng Yinan

    (Department of Preventive Medicine, Northwestern University, Chicago, IL 60611, USA)

  • Yoon Grace

    (Department of Statistics, Northwestern University, Chicago, IL 60611, USA)

  • Zhang Zhou

    (Department of Preventive Medicine, Northwestern University, Chicago, IL 60611, USA)

  • Gao Tao

    (Department of Preventive Medicine, Northwestern University, Chicago, IL 60611, USA)

  • Joyce Brian

    (Department of Preventive Medicine, Northwestern University, Chicago, IL 60611, USA)

  • Zhang Wei

    (Department of Preventive Medicine, Northwestern University, Chicago, IL 60611, USA)

  • Schwartz Joel

    (Department of Environmental Health, Harvard University, Boston, MA 02115, USA)

  • Vokonas Pantel

    (Department of Preventive Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA)

  • Colicino Elena

    (Normative Aging Study, Veterans Affairs Boston Healthcare System and Boston University, Boston, MA 02118, USA)

  • Baccarelli Andrea

    (Department of Environmental Health Sciences, Columbia University, New York, NY 10032, USA)

  • Hou Lifang

    (Department of Preventive Medicine, Northwestern University, Chicago, IL 60611, USA)

  • Liu Lei

    (Department of Preventive Medicine, Northwestern University, Chicago, IL 60611, USA)

Abstract

In this article, we consider variable selection for correlated high dimensional DNA methylation markers as multivariate outcomes. A novel weighted square-root LASSO procedure is proposed to estimate the regression coefficient matrix. A key feature of this method is tuning-insensitivity, which greatly simplifies the computation by obviating cross validation for penalty parameter selection. A precision matrix obtained via the constrained ℓ1 minimization method is used to account for the within-subject correlation among multivariate outcomes. Oracle inequalities of the regularized estimators are derived. The performance of our proposed method is illustrated via extensive simulation studies. We apply our method to study the relation between smoking and high dimensional DNA methylation markers in the Normative Aging Study (NAS).

Suggested Citation

  • Zhang Haixiang & Zheng Yinan & Yoon Grace & Zhang Zhou & Gao Tao & Joyce Brian & Zhang Wei & Schwartz Joel & Vokonas Pantel & Colicino Elena & Baccarelli Andrea & Hou Lifang & Liu Lei, 2017. "Regularized estimation in sparse high-dimensional multivariate regression, with application to a DNA methylation study," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(3), pages 159-171, August.
  • Handle: RePEc:bpj:sagmbi:v:16:y:2017:i:3:p:159-171:n:4
    DOI: 10.1515/sagmb-2016-0073
    as

    Download full text from publisher

    File URL: https://doi.org/10.1515/sagmb-2016-0073
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.1515/sagmb-2016-0073?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    2. Jianqing Fan & Yuan Liao & Martina Mincheva, 2013. "Large covariance estimation by thresholding principal orthogonal complements," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 75(4), pages 603-680, September.
    3. Yingying Fan & Jinchi Lv, 2014. "Asymptotic properties for combined L1 and concave regularization," Biometrika, Biometrika Trust, vol. 101(1), pages 57-70.
    4. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    5. Yuan Jiang & Yunxiao He & Heping Zhang, 2016. "Variable Selection With Prior Information for Generalized Linear Models via the Prior LASSO Method," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 355-376, March.
    6. Ming Yuan & Yi Lin, 2007. "Model selection and estimation in the Gaussian graphical model," Biometrika, Biometrika Trust, vol. 94(1), pages 19-35.
    7. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    8. Wei Lin & Jinchi Lv, 2013. "High-Dimensional Sparse Additive Hazards Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(501), pages 247-264, March.
    9. A. Belloni & V. Chernozhukov & L. Wang, 2011. "Square-root lasso: pivotal recovery of sparse signals via conic programming," Biometrika, Biometrika Trust, vol. 98(4), pages 791-806.
    10. Wang, Hansheng & Leng, Chenlei, 2007. "Unified LASSO Estimation by Least Squares Approximation," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 1039-1048, September.
    11. Wei Lin & Rui Feng & Hongzhe Li, 2015. "Regularization Methods for High-Dimensional Instrumental Variables Regression With an Application to Genetical Genomics," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 270-288, March.
    12. Cai, Tony & Liu, Weidong & Luo, Xi, 2011. "A Constrained â„“1 Minimization Approach to Sparse Precision Matrix Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 594-607.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Pei Wang & Shunjie Chen & Sijia Yang, 2022. "Recent Advances on Penalized Regression Models for Biological Data," Mathematics, MDPI, vol. 10(19), pages 1-24, October.
    2. Bailey, Natalia & Pesaran, M. Hashem & Smith, L. Vanessa, 2019. "A multiple testing approach to the regularisation of large sample correlation matrices," Journal of Econometrics, Elsevier, vol. 208(2), pages 507-534.
    3. Yang, Yihe & Dai, Hongsheng & Pan, Jianxin, 2023. "Block-diagonal precision matrix regularization for ultra-high dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 179(C).
    4. Jianqing Fan & Yuan Liao & Han Liu, 2016. "An overview of the estimation of large covariance and precision matrices," Econometrics Journal, Royal Economic Society, vol. 19(1), pages 1-32, February.
    5. Umberto Amato & Anestis Antoniadis & Italia De Feis & Irene Gijbels, 2021. "Penalised robust estimators for sparse and high-dimensional linear models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(1), pages 1-48, March.
    6. Li, Xinjue & Zboňáková, Lenka & Wang, Weining & Härdle, Wolfgang Karl, 2019. "Combining Penalization and Adaption in High Dimension with Application in Bond Risk Premia Forecasting," IRTG 1792 Discussion Papers 2019-030, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    7. Zanhua Yin, 2020. "Variable selection for sparse logistic regression," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 83(7), pages 821-836, October.
    8. Wei Qian & Yuhong Yang, 2013. "Model selection via standard error adjusted adaptive lasso," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 65(2), pages 295-318, April.
    9. Zemin Zheng & Jinchi Lv & Wei Lin, 2021. "Nonsparse Learning with Latent Variables," Operations Research, INFORMS, vol. 69(1), pages 346-359, January.
    10. Wei Wang & Shou‐En Lu & Jerry Q. Cheng & Minge Xie & John B. Kostis, 2022. "Multivariate survival analysis in big data: A divide‐and‐combine approach," Biometrics, The International Biometric Society, vol. 78(3), pages 852-866, September.
    11. Ziqi Chen & Chenlei Leng, 2016. "Dynamic Covariance Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(515), pages 1196-1207, July.
    12. Gold, David & Lederer, Johannes & Tao, Jing, 2020. "Inference for high-dimensional instrumental variables regression," Journal of Econometrics, Elsevier, vol. 217(1), pages 79-111.
    13. Yanxin Wang & Qibin Fan & Li Zhu, 2018. "Variable selection and estimation using a continuous approximation to the $$L_0$$ L 0 penalty," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 70(1), pages 191-214, February.
    14. Liu, Weidong & Luo, Xi, 2015. "Fast and adaptive sparse precision matrix estimation in high dimensions," Journal of Multivariate Analysis, Elsevier, vol. 135(C), pages 153-162.
    15. Laura Freijeiro‐González & Manuel Febrero‐Bande & Wenceslao González‐Manteiga, 2022. "A Critical Review of LASSO and Its Derivatives for Variable Selection Under Dependence Among Covariates," International Statistical Review, International Statistical Institute, vol. 90(1), pages 118-145, April.
    16. Yin, Jianxin & Li, Hongzhe, 2012. "Model selection and estimation in the matrix normal graphical model," Journal of Multivariate Analysis, Elsevier, vol. 107(C), pages 119-140.
    17. Guan Yu & Yufeng Liu, 2016. "Sparse Regression Incorporating Graphical Structure Among Predictors," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(514), pages 707-720, April.
    18. Zhixuan Fu & Chirag R. Parikh & Bingqing Zhou, 2017. "Penalized variable selection in competing risks regression," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(3), pages 353-376, July.
    19. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    20. Avagyan, Vahe & Nogales, Francisco J., 2015. "D-trace Precision Matrix Estimation Using Adaptive Lasso Penalties," DES - Working Papers. Statistics and Econometrics. WS 21775, Universidad Carlos III de Madrid. Departamento de Estadística.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:16:y:2017:i:3:p:159-171:n:4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.