IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v179y2023ics0167947322002043.html
   My bibliography  Save this article

A sparse additive model for high-dimensional interactions with an exposure variable

Author

Listed:
  • Bhatnagar, Sahir R.
  • Lu, Tianyuan
  • Lovato, Amanda
  • Olds, David L.
  • Kobor, Michael S.
  • Meaney, Michael J.
  • O'Donnell, Kieran
  • Yang, Archer Y.
  • Greenwood, Celia M.T.

Abstract

A conceptual paradigm for onset of a new disease is often considered to be the result of changes in entire biological networks whose states are affected by a complex interaction of genetic and environmental factors. However, when modeling a relevant phenotype as a function of high dimensional measurements, power to estimate interactions is low, the number of possible interactions could be enormous and their effects may be non-linear. A method called sail for detecting non-linear interactions with a key environmental or exposure variable in high-dimensional settings which respects the strong or weak heredity constraints is proposed. It is proven that asymptotically, sail possesses the oracle property, i.e., it performs as well as if the true model were known in advance. A computationally efficient fitting algorithm with automatic tuning parameter selection, which scales to high-dimensional datasets is proposed. Simulation results show that sail outperforms existing penalized regression methods in terms of prediction accuracy and support recovery when there are non-linear interactions with an exposure variable. sail is applied to detect non-linear interactions between genes and a prenatal psychosocial intervention program on cognitive performance in children at 4 years of age. Results show that individuals who are genetically predisposed to lower educational attainment are those who stand to benefit the most from the intervention. The proposed algorithms are implemented in an R package available on CRAN (https://cran.r-project.org/package=sail).

Suggested Citation

  • Bhatnagar, Sahir R. & Lu, Tianyuan & Lovato, Amanda & Olds, David L. & Kobor, Michael S. & Meaney, Michael J. & O'Donnell, Kieran & Yang, Archer Y. & Greenwood, Celia M.T., 2023. "A sparse additive model for high-dimensional interactions with an exposure variable," Computational Statistics & Data Analysis, Elsevier, vol. 179(C).
  • Handle: RePEc:eee:csdana:v:179:y:2023:i:c:s0167947322002043
    DOI: 10.1016/j.csda.2022.107624
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947322002043
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2022.107624?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    2. Asad Haris & Ali Shojaie & Noah Simon, 2019. "Nonparametric regression with adaptive truncation via a convex hierarchical penalty," Biometrika, Biometrika Trust, vol. 106(1), pages 87-107.
    3. Ning Hao & Yang Feng & Hao Helen Zhang, 2018. "Model Selection for High-Dimensional Quadratic Regression via Regularization," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(522), pages 615-625, April.
    4. Aysu Okbay & Jonathan P. Beauchamp & Mark Alan Fontana & James J. Lee & Tune H. Pers & Cornelius A. Rietveld & Patrick Turley & Guo-Bo Chen & Valur Emilsson & S. Fleur W. Meddens & Sven Oskarsson & Jo, 2016. "Genome-wide association study identifies 74 loci associated with educational attainment," Nature, Nature, vol. 533(7604), pages 539-542, May.
    5. Hansheng Wang & Guodong Li & Chih‐Ling Tsai, 2007. "Regression coefficient and autoregressive order shrinkage and selection via the lasso," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 69(1), pages 63-78, February.
    6. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    7. Radchenko, Peter & James, Gareth M., 2010. "Variable Selection Using Adaptive Nonlinear Interaction Structures in High Dimensions," Journal of the American Statistical Association, American Statistical Association, vol. 105(492), pages 1541-1553.
    8. Choi, Nam Hee & Li, William & Zhu, Ji, 2010. "Variable Selection With the Strong Heredity Constraint and Its Oracle Property," Journal of the American Statistical Association, American Statistical Association, vol. 105(489), pages 354-364.
    9. Pradeep Ravikumar & John Lafferty & Han Liu & Larry Wasserman, 2009. "Sparse additive models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(5), pages 1009-1030, November.
    10. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    11. Hastie, Nicholas D. & van der Loos, Matthijs J. H. M. & Vitart, Veronique & Völzke, Henry & Wellmann, Jürgen & Yu, Lei & Zhao, Wei & Allik, Jüri & Attia, John R. & Bandinelli, Stefania & Bastardot,, 2013. "GWAS of 126,559 Individuals Identifies Genetic Variants Associated with Educational Attainment," Scholarly Articles 13383543, Harvard University Department of Economics.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    2. Diego Vidaurre & Concha Bielza & Pedro Larrañaga, 2013. "A Survey of L1 Regression," International Statistical Review, International Statistical Institute, vol. 81(3), pages 361-387, December.
    3. Camila Epprecht & Dominique Guegan & Álvaro Veiga & Joel Correa da Rosa, 2017. "Variable selection and forecasting via automated methods for linear models: LASSO/adaLASSO and Autometrics," Post-Print halshs-00917797, HAL.
    4. Zeyu Bian & Erica E. M. Moodie & Susan M. Shortreed & Sahir Bhatnagar, 2023. "Variable selection in regression‐based estimation of dynamic treatment regimes," Biometrics, The International Biometric Society, vol. 79(2), pages 988-999, June.
    5. Camila Epprecht & Dominique Guegan & Álvaro Veiga & Joel Correa da Rosa, 2017. "Variable selection and forecasting via automated methods for linear models: LASSO/adaLASSO and Autometrics," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) halshs-00917797, HAL.
    6. Li Yun & O’Connor George T. & Dupuis Josée & Kolaczyk Eric, 2015. "Modeling gene-covariate interactions in sparse regression with group structure for genome-wide association studies," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 14(3), pages 265-277, June.
    7. Jonathan Boss & Alexander Rix & Yin‐Hsiu Chen & Naveen N. Narisetty & Zhenke Wu & Kelly K. Ferguson & Thomas F. McElrath & John D. Meeker & Bhramar Mukherjee, 2021. "A hierarchical integrative group least absolute shrinkage and selection operator for analyzing environmental mixtures," Environmetrics, John Wiley & Sons, Ltd., vol. 32(8), December.
    8. Ryan A. Peterson & Joseph E. Cavanaugh, 2022. "Ranked sparsity: a cogent regularization framework for selecting and estimating feature interactions and polynomials," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 106(3), pages 427-454, September.
    9. Fabian Scheipl & Thomas Kneib & Ludwig Fahrmeir, 2013. "Penalized likelihood and Bayesian function selection in regression models," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 97(4), pages 349-385, October.
    10. Wang, Cheng & Chen, Haozhe & Jiang, Binyan, 2024. "HiQR: An efficient algorithm for high-dimensional quadratic regression with penalties," Computational Statistics & Data Analysis, Elsevier, vol. 192(C).
    11. Yawei He & Zehua Chen, 2016. "The EBIC and a sequential procedure for feature selection in interactive linear models with high-dimensional data," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 68(1), pages 155-180, February.
    12. Camila Epprecht & Dominique Guegan & Álvaro Veiga, 2013. "Comparing variable selection techniques for linear regression: LASSO and Autometrics," Documents de travail du Centre d'Economie de la Sorbonne 13080, Université Panthéon-Sorbonne (Paris 1), Centre d'Economie de la Sorbonne.
    13. Xia Zheng & Yaohua Rong & Ling Liu & Weihu Cheng, 2021. "A More Accurate Estimation of Semiparametric Logistic Regression," Mathematics, MDPI, vol. 9(19), pages 1-12, September.
    14. Nardi, Y. & Rinaldo, A., 2011. "Autoregressive process modeling via the Lasso procedure," Journal of Multivariate Analysis, Elsevier, vol. 102(3), pages 528-549, March.
    15. Ning Hao & Hao Helen Zhang, 2017. "A Note on High-Dimensional Linear Regression With Interactions," The American Statistician, Taylor & Francis Journals, vol. 71(4), pages 291-297, October.
    16. Feng Li & Yajie Li & Sanying Feng, 2021. "Estimation for Varying Coefficient Models with Hierarchical Structure," Mathematics, MDPI, vol. 9(2), pages 1-18, January.
    17. Wang, Lu & Shen, Jincheng & Thall, Peter F., 2014. "A modified adaptive Lasso for identifying interactions in the Cox model with the heredity constraint," Statistics & Probability Letters, Elsevier, vol. 93(C), pages 126-133.
    18. Yi Liu & Veronika Ročková & Yuexi Wang, 2021. "Variable selection with ABC Bayesian forests," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(3), pages 453-481, July.
    19. Du, Pang & Cheng, Guang & Liang, Hua, 2012. "Semiparametric regression models with additive nonparametric components and high dimensional parametric components," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 2006-2017.
    20. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:179:y:2023:i:c:s0167947322002043. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.