IDEAS home Printed from https://ideas.repec.org/a/taf/jnlasa/v109y2014i507p1285-1301.html
   My bibliography  Save this article

Interaction Screening for Ultrahigh-Dimensional Data

Author

Listed:
  • Ning Hao
  • Hao Helen Zhang

Abstract

In ultrahigh-dimensional data analysis, it is extremely challenging to identify important interaction effects, and a top concern in practice is computational feasibility. For a dataset with n observations and p predictors, the augmented design matrix including all linear and order-2 terms is of size n × ( p -super-2 + 3 p )/2. When p is large, say more than tens of hundreds, the number of interactions is enormous and beyond the capacity of standard machines and software tools for storage and analysis. In theory, the interaction-selection consistency is hard to achieve in high-dimensional settings. Interaction effects have heavier tails and more complex covariance structures than main effects in a random design, making theoretical analysis difficult. In this article, we propose to tackle these issues by forward-selection-based procedures called iFOR, which identify interaction effects in a greedy forward fashion while maintaining the natural hierarchical model structure. Two algorithms, iFORT and iFORM, are studied. Computationally, the iFOR procedures are designed to be simple and fast to implement. No complex optimization tools are needed, since only OLS-type calculations are involved; the iFOR algorithms avoid storing and manipulating the whole augmented matrix, so the memory and CPU requirement is minimal; the computational complexity is linear in p for sparse models, hence feasible for p >> n . Theoretically, we prove that they possess sure screening property for ultrahigh-dimensional settings. Numerical examples are used to demonstrate their finite sample performance. Supplementary materials for this article are available online.

Suggested Citation

  • Ning Hao & Hao Helen Zhang, 2014. "Interaction Screening for Ultrahigh-Dimensional Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1285-1301, September.
  • Handle: RePEc:taf:jnlasa:v:109:y:2014:i:507:p:1285-1301
    DOI: 10.1080/01621459.2014.881741
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/01621459.2014.881741
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/01621459.2014.881741?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jun Lu & Dan Wang & Qinqin Hu, 2022. "Interaction screening via canonical correlation," Computational Statistics, Springer, vol. 37(5), pages 2637-2670, November.
    2. Chiou, Hai-Tang & Guo, Meihui & Ing, Ching-Kang, 2020. "Variable selection for high-dimensional regression models with time series and heteroscedastic errors," Journal of Econometrics, Elsevier, vol. 216(1), pages 118-136.
    3. Yao Dong & He Jiang, 2018. "A Two-Stage Regularization Method for Variable Selection and Forecasting in High-Order Interaction Model," Complexity, Hindawi, vol. 2018, pages 1-12, November.
    4. Xiong, Wei & Chen, Yaxian & Ma, Shuangge, 2023. "Unified model-free interaction screening via CV-entropy filter," Computational Statistics & Data Analysis, Elsevier, vol. 180(C).
    5. He Jiang, 2022. "A novel robust structural quadratic forecasting model and applications," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 41(6), pages 1156-1180, September.
    6. Qiu, Debin & Ahn, Jeongyoun, 2020. "Grouped variable screening for ultra-high dimensional data for linear model," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    7. Baiguo An & Guozhong Feng & Jianhua Guo, 2022. "Interaction Identification and Clique Screening for Classification with Ultra-high Dimensional Discrete Features," Journal of Classification, Springer;The Classification Society, vol. 39(1), pages 122-146, March.
    8. Yu, Ke & Luo, Shan, 2024. "Rank-based sequential feature selection for high-dimensional accelerated failure time models with main and interaction effects," Computational Statistics & Data Analysis, Elsevier, vol. 197(C).
    9. Wang, Cheng & Chen, Haozhe & Jiang, Binyan, 2024. "HiQR: An efficient algorithm for high-dimensional quadratic regression with penalties," Computational Statistics & Data Analysis, Elsevier, vol. 192(C).
    10. Ning Hao & Hao Helen Zhang, 2017. "A Note on High-Dimensional Linear Regression With Interactions," The American Statistician, Taylor & Francis Journals, vol. 71(4), pages 291-297, October.
    11. Hong, Hyokyoung G. & Zheng, Qi & Li, Yi, 2019. "Forward regression for Cox models with high-dimensional covariates," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 268-290.
    12. Zhao, Shaofei & Fu, Guifang, 2022. "Distribution-free and model-free multivariate feature screening via multivariate rank distance correlation," Journal of Multivariate Analysis, Elsevier, vol. 192(C).
    13. Randall Reese & Guifang Fu & Geran Zhao & Xiaotian Dai & Xiaotian Li & Kenneth Chiu, 2022. "Epistasis Detection via the Joint Cumulant," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 14(3), pages 514-532, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:jnlasa:v:109:y:2014:i:507:p:1285-1301. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UASA20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.