IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v19y2022i4p2338-d752349.html
   My bibliography  Save this article

A Novel Algorithm to Estimate the Significance Level of a Feature Interaction Using the Extreme Gradient Boosting Machine

Author

Listed:
  • Chao-Yu Guo

    (Division of Biostatistics, Institute of Public Health, School of Medicine, National Yang Ming Chiao Tung University, Taipei 112, Taiwan)

  • Ke-Hao Chang

    (Division of Biostatistics, Institute of Public Health, School of Medicine, National Yang Ming Chiao Tung University, Taipei 112, Taiwan)

Abstract

Recent studies have revealed the importance of the interaction effect in cardiac research. An analysis would lead to an erroneous conclusion when the approach failed to tackle a significant interaction. Regression models deal with interaction by adding the product of the two interactive variables. Thus, statistical methods could evaluate the significance and contribution of the interaction term. However, machine learning strategies could not provide the p -value of specific feature interaction. Therefore, we propose a novel machine learning algorithm to assess the p -value of a feature interaction, named the extreme gradient boosting machine for feature interaction (XGB-FI). The first step incorporates the concept of statistical methodology by stratifying the original data into four subgroups according to the two interactive features. The second step builds four XGB machines with cross-validation techniques to avoid overfitting. The third step calculates a newly defined feature interaction ratio (FIR) for all possible combinations of predictors. Finally, we calculate the empirical p -value according to the FIR distribution. Computer simulation studies compared the XGB-FI with the multiple regression model with an interaction term. The results showed that the type I error of XGB-FI is valid under the nominal level of 0.05 when there is no interaction effect. The power of XGB-FI is consistently higher than the multiple regression model in all scenarios we examined. In conclusion, the new machine learning algorithm outperforms the conventional statistical model when searching for an interaction.

Suggested Citation

  • Chao-Yu Guo & Ke-Hao Chang, 2022. "A Novel Algorithm to Estimate the Significance Level of a Feature Interaction Using the Extreme Gradient Boosting Machine," IJERPH, MDPI, vol. 19(4), pages 1-9, February.
  • Handle: RePEc:gam:jijerp:v:19:y:2022:i:4:p:2338-:d:752349
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/19/4/2338/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/19/4/2338/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Chao-Yu Guo & Yu-Chin Chou, 2020. "A novel machine learning strategy for model selections - Stepwise Support Vector Machine (StepSVM)," PLOS ONE, Public Library of Science, vol. 15(8), pages 1-18, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Li, Xiaobin & Sengupta, Tuhin & Si Mohammed, Kamel & Jamaani, Fouad, 2023. "Forecasting the lithium mineral resources prices in China: Evidence with Facebook Prophet (Fb-P) and Artificial Neural Networks (ANN) methods," Resources Policy, Elsevier, vol. 82(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:19:y:2022:i:4:p:2338-:d:752349. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.