IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v21y2024i11p1474-d1514993.html
   My bibliography  Save this article

Feature Selection and Machine Learning Approaches in Prediction of Current E-Cigarette Use Among U.S. Adults in 2022

Author

Listed:
  • Wei Fang

    (West Virginia Clinical and Translational Science Institute, Morgantown, WV 26506, USA)

  • Ying Liu

    (Department of Biostatistics and Epidemiology, College of Public Health, East Tennessee State University, Johnson City, TN 37614, USA)

  • Chun Xu

    (Department of Health and Biomedical Sciences, College of Health Professions, University of Texas Rio Grande Valley, Brownsville, TX 78520, USA)

  • Xingguang Luo

    (Department of Psychiatry, Yale University School of Medicine, New Haven, CT 06516, USA)

  • Kesheng Wang

    (Department of Biobehavioral Health & Nursing Science, College of Nursing, University of South Carolina, Columbia, SC 29208, USA)

Abstract

Feature selection is essentially the process of picking informative and relevant features from a larger collection of features. Few studies have focused on predictors for current e-cigarette use among U.S. adults using feature selection and machine learning (ML) approaches. This study aimed to perform feature selection and develop ML approaches in prediction of current e-cigarette use using the 2022 Health Information National Trends Survey (HINTS 6). The Boruta algorithm and the least absolute shrinkage and selection operator (LASSO) were used to perform feature selection of 71 variables. The random oversampling example (ROSE) method was utilized to deal with imbalance data. Five ML tools including support vector machines (SVMs), logistic regression (LR), random forest (RF), gradient boosting machine (GBM), and extreme gradient boosting (XGBoost) were applied to develop ML models. The overall prevalence of current e-cigarette use was 4.3%. Using the overlapped 15 variables selected by Boruta and LASSO, the RF algorithm provided the best classifier with an accuracy of 0.992, sensitivity of 0.985, F1 score of 0.991, and AUC of 0.999. Weighted logistic regression further confirmed that age, education level, smoking status, belief in the harm of e-cigarette use, binge drinking, belief in alcohol increasing cancer, and the Patient Health Questionnaire-4 (PHQ4) score were associated with e-cigarette use. This study confirmed the strength of ML techniques in survey data, and the findings will guide inquiry into behaviors and mentalities of substance users.

Suggested Citation

  • Wei Fang & Ying Liu & Chun Xu & Xingguang Luo & Kesheng Wang, 2024. "Feature Selection and Machine Learning Approaches in Prediction of Current E-Cigarette Use Among U.S. Adults in 2022," IJERPH, MDPI, vol. 21(11), pages 1-14, November.
  • Handle: RePEc:gam:jijerp:v:21:y:2024:i:11:p:1474-:d:1514993
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/21/11/1474/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/21/11/1474/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Kuhn, Max, 2008. "Building Predictive Models in R Using the caret Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 28(i05).
    2. Kursa, Miron B. & Rudnicki, Witold R., 2010. "Feature Selection with the Boruta Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 36(i11).
    3. Md Raihan-Al-Masud & M Rubaiyat Hossain Mondal, 2020. "Data-driven diagnosis of spinal abnormalities using feature selection and machine learning algorithms," PLOS ONE, Public Library of Science, vol. 15(2), pages 1-21, February.
    4. Michael Short & Adam Geoffrey Cole, 2021. "Factors Associated with E-Cigarette Escalation among High School Students: A Review of the Literature," IJERPH, MDPI, vol. 18(19), pages 1-10, September.
    5. Nkiruka C. Atuegwu & Cheryl Oncken & Reinhard C. Laubenbacher & Mario F. Perez & Eric M. Mortensen, 2020. "Factors Associated with E-Cigarette Use in U.S. Young Adult Never Smokers of Conventional Cigarettes: A Machine Learning Approach," IJERPH, MDPI, vol. 17(19), pages 1-16, October.
    6. Nkiruka C. Atuegwu & Mark D. Litt & Suchitra Krishnan-Sarin & Reinhard C. Laubenbacher & Mario F. Perez & Eric M. Mortensen, 2021. "E-Cigarette Use in Young Adult Never Cigarette Smokers with Disabilities: Results from the Behavioral Risk Factor Surveillance System Survey," IJERPH, MDPI, vol. 18(10), pages 1-13, May.
    7. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    8. Kim A.G.J. Romijnders & Jeroen L.A. Pennings & Liesbeth van Osch & Hein de Vries & Reinskje Talhout, 2019. "A Combination of Factors Related to Smoking Behavior, Attractive Product Characteristics, and Socio-Cognitive Factors are Important to Distinguish a Dual User from an Exclusive E-Cigarette User," IJERPH, MDPI, vol. 16(21), pages 1-12, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Štefan Lyócsa & Petra Vašaničová & Branka Hadji Misheva & Marko Dávid Vateha, 2022. "Default or profit scoring credit systems? Evidence from European and US peer-to-peer lending markets," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 8(1), pages 1-21, December.
    2. Foutzopoulos, Giorgos & Pandis, Nikolaos & Tsagris, Michail, 2024. "Predicting full retirement attainment of NBA players," MPRA Paper 121540, University Library of Munich, Germany.
    3. Van Belle, Jente & Guns, Tias & Verbeke, Wouter, 2021. "Using shared sell-through data to forecast wholesaler demand in multi-echelon supply chains," European Journal of Operational Research, Elsevier, vol. 288(2), pages 466-479.
    4. Jun Wang & Jinyong Huang & Yunlong Hu & Qianwen Guo & Shasha Zhang & Jinglin Tian & Yanqin Niu & Ling Ji & Yuzhong Xu & Peijun Tang & Yaqin He & Yuna Wang & Shuya Zhang & Hao Yang & Kang Kang & Xinchu, 2024. "Terminal modifications independent cell-free RNA sequencing enables sensitive early cancer detection and classification," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    5. Faisal Alsayegh & Moh A Alkhamis & Fatima Ali & Sreeja Attur & Nicholas M Fountain-Jones & Mohammad Zubaid, 2022. "Anemia or other comorbidities? using machine learning to reveal deeper insights into the drivers of acute coronary syndromes in hospital admitted patients," PLOS ONE, Public Library of Science, vol. 17(1), pages 1-15, January.
    6. Franck Ramaharo & Fitiavana Randriamifidy, 2023. "Determinants of renewable energy consumption in Madagascar: Evidence from feature selection algorithms," Papers 2401.13671, arXiv.org.
    7. Erik Duijvelaar & Jack Gisby & James E. Peters & Harm Jan Bogaard & Jurjan Aman, 2024. "Longitudinal plasma proteomics reveals biomarkers of alveolar-capillary barrier disruption in critically ill COVID-19 patients," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    8. Paweł Teisseyre & Robert A. Kłopotek & Jan Mielniczuk, 2016. "Random Subspace Method for high-dimensional regression with the R package regRSM," Computational Statistics, Springer, vol. 31(3), pages 943-972, September.
    9. Alexander Kirpich & Elizabeth A Ainsworth & Jessica M Wedow & Jeremy R B Newman & George Michailidis & Lauren M McIntyre, 2018. "Variable selection in omics data: A practical evaluation of small sample sizes," PLOS ONE, Public Library of Science, vol. 13(6), pages 1-19, June.
    10. Sara Saadatmand & Khodakaram Salimifard & Reza Mohammadi & Alex Kuiper & Maryam Marzban & Akram Farhadi, 2023. "Using machine learning in prediction of ICU admission, mortality, and length of stay in the early stage of admission of COVID-19 patients," Annals of Operations Research, Springer, vol. 328(1), pages 1043-1071, September.
    11. Patrick C Eschenfeldt & Uri Kartoun & Curtis R Heberle & Chung Yin Kong & Norman S Nishioka & Kenney Ng & Sagar Kamarthi & Chin Hur, 2018. "Analysis of factors associated with extended recovery time after colonoscopy," PLOS ONE, Public Library of Science, vol. 13(6), pages 1-16, June.
    12. Rachel Sippy & Daniel F Farrell & Daniel A Lichtenstein & Ryan Nightingale & Megan A Harris & Joseph Toth & Paris Hantztidiamantis & Nicholas Usher & Cinthya Cueva Aponte & Julio Barzallo Aguilar & An, 2020. "Severity Index for Suspected Arbovirus (SISA): Machine learning for accurate prediction of hospitalization in subjects suspected of arboviral infection," PLOS Neglected Tropical Diseases, Public Library of Science, vol. 14(2), pages 1-20, February.
    13. Bellotti, Anthony & Brigo, Damiano & Gambetti, Paolo & Vrins, Frédéric, 2021. "Forecasting recovery rates on non-performing loans with machine learning," International Journal of Forecasting, Elsevier, vol. 37(1), pages 428-444.
    14. Joshua P White & Simon Dennis & Martin Tomko & Jessica Bell & Stephan Winter, 2021. "Paths to social licence for tracking-data analytics in university research and services," PLOS ONE, Public Library of Science, vol. 16(5), pages 1-19, May.
    15. Jack S. Gisby & Norzawani B. Buang & Artemis Papadaki & Candice L. Clarke & Talat H. Malik & Nicholas Medjeral-Thomas & Damiola Pinheiro & Paige M. Mortimer & Shanice Lewis & Eleanor Sandhu & Stephen , 2022. "Multi-omics identify falling LRRC15 as a COVID-19 severity marker and persistent pro-thrombotic signals in convalescence," Nature Communications, Nature, vol. 13(1), pages 1-21, December.
    16. Nanna Munck & Patrick Murigu Kamau Njage & Pimlapas Leekitcharoenphon & Eva Litrup & Tine Hald, 2020. "Application of Whole‐Genome Sequences and Machine Learning in Source Attribution of Salmonella Typhimurium," Risk Analysis, John Wiley & Sons, vol. 40(9), pages 1693-1705, September.
    17. Tanzeela Khalid & Raphael Aggio & Paul White & Ben De Lacy Costello & Raj Persad & Huda Al-Kateb & Peter Jones & Chris S Probert & Norman Ratcliffe, 2015. "Urinary Volatile Organic Compounds for the Detection of Prostate Cancer," PLOS ONE, Public Library of Science, vol. 10(11), pages 1-15, November.
    18. Jimmy Semakula & Rene A. Corner-Thomas & Stephen T. Morris & Hugh T. Blair & Paul R. Kenyon, 2021. "Application of Machine Learning Algorithms to Predict Body Condition Score from Liveweight Records of Mature Romney Ewes," Agriculture, MDPI, vol. 11(2), pages 1-20, February.
    19. A. Jiran Meitei & Akanksha Saini & Bibhuti Bhusan Mohapatra & Kh. Jitenkumar Singh, 2022. "Predicting child anaemia in the North-Eastern states of India: a machine learning approach," International Journal of System Assurance Engineering and Management, Springer;The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden, vol. 13(6), pages 2949-2962, December.
    20. Kresova, Svetlana & Hess, Sebastian, 2021. "Determinants of Regional Raw Milk Prices in Russia," 2021 Conference, August 17-31, 2021, Virtual 315064, International Association of Agricultural Economists.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:21:y:2024:i:11:p:1474-:d:1514993. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.