IDEAS home Printed from https://ideas.repec.org/a/spr/aodasc/v9y2022i6d10.1007_s40745-020-00251-7.html
   My bibliography  Save this article

Monotonicity of the $$\chi ^2$$ χ 2 -statistic and Feature Selection

Author

Listed:
  • Firuz Kamalov

    (Canadian University Dubai)

  • Ho Hon Leung

    (UAE University)

  • Sherif Moussa

    (Canadian University Dubai)

Abstract

Feature selection is an important preprocessing step in analyzing large scale data. In this paper, we prove the monotonicity property of the $$\chi ^2$$ χ 2 -statistic and use it to construct a more robust feature selection method. In particular, we show that $$\chi ^2_{Y, X_1} \le \chi ^2_{Y, (X_1, X_2)}$$ χ Y , X 1 2 ≤ χ Y , ( X 1 , X 2 ) 2 . This result indicates that a new feature should be added to an existing feature set only if it increases the $$\chi ^2$$ χ 2 -statistic beyond a certain threshold. Our stepwise feature selection algorithm significantly reduces the number of features considered at each stage making it more efficient than other similar methods. In addition, the selection process has a natural stopping point thus eliminating the need for user input. Numerical experiments confirm that the proposed algorithm can significantly reduce the number of features required for classification and improve classifier accuracy.

Suggested Citation

  • Firuz Kamalov & Ho Hon Leung & Sherif Moussa, 2022. "Monotonicity of the $$\chi ^2$$ χ 2 -statistic and Feature Selection," Annals of Data Science, Springer, vol. 9(6), pages 1223-1241, December.
  • Handle: RePEc:spr:aodasc:v:9:y:2022:i:6:d:10.1007_s40745-020-00251-7
    DOI: 10.1007/s40745-020-00251-7
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s40745-020-00251-7
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s40745-020-00251-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Firuz Kamalov & Fadi Thabtah, 2017. "A Feature Selection Method Based on Ranked Vector Scores of Features for Classification," Annals of Data Science, Springer, vol. 4(4), pages 483-502, December.
    2. Firuz Kamalov & Ho Hon Leung, 2020. "Outlier Detection in High Dimensional Data," Journal of Information & Knowledge Management (JIKM), World Scientific Publishing Co. Pte. Ltd., vol. 19(01), pages 1-16, March.
    3. Taghi M. Khoshgoftaar & Kehan Gao & Amri Napolitano & Randall Wald, 2014. "A comparative study of iterative and non-iterative feature selection techniques for software defect prediction," Information Systems Frontiers, Springer, vol. 16(5), pages 801-822, November.
    4. Albert Satorra & Peter Bentler, 2010. "Ensuring Positiveness of the Scaled Difference Chi-square Test Statistic," Psychometrika, Springer;The Psychometric Society, vol. 75(2), pages 243-248, June.
    5. Fadi Thabtah & Firuz Kamalov, 2017. "Phishing Detection: A Case Analysis on Classifiers with Rules Using Machine Learning," Journal of Information & Knowledge Management (JIKM), World Scientific Publishing Co. Pte. Ltd., vol. 16(04), pages 1-16, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Firuz Kamalov & Fadi Thabtah & Ho Hon Leung, 2023. "Feature Selection in Imbalanced Data," Annals of Data Science, Springer, vol. 10(6), pages 1527-1541, December.
    2. Majed Rajab, 2019. "Visualisation Model Based on Phishing Features," Journal of Information & Knowledge Management (JIKM), World Scientific Publishing Co. Pte. Ltd., vol. 18(01), pages 1-17, March.
    3. César Merino-Soto & Gina Chávez-Ventura & Verónica López-Fernández & Guillermo M. Chans & Filiberto Toledano-Toledano, 2022. "Learning Self-Regulation Questionnaire (SRQ-L): Psychometric and Measurement Invariance Evidence in Peruvian Undergraduate Students," Sustainability, MDPI, vol. 14(18), pages 1-17, September.
    4. Csilla Horváth & Feray Adigüzel & Hester van Herk, 2013. "Cultural Aspects Of Compulsive Buying In Emerging And Developed Economies: A Cross Cultural Study In Compulsive Buying," Organizations and Markets in Emerging Economies, Faculty of Economics, Vilnius University, vol. 4(2).
    5. Anastasia Stathopoulou & Tommy Kweku Quansah & George Balabanis, 2022. "The Blinding Effects of Team Identification on Sports Corruption: Cross-Cultural Evidence from Sub-Saharan African Countries," Journal of Business Ethics, Springer, vol. 179(2), pages 511-529, August.
    6. Balabanis, George & Stathopoulou, Anastasia, 2021. "The price of social status desire and public self-consciousness in luxury consumption," Journal of Business Research, Elsevier, vol. 123(C), pages 463-475.
    7. Mirjam Braßler & Martin Schultze, 2021. "Students’ Innovation in Education for Sustainable Development—A Longitudinal Study on Interdisciplinary vs. Monodisciplinary Learning," Sustainability, MDPI, vol. 13(3), pages 1-17, January.
    8. Lars Petersen & Jacob Hörisch & Kathleen Jacobs, 2021. "Worse is worse and better doesn't matter?: The effects of favorable and unfavorable environmental information on consumers’ willingness to pay," Journal of Industrial Ecology, Yale University, vol. 25(5), pages 1338-1356, October.
    9. F Rodrigues & R Macedo & DS Teixeira & L Cid & D Monteiro, 2020. "Motivation in sport and exercise: a comparison between the BRSQ and BREQ," Quality & Quantity: International Journal of Methodology, Springer, vol. 54(4), pages 1335-1350, August.
    10. João Fidalgo & João Botelho & Luís Proença & José João Mendes & Vanessa Machado & Ana Sintra Delgado, 2022. "Cross-Cultural Adaptation and Validation of the Portuguese Version of the Psychosocial Impact of Dental Aesthetics Questionnaire," IJERPH, MDPI, vol. 19(16), pages 1-8, August.
    11. Yogita Khatri & Sandeep Kumar Singh, 2023. "An effective feature selection based cross-project defect prediction model for software quality improvement," International Journal of System Assurance Engineering and Management, Springer;The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden, vol. 14(1), pages 154-172, March.
    12. Antonio J. Rodríguez-Hidalgo & Anabel Alcívar & Mauricio Herrera-López, 2019. "Traditional Bullying and Discriminatory Bullying Around Special Educational Needs: Psychometric Properties of Two Instruments to Measure It," IJERPH, MDPI, vol. 16(1), pages 1-15, January.
    13. Frías-Jamilena, Dolores M. & Sabiote-Ortiz, Carmen M. & Martín-Santana, Josefa D. & Beerli-Palacio, Asunción, 2018. "The effect of Cultural Intelligence on consumer-based destination brand equity," Annals of Tourism Research, Elsevier, vol. 72(C), pages 22-36.
    14. Bouncken, Ricarda B. & Fredrich, Viktor, 2016. "Business model innovation in alliances: Successful configurations," Journal of Business Research, Elsevier, vol. 69(9), pages 3584-3590.
    15. Olenka Dworakowski & Zilla M. Huber & Tabea Meier & Ryan L. Boyd & Mike Martin & Andrea B. Horn, 2022. "You Do Not Have to Get through This Alone: Interpersonal Emotion Regulation and Psychosocial Resources during the COVID-19 Pandemic across Four Countries," IJERPH, MDPI, vol. 19(23), pages 1-20, November.
    16. Firuz Kamalov & Linda Smail & Ikhlaas Gurrib, 2021. "Stock price forecast with deep learning," Papers 2103.14081, arXiv.org.
    17. Chen, Tingting & Li, Fuli & Chen, Xiao-Ping & Ou, Zhanying, 2018. "Innovate or die: How should knowledge-worker teams respond to technological turbulence?," Organizational Behavior and Human Decision Processes, Elsevier, vol. 149(C), pages 1-16.
    18. Díaz, Estrella & Martín-Consuegra, David & Esteban, Águeda, 2015. "Perceptions of service cannibalisation: The moderating effect of the type of travel agency," Tourism Management, Elsevier, vol. 48(C), pages 329-342.
    19. Sen Sendjaya & Nathan Eva & Ivan Butar Butar & Mulyadi Robin & Samantha Castles, 2019. "SLBS-6: Validation of a Short Form of the Servant Leadership Behavior Scale," Journal of Business Ethics, Springer, vol. 156(4), pages 941-956, June.
    20. Joana R. Casanova & Leandro S. Almeida & Francisco Peixoto & Rui-Bártolo Ribeiro & João Marôco, 2019. "Academic Expectations Questionnaire: A Proposal for a Short Version," SAGE Open, , vol. 9(1), pages 21582440188, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:aodasc:v:9:y:2022:i:6:d:10.1007_s40745-020-00251-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.