IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v250y2016i2p389-399.html
   My bibliography  Save this article

Integer programming models for feature selection: New extensions and a randomized solution algorithm

Author

Listed:
  • Bertolazzi, P.
  • Felici, G.
  • Festa, P.
  • Fiscon, G.
  • Weitschek, E.

Abstract

Feature selection methods are used in machine learning and data analysis to select a subset of features that may be successfully used in the construction of a model for the data. These methods are applied under the assumption that often many of the available features are redundant for the purpose of the analysis. In this paper, we focus on a particular method for feature selection in supervised learning problems, based on a linear programming model with integer variables. For the solution of the optimization problem associated with this approach, we propose a novel robust metaheuristics algorithm that relies on a Greedy Randomized Adaptive Search Procedure, extended with the adoption of short memory and a local search strategy. The performances of our heuristic algorithm are successfully compared with those of well-established feature selection methods, both on simulated and real data from biological applications. The obtained results suggest that our method is particularly suited for problems with a very large number of binary or categorical features.

Suggested Citation

  • Bertolazzi, P. & Felici, G. & Festa, P. & Fiscon, G. & Weitschek, E., 2016. "Integer programming models for feature selection: New extensions and a randomized solution algorithm," European Journal of Operational Research, Elsevier, vol. 250(2), pages 389-399.
  • Handle: RePEc:eee:ejores:v:250:y:2016:i:2:p:389-399
    DOI: 10.1016/j.ejor.2015.09.051
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0377221715008930
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ejor.2015.09.051?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Unler, Alper & Murat, Alper, 2010. "A discrete particle swarm optimization method for feature selection in binary classification problems," European Journal of Operational Research, Elsevier, vol. 206(3), pages 528-539, November.
    2. Piramuthu, Selwyn, 2004. "Evaluating feature selection methods for learning in data mining applications," European Journal of Operational Research, Elsevier, vol. 156(2), pages 483-494, July.
    3. Onur Dagliyan & Fadime Uney-Yuksektepe & I Halil Kavakli & Metin Turkay, 2011. "Optimization Based Tumor Classification from Microarray Gene Expression Data," PLOS ONE, Public Library of Science, vol. 6(2), pages 1-10, February.
    4. Meiri, Ronen & Zahavi, Jacob, 2006. "Using simulated annealing to optimize the feature selection problem in marketing applications," European Journal of Operational Research, Elsevier, vol. 171(3), pages 842-858, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jiménez-Cordero, Asunción & Morales, Juan Miguel & Pineda, Salvador, 2021. "A novel embedded min-max approach for feature selection in nonlinear Support Vector Machine classification," European Journal of Operational Research, Elsevier, vol. 293(1), pages 24-35.
    2. Zhang, Yishi & Zhu, Ruilin & Chen, Zhijun & Gao, Jie & Xia, De, 2021. "Evaluating and selecting features via information theoretic lower bounds of feature inner correlations for high-dimensional data," European Journal of Operational Research, Elsevier, vol. 290(1), pages 235-247.
    3. Li, An-Da & He, Zhen & Wang, Qing & Zhang, Yang, 2019. "Key quality characteristics selection for imbalanced production data using a two-phase bi-objective feature selection method," European Journal of Operational Research, Elsevier, vol. 274(3), pages 978-989.
    4. Manlio Gaudioso & Giovanni Giallombardo & Giovanna Miglionico, 2023. "Sparse optimization via vector k-norm and DC programming with an application to feature selection for support vector machines," Computational Optimization and Applications, Springer, vol. 86(2), pages 745-766, November.
    5. Ghaddar, Bissan & Naoum-Sawaya, Joe, 2018. "High dimensional data classification and feature selection using support vector machines," European Journal of Operational Research, Elsevier, vol. 265(3), pages 993-1004.
    6. Daehan Won & Hasan Manzour & Wanpracha Chaovalitwongse, 2020. "Convex Optimization for Group Feature Selection in Networked Data," INFORMS Journal on Computing, INFORMS, vol. 32(1), pages 182-198, January.
    7. Douek-Pinkovich, Yifat & Ben-Gal, Irad & Raviv, Tal, 2022. "The stochastic test collection problem: Models, exact and heuristic solution approaches," European Journal of Operational Research, Elsevier, vol. 299(3), pages 945-959.
    8. Giovanni Felici & Kumar Parijat Tripathi & Daniela Evangelista & Mario Rosario Guarracino, 2017. "A mixed integer programming-based global optimization framework for analyzing gene expression data," Journal of Global Optimization, Springer, vol. 69(3), pages 727-744, November.
    9. Yifat Douek-Pinkovich & Irad Ben-Gal & Tal Raviv, 2021. "The generalized test collection problem," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(2), pages 372-386, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Fouskakis, D., 2012. "Bayesian variable selection in generalized linear models using a combination of stochastic optimization methods," European Journal of Operational Research, Elsevier, vol. 220(2), pages 414-422.
    2. Lee, In Gyu & Yoon, Sang Won & Won, Daehan, 2022. "A Mixed Integer Linear Programming Support Vector Machine for Cost-Effective Group Feature Selection: Branch-Cut-and-Price Approach," European Journal of Operational Research, Elsevier, vol. 299(3), pages 1055-1068.
    3. Anzanello, Michel J. & Albin, Susan L. & Chaovalitwongse, Wanpracha A., 2012. "Multicriteria variable selection for classification of production batches," European Journal of Operational Research, Elsevier, vol. 218(1), pages 97-105.
    4. Wang, Xin & Liu, Xiaodong & Pedrycz, Witold & Zhu, Xiaolei & Hu, Guangfei, 2012. "Mining axiomatic fuzzy set association rules for classification problems," European Journal of Operational Research, Elsevier, vol. 218(1), pages 202-210.
    5. Ding‐Wen Tan & William Yeoh & Yee Ling Boo & Soung‐Yue Liew, 2013. "The Impact Of Feature Selection: A Data‐Mining Application In Direct Marketing," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 20(1), pages 23-38, January.
    6. Aytug, Haldun, 2015. "Feature selection for support vector machines using Generalized Benders Decomposition," European Journal of Operational Research, Elsevier, vol. 244(1), pages 210-218.
    7. Yu, Shiwei & Wei, Yi-Ming & Fan, Jingli & Zhang, Xian & Wang, Ke, 2012. "Exploring the regional characteristics of inter-provincial CO2 emissions in China: An improved fuzzy clustering analysis based on particle swarm optimization," Applied Energy, Elsevier, vol. 92(C), pages 552-562.
    8. Schlereth, Christian & Stepanchuk, Tanja & Skiera, Bernd, 2010. "Optimization and analysis of the profitability of tariff structures with two-part tariffs," European Journal of Operational Research, Elsevier, vol. 206(3), pages 691-701, November.
    9. Wen, Hanguan & Liu, Xiufeng & Yang, Ming & Lei, Bo & Xu, Cheng & Chen, Zhe, 2024. "A novel approach for identifying customer groups for personalized demand-side management services using household socio-demographic data," Energy, Elsevier, vol. 286(C).
    10. Moraes, Marcelo Botelho da Costa & Nagano, Marcelo Seido, 2014. "Evolutionary models in cash management policies with multiple assets," Economic Modelling, Elsevier, vol. 39(C), pages 1-7.
    11. Cheng-Yu Ho & Ke-Sheng Cheng & Chi-Hang Ang, 2023. "Utilizing the Random Forest Method for Short-Term Wind Speed Forecasting in the Coastal Area of Central Taiwan," Energies, MDPI, vol. 16(3), pages 1-18, January.
    12. Casado Yusta, Silvia & Nœ–ez Letamendía, Laura & Pacheco Bonrostro, Joaqu’n Antonio, 2018. "Predicting Corporate Failure: The GRASP-LOGIT Model || Predicci—n de la quiebra empresarial: el modelo GRASP-LOGIT," Revista de Métodos Cuantitativos para la Economía y la Empresa = Journal of Quantitative Methods for Economics and Business Administration, Universidad Pablo de Olavide, Department of Quantitative Methods for Economics and Business Administration, vol. 26(1), pages 294-314, Diciembre.
    13. Matthias Bogaert & Lex Delaere, 2023. "Ensemble Methods in Customer Churn Prediction: A Comparative Analysis of the State-of-the-Art," Mathematics, MDPI, vol. 11(5), pages 1-28, February.
    14. Juheng Zhang & Selwyn Piramuthu, 2018. "Product recommendation with latent review topics," Information Systems Frontiers, Springer, vol. 20(3), pages 617-625, June.
    15. Du, Wen Sheng & Hu, Bao Qing, 2018. "A fast heuristic attribute reduction approach to ordered decision systems," European Journal of Operational Research, Elsevier, vol. 264(2), pages 440-452.
    16. Bin, Wei & Qinke, Peng & Jing, Zhao & Xiao, Chen, 2012. "A binary particle swarm optimization algorithm inspired by multi-level organizational learning behavior," European Journal of Operational Research, Elsevier, vol. 219(2), pages 224-233.
    17. Juheng Zhang & Selwyn Piramuthu, 0. "Product recommendation with latent review topics," Information Systems Frontiers, Springer, vol. 0, pages 1-9.
    18. Huaijun Wang & Ruomeng Ke & Junhuai Li & Yang An & Kan Wang & Lei Yu, 2018. "A correlation-based binary particle swarm optimization method for feature selection in human activity recognition," International Journal of Distributed Sensor Networks, , vol. 14(4), pages 15501477187, April.
    19. Huang, Yuming & Ge, Bingfeng & Hipel, Keith W. & Fang, Liping & Zhao, Bin & Yang, Kewei, 2023. "Solving the inverse graph model for conflict resolution using a hybrid metaheuristic algorithm," European Journal of Operational Research, Elsevier, vol. 305(2), pages 806-819.
    20. Toshiki Sato & Yuichi Takano & Ryuhei Miyashiro & Akiko Yoshise, 2016. "Feature subset selection for logistic regression via mixed integer optimization," Computational Optimization and Applications, Springer, vol. 64(3), pages 865-880, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:250:y:2016:i:2:p:389-399. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.