IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i13p1987-d1423691.html
   My bibliography  Save this article

A New Alternating Suboptimal Dynamic Programming Algorithm with Applications for Feature Selection

Author

Listed:
  • David Podgorelec

    (Faculty of Electrical Engineering and Computer Science, University of Maribor, Koroška Cesta 46, SI-2000 Maribor, Slovenia)

  • Borut Žalik

    (Faculty of Electrical Engineering and Computer Science, University of Maribor, Koroška Cesta 46, SI-2000 Maribor, Slovenia)

  • Domen Mongus

    (Faculty of Electrical Engineering and Computer Science, University of Maribor, Koroška Cesta 46, SI-2000 Maribor, Slovenia)

  • Dino Vlahek

    (Faculty of Electrical Engineering and Computer Science, University of Maribor, Koroška Cesta 46, SI-2000 Maribor, Slovenia)

Abstract

Feature selection is predominantly used in machine learning tasks, such as classification, regression, and clustering. It selects a subset of features (relevant attributes of data points) from a larger set that contributes as optimally as possible to the informativeness of the model. There are exponentially many subsets of a given set, and thus, the exhaustive search approach is only practical for problems with at most a few dozen features. In the past, there have been attempts to reduce the search space using dynamic programming. However, models that consider similarity in pairs of features alongside the quality of individual features do not provide the required optimal substructure. As a result, algorithms, which we will call suboptimal dynamic programming algorithms, find a solution that may deviate significantly from the optimal one. In this paper, we propose an iterative dynamic programming algorithm, which invertsthe order of feature processing in each iteration. Such an alternating approach allows for improving the optimization function by using the score from the previous iteration to estimate the contribution of unprocessed features. The iterative process is proven to converge and terminates when the solution does not change in three successive iterations or when the number of iterations reaches the threshold. Results in more than 95% of tests align with those of the exhaustive search approach, being competitive and often superior to the reference greedy approach. Validation was carried out by comparing the scores of output feature subsets and examining the accuracy of different classifiers learned on these features across nine real-world applications, considering different scenarios with various numbers of features and samples. In the context of feature selection, the proposed algorithm can be characterized as a robust filter method that can improve machine learning models regardless of dataset size. However, we expect that the idea of alternating suboptimal optimization will soon be generalized to tasks beyond feature selection.

Suggested Citation

  • David Podgorelec & Borut Žalik & Domen Mongus & Dino Vlahek, 2024. "A New Alternating Suboptimal Dynamic Programming Algorithm with Applications for Feature Selection," Mathematics, MDPI, vol. 12(13), pages 1-22, June.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:13:p:1987-:d:1423691
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/13/1987/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/13/1987/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Strobl, Carolin & Boulesteix, Anne-Laure & Augustin, Thomas, 2007. "Unbiased split selection for classification trees based on the Gini Index," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 483-501, September.
    2. Papadaki, Katerina P. & Powell, Warren B., 2002. "Exploiting structure in adaptive dynamic programming algorithms for a stochastic batch service problem," European Journal of Operational Research, Elsevier, vol. 142(1), pages 108-127, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. De Munck, Thomas & Chevalier, Philippe & Tancrez, Jean-Sébastien, 2023. "Managing priorities on on-demand service platforms with waiting time differentiation," International Journal of Production Economics, Elsevier, vol. 266(C).
    2. Satır, Benhür & Erenay, Fatih Safa & Bookbinder, James H., 2018. "Shipment consolidation with two demand classes: Rationing the dispatch capacity," European Journal of Operational Research, Elsevier, vol. 270(1), pages 171-184.
    3. Kalbfuss, Jörg & Odermatt, Reto & Stutzer, Alois, 2024. "Medical marijuana laws and mental health in the United States," Health Economics, Policy and Law, Cambridge University Press, vol. 19(3), pages 307-322, July.
    4. Qingrong Tan & Yan Cai & Fen Luo & Dongbo Tu, 2023. "Development of a High-Accuracy and Effective Online Calibration Method in CD-CAT Based on Gini Index," Journal of Educational and Behavioral Statistics, , vol. 48(1), pages 103-141, February.
    5. Daniel R. Jiang & Warren B. Powell, 2015. "An Approximate Dynamic Programming Algorithm for Monotone Value Functions," Operations Research, INFORMS, vol. 63(6), pages 1489-1511, December.
    6. Hapfelmeier, A. & Ulm, K., 2013. "A new variable selection approach using Random Forests," Computational Statistics & Data Analysis, Elsevier, vol. 60(C), pages 50-69.
    7. Burim Ramosaj & Markus Pauly, 2019. "Predicting missing values: a comparative study on non-parametric approaches for imputation," Computational Statistics, Springer, vol. 34(4), pages 1741-1764, December.
    8. Limon Barua & Bo Zou & Yan Zhou & Yulin Liu, 2023. "Modeling household online shopping demand in the U.S.: a machine learning approach and comparative investigation between 2009 and 2017," Transportation, Springer, vol. 50(2), pages 437-476, April.
    9. Yoshiaki Inoue, 2022. "A load-balancing problem for distributed bulk-service queues with size-dependent batch processing times," Queueing Systems: Theory and Applications, Springer, vol. 100(3), pages 449-451, April.
    10. Rachel A. Oldroyd & Michelle A. Morris & Mark Birkin, 2021. "Predicting Food Safety Compliance for Informed Food Outlet Inspections: A Machine Learning Approach," IJERPH, MDPI, vol. 18(23), pages 1-20, November.
    11. Dall'Orto, Leonardo Campo & Crainic, Teodor Gabriel & Leal, Jose Eugenio & Powell, Warren B., 2006. "The single-node dynamic service scheduling and dispatching problem," European Journal of Operational Research, Elsevier, vol. 170(1), pages 1-23, April.
    12. Enrico Biffis & Erik Chavez & Alexis Louaas & Pierre Picard, 2022. "Parametric insurance and technology adoption in developing countries," The Geneva Risk and Insurance Review, Palgrave Macmillan;International Association for the Study of Insurance Economics (The Geneva Association), vol. 47(1), pages 7-44, March.
    13. Paola Zuccolotto, 2010. "Evaluating the impact of a grouping variable on Job Satisfaction drivers," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 19(2), pages 287-305, June.
    14. Gerhard Tutz & Moritz Berger, 2016. "Item-focussed Trees for the Identification of Items in Differential Item Functioning," Psychometrika, Springer;The Psychometric Society, vol. 81(3), pages 727-750, September.
    15. Montes, Ignacio & Miranda, Enrique & Montes, Susana, 2014. "Stochastic dominance with imprecise information," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 868-886.
    16. Shu-Fu Kuo & Yu-Shan Shih, 2012. "Variable selection for functional density trees," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(7), pages 1387-1395, December.
    17. Daniel L. Chen & Markus Loecher, 2022. "Mood and the Malleability of Moral Reasoning: The Impact of Irrelevant Factors on Judicial Decisions," Working Papers hal-03864854, HAL.
    18. Xiaomu Ye & Pengfei Ding & Dawei Jin & Chuanyue Zhou & Yi Li & Jin Zhang, 2023. "Intelligent Analysis of Construction Costs of Shield Tunneling in Complex Geological Conditions by Machine Learning Method," Mathematics, MDPI, vol. 11(6), pages 1-22, March.
    19. Wei-Yin Loh, 2014. "Fifty Years of Classification and Regression Trees," International Statistical Review, International Statistical Institute, vol. 82(3), pages 329-348, December.
    20. Achim Zeileis & Torsten Hothorn, 2013. "A toolbox of permutation tests for structural change," Statistical Papers, Springer, vol. 54(4), pages 931-954, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:13:p:1987-:d:1423691. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.