IDEAS home Printed from https://ideas.repec.org/a/spr/annopr/v314y2022i1d10.1007_s10479-021-04496-0.html
   My bibliography  Save this article

Metaheuristics for data mining: survey and opportunities for big data

Author

Listed:
  • Clarisse Dhaenens

    (Univ. Lille, CNRS, Centrale Lille)

  • Laetitia Jourdan

    (Univ. Lille, CNRS, Centrale Lille)

Abstract

In the context of big data, many scientific communities aim to provide efficient approaches to accommodate large-scale datasets. This is the case of the machine-learning community, and more generally, the artificial intelligence community. The aim of this article is to explain how data mining problems can be considered as combinatorial optimization problems, and how metaheuristics can be used to address them. Four primary data mining tasks are presented: clustering, association rules, classification, and feature selection. This article follows the publication of a book in 2016 concerning this subject (Dhaenens and Jourdan in Metaheuristics for big data, Wiley, Hoboken, 2016), and an article published in 4OR (Dhaenens and Jourdan in 4OR 17 (2):115–139, 2019); additionally, updated references and an analysis of the current trends are presented.

Suggested Citation

  • Clarisse Dhaenens & Laetitia Jourdan, 2022. "Metaheuristics for data mining: survey and opportunities for big data," Annals of Operations Research, Springer, vol. 314(1), pages 117-140, July.
  • Handle: RePEc:spr:annopr:v:314:y:2022:i:1:d:10.1007_s10479-021-04496-0
    DOI: 10.1007/s10479-021-04496-0
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10479-021-04496-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10479-021-04496-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Olafsson, Sigurdur & Li, Xiaonan & Wu, Shuning, 2008. "Operations research and data mining," European Journal of Operational Research, Elsevier, vol. 187(3), pages 1429-1448, June.
    2. Meisel, Stephan & Mattfeld, Dirk, 2010. "Synergies of Operations Research and Data Mining," European Journal of Operational Research, Elsevier, vol. 206(1), pages 1-10, October.
    3. Youcef Gheraibia & Abdelouahab Moussaoui & Sohag Kabir & Smaine Mazouzi, 2016. "Pe-DFA: Penguins Search Optimisation Algorithm for DNA Fragment Assembly," International Journal of Applied Metaheuristic Computing (IJAMC), IGI Global, vol. 7(2), pages 58-70, April.
    4. Corne, David & Dhaenens, Clarisse & Jourdan, Laetitia, 2012. "Synergies between operations research and data mining: The emerging use of multi-objective approaches," European Journal of Operational Research, Elsevier, vol. 221(3), pages 469-479.
    5. Clarisse Dhaenens & Laetitia Jourdan, 2019. "Metaheuristics for data mining," 4OR, Springer, vol. 17(2), pages 115-139, June.
    6. Ahmad Abubaker & Adam Baharum & Mahmoud Alrefaei, 2015. "Automatic Clustering Using Multi-objective Particle Swarm and Simulated Annealing," PLOS ONE, Public Library of Science, vol. 10(7), pages 1-23, July.
    7. de la Iglesia, B. & Richards, G. & Philpott, M.S. & Rayward-Smith, V.J., 2006. "The application and effectiveness of a multi-objective metaheuristic algorithm for partial classification," European Journal of Operational Research, Elsevier, vol. 169(3), pages 898-917, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Qiyi He & Jin Tu & Zhiwei Ye & Mingwei Wang & Ye Cao & Xianjing Zhou & Wanfang Bai, 2023. "Association Rule Mining through Combining Hybrid Water Wave Optimization Algorithm with Levy Flight," Mathematics, MDPI, vol. 11(5), pages 1-19, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Clarisse Dhaenens & Laetitia Jourdan, 2019. "Metaheuristics for data mining," 4OR, Springer, vol. 17(2), pages 115-139, June.
    2. Zhang, Zhiwang & Gao, Guangxia & Shi, Yong, 2014. "Credit risk evaluation using multi-criteria optimization classifier with kernel, fuzzification and penalty factors," European Journal of Operational Research, Elsevier, vol. 237(1), pages 335-348.
    3. Caballini, Claudia & Gracia, Maria D. & Mar-Ortiz, Julio & Sacone, Simona, 2020. "A combined data mining – optimization approach to manage trucks operations in container terminals with the use of a TAS: Application to an Italian and a Mexican port," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 142(C).
    4. Corne, David & Dhaenens, Clarisse & Jourdan, Laetitia, 2012. "Synergies between operations research and data mining: The emerging use of multi-objective approaches," European Journal of Operational Research, Elsevier, vol. 221(3), pages 469-479.
    5. Hauser, Matthias & Flath, Christoph M. & Thiesse, Frédéric, 2021. "Catch me if you scan: Data-driven prescriptive modeling for smart store environments," European Journal of Operational Research, Elsevier, vol. 294(3), pages 860-873.
    6. Mark Gilchrist & Deana Lehmann Mooers & Glenn Skrubbeltrang & Francine Vachon, 2012. "Knowledge Discovery in Databases for Competitive Advantage," Journal of Management and Strategy, Journal of Management and Strategy, Sciedu Press, vol. 3(2), pages 2-15, April.
    7. Raeesi, Ramin & Sahebjamnia, Navid & Mansouri, S. Afshin, 2023. "The synergistic effect of operational research and big data analytics in greening container terminal operations: A review and future directions," European Journal of Operational Research, Elsevier, vol. 310(3), pages 943-973.
    8. Daniel Gartner & Yiye Zhang & Rema Padman, 2018. "Cognitive workload reduction in hospital information systems," Health Care Management Science, Springer, vol. 21(2), pages 224-243, June.
    9. Tom Pape, 2020. "Prioritising data items for business analytics: Framework and application to human resources," Papers 2012.13813, arXiv.org.
    10. Saridakis, Charalampos & Katsikeas, Constantine S. & Angelidou, Sofia & Oikonomidou, Maria & Pratikakis, Polyvios, 2023. "Mining Twitter lists to extract brand-related associative information for celebrity endorsement," European Journal of Operational Research, Elsevier, vol. 311(1), pages 316-332.
    11. Van Nguyen, Truong & Zhang, Jie & Zhou, Li & Meng, Meng & He, Yong, 2020. "A data-driven optimization of large-scale dry port location using the hybrid approach of data mining and complex network theory," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 134(C).
    12. Gambella, Claudio & Ghaddar, Bissan & Naoum-Sawaya, Joe, 2021. "Optimization problems for machine learning: A survey," European Journal of Operational Research, Elsevier, vol. 290(3), pages 807-828.
    13. Qi Liu & Gengzhong Feng & Nengmin Wang & Giri Kumar Tayi, 2018. "A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge," Information Systems Frontiers, Springer, vol. 20(2), pages 401-416, April.
    14. Besseris, George J., 2012. "Profiling effects in industrial data mining by non-parametric DOE methods: An application on screening checkweighing systems in packaging operations," European Journal of Operational Research, Elsevier, vol. 220(1), pages 147-161.
    15. Matteo Fischetti & Ivana Ljubić & Markus Sinnl, 2017. "Redesigning Benders Decomposition for Large-Scale Facility Location," Management Science, INFORMS, vol. 63(7), pages 2146-2162, July.
    16. Qi Liu & Gengzhong Feng & Nengmin Wang & Giri Kumar Tayi, 0. "A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge," Information Systems Frontiers, Springer, vol. 0, pages 1-16.
    17. Carrizosa, Emilio & Guerrero, Vanesa & Romero Morales, Dolores, 2018. "On Mathematical Optimization for the visualization of frequencies and adjacencies as rectangular maps," European Journal of Operational Research, Elsevier, vol. 265(1), pages 290-302.
    18. Yves Crama & Michel Grabisch & Silvano Martello, 2022. "Preface," Annals of Operations Research, Springer, vol. 314(1), pages 1-3, July.
    19. Davidson, Ian & Tayi, Giri, 2009. "Data preparation using data quality matrices for classification mining," European Journal of Operational Research, Elsevier, vol. 197(2), pages 764-772, September.
    20. Heydari Majeed & Yousefli Amir, 2017. "A new optimization model for market basket analysis with allocation considerations: A genetic algorithm solution approach," Management & Marketing, Sciendo, vol. 12(1), pages 1-11, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:annopr:v:314:y:2022:i:1:d:10.1007_s10479-021-04496-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.