IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v9y2021i16p1929-d613680.html
   My bibliography  Save this article

Hybrid Fruit-Fly Optimization Algorithm with K-Means for Text Document Clustering

Author

Listed:
  • Timea Bezdan

    (Faculty of Informatics and Computing, Singidunum University, Danijelova 32, 11010 Belgrade, Serbia)

  • Catalin Stoean

    (Human Language Technology Research Center, University of Bucharest, 010014 Bucharest, Romania)

  • Ahmed Al Naamany

    (Department for Mathematics and Computer Science, Modern College of Business and Science, Muscat 113, Oman)

  • Nebojsa Bacanin

    (Faculty of Informatics and Computing, Singidunum University, Danijelova 32, 11010 Belgrade, Serbia)

  • Tarik A. Rashid

    (Computer Science and Engineering Department, University of Kurdistan Hewler, Erbil 44001, Iraq)

  • Miodrag Zivkovic

    (Faculty of Informatics and Computing, Singidunum University, Danijelova 32, 11010 Belgrade, Serbia)

  • K. Venkatachalam

    (Department of Computer Science and Engineering, CHRIST (Deemed to be University), Bangalore 560029, India)

Abstract

The fast-growing Internet results in massive amounts of text data. Due to the large volume of the unstructured format of text data, extracting relevant information and its analysis becomes very challenging. Text document clustering is a text-mining process that partitions the set of text-based documents into mutually exclusive clusters in such a way that documents within the same group are similar to each other, while documents from different clusters differ based on the content. One of the biggest challenges in text clustering is partitioning the collection of text data by measuring the relevance of the content in the documents. Addressing this issue, in this work a hybrid swarm intelligence algorithm with a K-means algorithm is proposed for text clustering. First, the hybrid fruit-fly optimization algorithm is tested on ten unconstrained CEC2019 benchmark functions. Next, the proposed method is evaluated on six standard benchmark text datasets. The experimental evaluation on the unconstrained functions, as well as on text-based documents, indicated that the proposed approach is robust and superior to other state-of-the-art methods.

Suggested Citation

  • Timea Bezdan & Catalin Stoean & Ahmed Al Naamany & Nebojsa Bacanin & Tarik A. Rashid & Miodrag Zivkovic & K. Venkatachalam, 2021. "Hybrid Fruit-Fly Optimization Algorithm with K-Means for Text Document Clustering," Mathematics, MDPI, vol. 9(16), pages 1-19, August.
  • Handle: RePEc:gam:jmathe:v:9:y:2021:i:16:p:1929-:d:613680
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/9/16/1929/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/9/16/1929/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Andrea Lodi & Silvano Martello & Daniele Vigo, 1999. "Heuristic and Metaheuristic Approaches for a Class of Two-Dimensional Bin Packing Problems," INFORMS Journal on Computing, INFORMS, vol. 11(4), pages 345-357, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mohammed Azmi Al-Betar & Ammar Kamal Abasi & Ghazi Al-Naymat & Kamran Arshad & Sharif Naser Makhadmeh, 2023. "Optimization of scientific publications clustering with ensemble approach for topic extraction," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(5), pages 2819-2877, May.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Francisco Trespalacios & Ignacio E. Grossmann, 2017. "Symmetry breaking for generalized disjunctive programming formulation of the strip packing problem," Annals of Operations Research, Springer, vol. 258(2), pages 747-759, November.
    2. Schmid, Verena & Doerner, Karl F. & Laporte, Gilbert, 2013. "Rich routing problems arising in supply chain management," European Journal of Operational Research, Elsevier, vol. 224(3), pages 435-448.
    3. Gregory S. Taylor & Yupo Chan & Ghulam Rasool, 2017. "A three-dimensional bin-packing model: exact multicriteria solution and computational complexity," Annals of Operations Research, Springer, vol. 251(1), pages 397-427, April.
    4. Bayliss, Christopher & Currie, Christine S.M. & Bennell, Julia A. & Martinez-Sykora, Antonio, 2021. "Queue-constrained packing: A vehicle ferry case study," European Journal of Operational Research, Elsevier, vol. 289(2), pages 727-741.
    5. Oscar Dominguez & Angel A. Juan & Barry Barrios & Javier Faulin & Alba Agustin, 2016. "Using biased randomization for solving the two-dimensional loading vehicle routing problem with heterogeneous fleet," Annals of Operations Research, Springer, vol. 236(2), pages 383-404, January.
    6. Bennell, J.A. & Cabo, M. & Martínez-Sykora, A., 2018. "A beam search approach to solve the convex irregular bin packing problem with guillotine guts," European Journal of Operational Research, Elsevier, vol. 270(1), pages 89-102.
    7. A Ghanmi & R H A D Shaw, 2008. "Modelling and analysis of Canadian Forces strategic lift and pre-positioning options," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 59(12), pages 1591-1602, December.
    8. Teodor Gabriel Crainic & Guido Perboli & Roberto Tadei, 2008. "Extreme Point-Based Heuristics for Three-Dimensional Bin Packing," INFORMS Journal on Computing, INFORMS, vol. 20(3), pages 368-384, August.
    9. Zhang, Zhenzhen & Wei, Lijun & Lim, Andrew, 2015. "An evolutionary local search for the capacitated vehicle routing problem minimizing fuel consumption under three-dimensional loading constraints," Transportation Research Part B: Methodological, Elsevier, vol. 82(C), pages 20-35.
    10. Iori, Manuel & de Lima, Vinícius L. & Martello, Silvano & Miyazawa, Flávio K. & Monaci, Michele, 2021. "Exact solution techniques for two-dimensional cutting and packing," European Journal of Operational Research, Elsevier, vol. 289(2), pages 399-415.
    11. Felix Prause & Kai Hoppmann-Baum & Boris Defourny & Thorsten Koch, 2021. "The maximum diversity assortment selection problem," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 93(3), pages 521-554, June.
    12. Dominguez, Oscar & Guimarans, Daniel & Juan, Angel A. & de la Nuez, Ignacio, 2016. "A Biased-Randomised Large Neighbourhood Search for the two-dimensional Vehicle Routing Problem with Backhauls," European Journal of Operational Research, Elsevier, vol. 255(2), pages 442-462.
    13. M. Muntazir Mehdi & Le Wang & Sean P. Willems, 2022. "Developing a Maximum Inscribed Rectangle Heuristic to Satisfy Rush Orders for Heavy Plate Steel," Interfaces, INFORMS, vol. 52(3), pages 283-294, May.
    14. Lodi, Andrea & Martello, Silvano & Vigo, Daniele, 2002. "Heuristic algorithms for the three-dimensional bin packing problem," European Journal of Operational Research, Elsevier, vol. 141(2), pages 410-420, September.
    15. Zachariadis, Emmanouil E. & Tarantilis, Christos D. & Kiranoudis, Christos T., 2009. "A Guided Tabu Search for the Vehicle Routing Problem with two-dimensional loading constraints," European Journal of Operational Research, Elsevier, vol. 195(3), pages 729-743, June.
    16. Henriette Koch & Maximilian Schlögell & Andreas Bortfeldt, 2020. "A hybrid algorithm for the vehicle routing problem with three-dimensional loading constraints and mixed backhauls," Journal of Scheduling, Springer, vol. 23(1), pages 71-93, February.
    17. Bortfeldt, Andreas, 2013. "A reduction approach for solving the rectangle packing area minimization problem," European Journal of Operational Research, Elsevier, vol. 224(3), pages 486-496.
    18. Henriette Koch & Andreas Bortfeldt & Gerhard Wäscher, 2018. "A hybrid algorithm for the vehicle routing problem with backhauls, time windows and three-dimensional loading constraints," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 40(4), pages 1029-1075, October.
    19. Lodi, Andrea & Martello, Silvano & Monaci, Michele, 2002. "Two-dimensional packing problems: A survey," European Journal of Operational Research, Elsevier, vol. 141(2), pages 241-252, September.
    20. Emmanouil E. Zachariadis & Christos D. Tarantilis & Chris T. Kiranoudis, 2017. "Vehicle routing strategies for pick-up and delivery service under two dimensional loading constraints," Operational Research, Springer, vol. 17(1), pages 115-143, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:9:y:2021:i:16:p:1929-:d:613680. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.