IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0201874.html
   My bibliography  Save this article

Balancing effort and benefit of K-means clustering algorithms in Big Data realms

Author

Listed:
  • Joaquín Pérez-Ortega
  • Nelva Nely Almanza-Ortega
  • David Romero

Abstract

In this paper we propose a criterion to balance the processing time and the solution quality of k-means cluster algorithms when applied to instances where the number n of objects is big. The majority of the known strategies aimed to improve the performance of k-means algorithms are related to the initialization or classification steps. In contrast, our criterion applies in the convergence step, namely, the process stops whenever the number of objects that change their assigned cluster at any iteration is lower than a given threshold. Through computer experimentation with synthetic and real instances, we found that a threshold close to 0.03n involves a decrease in computing time of about a factor 4/100, yielding solutions whose quality reduces by less than two percent. These findings naturally suggest the usefulness of our criterion in Big Data realms.

Suggested Citation

  • Joaquín Pérez-Ortega & Nelva Nely Almanza-Ortega & David Romero, 2018. "Balancing effort and benefit of K-means clustering algorithms in Big Data realms," PLOS ONE, Public Library of Science, vol. 13(9), pages 1-19, September.
  • Handle: RePEc:plo:pone00:0201874
    DOI: 10.1371/journal.pone.0201874
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0201874
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0201874&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0201874?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Yordan P Raykov & Alexis Boukouvalas & Fahd Baig & Max A Little, 2016. "What to Do When K-Means Clustering Fails: A Simple yet Principled Alternative Algorithm," PLOS ONE, Public Library of Science, vol. 11(9), pages 1-28, September.
    2. Konak, Abdullah & Coit, David W. & Smith, Alice E., 2006. "Multi-objective optimization using genetic algorithms: A tutorial," Reliability Engineering and System Safety, Elsevier, vol. 91(9), pages 992-1007.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Iram Parvez & Jianjian Shen & Ishitaq Hassan & Nannan Zhang, 2021. "Generation of Hydro Energy by Using Data Mining Algorithm for Cascaded Hydropower Plant," Energies, MDPI, vol. 14(2), pages 1-28, January.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gupta, Pankaj & Mittal, Garima & Mehlawat, Mukesh Kumar, 2013. "Expected value multiobjective portfolio rebalancing model with fuzzy parameters," Insurance: Mathematics and Economics, Elsevier, vol. 52(2), pages 190-203.
    2. Weifan Zhong & Lijing Du, 2023. "Predicting Traffic Casualties Using Support Vector Machines with Heuristic Algorithms: A Study Based on Collision Data of Urban Roads," Sustainability, MDPI, vol. 15(4), pages 1-18, February.
    3. Cai, Yuhao & Qian, Xin & Su, Ruihang & Jia, Xiongjie & Ying, Jinhui & Zhao, Tianshou & Jiang, Haoran, 2024. "Thermo-electrochemical modeling of thermally regenerative flow batteries," Applied Energy, Elsevier, vol. 355(C).
    4. Ahmadi, Mohammad H. & Amin Nabakhteh, Mohammad & Ahmadi, Mohammad-Ali & Pourfayaz, Fathollah & Bidi, Mokhtar, 2017. "Investigation and optimization of performance of nano-scale Stirling refrigerator using working fluid as Maxwell–Boltzmann gases," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 483(C), pages 337-350.
    5. Hausken, Kjell & Levitin, Gregory, 2009. "Minmax defense strategy for complex multi-state systems," Reliability Engineering and System Safety, Elsevier, vol. 94(2), pages 577-587.
    6. Akhlaque Ahmad Khan & Ahmad Faiz Minai & Rupendra Kumar Pachauri & Hasmat Malik, 2022. "Optimal Sizing, Control, and Management Strategies for Hybrid Renewable Energy Systems: A Comprehensive Review," Energies, MDPI, vol. 15(17), pages 1-29, August.
    7. Janssens, Jochen & Van den Bergh, Joos & Sörensen, Kenneth & Cattrysse, Dirk, 2015. "Multi-objective microzone-based vehicle routing for courier companies: From tactical to operational planning," European Journal of Operational Research, Elsevier, vol. 242(1), pages 222-231.
    8. H. Liao & Q. Wu, 2013. "Multi-objective optimization by learning automata," Journal of Global Optimization, Springer, vol. 55(2), pages 459-487, February.
    9. Ahmadi, Mohammad H. & Ahmadi, Mohammad-Ali & Maleki, Akbar & Pourfayaz, Fathollah & Bidi, Mokhtar & Açıkkalp, Emin, 2017. "Exergetic sustainability evaluation and multi-objective optimization of performance of an irreversible nanoscale Stirling refrigeration cycle operating with Maxwell–Boltzmann gas," Renewable and Sustainable Energy Reviews, Elsevier, vol. 78(C), pages 80-92.
    10. Abokersh, Mohamed Hany & Vallès, Manel & Cabeza, Luisa F. & Boer, Dieter, 2020. "A framework for the optimal integration of solar assisted district heating in different urban sized communities: A robust machine learning approach incorporating global sensitivity analysis," Applied Energy, Elsevier, vol. 267(C).
    11. Nizami, M.S.H. & Hossain, M.J. & Amin, B.M. Ruhul & Fernandez, Edstan, 2020. "A residential energy management system with bi-level optimization-based bidding strategy for day-ahead bi-directional electricity trading," Applied Energy, Elsevier, vol. 261(C).
    12. Briš, Radim & Byczanski, Petr & Goňo, Radomír & Rusek, Stanislav, 2017. "Discrete maintenance optimization of complex multi-component systems," Reliability Engineering and System Safety, Elsevier, vol. 168(C), pages 80-89.
    13. Schmidt, Adam & Albert, Laura A. & Zheng, Kaiyue, 2021. "Risk management for cyber-infrastructure protection: A bi-objective integer programming approach," Reliability Engineering and System Safety, Elsevier, vol. 205(C).
    14. Zio, E. & Pedroni, N., 2010. "An optimized Line Sampling method for the estimation of the failure probability of nuclear passive systems," Reliability Engineering and System Safety, Elsevier, vol. 95(12), pages 1300-1313.
    15. Juan Carlos Bravo-Rodríguez & Juan Carlos del-Pino-López & Pedro Cruz-Romero, 2019. "A Survey on Optimization Techniques Applied to Magnetic Field Mitigation in Power Systems," Energies, MDPI, vol. 12(7), pages 1-20, April.
    16. Astriani, Yuli & Shafiullah, GM & Shahnia, Farhad, 2021. "Incentive determination of a demand response program for microgrids," Applied Energy, Elsevier, vol. 292(C).
    17. Rezghi, Ali & Riasi, Alireza & Tazraei, Pedram, 2020. "Multi-objective optimization of hydraulic transient condition in a pump-turbine hydropower considering the wicket-gates closing law and the surge tank position," Renewable Energy, Elsevier, vol. 148(C), pages 478-491.
    18. Zeel Maheshwari & Rama Ramakumar, 2017. "Smart Integrated Renewable Energy Systems (SIRES): A Novel Approach for Sustainable Development," Energies, MDPI, vol. 10(8), pages 1-22, August.
    19. S. Mohammad S. Mahmoudi & Sina Salehi & Mortaza Yari & Marc A. Rosen, 2017. "Exergoeconomic Performance Comparison and Optimization of Single-Stage Absorption Heat Transformers," Energies, MDPI, vol. 10(4), pages 1-28, April.
    20. Liu Pai & Tomonobu Senjyu, 2022. "A Yearly Based Multiobjective Park-and-Ride Control Approach Simulation Using Photovoltaic and Battery Energy Storage Systems: Fuxin, China Case Study," Sustainability, MDPI, vol. 14(14), pages 1-19, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0201874. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.