IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v263y2017i2p367-379.html
   My bibliography  Save this article

New diagonal bundle method for clustering problems in large data sets

Author

Listed:
  • Karmitsa, Napsu
  • Bagirov, Adil M.
  • Taheri, Sona

Abstract

Clustering is one of the most important tasks in data mining. Recent developments in computer hardware allow us to store in random access memory (RAM) and repeatedly read data sets with hundreds of thousands and even millions of data points. This makes it possible to use conventional clustering algorithms in such data sets. However, these algorithms may need prohibitively large computational time and fail to produce accurate solutions. Therefore, it is important to develop clustering algorithms which are accurate and can provide real time clustering in large data sets. This paper introduces one of them. Using nonsmooth optimization formulation of the clustering problem the objective function is represented as a difference of two convex (DC) functions. Then a new diagonal bundle algorithm that explicitly uses this structure is designed and combined with an incremental approach to solve this problem. The method is evaluated using real world data sets with both large number of attributes and large number of data points. The proposed method is compared with two other clustering algorithms using numerical results.

Suggested Citation

  • Karmitsa, Napsu & Bagirov, Adil M. & Taheri, Sona, 2017. "New diagonal bundle method for clustering problems in large data sets," European Journal of Operational Research, Elsevier, vol. 263(2), pages 367-379.
  • Handle: RePEc:eee:ejores:v:263:y:2017:i:2:p:367-379
    DOI: 10.1016/j.ejor.2017.06.010
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0377221717305283
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ejor.2017.06.010?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. A. Bagirov & B. Ordin & G. Ozturk & A. Xavier, 2015. "An incremental clustering algorithm based on hyperbolic smoothing," Computational Optimization and Applications, Springer, vol. 61(1), pages 219-241, May.
    2. Santi, Éverton & Aloise, Daniel & Blanchard, Simon J., 2016. "A model for clustering data from heterogeneous dissimilarities," European Journal of Operational Research, Elsevier, vol. 253(3), pages 659-672.
    3. Rabello, Rômulo Louzada & Mauri, Geraldo Regis & Ribeiro, Glaydston Mattos & Lorena, Luiz Antonio Nogueira, 2014. "A Clustering Search metaheuristic for the Point-Feature Cartographic Label Placement Problem," European Journal of Operational Research, Elsevier, vol. 234(3), pages 802-808.
    4. Bagirov, Adil M. & Yearwood, John, 2006. "A new nonsmooth optimization algorithm for minimum sum-of-squares clustering problems," European Journal of Operational Research, Elsevier, vol. 170(2), pages 578-596, April.
    5. Carrizosa, Emilio & Mladenović, Nenad & Todosijević, Raca, 2013. "Variable neighborhood search for minimum sum-of-squares clustering on networks," European Journal of Operational Research, Elsevier, vol. 230(2), pages 356-363.
    6. A. Bagirov & A. Rubinov & N. Soukhoroukova & J. Yearwood, 2003. "Unsupervised and supervised data classification via nonsmooth and global optimization," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 11(1), pages 1-75, June.
    7. Adil Bagirov & Napsu Karmitsa & Marko M. Mäkelä, 2014. "Introduction to Nonsmooth Optimization," Springer Books, Springer, edition 127, number 978-3-319-08114-4, June.
    8. Napsu Karmitsa, 2015. "Diagonal Bundle Method for Nonsmooth Sparse Optimization," Journal of Optimization Theory and Applications, Springer, vol. 166(3), pages 889-905, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Veronica Piccialli & Antonio M. Sudoso & Angelika Wiegele, 2022. "SOS-SDP: An Exact Solver for Minimum Sum-of-Squares Clustering," INFORMS Journal on Computing, INFORMS, vol. 34(4), pages 2144-2162, July.
    2. Gambella, Claudio & Ghaddar, Bissan & Naoum-Sawaya, Joe, 2021. "Optimization problems for machine learning: A survey," European Journal of Operational Research, Elsevier, vol. 290(3), pages 807-828.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gambella, Claudio & Ghaddar, Bissan & Naoum-Sawaya, Joe, 2021. "Optimization problems for machine learning: A survey," European Journal of Operational Research, Elsevier, vol. 290(3), pages 807-828.
    2. Manlio Gaudioso & Giovanni Giallombardo & Giovanna Miglionico, 2020. "Essentials of numerical nonsmooth optimization," 4OR, Springer, vol. 18(1), pages 1-47, March.
    3. Chen, Yi-Ting & Sun, Edward W. & Lin, Yi-Bing, 2020. "Merging anomalous data usage in wireless mobile telecommunications: Business analytics with a strategy-focused data-driven approach for sustainability," European Journal of Operational Research, Elsevier, vol. 281(3), pages 687-705.
    4. Kaisa Joki & Adil M. Bagirov & Napsu Karmitsa & Marko M. Mäkelä, 2017. "A proximal bundle method for nonsmooth DC optimization utilizing nonconvex cutting planes," Journal of Global Optimization, Springer, vol. 68(3), pages 501-535, July.
    5. Manlio Gaudioso & Giovanni Giallombardo & Giovanna Miglionico, 2022. "Essentials of numerical nonsmooth optimization," Annals of Operations Research, Springer, vol. 314(1), pages 213-253, July.
    6. Adil Bagirov & Asef Ganjehlou, 2008. "An approximate subgradient algorithm for unconstrained nonsmooth, nonconvex optimization," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 67(2), pages 187-206, April.
    7. A. M. Bagirov & B. Karasözen & M. Sezer, 2008. "Discrete Gradient Method: Derivative-Free Method for Nonsmooth Optimization," Journal of Optimization Theory and Applications, Springer, vol. 137(2), pages 317-334, May.
    8. A. Bagirov & B. Ordin & G. Ozturk & A. Xavier, 2015. "An incremental clustering algorithm based on hyperbolic smoothing," Computational Optimization and Applications, Springer, vol. 61(1), pages 219-241, May.
    9. Outi Montonen & Kaisa Joki, 2018. "Bundle-based descent method for nonsmooth multiobjective DC optimization with inequality constraints," Journal of Global Optimization, Springer, vol. 72(3), pages 403-429, November.
    10. Zhu, Shan & Hu, Xiangpei & Huang, Kai & Yuan, Yufei, 2021. "Optimization of product category allocation in multiple warehouses to minimize splitting of online supermarket customer orders," European Journal of Operational Research, Elsevier, vol. 290(2), pages 556-571.
    11. Gaudioso, Manlio & Giallombardo, Giovanni & Mukhametzhanov, Marat, 2018. "Numerical infinitesimals in a variable metric method for convex nonsmooth optimization," Applied Mathematics and Computation, Elsevier, vol. 318(C), pages 312-320.
    12. Bagirov, Adil M. & Ugon, Julien & Mirzayeva, Hijran, 2013. "Nonsmooth nonconvex optimization approach to clusterwise linear regression problems," European Journal of Operational Research, Elsevier, vol. 229(1), pages 132-142.
    13. Rubinov, A.M. & Soukhorokova, N.V. & Ugon, J., 2006. "Classes and clusters in data analysis," European Journal of Operational Research, Elsevier, vol. 173(3), pages 849-865, September.
    14. Manlio Gaudioso & Giovanni Giallombardo & Giovanna Miglionico, 2018. "Minimizing Piecewise-Concave Functions Over Polyhedra," Mathematics of Operations Research, INFORMS, vol. 43(2), pages 580-597, May.
    15. Konstantin Sonntag & Bennet Gebken & Georg Müller & Sebastian Peitz & Stefan Volkwein, 2024. "A Descent Method for Nonsmooth Multiobjective Optimization in Hilbert Spaces," Journal of Optimization Theory and Applications, Springer, vol. 203(1), pages 455-487, October.
    16. Napsu Karmitsa, 2016. "Testing Different Nonsmooth Formulations of the Lennard–Jones Potential in Atomic Clustering Problems," Journal of Optimization Theory and Applications, Springer, vol. 171(1), pages 316-335, October.
    17. Radek Hrebik & Jaromir Kukal & Josef Jablonsky, 2019. "Optimal unions of hidden classes," Central European Journal of Operations Research, Springer;Slovak Society for Operations Research;Hungarian Operational Research Society;Czech Society for Operations Research;Österr. Gesellschaft für Operations Research (ÖGOR);Slovenian Society Informatika - Section for Operational Research;Croatian Operational Research Society, vol. 27(1), pages 161-177, March.
    18. Felipe Serrano & Robert Schwarz & Ambros Gleixner, 2020. "On the relation between the extended supporting hyperplane algorithm and Kelley’s cutting plane algorithm," Journal of Global Optimization, Springer, vol. 78(1), pages 161-179, September.
    19. Aktaş, Dilay & Lokman, Banu & İnkaya, Tülin & Dejaegere, Gilles, 2024. "Cluster ensemble selection and consensus clustering: A multi-objective optimization approach," European Journal of Operational Research, Elsevier, vol. 314(3), pages 1065-1077.
    20. Welington Oliveira, 2019. "Proximal bundle methods for nonsmooth DC programming," Journal of Global Optimization, Springer, vol. 75(2), pages 523-563, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:263:y:2017:i:2:p:367-379. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.