IDEAS home Printed from https://ideas.repec.org/a/spr/annopr/v339y2024i3d10.1007_s10479-022-04818-w.html
   My bibliography  Save this article

Robustification of the k-means clustering problem and tailored decomposition methods: when more conservative means more accurate

Author

Listed:
  • Jan Pablo Burgard

    (Trier University)

  • Carina Moreira Costa

    (Trier University)

  • Martin Schmidt

    (Trier University)

Abstract

k-means clustering is a classic method of unsupervised learning with the aim of partitioning a given number of measurements into k clusters. In many modern applications, however, this approach suffers from unstructured measurement errors because the k-means clustering result then represents a clustering of the erroneous measurements instead of retrieving the true underlying clustering structure. We resolve this issue by applying techniques from robust optimization to hedge the clustering result against unstructured errors in the observed data. To this end, we derive the strictly and $$\Gamma $$ Γ -robust counterparts of the k-means clustering problem. Since the nominal problem is already NP-hard, global approaches are often not feasible in practice. As a remedy, we develop tailored alternating direction methods by decomposing the search space of the nominal as well as of the robustified problems to quickly obtain feasible points of good quality. Our numerical results reveal an interesting feature: the less conservative $$\Gamma $$ Γ -approach is clearly outperformed by the strictly robust clustering method. In particular, the strictly robustified clustering method is able to recover clusterings of the original data even if only erroneous measurements are observed.

Suggested Citation

  • Jan Pablo Burgard & Carina Moreira Costa & Martin Schmidt, 2024. "Robustification of the k-means clustering problem and tailored decomposition methods: when more conservative means more accurate," Annals of Operations Research, Springer, vol. 339(3), pages 1525-1568, August.
  • Handle: RePEc:spr:annopr:v:339:y:2024:i:3:d:10.1007_s10479-022-04818-w
    DOI: 10.1007/s10479-022-04818-w
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10479-022-04818-w
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10479-022-04818-w?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Sergio Davalos, 2017. "Big Data has a Big Role in Biostatistics with Big Challenges and Big Expectations," Biostatistics and Biometrics Open Access Journal, Juniper Publishers Inc., vol. 1(3), pages 63-64, May.
    2. Burgard, Jan Pablo & Münnich, Ralf T., 2012. "Modelling over and undercounts for design-based Monte Carlo studies in small area estimation: An application to the German register-assisted census," Computational Statistics & Data Analysis, Elsevier, vol. 56(10), pages 2856-2863.
    3. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    4. Andreas Alfons & Matthias Templ & Peter Filzmoser, 2013. "Robust estimation of economic indicators from survey samples based on Pareto tail modelling," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 62(2), pages 271-286, March.
    5. Jochen Gorski & Frank Pfeuffer & Kathrin Klamroth, 2007. "Biconvex sets and optimization with biconvex functions: a survey and extensions," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 66(3), pages 373-407, December.
    6. Richard E. Wendell & Arthur P. Hurter, 1976. "Minimization of a Non-Separable Objective Function Subject to Disjoint Constraints," Operations Research, INFORMS, vol. 24(4), pages 643-657, August.
    7. A. L. Soyster, 1973. "Technical Note—Convex Programming with Set-Inclusive Constraints and Applications to Inexact Linear Programming," Operations Research, INFORMS, vol. 21(5), pages 1154-1157, October.
    8. Dimitris Bertsimas & Melvyn Sim, 2004. "The Price of Robustness," Operations Research, INFORMS, vol. 52(1), pages 35-53, February.
    9. Ricardo Fraiman & Badih Ghattas & Marcela Svarc, 2013. "Interpretable clustering using unsupervised binary trees," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(2), pages 125-145, June.
    10. Björn Geißler & Antonio Morsi & Lars Schewe & Martin Schmidt, 2018. "Solving Highly Detailed Gas Transport MINLPs: Block Separability and Penalty Alternating Direction Methods," INFORMS Journal on Computing, INFORMS, vol. 30(2), pages 309-323, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Thomas Kleinert & Martin Schmidt, 2021. "Computing Feasible Points of Bilevel Problems with a Penalty Alternating Direction Method," INFORMS Journal on Computing, INFORMS, vol. 33(1), pages 198-215, January.
    2. Carina Moreira Costa & Dennis Kreber & Martin Schmidt, 2022. "An Alternating Method for Cardinality-Constrained Optimization: A Computational Study for the Best Subset Selection and Sparse Portfolio Problems," INFORMS Journal on Computing, INFORMS, vol. 34(6), pages 2968-2988, November.
    3. Wenqing Chen & Melvyn Sim & Jie Sun & Chung-Piaw Teo, 2010. "From CVaR to Uncertainty Set: Implications in Joint Chance-Constrained Optimization," Operations Research, INFORMS, vol. 58(2), pages 470-485, April.
    4. Stefan Mišković, 2017. "A VNS-LP algorithm for the robust dynamic maximal covering location problem," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 39(4), pages 1011-1033, October.
    5. Jeong, Jaehee & Premsankar, Gopika & Ghaddar, Bissan & Tarkoma, Sasu, 2024. "A robust optimization approach for placement of applications in edge computing considering latency uncertainty," Omega, Elsevier, vol. 126(C).
    6. Beck, Yasmine & Ljubić, Ivana & Schmidt, Martin, 2023. "A survey on bilevel optimization under uncertainty," European Journal of Operational Research, Elsevier, vol. 311(2), pages 401-426.
    7. Antonio G. Martín & Manuel Díaz-Madroñero & Josefa Mula, 2020. "Master production schedule using robust optimization approaches in an automobile second-tier supplier," Central European Journal of Operations Research, Springer;Slovak Society for Operations Research;Hungarian Operational Research Society;Czech Society for Operations Research;Österr. Gesellschaft für Operations Research (ÖGOR);Slovenian Society Informatika - Section for Operational Research;Croatian Operational Research Society, vol. 28(1), pages 143-166, March.
    8. Roberto Gomes de Mattos & Fabricio Oliveira & Adriana Leiras & Abdon Baptista de Paula Filho & Paulo Gonçalves, 2019. "Robust optimization of the insecticide-treated bed nets procurement and distribution planning under uncertainty for malaria prevention and control," Annals of Operations Research, Springer, vol. 283(1), pages 1045-1078, December.
    9. F. Davarian & J. Behnamian, 2022. "Robust finite-horizon scheduling/rescheduling of operating rooms with elective and emergency surgeries under resource constraints," Journal of Scheduling, Springer, vol. 25(6), pages 625-641, December.
    10. Golovkine, Steven & Klutchnikoff, Nicolas & Patilea, Valentin, 2022. "Clustering multivariate functional data using unsupervised binary trees," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    11. Antonio J. Conejo & Nicholas G. Hall & Daniel Zhuoyu Long & Runhao Zhang, 2021. "Robust Capacity Planning for Project Management," INFORMS Journal on Computing, INFORMS, vol. 33(4), pages 1533-1550, October.
    12. Wang, Fan & Zhang, Chao & Zhang, Hui & Xu, Liang, 2021. "Short-term physician rescheduling model with feature-driven demand for mental disorders outpatients," Omega, Elsevier, vol. 105(C).
    13. Zhi Chen & Melvyn Sim & Peng Xiong, 2020. "Robust Stochastic Optimization Made Easy with RSOME," Management Science, INFORMS, vol. 66(8), pages 3329-3339, August.
    14. Zhang, Yan & Fu, Lijun & Zhu, Wanlu & Bao, Xianqiang & Liu, Cang, 2018. "Robust model predictive control for optimal energy management of island microgrids with uncertainties," Energy, Elsevier, vol. 164(C), pages 1229-1241.
    15. Sahar Moazzeni & Sobhan Mostafayi Darmian & Lars Magnus Hvattum, 2023. "Multiple criteria decision making and robust optimization to design a development plan for small and medium-sized enterprises in the east of Iran," Operational Research, Springer, vol. 23(1), pages 1-32, March.
    16. Detienne, Boris & Lefebvre, Henri & Malaguti, Enrico & Monaci, Michele, 2024. "Adjustable robust optimization with objective uncertainty," European Journal of Operational Research, Elsevier, vol. 312(1), pages 373-384.
    17. Ramezanian, Reza & Behboodi, Zahra, 2017. "Blood supply chain network design under uncertainties in supply and demand considering social aspects," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 104(C), pages 69-82.
    18. Dimitris Bertsimas & Xuan Vinh Doan & Karthik Natarajan & Chung-Piaw Teo, 2010. "Models for Minimax Stochastic Linear Optimization Problems with Risk Aversion," Mathematics of Operations Research, INFORMS, vol. 35(3), pages 580-602, August.
    19. Zhao, Yue & Chen, Zhi & Lim, Andrew & Zhang, Zhenzhen, 2022. "Vessel deployment with limited information: Distributionally robust chance constrained models," Transportation Research Part B: Methodological, Elsevier, vol. 161(C), pages 197-217.
    20. Klamroth, Kathrin & Köbis, Elisabeth & Schöbel, Anita & Tammer, Christiane, 2017. "A unified approach to uncertain optimization," European Journal of Operational Research, Elsevier, vol. 260(2), pages 403-420.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:annopr:v:339:y:2024:i:3:d:10.1007_s10479-022-04818-w. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.