IDEAS home Printed from https://ideas.repec.org/a/inm/oropre/v67y2019i1p215-231.html
   My bibliography  Save this article

Good Clusterings Have Large Volume

Author

Listed:
  • Steffen Borgwardt

    (Department of Mathematical and Statistical Sciences, University of Colorado, Denver, Colorado 80204)

  • Felix Happach

    (Department of Mathematics and TUM School of Management, Technische Universität München, 80333 München, Germany)

Abstract

The clustering of a data set is one of the core tasks in data analytics. Many clustering algorithms exhibit a strong contrast between a favorable performance in practice and bad theoretical worst cases. Prime examples are least-squares assignments and the popular k -means algorithm. We are interested in this contrast and study it through polyhedral theory. Several popular clustering algorithms can be connected to finding a vertex of the so-called bounded-shape partition polytopes. The vertices correspond to clusterings with extraordinary separation properties, in particular allowing the construction of a separating power diagram, defined by its so-called sites, such that each cluster has its own cell. First, we quantitatively measure the space of all sites that allow construction of a separating power diagram for a clustering by the volume of the normal cone at the corresponding vertex. This gives rise to a new quality criterion for clusterings, and explains why good clusterings are also the most likely to be found by some classical algorithms. Second, we characterize the edges of the bounded-shape partition polytopes. Through this, we obtain an explicit description of the normal cones. This allows us to compute measures with respect to the new quality criterion, and even compute “most stable” sites, and thereby “most stable” power diagrams, for the separation of clusters. The hardness of these computations depends on the number of edges incident to a vertex, which may be exponential. However, the computational effort is rewarded with a wealth of information that can be gained from the results, which we highlight through some proof-of-concept computations.

Suggested Citation

  • Steffen Borgwardt & Felix Happach, 2019. "Good Clusterings Have Large Volume," Operations Research, INFORMS, vol. 67(1), pages 215-231, January.
  • Handle: RePEc:inm:oropre:v:67:y:2019:i:1:p:215-231
    DOI: 10.1287/opre.2018.1779
    as

    Download full text from publisher

    File URL: https://doi.org/10.1287/opre.2018.1779
    Download Restriction: no

    File URL: https://libkey.io/10.1287/opre.2018.1779?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Suck, Reinhard, 1992. "Geometric and combinatorial properties of the polytope of binary choice probabilities," Mathematical Social Sciences, Elsevier, vol. 23(1), pages 81-102, February.
    2. Ethan Anderes & Steffen Borgwardt & Jacob Miller, 2016. "Discrete Wasserstein barycenters: optimal transport for discrete data," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 84(2), pages 389-409, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Chen, Claire Y.T. & Sun, Edward W. & Miao, Wanyu & Lin, Yi-Bing, 2024. "Reconciling business analytics with graphically initialized subspace clustering for optimal nonlinear pricing," European Journal of Operational Research, Elsevier, vol. 312(3), pages 1086-1107.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Regenwetter, Michel & Marley, A. A. J. & Grofman, Bernard, 2002. "A general concept of majority rule," Mathematical Social Sciences, Elsevier, vol. 43(3), pages 405-428, July.
    2. Marley, A. A. J., 2002. "Random utility models and their applications: recent developments," Mathematical Social Sciences, Elsevier, vol. 43(3), pages 289-302, July.
    3. Suck, Reinhard, 2002. "Independent random utility representations," Mathematical Social Sciences, Elsevier, vol. 43(3), pages 371-389, July.
    4. Johannes von Lindheim, 2023. "Simple approximative algorithms for free-support Wasserstein barycenters," Computational Optimization and Applications, Springer, vol. 85(1), pages 213-246, May.
    5. Smeulders, B., 2018. "Testing a mixture model of single-peaked preferences," Mathematical Social Sciences, Elsevier, vol. 93(C), pages 101-113.
    6. Steffen Borgwardt & Stephan Patterson, 2021. "On the computational complexity of finding a sparse Wasserstein barycenter," Journal of Combinatorial Optimization, Springer, vol. 41(3), pages 736-761, April.
    7. Puccetti, Giovanni & Rüschendorf, Ludger & Vanduffel, Steven, 2020. "On the computation of Wasserstein barycenters," Journal of Multivariate Analysis, Elsevier, vol. 176(C).
    8. Steffen Borgwardt, 2022. "An LP-based, strongly-polynomial 2-approximation algorithm for sparse Wasserstein barycenters," Operational Research, Springer, vol. 22(2), pages 1511-1551, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:67:y:2019:i:1:p:215-231. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.