IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0204627.html
   My bibliography  Save this article

Optimal clustering under uncertainty

Author

Listed:
  • Lori A Dalton
  • Marco E Benalcázar
  • Edward R Dougherty

Abstract

Classical clustering algorithms typically either lack an underlying probability framework to make them predictive or focus on parameter estimation rather than defining and minimizing a notion of error. Recent work addresses these issues by developing a probabilistic framework based on the theory of random labeled point processes and characterizing a Bayes clusterer that minimizes the number of misclustered points. The Bayes clusterer is analogous to the Bayes classifier. Whereas determining a Bayes classifier requires full knowledge of the feature-label distribution, deriving a Bayes clusterer requires full knowledge of the point process. When uncertain of the point process, one would like to find a robust clusterer that is optimal over the uncertainty, just as one may find optimal robust classifiers with uncertain feature-label distributions. Herein, we derive an optimal robust clusterer by first finding an effective random point process that incorporates all randomness within its own probabilistic structure and from which a Bayes clusterer can be derived that provides an optimal robust clusterer relative to the uncertainty. This is analogous to the use of effective class-conditional distributions in robust classification. After evaluating the performance of robust clusterers in synthetic mixtures of Gaussians models, we apply the framework to granular imaging, where we make use of the asymptotic granulometric moment theory for granular images to relate robust clustering theory to the application.

Suggested Citation

  • Lori A Dalton & Marco E Benalcázar & Edward R Dougherty, 2018. "Optimal clustering under uncertainty," PLOS ONE, Public Library of Science, vol. 13(10), pages 1-21, October.
  • Handle: RePEc:plo:pone00:0204627
    DOI: 10.1371/journal.pone.0204627
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0204627
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0204627&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0204627?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Fernando A. Quintana & Pilar L. Iglesias, 2003. "Bayesian clustering and product partition models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 65(2), pages 557-574, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Christopher A. Bush & Juhee Lee & Steven N. MacEachern, 2010. "Minimally informative prior distributions for non‐parametric Bayesian analysis," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(2), pages 253-268, March.
    2. An Cheng & Tonghui Chen & Guogang Jiang & Xinru Han, 2021. "Can Major Public Health Emergencies Affect Changes in International Oil Prices?," IJERPH, MDPI, vol. 18(24), pages 1-13, December.
    3. Sylvia Frühwirth-Schnatter & Gertraud Malsiner-Walli, 2019. "From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 33-64, March.
    4. Abel Rodr�guez & Enrique ter Horst, 2011. "Measuring expectations in options markets: an application to the S&P500 index," Quantitative Finance, Taylor & Francis Journals, vol. 11(9), pages 1393-1405, July.
    5. Sara Wade & Stephen G. Walker & Sonia Petrone, 2014. "A Predictive Study of Dirichlet Process Mixture Models for Curve Fitting," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 41(3), pages 580-605, September.
    6. Chai, Jian & Lu, Quanying & Hu, Yi & Wang, Shouyang & Lai, Kin Keung & Liu, Hongtao, 2018. "Analysis and Bayes statistical probability inference of crude oil price change point," Technological Forecasting and Social Change, Elsevier, vol. 126(C), pages 271-283.
    7. Giacomo Bormetti & Maria Elena De Giuli & Danilo Delpini & Claudia Tarantola, 2008. "Bayesian Analysis of Value-at-Risk with Product Partition Models," Papers 0809.0241, arXiv.org, revised May 2009.
    8. Peter Mueller & Fernando Andrés Quintana & Garritt L. Page, 2024. "Regression with Variable Dimension Covariates," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 86(1), pages 185-198, November.
    9. Chen Sui-Pi & Huang Guan-Hua, 2014. "A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(3), pages 275-297, June.
    10. Miguel-Angel Negrín-Hernández & María Martel-Escobar & Francisco-José Vázquez-Polo, 2021. "Bayesian Meta-Analysis for Binary Data and Prior Distribution on Models," IJERPH, MDPI, vol. 18(2), pages 1-18, January.
    11. Loschi, R.H. & Iglesias, P.L. & Arellano-Valle, R.B. & Cruz, F.R.B., 2007. "Full predictivistic modeling of stock market data: Application to change point problems," European Journal of Operational Research, Elsevier, vol. 180(1), pages 282-291, July.
    12. Peter Müeller & Fernando A. Quintana & Garritt Page, 2018. "Nonparametric Bayesian inference in applications," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 27(2), pages 175-206, June.
    13. Ruth Fuentes–García & Ramsés Mena & Stephen Walker, 2010. "A Probability for Classification Based on the Dirichlet Process Mixture Model," Journal of Classification, Springer;The Classification Society, vol. 27(3), pages 389-403, November.
    14. Mark S. Handcock & Adrian E. Raftery & Jeremy M. Tantrum, 2007. "Model‐based clustering for social networks," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 170(2), pages 301-354, March.
    15. Wang, Ketong & Porter, Michael D., 2018. "Optimal Bayesian clustering using non-negative matrix factorization," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 395-411.
    16. Im, Yunju & Tan, Aixin, 2021. "Bayesian subgroup analysis in regression using mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 162(C).
    17. L. G. Leon-Novelo & B. Nebiyou Bekele & P. Müller & F. Quintana & K. Wathen, 2012. "Borrowing Strength with Nonexchangeable Priors over Subpopulations," Biometrics, The International Biometric Society, vol. 68(2), pages 550-558, June.
    18. Jim Q. Smith & Paul E. Anderson & Silvia Liverani, 2008. "Separation measures and the geometry of Bayes factor selection for classification," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 957-980, November.
    19. Maura Mezzetti & Daniele Borzelli & Andrea d’Avella, 2022. "A Bayesian approach to model individual differences and to partition individuals: case studies in growth and learning curves," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 31(5), pages 1245-1271, December.
    20. Chai Jian & Wang Shubin & Xiao Hao, 2013. "Abrupt Changes of Global Oil Price," Journal of Systems Science and Information, De Gruyter, vol. 1(1), pages 38-59, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0204627. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.