IDEAS home Printed from https://ideas.repec.org/p/osf/osfxxx/epb6t.html
   My bibliography  Save this paper

Evidence accumulation clustering using combinations of features

Author

Listed:
  • Wong, William
  • Tsuchiya, Naotsugu

Abstract

Evidence accumulation clustering (EAC) is an ensemble clustering algorithm that can cluster data for arbitrary shapes and numbers of clusters. Here, we present a variant of EAC in which we aimed to better cluster data with a large number of features, many of which may be uninformative. Our new method builds on the existing EAC algorithm by populating the clustering ensemble with clusterings based on combinations of fewer features than the original dataset at a time. Our method also calls for prewhitening the recombined data and weighting the influence of each individual clustering by an estimate of its informativeness. We provide code of an example implementation of the algorithm in Matlab and demonstrate its effectiveness compared to ordinary evidence accumulation clustering with synthetic data.

Suggested Citation

  • Wong, William & Tsuchiya, Naotsugu, 2020. "Evidence accumulation clustering using combinations of features," OSF Preprints epb6t, Center for Open Science.
  • Handle: RePEc:osf:osfxxx:epb6t
    DOI: 10.31219/osf.io/epb6t
    as

    Download full text from publisher

    File URL: https://osf.io/download/5cef897e23fec40017efc1be/
    Download Restriction: no

    File URL: https://libkey.io/10.31219/osf.io/epb6t?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Agnan Kessy & Alex Lewin & Korbinian Strimmer, 2018. "Optimal Whitening and Decorrelation," The American Statistician, Taylor & Francis Journals, vol. 72(4), pages 309-314, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Stan Lipovetsky, 2022. "Canonical Concordance Correlation Analysis," Mathematics, MDPI, vol. 11(1), pages 1-12, December.
    2. Schosser, Josef, 2019. "Consistency between principal and agent with differing time horizons: Computing incentives under risk," European Journal of Operational Research, Elsevier, vol. 277(3), pages 1113-1123.
    3. Damiano Brigo & Xiaoshan Huang & Andrea Pallavicini & Haitz Saez de Ocariz Borde, 2021. "Interpretability in deep learning for finance: a case study for the Heston model," Papers 2104.09476, arXiv.org.
    4. Harold Doran, 2023. "A Collection of Numerical Recipes Useful for Building Scalable Psychometric Applications," Journal of Educational and Behavioral Statistics, , vol. 48(1), pages 37-69, February.
    5. Loperfido, Nicola, 2024. "The skewness of mean–variance normal mixtures," Journal of Multivariate Analysis, Elsevier, vol. 199(C).
    6. Steen MAGNUSSEN, 2018. "An estimation strategy to protect against over-estimating precision in a LiDAR-based prediction of a stand mean," Journal of Forest Science, Czech Academy of Agricultural Sciences, vol. 64(12), pages 497-505.
    7. Minati, Ludovico & Li, Chao & Bartels, Jim & Chakraborty, Parthojit & Li, Zixuan & Yoshimura, Natsue & Frasca, Mattia & Ito, Hiroyuki, 2023. "Accelerometer time series augmentation through externally driving a non-linear dynamical system," Chaos, Solitons & Fractals, Elsevier, vol. 168(C).
    8. Dirk Roeder & Georgi Dimitroff, 2020. "Volatility model calibration with neural networks a comparison between direct and indirect methods," Papers 2007.03494, arXiv.org.
    9. Nikita Moshkov & Michael Bornholdt & Santiago Benoit & Matthew Smith & Claire McQuin & Allen Goodman & Rebecca A. Senft & Yu Han & Mehrtash Babadi & Peter Horvath & Beth A. Cimini & Anne E. Carpenter , 2024. "Learning representations for image-based profiling of perturbations," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    10. Jonathan Gillard & Emily O’Riordan & Anatoly Zhigljavsky, 2023. "Polynomial whitening for high-dimensional data," Computational Statistics, Springer, vol. 38(3), pages 1427-1461, September.
    11. Priddle, Jacob W. & Drovandi, Christopher, 2023. "Transformations in semi-parametric Bayesian synthetic likelihood," Computational Statistics & Data Analysis, Elsevier, vol. 187(C).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:osf:osfxxx:epb6t. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: OSF (email available below). General contact details of provider: https://osf.io/preprints/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.