IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v13y2019i3d10.1007_s11634-018-0323-4.html
   My bibliography  Save this article

Directional co-clustering

Author

Listed:
  • Aghiles Salah

    (SIS, Singapore Management University)

  • Mohamed Nadif

    (LIPADE, Paris Descartes University)

Abstract

Co-clustering addresses the problem of simultaneous clustering of both dimensions of a data matrix. When dealing with high dimensional sparse data, co-clustering turns out to be more beneficial than one-sided clustering even if one is interested in clustering along one dimension only. Aside from being high dimensional and sparse, some datasets, such as document-term matrices, exhibit directional characteristics, and the $$L_2$$ L 2 normalization of such data, so that it lies on the surface of a unit hypersphere, is useful. Popular co-clustering assumptions such as Gaussian or Multinomial are inadequate for this type of data. In this paper, we extend the scope of co-clustering to directional data. We present Diagonal Block Mixture of Von Mises–Fisher distributions (dbmovMFs), a co-clustering model which is well suited for directional data lying on a unit hypersphere. By setting the estimate of the model parameters under the maximum likelihood (ML) and classification ML approaches, we develop a class of EM algorithms for estimating dbmovMFs from data. Extensive experiments, on several real-world datasets, confirm the advantage of our approach and demonstrate the effectiveness of our algorithms.

Suggested Citation

  • Aghiles Salah & Mohamed Nadif, 2019. "Directional co-clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(3), pages 591-620, September.
  • Handle: RePEc:spr:advdac:v:13:y:2019:i:3:d:10.1007_s11634-018-0323-4
    DOI: 10.1007/s11634-018-0323-4
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-018-0323-4
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-018-0323-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Celeux, Gilles & Govaert, Gerard, 1992. "A classification EM algorithm for clustering and two stochastic versions," Computational Statistics & Data Analysis, Elsevier, vol. 14(3), pages 315-332, October.
    2. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    3. van Dijk, A. & van Rosmalen, J.M. & Paap, R., 2009. "A Bayesian approach to two-mode clustering," Econometric Institute Research Papers EI 2009-06, Erasmus University Rotterdam, Erasmus School of Economics (ESE), Econometric Institute.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Lazhar Labiod & Mohamed Nadif, 2021. "Efficient regularized spectral data embedding," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(1), pages 99-119, March.
    2. Paul Riverain & Simon Fossier & Mohamed Nadif, 2023. "Poisson degree corrected dynamic stochastic block model," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(1), pages 135-162, March.
    3. Arthur Pewsey & Eduardo García-Portugués, 2021. "Recent advances in directional statistics," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(1), pages 1-58, March.
    4. Stanislav Nagy & Houyem Demni & Davide Buttarazzi & Giovanni C. Porzio, 2024. "Theory of angular depth for classification of directional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 18(3), pages 627-662, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Volodymyr Melnykov & Xuwen Zhu, 2019. "An extension of the K-means algorithm to clustering skewed data," Computational Statistics, Springer, vol. 34(1), pages 373-394, March.
    2. Francesco Dotto & Alessio Farcomeni & Luis Angel García-Escudero & Agustín Mayo-Iscar, 2017. "A fuzzy approach to robust regression clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 11(4), pages 691-710, December.
    3. Zaheer Ahmed & Alberto Cassese & Gerard Breukelen & Jan Schepers, 2023. "E-ReMI: Extended Maximal Interaction Two-mode Clustering," Journal of Classification, Springer;The Classification Society, vol. 40(2), pages 298-331, July.
    4. Rocci, Roberto & Vichi, Maurizio, 2008. "Two-mode multi-partitioning," Computational Statistics & Data Analysis, Elsevier, vol. 52(4), pages 1984-2003, January.
    5. Sharon M. McNicholas & Paul D. McNicholas & Daniel A. Ashlock, 2021. "An Evolutionary Algorithm with Crossover and Mutation for Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 264-279, July.
    6. Alessandro Casa & Charles Bouveyron & Elena Erosheva & Giovanna Menardi, 2021. "Co-clustering of Time-Dependent Data via the Shape Invariant Model," Journal of Classification, Springer;The Classification Society, vol. 38(3), pages 626-649, October.
    7. Roberto Mari & Salvatore Ingrassia & Antonio Punzo, 2023. "Local and Overall Deviance R-Squared Measures for Mixtures of Generalized Linear Models," Journal of Classification, Springer;The Classification Society, vol. 40(2), pages 233-266, July.
    8. Marino, Maria Francesca & Pandolfi, Silvia, 2022. "Hybrid maximum likelihood inference for stochastic block models," Computational Statistics & Data Analysis, Elsevier, vol. 171(C).
    9. Shuchismita Sarkar & Volodymyr Melnykov & Rong Zheng, 2020. "Gaussian mixture modeling and model-based clustering under measurement inconsistency," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(2), pages 379-413, June.
    10. Heath, Jeffrey W. & Fu, Michael C. & Jank, Wolfgang, 2009. "New global optimization algorithms for model-based clustering," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 3999-4017, October.
    11. Hofmeyr, David P., 2020. "Degrees of freedom and model selection for k-means clustering," Computational Statistics & Data Analysis, Elsevier, vol. 149(C).
    12. Govaert, Gérard & Nadif, Mohamed, 2008. "Block clustering with Bernoulli mixture models: Comparison of different approaches," Computational Statistics & Data Analysis, Elsevier, vol. 52(6), pages 3233-3245, February.
    13. Xavier Bry & Lionel Cucala, 2022. "A von Mises–Fisher mixture model for clustering numerical and categorical variables," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(2), pages 429-455, June.
    14. Miriam Aparicio, 2021. "Resiliency and Cooperation or Regarding Social and Collective Competencies for University Achievement. An Analysis from a Systemic Perspective," European Journal of Social Sciences Education and Research Articles, Revistia Research and Publishing, vol. 8, ejser_v8_.
    15. Yunpeng Zhao & Qing Pan & Chengan Du, 2019. "Logistic regression augmented community detection for network data with application in identifying autism‐related gene pathways," Biometrics, The International Biometric Society, vol. 75(1), pages 222-234, March.
    16. Wu, Han-Ming & Tien, Yin-Jing & Chen, Chun-houh, 2010. "GAP: A graphical environment for matrix visualization and cluster analysis," Computational Statistics & Data Analysis, Elsevier, vol. 54(3), pages 767-778, March.
    17. José E. Chacón, 2021. "Explicit Agreement Extremes for a 2 × 2 Table with Given Marginals," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 257-263, July.
    18. F. Marta L. Di Lascio & Andrea Menapace & Roberta Pappadà, 2024. "A spatially‐weighted AMH copula‐based dissimilarity measure for clustering variables: An application to urban thermal efficiency," Environmetrics, John Wiley & Sons, Ltd., vol. 35(1), February.
    19. Yifan Zhu & Chongzhi Di & Ying Qing Chen, 2019. "Clustering Functional Data with Application to Electronic Medication Adherence Monitoring in HIV Prevention Trials," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 11(2), pages 238-261, July.
    20. Irene Vrbik & Paul McNicholas, 2015. "Fractionally-Supervised Classification," Journal of Classification, Springer;The Classification Society, vol. 32(3), pages 359-381, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:13:y:2019:i:3:d:10.1007_s11634-018-0323-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.