IDEAS home Printed from https://ideas.repec.org/a/spr/annopr/v341y2024i1d10.1007_s10479-021-04376-7.html
   My bibliography  Save this article

A node-based index for clustering validation of graph data

Author

Listed:
  • Ali Tosyali

    (Rochester Institute of Technology)

  • Behnam Tavakkol

    (Stockton University)

Abstract

Clustering validity indices are designed for evaluating the performance of clustering algorithms in terms of quality of clusters. They are also used for detecting the correct number of clusters. Numerous clustering validity indices have been proposed in the literature for different types of data. However, to the best of our knowledge, there are a few well-known clustering validity indices for graph data. In this paper, we propose a new clustering validity index for graph data called the node-based clustering validity index. The main characteristic of our proposed index is that it captures the exclusive contribution of each node to the separation and compactness of the clusters in a graph. This characteristic gives an advantage to the method which is the detection of the correct number of clusters for cases that two sets of nodes are bonded to each other with a single node. We provide an illustrative example to show that many existing validity indices fail to detect the correct number of clusters in such cases. We evaluate the performance of our proposed index with several experiments on different real-world and synthetic graphs. Experiments show that our proposed node-based index outperforms the existing validity indices for different graph data.

Suggested Citation

  • Ali Tosyali & Behnam Tavakkol, 2024. "A node-based index for clustering validation of graph data," Annals of Operations Research, Springer, vol. 341(1), pages 197-221, October.
  • Handle: RePEc:spr:annopr:v:341:y:2024:i:1:d:10.1007_s10479-021-04376-7
    DOI: 10.1007/s10479-021-04376-7
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10479-021-04376-7
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10479-021-04376-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Johannes Pol & Jean-Paul Rameshkoumar, 2018. "The co-evolution of knowledge and collaboration networks: the role of the technology life-cycle," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(1), pages 307-323, January.
    2. J. van Der Pol & J-P. Rameshkoumar & D. Virapin & B. Zozime, 2015. "The co-evolution of knowledge and collaboration networks: the role of the technology life-cycle," Post-Print hal-02269511, HAL.
    3. J. van der Pol & J-P. Rameshkoumar & D. Virapin & B. Zozime, 2015. "The co-evolution of knowledge and collaboration networks: the role of the technology life-cycle," Post-Print hal-02269511, HAL.
    4. Jia Zhu & Xingcheng Wu & Xueqin Lin & Changqin Huang & Gabriel Pui Cheong Fung & Yong Tang, 2018. "A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 781-794, March.
    5. Scott Emmons & Stephen Kobourov & Mike Gallant & Katy Börner, 2016. "Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale," PLOS ONE, Public Library of Science, vol. 11(7), pages 1-18, July.
    6. Ali Tosyali & Jinho Kim & Jeongsub Choi & Yunyi Kang & Myong K. Jeong, 2020. "New node anomaly detection algorithm based on nonnegative matrix factorization for directed citation networks," Annals of Operations Research, Springer, vol. 288(1), pages 457-474, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Johannes van Der Pol & Jean-Paul Rameshkoumar, 2021. "A method to reduce false positives in a patent query [Une méthode pour réduire les faux positifs dans une requête brevet]," Working Papers hal-03287970, HAL.
    2. Mohamad Alghamdi, 2020. "Economics Performance Under Endogenous Knowledge Spillovers," Asia-Pacific Financial Markets, Springer;Japanese Association of Financial Economics and Engineering, vol. 27(2), pages 175-192, June.
    3. Mohamad Alghamdi, 2023. "Forming Stable R&D Networks in Different Market Structures," Annals of Economics and Finance, Society for AEF, vol. 24(1), pages 91-117, May.
    4. Thomas Rotolo & Scott Frickel, 2019. "When disasters strike environmental science: a case–control study of changes in scientific collaboration networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(1), pages 301-317, July.
    5. Angelou, K. & Maragakis, M. & Kosmidis, K. & Argyrakis, P., 2021. "The evolution of triangular research and innovation collaborations in the European area," Journal of Informetrics, Elsevier, vol. 15(3).
    6. Patrick Wolf & Tobias Buchmann, 2021. "Analyzing development patterns in research networks and technology," Review of Evolutionary Political Economy, Springer, vol. 2(1), pages 55-81, April.
    7. Johannes Pol, 2019. "Introduction to Network Modeling Using Exponential Random Graph Models (ERGM): Theory and an Application Using R-Project," Computational Economics, Springer;Society for Computational Economics, vol. 54(3), pages 845-875, October.
    8. Shino Iwami & Arto Ojala & Chihiro Watanabe & Pekka Neittaanmäki, 2020. "A bibliometric approach to finding fields that co-evolved with information technology," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(1), pages 3-21, January.
    9. Yi Yu & Jaeseung Baek & Ali Tosyali & Myong K. Jeong, 2024. "Robust asymmetric non-negative matrix factorization for clustering nodes in directed networks," Annals of Operations Research, Springer, vol. 341(1), pages 245-265, October.
    10. Jinseok Kim, 2019. "A fast and integrative algorithm for clustering performance evaluation in author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(2), pages 661-681, August.
    11. Laura Freeman & Abdul Rahman & Feras A. Batarseh, 2021. "Enabling Artificial Intelligence Adoption through Assurance," Social Sciences, MDPI, vol. 10(9), pages 1-15, August.
    12. Andrej Srakar, 2017. "Prevalence of Diseases and Health Care Utilization ofthe Self-Employed Artists and TheirEmpirical Determinants: Evidence From a Slovenian Survey," ACEI Working Paper Series AWP-08-2017, Association for Cultural Economics International, revised Sep 2017.
    13. Olivia Fischer & Loris T. Jeitziner & Dirk U. Wulff, 2024. "Affect in science communication: a data-driven analysis of TED Talks on YouTube," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-9, December.
    14. Jaeseung Baek & Myong K. Jeong & Elsayed A. Elsayed, 2024. "Spatial randomness-based anomaly detection approach for monitoring local variations in multimode surface topography," Annals of Operations Research, Springer, vol. 341(1), pages 173-195, October.
    15. Katy Börner & Adam H. Simpson & Andreas Bueckle & Robert L. Goldstone, 2018. "Science map metaphors: a comparison of network versus hexmap-based visualizations," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(2), pages 409-426, February.
    16. Marlen Komorowski & Ruxandra Lupu & Sara Pepper & Justin Lewis, 2021. "Joining the Dots—Understanding the Value Generation of Creative Networks for Sustainability in Local Creative Ecosystems," Sustainability, MDPI, vol. 13(22), pages 1-16, November.
    17. Kevin W. Boyack, 2017. "Investigating the effect of global data on topic detection," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 999-1015, May.
    18. Jinseok Kim & Jenna Kim & Jason Owen‐Smith, 2021. "Ethnicity‐based name partitioning for author name disambiguation using supervised machine learning," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(8), pages 979-994, August.
    19. Humaira Waqas & Muhammad Abdul Qadir, 2021. "Multilayer heuristics based clustering framework (MHCF) for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(9), pages 7637-7678, September.
    20. Jinseok Kim & Jenna Kim, 2020. "Effect of forename string on author name disambiguation," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 71(7), pages 839-855, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:annopr:v:341:y:2024:i:1:d:10.1007_s10479-021-04376-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.