IDEAS home Printed from https://ideas.repec.org/a/spr/stpapr/v66y2025i2d10.1007_s00362-025-01660-7.html
   My bibliography  Save this article

Detecting small clusters in the stochastic block model

Author

Listed:
  • Fei Ye

    (Capital University of Economics and Business)

  • Jingsong Xiao

    (Tsinghua University)

  • Weidong Ma

    (Tsinghua University)

  • Shiwen Jin

    (Tsinghua University)

  • Ying Yang

    (Tsinghua University)

Abstract

In the study of community detection, the stochastic block model (SBM) is frequently used as an ideal model. Many community detection strategies have been proposed and proved to be consistent under the SBM. However, almost all of these consistencies were established on the common assumption that all communities are balanced. When the communities are not balanced and some communities have small size, those strategies might be less efficient. In this paper, we consider the SBM with small clusters, under which the communities consist of several large clusters that have balanced sizes and some small clusters that have sizes of order smaller than the sizes of large clusters, and propose a two-step method to efficiently detect small clusters as well as the community structure of large clusters. In the first step, to get an initial estimator of the community structure, we treat the nodes in small clusters as outliers and utilize a robust community detecting method to classify the majority of nodes in large clusters correctly. In the second step, we pick out the nodes in small clusters using the entry-wise deviation, and update the community structure. We demonstrate that, under mild conditions, our method can consistently recover the large communities and identify each node in small clusters. Simulation results show that the proposed approach performs well whether the initial estimator is obtained by the semidefinite programming or the regularized spectral clustering. We also illustrate our method on real world networks.

Suggested Citation

  • Fei Ye & Jingsong Xiao & Weidong Ma & Shiwen Jin & Ying Yang, 2025. "Detecting small clusters in the stochastic block model," Statistical Papers, Springer, vol. 66(2), pages 1-34, February.
  • Handle: RePEc:spr:stpapr:v:66:y:2025:i:2:d:10.1007_s00362-025-01660-7
    DOI: 10.1007/s00362-025-01660-7
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00362-025-01660-7
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00362-025-01660-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Jianwei Hu & Hong Qin & Ting Yan & Yunpeng Zhao, 2020. "Corrected Bayesian Information Criterion for Stochastic Block Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(532), pages 1771-1783, December.
    2. D. S. Choi & P. J. Wolfe & E. M. Airoldi, 2012. "Stochastic blockmodels with a growing number of classes," Biometrika, Biometrika Trust, vol. 99(2), pages 273-284.
    3. Gaucher, Solenne & Klopp, Olga & Robin, Geneviève, 2021. "Outlier detection in networks with missing links," Computational Statistics & Data Analysis, Elsevier, vol. 164(C).
    4. Kehui Chen & Jing Lei, 2018. "Network Cross-Validation for Determining the Number of Communities in Network Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(521), pages 241-251, January.
    5. Jianwei Hu & Jingfei Zhang & Hong Qin & Ting Yan & Ji Zhu, 2021. "Using Maximum Entry-Wise Deviation to Test the Goodness of Fit for Stochastic Block Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(535), pages 1373-1382, July.
    6. Daniel L. Sussman & Minh Tang & Donniell E. Fishkind & Carey E. Priebe, 2012. "A Consistent Adjacency Spectral Embedding for Stochastic Blockmodel Graphs," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(499), pages 1119-1128, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Vainora, J., 2024. "Latent Position-Based Modeling of Parameter Heterogeneity," Cambridge Working Papers in Economics 2455, Faculty of Economics, University of Cambridge.
    2. Wu, Qianyong & Hu, Jiang, 2024. "Two-sample test of stochastic block models," Computational Statistics & Data Analysis, Elsevier, vol. 192(C).
    3. Mingyang Ren & Sanguo Zhang & Junhui Wang, 2023. "Consistent estimation of the number of communities via regularized network embedding," Biometrics, The International Biometric Society, vol. 79(3), pages 2404-2416, September.
    4. Jochmans, Koen, 2024. "Nonparametric identification and estimation of stochastic block models from many small networks," Journal of Econometrics, Elsevier, vol. 242(2).
    5. Chung, Jaewon & Bridgeford, Eric & Arroyo, Jesus & Pedigo, Benjamin D. & Saad-Eldin, Ali & Gopalakrishnan, Vivek & Xiang, Liang & Priebe, Carey E. & Vogelstein, Joshua T., 2020. "Statistical Connectomics," OSF Preprints ek4n3, Center for Open Science.
    6. Tidarat Luangrungruang & Urachart Kokaew, 2022. "Adapting Fleming-Type Learning Style Classifications to Deaf Student Behavior," Sustainability, MDPI, vol. 14(8), pages 1-16, April.
    7. Thorben Funke & Till Becker, 2019. "Stochastic block models: A comparison of variants and inference methods," PLOS ONE, Public Library of Science, vol. 14(4), pages 1-40, April.
    8. Lu, Hong & Sang, Xiaoshuang & Zhao, Qinghua & Lu, Jianfeng, 2020. "Community detection algorithm based on nonnegative matrix factorization and pairwise constraints," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 545(C).
    9. Fei Ma & Yixuan Wang & Kum Fai Yuen & Wenlin Wang & Xiaodan Li & Yuan Liang, 2019. "The Evolution of the Spatial Association Effect of Carbon Emissions in Transportation: A Social Network Perspective," IJERPH, MDPI, vol. 16(12), pages 1-23, June.
    10. Stefanos Bennett & Mihai Cucuringu & Gesine Reinert, 2022. "Lead-lag detection and network clustering for multivariate time series with an application to the US equity market," Papers 2201.08283, arXiv.org.
    11. Thibaut Lamadon & Elena Manresa & Stephane Bonhomme, 2016. "Discretizing Unobserved Heterogeneity," 2016 Meeting Papers 1536, Society for Economic Dynamics.
    12. Olga Klopp & Nicolas Verzelen, 2017. "Optimal graphon estimation in cut distance," Working Papers 2017-42, Center for Research in Economics and Statistics.
    13. Can M. Le & Tianxi Li, 2022. "Linear regression and its inference on noisy network‐linked data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(5), pages 1851-1885, November.
    14. Hric, Darko & Kaski, Kimmo & Kivelä, Mikko, 2018. "Stochastic block model reveals maps of citation patterns and their evolution in time," Journal of Informetrics, Elsevier, vol. 12(3), pages 757-783.
    15. Deng, Jiayi & Huang, Danyang & Ding, Yi & Zhu, Yingqiu & Jing, Bingyi & Zhang, Bo, 2024. "Subsampling spectral clustering for stochastic block models in large-scale networks," Computational Statistics & Data Analysis, Elsevier, vol. 189(C).
    16. Tin Lok James Ng & Thomas Brendan Murphy, 2021. "Weighted stochastic block model," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(5), pages 1365-1398, December.
    17. Zhang, Yue & Yuan, Mingao, 2020. "Nonreconstruction of high-dimensional stochastic block model with bounded degree," Statistics & Probability Letters, Elsevier, vol. 158(C).
    18. Diegert, Paul & Jochmans, Koen, 2024. "Nonparametric Identification of Models for Dyadic Data”," TSE Working Papers 24-1574, Toulouse School of Economics (TSE).
    19. Patrick Rubin‐Delanchy & Joshua Cape & Minh Tang & Carey E. Priebe, 2022. "A statistical interpretation of spectral embedding: The generalised random dot product graph," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(4), pages 1446-1473, September.
    20. Dragana M. Pavlović & Bryan R.L. Guillaume & Soroosh Afyouni & Thomas E. Nichols, 2020. "Multi‐subject stochastic blockmodels with mixed effects for adaptive analysis of individual differences in human brain network cluster structure," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 74(3), pages 363-396, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stpapr:v:66:y:2025:i:2:d:10.1007_s00362-025-01660-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.