IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v60y2013icp12-31.html
   My bibliography  Save this article

Improved Bayesian inference for the stochastic block model with application to large networks

Author

Listed:
  • McDaid, Aaron F.
  • Murphy, Thomas Brendan
  • Friel, Nial
  • Hurley, Neil J.

Abstract

An efficient MCMC algorithm is presented to cluster the nodes of a network such that nodes with similar role in the network are clustered together. This is known as block-modeling or block-clustering. The model is the stochastic blockmodel (SBM) with block parameters integrated out. The resulting marginal distribution defines a posterior over the number of clusters and cluster memberships. Sampling from this posterior is simpler than from the original SBM as transdimensional MCMC can be avoided. The algorithm is based on the allocation sampler. It requires a prior to be placed on the number of clusters, thereby allowing the number of clusters to be directly estimated by the algorithm, rather than being given as an input parameter. Synthetic and real data are used to test the speed and accuracy of the model and algorithm, including the ability to estimate the number of clusters. The algorithm can scale to networks with up to ten thousand nodes and tens of millions of edges.

Suggested Citation

  • McDaid, Aaron F. & Murphy, Thomas Brendan & Friel, Nial & Hurley, Neil J., 2013. "Improved Bayesian inference for the stochastic block model with application to large networks," Computational Statistics & Data Analysis, Elsevier, vol. 60(C), pages 12-31.
  • Handle: RePEc:eee:csdana:v:60:y:2013:i:c:p:12-31
    DOI: 10.1016/j.csda.2012.10.021
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947312003891
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2012.10.021?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Mark S. Handcock & Adrian E. Raftery & Jeremy M. Tantrum, 2007. "Model‐based clustering for social networks," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 170(2), pages 301-354, March.
    2. Hoff P.D. & Raftery A.E. & Handcock M.S., 2002. "Latent Space Approaches to Social Network Analysis," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 1090-1098, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Nan, Dong-Yang & Yu, Wei & Liu, Xiao & Zhang, Yun-Peng & Dai, Wei-Di, 2018. "A framework of community detection based on individual labels in attribute networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 512(C), pages 523-536.
    2. Prasenjit Ghosh & Debdeep Pati & Anirban Bhattacharya, 2020. "Posterior Contraction Rates for Stochastic Block Models," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 82(2), pages 448-476, August.
    3. Marino, Maria Francesca & Pandolfi, Silvia, 2022. "Hybrid maximum likelihood inference for stochastic block models," Computational Statistics & Data Analysis, Elsevier, vol. 171(C).
    4. Rawya Zreik & Pierre Latouche & Charles Bouveyron, 2017. "The dynamic random subgraph model for the clustering of evolving networks," Computational Statistics, Springer, vol. 32(2), pages 501-533, June.
    5. Ludkin, Matthew, 2020. "Inference for a generalised stochastic block model with unknown number of blocks and non-conjugate edge models," Computational Statistics & Data Analysis, Elsevier, vol. 152(C).
    6. Marco Bertoletti & Nial Friel & Riccardo Rastelli, 2015. "Choosing the number of clusters in a finite mixture model using an exact integrated completed likelihood criterion," METRON, Springer;Sapienza Università di Roma, vol. 73(2), pages 177-199, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Laleh Tafakori & Armin Pourkhanali & Riccardo Rastelli, 2022. "Measuring systemic risk and contagion in the European financial network," Empirical Economics, Springer, vol. 63(1), pages 345-389, July.
    2. Samrachana Adhikari & Beau Dabbs, 2018. "Social Network Analysis in R: A Software Review," Journal of Educational and Behavioral Statistics, , vol. 43(2), pages 225-253, April.
    3. Guang Ouyang & Dipak K. Dey & Panpan Zhang, 2020. "Clique-Based Method for Social Network Clustering," Journal of Classification, Springer;The Classification Society, vol. 37(1), pages 254-274, April.
    4. Samrachana Adhikari & Tracy Sweet & Brian Junker, 2021. "Analysis of longitudinal advice‐seeking networks following implementation of high stakes testing," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(4), pages 1475-1500, October.
    5. West, Robert M. & House, Allan O. & Keen, Justin & Ward, Vicky L., 2015. "Using the structure of social networks to map inter-agency relationships in public health services," Social Science & Medicine, Elsevier, vol. 145(C), pages 107-114.
    6. Chiara Di Maria & Antonino Abbruzzo & Gianfranco Lovison, 2022. "Networks as mediating variables: a Bayesian latent space approach," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 31(4), pages 1015-1035, October.
    7. Chen, Mingli & Fernández-Val, Iván & Weidner, Martin, 2021. "Nonlinear factor models for network and panel data," Journal of Econometrics, Elsevier, vol. 220(2), pages 296-324.
    8. Ick Hoon Jin & Minjeong Jeon, 2019. "A Doubly Latent Space Joint Model for Local Item and Person Dependence in the Analysis of Item Response Data," Psychometrika, Springer;The Psychometric Society, vol. 84(1), pages 236-260, March.
    9. Chih‐Sheng Hsieh & Hans van Kippersluis, 2018. "Smoking initiation: Peers and personality," Quantitative Economics, Econometric Society, vol. 9(2), pages 825-863, July.
    10. Salter-Townshend, Michael & Murphy, Thomas Brendan, 2013. "Variational Bayesian inference for the Latent Position Cluster Model for network data," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 661-671.
    11. Fengqin Tang & Chunning Wang & Jinxia Su & Yuanyuan Wang, 2020. "Spectral clustering-based community detection using graph distance and node attributes," Computational Statistics, Springer, vol. 35(1), pages 69-94, March.
    12. Chih‐Sheng Hsieh & Xu Lin, 2021. "Social interactions and social preferences in social networks," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 36(2), pages 165-189, March.
    13. Irene Crimaldi & Michela Del Vicario & Greg Morrison & Walter Quattrociocchi & Massimo Riccaboni, 2015. "Homophily and Triadic Closure in Evolving Social Networks," Working Papers 3/2015, IMT School for Advanced Studies Lucca, revised May 2015.
    14. Sudhir Voleti & Praveen K. Kopalle & Pulak Ghosh, 2015. "An Interproduct Competition Model Incorporating Branding Hierarchy and Product Similarities Using Store-Level Data," Management Science, INFORMS, vol. 61(11), pages 2720-2738, November.
    15. Teague R. Henry & Kathleen M. Gates & Mitchell J. Prinstein & Douglas Steinley, 2020. "Modeling Heterogeneous Peer Assortment Effects Using Finite Mixture Exponential Random Graph Models," Psychometrika, Springer;The Psychometric Society, vol. 85(1), pages 8-34, March.
    16. Adrian E. Raftery, 2017. "Comment: Extending the Latent Position Model for Networks," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(520), pages 1531-1534, October.
    17. Yingda Lu & Kinshuk Jerath & Param Vir Singh, 2013. "The Emergence of Opinion Leaders in a Networked Online Community: A Dyadic Model with Time Dynamics and a Heuristic for Fast Estimation," Management Science, INFORMS, vol. 59(8), pages 1783-1799, August.
    18. Sosa, Juan & Betancourt, Brenda, 2022. "A latent space model for multilayer network data," Computational Statistics & Data Analysis, Elsevier, vol. 169(C).
    19. Babkin, Sergii & Stewart, Jonathan R. & Long, Xiaochen & Schweinberger, Michael, 2020. "Large-scale estimation of random graph models with local dependence," Computational Statistics & Data Analysis, Elsevier, vol. 152(C).
    20. Michael Braun & André Bonfrer, 2011. "Scalable Inference of Customer Similarities from Interactions Data Using Dirichlet Processes," Marketing Science, INFORMS, vol. 30(3), pages 513-531, 05-06.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:60:y:2013:i:c:p:12-31. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.