IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1009275.html
   My bibliography  Save this article

SiGMoiD: A super-statistical generative model for binary data

Author

Listed:
  • Xiaochuan Zhao
  • Germán Plata
  • Purushottam D Dixit

Abstract

In modern computational biology, there is great interest in building probabilistic models to describe collections of a large number of co-varying binary variables. However, current approaches to build generative models rely on modelers’ identification of constraints and are computationally expensive to infer when the number of variables is large (N~100). Here, we address both these issues with Super-statistical Generative Model for binary Data (SiGMoiD). SiGMoiD is a maximum entropy-based framework where we imagine the data as arising from super-statistical system; individual binary variables in a given sample are coupled to the same ‘bath’ whose intensive variables vary from sample to sample. Importantly, unlike standard maximum entropy approaches where modeler specifies the constraints, the SiGMoiD algorithm infers them directly from the data. Due to this optimal choice of constraints, SiGMoiD allows to model collections of a very large number (N>1000) of binary variables. Finally, SiGMoiD offers a reduced dimensional description of the data, allowing us to identify clusters of similar data points as well as binary variables. We illustrate the versatility of SiGMoiD using several datasets spanning several time- and length-scales.Author summary: Collectively varying binary variables are ubiquitous in modern biology. Given that the number of possible configurations of these systems typically far exceeds the number of available samples, generative models have become an essential tool in quantitative descriptions of binary data. The state-of-the-art approaches to build generative models have several conceptual limitations. Specifically, they rely on the modeler choosing system-appropriate constraints, which can be challenging in systems with many complex interactions. Moreover, they are computationally expensive to infer when the number of variables is large (N~100). To address this issue, we propose a theoretical generalization of the maximum entropy approach that allows us to model very high dimensional data; at least an order of magnitude higher than what is currently possible. This framework will be a significant advancement in the computational analysis of covarying binary variables.

Suggested Citation

  • Xiaochuan Zhao & Germán Plata & Purushottam D Dixit, 2021. "SiGMoiD: A super-statistical generative model for binary data," PLOS Computational Biology, Public Library of Science, vol. 17(8), pages 1-13, August.
  • Handle: RePEc:plo:pcbi00:1009275
    DOI: 10.1371/journal.pcbi.1009275
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009275
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1009275&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1009275?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Elad Schneidman & Michael J. Berry & Ronen Segev & William Bialek, 2006. "Weak pairwise correlations imply strongly correlated network states in a neural population," Nature, Nature, vol. 440(7087), pages 1007-1012, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lipovetsky, Stan, 2018. "Quantum paradigm of probability amplitude and complex utility in entangled discrete choice modeling," Journal of choice modelling, Elsevier, vol. 27(C), pages 62-73.
    2. Mark L Ioffe & Michael J Berry II, 2017. "The structured ‘low temperature’ phase of the retinal population code," PLOS Computational Biology, Public Library of Science, vol. 13(10), pages 1-31, October.
    3. Katarína Bod’ová & Enikő Szép & Nicholas H Barton, 2021. "Dynamic maximum entropy provides accurate approximation of structured population dynamics," PLOS Computational Biology, Public Library of Science, vol. 17(12), pages 1-22, December.
    4. MohammadReza Zahedian & Mahsa Bagherikalhor & Andrey Trufanov & G. Reza Jafari, 2022. "Financial Crisis in the Framework of Non-zero Temperature Balance Theory," Papers 2202.03198, arXiv.org.
    5. Gaëlle Desbordes & Jianzhong Jin & Chong Weng & Nicholas A Lesica & Garrett B Stanley & Jose-Manuel Alonso, 2008. "Timing Precision in Population Coding of Natural Scenes in the Early Visual System," PLOS Biology, Public Library of Science, vol. 6(12), pages 1-11, December.
    6. Yasser Roudi & Sheila Nirenberg & Peter E Latham, 2009. "Pairwise Maximum Entropy Models for Studying Large Biological Systems: When They Can Work and When They Can't," PLOS Computational Biology, Public Library of Science, vol. 5(5), pages 1-18, May.
    7. Maulana, Ardian & Situngkir, Hokky, 2015. "Korelasi Bebas-skala dalam Studi Geo-politik Pemilihan [Scale-free correlation within Geopolitics of Election Studies]," MPRA Paper 66351, University Library of Munich, Germany.
    8. Hideaki Shimazaki & Shun-ichi Amari & Emery N Brown & Sonja Grün, 2012. "State-Space Analysis of Time-Varying Higher-Order Spike Correlation for Multiple Neural Spike Train Data," PLOS Computational Biology, Public Library of Science, vol. 8(3), pages 1-27, March.
    9. Timothy R Lezon & Ivet Bahar, 2010. "Using Entropy Maximization to Understand the Determinants of Structural Dynamics beyond Native Contact Topology," PLOS Computational Biology, Public Library of Science, vol. 6(6), pages 1-12, June.
    10. Xiaoyuan Liu & Hayato Ushijima-Mwesigwa & Avradip Mandal & Sarvagya Upadhyay & Ilya Safro & Arnab Roy, 2022. "Leveraging special-purpose hardware for local search heuristics," Computational Optimization and Applications, Springer, vol. 82(1), pages 1-29, May.
    11. Sacha Jennifer van Albada & Moritz Helias & Markus Diesmann, 2015. "Scalability of Asynchronous Networks Is Limited by One-to-One Mapping between Effective Connectivity and Correlations," PLOS Computational Biology, Public Library of Science, vol. 11(9), pages 1-37, September.
    12. Sahar Gelfman & Quanli Wang & Yi-Fan Lu & Diana Hall & Christopher D Bostick & Ryan Dhindsa & Matt Halvorsen & K Melodi McSweeney & Ellese Cotterill & Tom Edinburgh & Michael A Beaumont & Wayne N Fran, 2018. "meaRtools: An R package for the analysis of neuronal networks recorded on microelectrode arrays," PLOS Computational Biology, Public Library of Science, vol. 14(10), pages 1-20, October.
    13. Jason S Prentice & Olivier Marre & Mark L Ioffe & Adrianna R Loback & Gašper Tkačik & Michael J Berry II, 2016. "Error-Robust Modes of the Retinal Population Code," PLOS Computational Biology, Public Library of Science, vol. 12(11), pages 1-32, November.
    14. Simona Cocco & Remi Monasson & Martin Weigt, 2013. "From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction," PLOS Computational Biology, Public Library of Science, vol. 9(8), pages 1-17, August.
    15. Montani, Fernando & Phoka, Elena & Portesi, Mariela & Schultz, Simon R., 2013. "Statistical modelling of higher-order correlations in pools of neural activity," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 392(14), pages 3066-3086.
    16. Jan Humplik & Gašper Tkačik, 2017. "Probabilistic models for neural populations that naturally capture global coupling and criticality," PLOS Computational Biology, Public Library of Science, vol. 13(9), pages 1-26, September.
    17. Richard R Stein & Debora S Marks & Chris Sander, 2015. "Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models," PLOS Computational Biology, Public Library of Science, vol. 11(7), pages 1-22, July.
    18. Ross S Williamson & Maneesh Sahani & Jonathan W Pillow, 2015. "The Equivalence of Information-Theoretic and Likelihood-Based Methods for Neural Dimensionality Reduction," PLOS Computational Biology, Public Library of Science, vol. 11(4), pages 1-31, April.
    19. Urs Köster & Jascha Sohl-Dickstein & Charles M Gray & Bruno A Olshausen, 2014. "Modeling Higher-Order Correlations within Cortical Microcolumns," PLOS Computational Biology, Public Library of Science, vol. 10(7), pages 1-12, July.
    20. N Blasco & P Corredor & S Ferreruela, 2011. "Detecting intentional herding: what lies beneath intraday data in the Spanish stock market," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 62(6), pages 1056-1066, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1009275. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.