IDEAS home Printed from https://ideas.repec.org/a/taf/jnlasa/v111y2016i515p1144-1156.html
   My bibliography  Save this article

Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes

Author

Listed:
  • Mingyuan Zhou
  • Oscar Hernan Madrid Padilla
  • James G. Scott

Abstract

We define a family of probability distributions for random count matrices with a potentially unbounded number of rows and columns. The three distributions we consider are derived from the gamma-Poisson, gamma-negative binomial, and beta-negative binomial processes, which we refer to generically as a family of negative-binomial processes. Because the models lead to closed-form update equations within the context of a Gibbs sampler, they are natural candidates for nonparametric Bayesian priors over count matrices. A key aspect of our analysis is the recognition that although the random count matrices within the family are defined by a row-wise construction, their columns can be shown to be independent and identically distributed (iid). This fact is used to derive explicit formulas for drawing all the columns at once. Moreover, by analyzing these matrices’ combinatorial structure, we describe how to sequentially construct a column-iid random count matrix one row at a time, and derive the predictive distribution of a new row count vector with previously unseen features. We describe the similarities and differences between the three priors, and argue that the greater flexibility of the gamma- and beta-negative binomial processes—especially their ability to model over-dispersed, heavy-tailed count data—makes these well suited to a wide variety of real-world applications. As an example of our framework, we construct a naive-Bayes text classifier to categorize a count vector to one of several existing random count matrices of different categories. The classifier supports an unbounded number of features and, unlike most existing methods, it does not require a predefined finite vocabulary to be shared by all the categories, and needs neither feature selection nor parameter tuning. Both the gamma- and beta-negative binomial processes are shown to significantly outperform the gamma-Poisson process when applied to document categorization, with comparable performance to other state-of-the-art supervised text classification algorithms. Supplementary materials for this article are available online.

Suggested Citation

  • Mingyuan Zhou & Oscar Hernan Madrid Padilla & James G. Scott, 2016. "Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(515), pages 1144-1156, July.
  • Handle: RePEc:taf:jnlasa:v:111:y:2016:i:515:p:1144-1156
    DOI: 10.1080/01621459.2015.1075407
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/01621459.2015.1075407
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/01621459.2015.1075407?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Rainer Winkelmann, 2008. "Econometric Analysis of Count Data," Springer Books, Springer, edition 0, number 978-3-540-78389-3, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bono, Pierre-Henri & David, Quentin & Desbordes, Rodolphe & Py, Loriane, 2022. "Metro infrastructure and metropolitan attractiveness," Regional Science and Urban Economics, Elsevier, vol. 93(C).
    2. Kalle Hirvonen & John Hoddinott, 2017. "Agricultural production and children's diets: evidence from rural Ethiopia," Agricultural Economics, International Association of Agricultural Economists, vol. 48(4), pages 469-480, July.
    3. Noel Perceval Assogba & Daowei Zhang, 2020. "An Economic Analysis of Tropical Forest Resource Conservation in a Protected Area," Sustainability, MDPI, vol. 12(14), pages 1-12, July.
    4. Riccardo Crescenzi & Carlo Pietrobelli & Roberta Rabellotti, 2012. "Innovation Drivers, Value Chains and the Geography of Multinational Firms in European Regions," LEQS – LSE 'Europe in Question' Discussion Paper Series 53, European Institute, LSE.
    5. Marco Dueñas & Giorgio Fagiolo, 2013. "Modeling the International-Trade Network: a gravity approach," Journal of Economic Interaction and Coordination, Springer;Society for Economic Science with Heterogeneous Interacting Agents, vol. 8(1), pages 155-178, April.
    6. Carillo, Maria Rosaria & Papagni, Erasmo & Sapio, Alessandro, 2013. "Do collaborations enhance the high-quality output of scientific institutions? Evidence from the Italian Research Assessment Exercise," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 47(C), pages 25-36.
    7. Gamba, Simona & Magazzini, Laura & Pertile, Paolo, 2021. "R&D and market size: Who benefits from orphan drug legislation?," Journal of Health Economics, Elsevier, vol. 80(C).
    8. Darcy Steeg Morris & Kimberly F. Sellers, 2022. "A Flexible Mixed Model for Clustered Count Data," Stats, MDPI, vol. 5(1), pages 1-18, January.
    9. Paul Kwame Nkegbe & Naasegnibe Kuunibe & Samuel Sekyi, 2017. "Poverty and malaria morbidity in the Jirapa District of Ghana: A count regression approach," Cogent Economics & Finance, Taylor & Francis Journals, vol. 5(1), pages 1293472-129, January.
    10. Kenneth W. Moffett & Laurie L. Rice & Ramana Madupalli, 2014. "Young Voters and War: The Iraq War as a Catalyst for Political Participation," Social Science Quarterly, Southwestern Social Science Association, vol. 95(5), pages 1419-1443, December.
    11. Erdogdu, Erkan, 2013. "A cross-country analysis of electricity market reforms: Potential contribution of New Institutional Economics," Energy Economics, Elsevier, vol. 39(C), pages 239-251.
    12. Santos Silva, J.M.C. & Tenreyro, Silvana, 2010. "On the existence of the maximum likelihood estimates in Poisson regression," Economics Letters, Elsevier, vol. 107(2), pages 310-312, May.
    13. Iván Darío Sánchez & Jorge Luis Juliao Rossi & Julio César Zuluaga Jiménez, 2013. "La relación entre las redes externas de trabajo y el desempeno innovador de las pymes colombianas: un análisis del rol moderador del ambiente industrial," Estudios Gerenciales, Universidad Icesi, September.
    14. Morescalchi, Andrea & Pammolli, Fabio & Penner, Orion & Petersen, Alexander M. & Riccaboni, Massimo, 2015. "The evolution of networks of innovators within and across borders: Evidence from patent data," Research Policy, Elsevier, vol. 44(3), pages 651-668.
    15. Ahmet Faruk Aysan & Luis Carlos Castillo-Téllez & Dilek Demirbas & Mustafa Disli, 2021. "Foreign Trade, Education, And Innovative Performance: A Multilevel Analysis," Bulletin of Monetary Economics and Banking, Bank Indonesia, vol. 24(3), pages 413-440, September.
    16. Gerner-Beuerle, Carsten & Mucciarelli, Federico M. & Schuster, Edmund & Siems, Mathias, 2018. "Why do businesses incorporate in other EU Member States? An empirical analysis of the role of conflict of laws rules," International Review of Law and Economics, Elsevier, vol. 56(C), pages 14-27.
    17. J. M. C. Santos Silva & Silvana Tenreyro, 2011. "poisson: Some convergence issues," Stata Journal, StataCorp LP, vol. 11(2), pages 215-225, June.
    18. Roberto León-González, 2019. "Efficient Bayesian inference in generalized inverse gamma processes for stochastic volatility," Econometric Reviews, Taylor & Francis Journals, vol. 38(8), pages 899-920, September.
    19. Sarni Maniar Berliana & Purhadi & Sutikno & Santi Puteri Rahayu, 2020. "Parameter Estimation and Hypothesis Testing of Geographically Weighted Multivariate Generalized Poisson Regression," Mathematics, MDPI, vol. 8(9), pages 1-14, September.
    20. Lluís Bermúdez & Dimitris Karlis & Isabel Morillo, 2020. "Modelling Unobserved Heterogeneity in Claim Counts Using Finite Mixture Models," Risks, MDPI, vol. 8(1), pages 1-13, January.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:jnlasa:v:111:y:2016:i:515:p:1144-1156. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UASA20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.