IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v169y2022ics0167947322000032.html
   My bibliography  Save this article

Low-rank matrix denoising for count data using unbiased Kullback-Leibler risk estimation

Author

Listed:
  • Bigot, Jérémie
  • Deledalle, Charles

Abstract

Many statistical studies are concerned with the analysis of observations organized in a matrix form whose elements are count data. When these observations are assumed to follow a Poisson or a multinomial distribution, it is of interest to focus on the estimation of either the intensity matrix (Poisson case) or the compositional matrix (multinomial case) when it is assumed to have a low rank structure. In this setting, it is proposed to construct an estimator minimizing the regularized negative log-likelihood by a nuclear norm penalty. Such an approach easily yields a low-rank matrix-valued estimator with positive entries which belongs to the set of row-stochastic matrices in the multinomial case. Then, as a main contribution, a data-driven procedure is constructed to select the regularization parameter in the construction of such estimators by minimizing (approximately) unbiased estimates of the Kullback-Leibler (KL) risk in such models, which generalize Stein's unbiased risk estimation originally proposed for Gaussian data. The evaluation of these quantities is a delicate problem, and novel methods are introduced to obtain accurate numerical approximation of such unbiased estimates. Simulated data are used to validate this way of selecting regularizing parameters for low-rank matrix estimation from count data. For data following a multinomial distribution, the performances of this approach are also compared to K-fold cross-validation. Examples from a survey study and metagenomics also illustrate the benefits of this methodology for real data analysis.

Suggested Citation

  • Bigot, Jérémie & Deledalle, Charles, 2022. "Low-rank matrix denoising for count data using unbiased Kullback-Leibler risk estimation," Computational Statistics & Data Analysis, Elsevier, vol. 169(C).
  • Handle: RePEc:eee:csdana:v:169:y:2022:i:c:s0167947322000032
    DOI: 10.1016/j.csda.2022.107423
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947322000032
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2022.107423?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Yuanpei Cao & Anru Zhang & Hongzhe Li, 2020. "Multisample estimation of bacterial composition matrices in metagenomics data," Biometrika, Biometrika Trust, vol. 107(1), pages 75-92.
    2. Robin, Geneviève & Josse, Julie & Moulines, Éric & Sardy, Sylvain, 2019. "Low-rank model with covariates for count data with missing values," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 416-434.
    3. Shabalin, Andrey A. & Nobel, Andrew B., 2013. "Reconstruction of a low-rank matrix in the presence of Gaussian noise," Journal of Multivariate Analysis, Elsevier, vol. 118(C), pages 67-76.
    4. A. S. Lewis, 1996. "Derivatives of Spectral Functions," Mathematics of Operations Research, INFORMS, vol. 21(3), pages 576-588, August.
    5. Unknown, 2005. "Forward," 2005 Conference: Slovenia in the EU - Challenges for Agriculture, Food Science and Rural Affairs, November 10-11, 2005, Moravske Toplice, Slovenia 183804, Slovenian Association of Agricultural Economists (DAES).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chao Kan & Wen Song, 2015. "Second-order conditions for existence of augmented Lagrange multipliers for eigenvalue composite optimization problems," Journal of Global Optimization, Springer, vol. 63(1), pages 77-97, September.
    2. Pilar Lopez-Llompart & G. Mathias Kondolf, 2016. "Encroachments in floodways of the Mississippi River and Tributaries Project," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 81(1), pages 513-542, March.
    3. Michelle Sheran Sylvester, 2007. "The Career and Family Choices of Women: A Dynamic Analysis of Labor Force Participation, Schooling, Marriage and Fertility Decisions," Review of Economic Dynamics, Elsevier for the Society for Economic Dynamics, vol. 10(3), pages 367-399, July.
    4. DAVID M. BLAU & WILBERT van der KLAAUW, 2013. "What Determines Family Structure?," Economic Inquiry, Western Economic Association International, vol. 51(1), pages 579-604, January.
    5. Afanasyev, Dmitriy O. & Fedorova, Elena A. & Popov, Viktor U., 2015. "Fine structure of the price–demand relationship in the electricity market: Multi-scale correlation analysis," Energy Economics, Elsevier, vol. 51(C), pages 215-226.
    6. Peter Viggo Jakobsen, 2009. "Small States, Big Influence: The Overlooked Nordic Influence on the Civilian ESDP," Journal of Common Market Studies, Wiley Blackwell, vol. 47(1), pages 81-102, January.
    7. Billio, Monica & Casarin, Roberto & Osuntuyi, Anthony, 2016. "Efficient Gibbs sampling for Markov switching GARCH models," Computational Statistics & Data Analysis, Elsevier, vol. 100(C), pages 37-57.
    8. Jan Babecký & Fabrizio Coricelli & Roman Horváth, 2009. "Assessing Inflation Persistence: Micro Evidence on an Inflation Targeting Economy," Czech Journal of Economics and Finance (Finance a uver), Charles University Prague, Faculty of Social Sciences, vol. 59(2), pages 102-127, June.
    9. Lloyd, S. P., 2017. "Unconventional Monetary Policy and the Interest Rate Channel: Signalling and Portfolio Rebalancing," Cambridge Working Papers in Economics 1735, Faculty of Economics, University of Cambridge.
    10. Ichiro Fukunaga, 2007. "Imperfect Common Knowledge, Staggered Price Setting, and the Effects of Monetary Policy," Journal of Money, Credit and Banking, Blackwell Publishing, vol. 39(7), pages 1711-1739, October.
    11. Albertazzi, Ugo & Gambacorta, Leonardo, 2009. "Bank profitability and the business cycle," Journal of Financial Stability, Elsevier, vol. 5(4), pages 393-409, December.
    12. Beck, Thorsten & Demirgüç-Kunt, Asli & Merrouche, Ouarda, 2013. "Islamic vs. conventional banking: Business model, efficiency and stability," Journal of Banking & Finance, Elsevier, vol. 37(2), pages 433-447.
    13. Jinho Bae & Chang-Jin Kim & Dong Kim, 2012. "The evolution of the monetary policy regimes in the U.S," Empirical Economics, Springer, vol. 43(2), pages 617-649, October.
    14. McMahon, Rob, 2020. "Co-developing digital inclusion policy and programming with indigenous partners: Interventions from Canada," Internet Policy Review: Journal on Internet Regulation, Alexander von Humboldt Institute for Internet and Society (HIIG), Berlin, vol. 9(2), pages 1-26.
    15. George W. Evans & Seppo Honkapohja, 2009. "Robust Learning Stability with Operational Monetary Policy Rules," Central Banking, Analysis, and Economic Policies Book Series, in: Klaus Schmidt-Hebbel & Carl E. Walsh & Norman Loayza (Series Editor) & Klaus Schmidt-Hebbel (Series (ed.),Monetary Policy under Uncertainty and Learning, edition 1, volume 13, chapter 5, pages 145-170, Central Bank of Chile.
    16. Lehtonen, Heikki & Kujala, Sanna, 2007. "Climate change impacts on crop risks and agricultural production in Finland," 101st Seminar, July 5-6, 2007, Berlin Germany 9259, European Association of Agricultural Economists.
    17. Michael Pomerleano, 2011. "Developing Regional Financial Markets – the Case of East Asia," Chapters, in: Ulrich Volz (ed.), Regional Integration, Economic Development and Global Governance, chapter 9, Edward Elgar Publishing.
    18. Gary Charness & Francesco Feri & Miguel A. Meléndez-Jiménez & Matthias Sutter, 2023. "An Experimental Study on the Effects of Communication, Credibility, and Clustering in Network Games," The Review of Economics and Statistics, MIT Press, vol. 105(6), pages 1530-1543, November.
    19. Kitsul, Yuriy & Wright, Jonathan H., 2013. "The economics of options-implied inflation probability density functions," Journal of Financial Economics, Elsevier, vol. 110(3), pages 696-711.
    20. Dieter Balkenborg & Rosemarie Nagel, 2016. "An Experiment on Forward vs. Backward Induction: How Fairness and Level k Reasoning Matter," German Economic Review, Verein für Socialpolitik, vol. 17(3), pages 378-408, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:169:y:2022:i:c:s0167947322000032. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.