IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v169y2022ics0167947322000032.html
   My bibliography  Save this article

Low-rank matrix denoising for count data using unbiased Kullback-Leibler risk estimation

Author

Listed:
  • Bigot, Jérémie
  • Deledalle, Charles

Abstract

Many statistical studies are concerned with the analysis of observations organized in a matrix form whose elements are count data. When these observations are assumed to follow a Poisson or a multinomial distribution, it is of interest to focus on the estimation of either the intensity matrix (Poisson case) or the compositional matrix (multinomial case) when it is assumed to have a low rank structure. In this setting, it is proposed to construct an estimator minimizing the regularized negative log-likelihood by a nuclear norm penalty. Such an approach easily yields a low-rank matrix-valued estimator with positive entries which belongs to the set of row-stochastic matrices in the multinomial case. Then, as a main contribution, a data-driven procedure is constructed to select the regularization parameter in the construction of such estimators by minimizing (approximately) unbiased estimates of the Kullback-Leibler (KL) risk in such models, which generalize Stein's unbiased risk estimation originally proposed for Gaussian data. The evaluation of these quantities is a delicate problem, and novel methods are introduced to obtain accurate numerical approximation of such unbiased estimates. Simulated data are used to validate this way of selecting regularizing parameters for low-rank matrix estimation from count data. For data following a multinomial distribution, the performances of this approach are also compared to K-fold cross-validation. Examples from a survey study and metagenomics also illustrate the benefits of this methodology for real data analysis.

Suggested Citation

  • Bigot, Jérémie & Deledalle, Charles, 2022. "Low-rank matrix denoising for count data using unbiased Kullback-Leibler risk estimation," Computational Statistics & Data Analysis, Elsevier, vol. 169(C).
  • Handle: RePEc:eee:csdana:v:169:y:2022:i:c:s0167947322000032
    DOI: 10.1016/j.csda.2022.107423
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947322000032
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2022.107423?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Unknown, 2005. "Forward," 2005 Conference: Slovenia in the EU - Challenges for Agriculture, Food Science and Rural Affairs, November 10-11, 2005, Moravske Toplice, Slovenia 183804, Slovenian Association of Agricultural Economists (DAES).
    2. A. S. Lewis, 1996. "Derivatives of Spectral Functions," Mathematics of Operations Research, INFORMS, vol. 21(3), pages 576-588, August.
    3. Yuanpei Cao & Anru Zhang & Hongzhe Li, 2020. "Multisample estimation of bacterial composition matrices in metagenomics data," Biometrika, Biometrika Trust, vol. 107(1), pages 75-92.
    4. Robin, Geneviève & Josse, Julie & Moulines, Éric & Sardy, Sylvain, 2019. "Low-rank model with covariates for count data with missing values," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 416-434.
    5. Shabalin, Andrey A. & Nobel, Andrew B., 2013. "Reconstruction of a low-rank matrix in the presence of Gaussian noise," Journal of Multivariate Analysis, Elsevier, vol. 118(C), pages 67-76.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Kim, Kipoong & Park, Jaesung & Jung, Sungkyu, 2024. "Principal component analysis for zero-inflated compositional data," Computational Statistics & Data Analysis, Elsevier, vol. 198(C).
    2. Li, Xiao & Matsuda, Takeru & Komaki, Fumiyasu, 2024. "Empirical Bayes Poisson matrix completion," Computational Statistics & Data Analysis, Elsevier, vol. 197(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chao Kan & Wen Song, 2015. "Second-order conditions for existence of augmented Lagrange multipliers for eigenvalue composite optimization problems," Journal of Global Optimization, Springer, vol. 63(1), pages 77-97, September.
    2. Christopher Adam & Stephen O’Connell & Edward Buffie, 2007. "Monetary Policy Rules For Manging Aid Surges In Africa," WEF Working Papers 0016, ESRC World Economy and Finance Research Programme, Birkbeck, University of London.
    3. Nida Çakır Melek & Troy Davig & Jun Nie & Andrew Lee Smith & Didem Tuzemen, 2015. "Evaluating a year of oil price volatility," Macro Bulletin, Federal Reserve Bank of Kansas City, pages 1-3, September.
    4. Wakefield, Robin, 2008. "Networks of accounting research: A citation-based structural and network analysis," The British Accounting Review, Elsevier, vol. 40(3), pages 228-244.
    5. Pilar Lopez-Llompart & G. Mathias Kondolf, 2016. "Encroachments in floodways of the Mississippi River and Tributaries Project," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 81(1), pages 513-542, March.
    6. Fucai Lu & Wei He & Yang Cheng & Sihua Chen & Liang Ning & Xiaoan Mei, 2015. "Exploring the Upgrading of Chinese Automotive Manufacturing Industry in the Global Value Chain: An Empirical Study Based on Panel Data," Sustainability, MDPI, vol. 7(5), pages 1-23, May.
    7. Menzies Gordon Douglas & Zizzo Daniel John, 2009. "Inferential Expectations," The B.E. Journal of Macroeconomics, De Gruyter, vol. 9(1), pages 1-27, December.
    8. Cheng, Jianquan & Bertolini, Luca, 2013. "Measuring urban job accessibility with distance decay, competition and diversity," Journal of Transport Geography, Elsevier, vol. 30(C), pages 100-109.
    9. M. De Donno & M. Pratelli, 2006. "A theory of stochastic integration for bond markets," Papers math/0602532, arXiv.org.
    10. Prilly Oktoviany & Robert Knobloch & Ralf Korn, 2021. "A machine learning-based price state prediction model for agricultural commodities using external factors," Decisions in Economics and Finance, Springer;Associazione per la Matematica, vol. 44(2), pages 1063-1085, December.
    11. Michelle Sheran Sylvester, 2007. "The Career and Family Choices of Women: A Dynamic Analysis of Labor Force Participation, Schooling, Marriage and Fertility Decisions," Review of Economic Dynamics, Elsevier for the Society for Economic Dynamics, vol. 10(3), pages 367-399, July.
    12. Henrekson, Magnus & Johansson, Dan, 2010. "Firm Growth, Institutions and Structural Transformation," Ratio Working Papers 150, The Ratio Institute.
    13. Kim, Seonghoon & Deng, Quheng & Fleisher, Belton M. & Li, Shi, 2014. "The Lasting Impact of Parental Early Life Malnutrition on Their Offspring: Evidence from the China Great Leap Forward Famine," World Development, Elsevier, vol. 54(C), pages 232-242.
    14. Karen K. Lewis, 2011. "Global Asset Pricing," Annual Review of Financial Economics, Annual Reviews, vol. 3(1), pages 435-466, December.
    15. DAVID M. BLAU & WILBERT van der KLAAUW, 2013. "What Determines Family Structure?," Economic Inquiry, Western Economic Association International, vol. 51(1), pages 579-604, January.
    16. repec:spo:wpmain:info:hdl:2441/1482 is not listed on IDEAS
    17. Barbara Kotschwar & Kevin Stahler, 2016. "Level the Playing Field to Bolster the Boardroom: Sports as a Springboard for Women's Labor Force Advancement in Asia," Asian Economic Policy Review, Japan Center for Economic Research, vol. 11(1), pages 117-134, January.
    18. D. (Derek) Bond & Michael J. Harrison & Edward J. (Edward Joseph) O'Brien, 2009. "Exploring long memory and nonlinearity in Irish real exchange Rates using tests based on semiparametric estimation," Working Papers 200901, School of Economics, University College Dublin.
    19. Michele Cavallo & Marco Del Negro & W. Scott Frame & Jamie Grasing & Benjamin A. Malin & Carlo Rosa, 2019. "Fiscal Implications of the Federal Reserve's Balance Sheet Normalization," International Journal of Central Banking, International Journal of Central Banking, vol. 15(5), pages 255-306, December.
    20. Martha Jiménez García, 2019. "The Impact of Information and Communication Technologies on Economic Growth in Mexico," International Journal of Business and Social Research, MIR Center for Socio-Economic Research, vol. 9(2), pages 11-22, February.
    21. Bonfim, Diana, 2009. "Credit risk drivers: Evaluating the contribution of firm level information and of macroeconomic dynamics," Journal of Banking & Finance, Elsevier, vol. 33(2), pages 281-299, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:169:y:2022:i:c:s0167947322000032. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.