IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v9y2021i21p2717-d665127.html
   My bibliography  Save this article

Nonparametric Multivariate Density Estimation: Case Study of Cauchy Mixture Model

Author

Listed:
  • Tomas Ruzgas

    (Department of Applied Mathematics, Faculty of Mathematics and Natural Sciences, Kaunas University of Technology, 44249 Kaunas, Lithuania)

  • Mantas Lukauskas

    (Department of Applied Mathematics, Faculty of Mathematics and Natural Sciences, Kaunas University of Technology, 44249 Kaunas, Lithuania)

  • Gedmantas Čepkauskas

    (Department of Applied Mathematics, Faculty of Mathematics and Natural Sciences, Kaunas University of Technology, 44249 Kaunas, Lithuania)

Abstract

Estimation of probability density functions (pdf) is considered an essential part of statistical modelling. Heteroskedasticity and outliers are the problems that make data analysis harder. The Cauchy mixture model helps us to cover both of them. This paper studies five different significant types of non-parametric multivariate density estimation techniques algorithmically and empirically. At the same time, we do not make assumptions about the origin of data from any known parametric families of distribution. The method of the inversion formula is made when the cluster of noise is involved in the general mixture model. The effectiveness of the method is demonstrated through a simulation study. The relationship between the accuracy of evaluation and complicated multidimensional Cauchy mixture models (CMM) is analyzed using the Monte Carlo method. For larger dimensions ( d ~ 5) and small samples ( n ~ 50), the adaptive kernel method is recommended. If the sample is n ~ 100, it is recommended to use a modified inversion formula (MIDE). It is better for larger samples with overlapping distributions to use a semi-parametric kernel estimation and more isolated distribution-modified inversion methods. For the mean absolute percentage error, it is recommended to use a semi-parametric kernel estimation when the sample has overlapping distributions. In the smaller dimensions ( d = 2) and a sample is with overlapping distributions, it is recommended to use the semi-parametric kernel method (SKDE) and for isolated distributions, it is recommended to use modified inversion formula (MIDE). The inversion formula algorithm shows that with noise cluster, the results of the inversion formula improved significantly.

Suggested Citation

  • Tomas Ruzgas & Mantas Lukauskas & Gedmantas Čepkauskas, 2021. "Nonparametric Multivariate Density Estimation: Case Study of Cauchy Mixture Model," Mathematics, MDPI, vol. 9(21), pages 1-22, October.
  • Handle: RePEc:gam:jmathe:v:9:y:2021:i:21:p:2717-:d:665127
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/9/21/2717/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/9/21/2717/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Castellana, J. V. & Leadbetter, M. R., 1986. "On smoothed probability density estimation for stationary processes," Stochastic Processes and their Applications, Elsevier, vol. 21(2), pages 179-193, February.
    2. Delgado, Miguel A & Robinson, Peter M, 1992. "Nonparametric and Semiparametric Methods for Economic Research," Journal of Economic Surveys, Wiley Blackwell, vol. 6(3), pages 201-249.
    3. Hyndman, R.J. & Yao, Q., 1998. "Nonparametric Estimation and Symmetry Tests for Conditional Density Functions," Monash Econometrics and Business Statistics Working Papers 17/98, Monash University, Department of Econometrics and Business Statistics.
    4. van der Laan Mark J. & Dudoit Sandrine & Keles Sunduz, 2004. "Asymptotic Optimality of Likelihood-Based Cross-Validation," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 3(1), pages 1-25, March.
    5. Härdle, Wolfgang & Müller, Marlene, 1997. "Multivariate and semiparametric kernel regression," SFB 373 Discussion Papers 1997,26, Humboldt University of Berlin, Interdisciplinary Research Project 373: Quantification and Simulation of Economic Processes.
    6. Marron, J. S. & Nolan, D., 1988. "Canonical kernels for density estimation," Statistics & Probability Letters, Elsevier, vol. 7(3), pages 195-199, December.
    7. Goffe, William L. & Ferrier, Gary D. & Rogers, John, 1994. "Global optimization of statistical functions with simulated annealing," Journal of Econometrics, Elsevier, vol. 60(1-2), pages 65-99.
    8. Pedro Delicado & Manuel del Río, 1999. "A generalization of histogram type estimators," Economics Working Papers 422, Department of Economics and Business, Universitat Pompeu Fabra.
    9. Kooperberg, Charles & Stone, Charles J., 1991. "A study of logspline density estimation," Computational Statistics & Data Analysis, Elsevier, vol. 12(3), pages 327-347, November.
    10. Jonas Rothfuss & Fabio Ferreira & Simon Walther & Maxim Ulrich, 2019. "Conditional Density Estimation with Neural Networks: Best Practices and Benchmarks," Papers 1903.00954, arXiv.org, revised Apr 2019.
    11. Cuevas, Antonio & Febrero, Manuel & Fraiman, Ricardo, 2001. "Cluster analysis: a further approach based on density estimation," Computational Statistics & Data Analysis, Elsevier, vol. 36(4), pages 441-459, June.
    12. Koo, Ja-Yong, 1996. "Bivariate B-splines for tensor logspline density estimation," Computational Statistics & Data Analysis, Elsevier, vol. 21(1), pages 31-42, January.
    13. Teruko Takada, 2001. "Nonparametric density estimation: A comparative study," Economics Bulletin, AccessEcon, vol. 3(16), pages 1-10.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kirkby, J. Lars & Leitao, Álvaro & Nguyen, Duy, 2021. "Nonparametric density estimation and bandwidth selection with B-spline bases: A novel Galerkin method," Computational Statistics & Data Analysis, Elsevier, vol. 159(C).
    2. Koo, Ja-Yong & Kooperberg, Charles, 2000. "Logspline density estimation for binned data," Statistics & Probability Letters, Elsevier, vol. 46(2), pages 133-147, January.
    3. M. M. Salinas-Jimenez, 2003. "Technological change, efficiency gains and capital accumulation in labour productivity growth and convergence: an application to the Spanish regions," Applied Economics, Taylor & Francis Journals, vol. 35(17), pages 1839-1851.
    4. Koo, Ja-Yong, 1998. "Convergence Rates for Logspline Tomography," Journal of Multivariate Analysis, Elsevier, vol. 67(2), pages 367-384, November.
    5. Thomas Baudin & Robert Stelter, 2022. "The rural exodus and the rise of Europe," Journal of Economic Growth, Springer, vol. 27(3), pages 365-414, September.
    6. Luca Benati & Paolo Surico, 2009. "VAR Analysis and the Great Moderation," American Economic Review, American Economic Association, vol. 99(4), pages 1636-1652, September.
    7. Ichimura, Hidehiko & Todd, Petra E., 2007. "Implementing Nonparametric and Semiparametric Estimators," Handbook of Econometrics, in: J.J. Heckman & E.E. Leamer (ed.), Handbook of Econometrics, edition 1, volume 6, chapter 74, Elsevier.
    8. Asgharian, Hossein & Hess, Wolfgang & Liu, Lu, 2013. "A spatial analysis of international stock market linkages," Journal of Banking & Finance, Elsevier, vol. 37(12), pages 4738-4754.
    9. Luca Benati & Paolo Surico, 2008. "Evolving U.S. Monetary Policy and The Decline of Inflation Predictability," Journal of the European Economic Association, MIT Press, vol. 6(2-3), pages 634-646, 04-05.
    10. John M. Abowd & Francis Kramarz & Sébastien Pérez-Duarte & Ian M. Schmutte, 2018. "Sorting Between and Within Industries: A Testable Model of Assortative Matching," Annals of Economics and Statistics, GENES, issue 129, pages 1-32.
    11. Roberto Rocci & Stefano Antonio Gattone & Roberto Di Mari, 2018. "A data driven equivariant approach to constrained Gaussian mixture modeling," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 235-260, June.
    12. Su, Liangjun, 2006. "A simple test for multivariate conditional symmetry," Economics Letters, Elsevier, vol. 93(3), pages 374-378, December.
    13. Qi Li & Juan Lin & Jeffrey S. Racine, 2013. "Optimal Bandwidth Selection for Nonparametric Conditional Distribution and Quantile Functions," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 31(1), pages 57-65, January.
    14. Jason Matthew DeBacker, 2015. "Flip‐Flopping: Ideological Adjustment Costs In The United States Senate," Economic Inquiry, Western Economic Association International, vol. 53(1), pages 108-128, January.
    15. Hu, Shuowen & Poskitt, D.S. & Zhang, Xibin, 2012. "Bayesian adaptive bandwidth kernel density estimation of irregular multivariate distributions," Computational Statistics & Data Analysis, Elsevier, vol. 56(3), pages 732-740.
    16. Luca Benati & Pierpaolo Benigno, 2023. "Gibson s Paradox and the Natural Rate of Interest," Diskussionsschriften dp2303, Universitaet Bern, Departement Volkswirtschaft.
    17. Haan, Peter & Prowse, Victoria L., 2010. "The Design of Unemployment Transfers: Evidence from a Dynamic Structural Life-Cycle Model," IZA Discussion Papers 4792, Institute of Labor Economics (IZA).
    18. Dufour, Jean-Marie, 2006. "Monte Carlo tests with nuisance parameters: A general approach to finite-sample inference and nonstandard asymptotics," Journal of Econometrics, Elsevier, vol. 133(2), pages 443-477, August.
    19. Green, Rikard & Larsson, Karl & Lunina, Veronika & Nilsson, Birger, 2018. "Cross-commodity news transmission and volatility spillovers in the German energy markets," Journal of Banking & Finance, Elsevier, vol. 95(C), pages 231-243.
    20. Kapetanios, George & Marcellino, Massimiliano & Papailias, Fotis, 2016. "Forecasting inflation and GDP growth using heuristic optimisation of information criteria and variable reduction methods," Computational Statistics & Data Analysis, Elsevier, vol. 100(C), pages 369-382.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:9:y:2021:i:21:p:2717-:d:665127. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.