IDEAS home Printed from https://ideas.repec.org/a/jss/jstsof/v066c01.html
   My bibliography  Save this article

Testing Goodness-of-Fit with the Kernel Density Estimator: GoFKernel

Author

Listed:
  • Pavia, Jose M.

Abstract

To assess the goodness-of-fit of a sample to a continuous random distribution, the most popular approach has been based on measuring, using either L∞ - or L2 -norms, the distance between the null hypothesis cumulative distribution function and the empirical cumulative distribution function. Indeed, as far as I know, almost all the tests currently available in R related to this issue (ks.test in package stats, ad.test in package ADGofTest, and ad.test, ad2.test, ks.test, v.test and w2.test in package truncgof) use one of these two distances on cumulative distribution functions. This paper (i) proposes dgeometric.test, a new implementation of the test that measures the discrepancy between a sample kernel estimate of the density function and the null hypothesis density function on the L1 -norm, (ii) introduces the GoFKernel package, and (iii) performs a large simulation exercise to assess the calibration and sensitivity of the above listed tests as well as the Fan's test (Fan'94), fan.test, also implemented in the GoFKernel package. In addition to dgeometric.test and fan.test, the GoFKernel package adds a couple of functions that R users might also find of interest: density.reflected extends density, allowing the computation of consistent kernel density estimates for bounded random variables, and random.function offers an ad-hoc and universal (although computational expensive and potentially inaccurate for long tail distributions) sampling method. In light of the simulation results, we can conclude that (i) the tests implemented in the truncgof package should not be used to assess goodness-of-fit (at least for non-truncated distributions), (ii) the test fan.test shows an over-tendency to not reject the null hypothesis, being visibly miscalibrated (at least in its default option, where the bandwidth parameter is estimated using dpik from package KernSmooth), (iii) the tests ks.test and ad.test show similar power, with ad.test being slightly preferable in large samples, and (iv) dgeometric.test represents a good alternative given its satisfactory calibration and its, in general, superior power in samples of medium and large sizes. As a counterpart it entails more computational burden when the random generator of the null hypothesis density function is not available in R and random.function must be used.

Suggested Citation

  • Pavia, Jose M., 2015. "Testing Goodness-of-Fit with the Kernel Density Estimator: GoFKernel," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 66(c01).
  • Handle: RePEc:jss:jstsof:v:066:c01
    DOI: http://hdl.handle.net/10.18637/jss.v066.c01
    as

    Download full text from publisher

    File URL: https://www.jstatsoft.org/index.php/jss/article/view/v066c01/v66c01.pdf
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v066c01/GoFKernel_2.0-6.tar.gz
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v066c01/v66c01.R
    Download Restriction: no

    File URL: https://libkey.io/http://hdl.handle.net/10.18637/jss.v066.c01?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Ricardo Cao & Gábor Lugosi, 2005. "Goodness‐of‐fit Tests Based on the Kernel Density Estimator," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 32(4), pages 599-616, December.
    2. Vexler, Albert & Gurevich, Gregory, 2010. "Empirical likelihood ratios applied to goodness-of-fit tests based on sample entropy," Computational Statistics & Data Analysis, Elsevier, vol. 54(2), pages 531-545, February.
    3. Bruce G. Lindsay & Marianthi Markatou & Surajit Ray, 2014. "Kernels, Degrees of Freedom, and Power Properties of Quadratic Distance Goodness-of-Fit Tests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(505), pages 395-410, March.
    4. Fan, Yanqin, 1998. "Goodness-Of-Fit Tests Based On Kernel Density Estimators With Fixed Smoothing Parameters," Econometric Theory, Cambridge University Press, vol. 14(5), pages 604-621, October.
    5. Fan, Yanqin, 1994. "Testing the Goodness of Fit of a Parametric Density Function by Kernel Method," Econometric Theory, Cambridge University Press, vol. 10(2), pages 316-356, June.
    6. Miecznikowski, Jeffrey & Vexler, Albert & Shepherd, Lori, 2013. "dbEmpLikeGOF: An R Package for Nonparametric Likelihood Ratio Tests for Goodness-of-Fit and Two-Sample Comparisons Based on Sample Entropy," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 54(i03).
    7. Qi Li & Jeffrey Scott Racine, 2006. "Nonparametric Econometrics: Theory and Practice," Economics Books, Princeton University Press, edition 1, volume 1, number 8355.
    8. Anderson, N. H. & Hall, P. & Titterington, D. M., 1994. "Two-Sample Test Statistics for Measuring Discrepancies Between Two Multivariate Probability Density Functions Using Kernel-Based Density Estimates," Journal of Multivariate Analysis, Elsevier, vol. 50(1), pages 41-54, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Gabriel Lang & Eric Marcon & Florence Puech, 2020. "Distance-based measures of spatial concentration: introducing a relative density function," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 64(2), pages 243-265, April.
    2. Sanku Dey & Ahmed Elshahhat & Mazen Nassar, 2023. "Analysis of progressive type-II censored gamma distribution," Computational Statistics, Springer, vol. 38(1), pages 481-508, March.
    3. Ahmed Elshahhat & Mazen Nassar, 2021. "Bayesian survival analysis for adaptive Type-II progressive hybrid censored Hjorth data," Computational Statistics, Springer, vol. 36(3), pages 1965-1990, September.
    4. Refah Alotaibi & Mazen Nassar & Ahmed Elshahhat, 2022. "Computational Analysis of XLindley Parameters Using Adaptive Type-II Progressive Hybrid Censoring with Applications in Chemical Engineering," Mathematics, MDPI, vol. 10(18), pages 1-24, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Graciela Boente & Daniela Rodriguez & Wenceslao González Manteiga, 2014. "Goodness-of-fit Test for Directional Data," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 41(1), pages 259-275, March.
    2. Martínez-Camblor, Pablo & de Uña-Álvarez, Jacobo, 2009. "Non-parametric k-sample tests: Density functions vs distribution functions," Computational Statistics & Data Analysis, Elsevier, vol. 53(9), pages 3344-3357, July.
    3. Bagkavos, Dimitrios & Patil, Prakash N., 2023. "Goodness-of-fit testing for normal mixture densities," Computational Statistics & Data Analysis, Elsevier, vol. 188(C).
    4. Wenceslao González-Manteiga & Rosa Crujeiras, 2013. "An updated review of Goodness-of-Fit tests for regression models," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 22(3), pages 361-411, September.
    5. Carlos Tenreiro, 2022. "On automatic kernel density estimate-based tests for goodness-of-fit," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(3), pages 717-748, September.
    6. Tenreiro, Carlos, 2009. "On the choice of the smoothing parameter for the BHEP goodness-of-fit test," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1038-1053, February.
    7. Scaillet, Olivier, 2007. "Kernel-based goodness-of-fit tests for copulas with fixed smoothing parameters," Journal of Multivariate Analysis, Elsevier, vol. 98(3), pages 533-543, March.
    8. Marcelo Fernandes & Eduardo Mendes & Olivier Scaillet, 2015. "Testing for symmetry and conditional symmetry using asymmetric kernels," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 67(4), pages 649-671, August.
    9. Delgado, Miguel A. & Song, Xiaojun, 2018. "Nonparametric tests for conditional symmetry," Journal of Econometrics, Elsevier, vol. 206(2), pages 447-471.
    10. Jiménez Gamero, M.D. & Muñoz García, J. & Pino Mejías, R., 2005. "Testing goodness of fit for the distribution of errors in multivariate linear models," Journal of Multivariate Analysis, Elsevier, vol. 95(2), pages 301-322, August.
    11. Bagkavos, Dimitrios & Patil, Prakash N. & Wood, Andrew T.A., 2023. "Nonparametric goodness-of-fit testing for a continuous multivariate parametric model," Journal of Multivariate Analysis, Elsevier, vol. 196(C).
    12. Juan Carlos Pardo-Fernández & María Dolores Jiménez-Gamero & Anouar El Ghouch, 2015. "A Non-parametric ANOVA-type Test for Regression Curves Based on Characteristic Functions," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 42(1), pages 197-213, March.
    13. Li, Qi & Maasoumi, Esfandiar & Racine, Jeffrey S., 2009. "A nonparametric test for equality of distributions with mixed categorical and continuous data," Journal of Econometrics, Elsevier, vol. 148(2), pages 186-200, February.
    14. Meryem Duygun & Silvia Pazzi & Emili Tortosa-Ausina & Simona Zambelli, 2014. "Does local public ownership matter for the efficiency of water utilities? Evidence from Italy," Working Papers 2014/21, Economics Department, Universitat Jaume I, Castellón (Spain).
    15. Pablo Martínez-Camblor & Jacobo Uña-Álvarez, 2013. "Studying the bandwidth in $$k$$ -sample smooth tests," Computational Statistics, Springer, vol. 28(2), pages 875-892, April.
    16. Henze, N. & Klar, B. & Zhu, L. X., 2005. "Checking the adequacy of the multivariate semiparametric location shift model," Journal of Multivariate Analysis, Elsevier, vol. 93(2), pages 238-256, April.
    17. Hadi Alizadeh Noughabi & Albert Vexler, 2016. "An efficient correction to the density-based empirical likelihood ratio goodness-of-fit test for the inverse Gaussian distribution," Journal of Applied Statistics, Taylor & Francis Journals, vol. 43(16), pages 2988-3003, December.
    18. Qi Li & Jeffrey Scott Racine, 2006. "Nonparametric Econometrics: Theory and Practice," Economics Books, Princeton University Press, edition 1, volume 1, number 8355.
    19. Hart, Jeffrey D. & Choi, Taeryon & Yi, Seongbaek, 2016. "Frequentist nonparametric goodness-of-fit tests via marginal likelihood ratios," Computational Statistics & Data Analysis, Elsevier, vol. 96(C), pages 120-132.
    20. Fernando Antonio Slaibe Postali, 2016. "Oil windfalls and X-inefficiency: evidence from Brazil," Journal of Economic Studies, Emerald Group Publishing Limited, vol. 43(5), pages 699-718, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:jss:jstsof:v:066:c01. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F. Baum (email available below). General contact details of provider: http://www.jstatsoft.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.