IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v162y2021ics016794732100116x.html
   My bibliography  Save this article

Assessing the effective sample size for large spatial datasets: A block likelihood approach

Author

Listed:
  • Acosta, Jonathan
  • Alegría, Alfredo
  • Osorio, Felipe
  • Vallejos, Ronny

Abstract

The development of new techniques for sample size reduction has attracted growing interest in recent decades. Recent findings allow us to quantify the amount of duplicated information within a sample of spatial data through the so-called effective sample size (ESS), whose definition arises from the Fisher information that is associated with maximum likelihood estimation. However, in all circumstances where the sample size is very large, maximum likelihood estimation and ESS evaluation are challenging from a computational viewpoint. An alternative definition of the ESS, in terms of the Godambe information from a block likelihood estimation approach, is presented. Several theoretical properties satisfied by this quantity are investigated. Our proposal is evaluated in some parametric correlation structures, including the intraclass, AR(1), Matérn, and simultaneous autoregressive models. Simulation experiments show that our proposal provides accurate approximations of the full likelihood-based ESS while maintaining a moderate computational cost. A large dataset is analyzed to quantify the effectiveness and limitations of the proposed framework in practice.

Suggested Citation

  • Acosta, Jonathan & Alegría, Alfredo & Osorio, Felipe & Vallejos, Ronny, 2021. "Assessing the effective sample size for large spatial datasets: A block likelihood approach," Computational Statistics & Data Analysis, Elsevier, vol. 162(C).
  • Handle: RePEc:eee:csdana:v:162:y:2021:i:c:s016794732100116x
    DOI: 10.1016/j.csda.2021.107282
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S016794732100116X
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2021.107282?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Litvinenko, Alexander & Sun, Ying & Genton, Marc G. & Keyes, David E., 2019. "Likelihood approximation with hierarchical matrices for large spatial datasets," Computational Statistics & Data Analysis, Elsevier, vol. 137(C), pages 115-132.
    2. Faes, Christel & Molenberghs, Geert & Aerts, Marc & Verbeke, Geert & Kenward, Michael G., 2009. "The Effective Sample Size and an Alternative Small-Sample Degrees-of-Freedom Method," The American Statistician, American Statistical Association, vol. 63(4), pages 389-399.
    3. Matthew J. Heaton & Abhirup Datta & Andrew O. Finley & Reinhard Furrer & Joseph Guinness & Rajarshi Guhaniyogi & Florian Gerber & Robert B. Gramacy & Dorit Hammerling & Matthias Katzfuss & Finn Lindgr, 2019. "A Case Study Competition Among Methods for Analyzing Large Spatial Data," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 24(3), pages 398-425, September.
    4. James Berger & M. J. Bayarri & L. R. Pericchi, 2014. "The Effective Sample Size," Econometric Reviews, Taylor & Francis Journals, vol. 33(1-4), pages 197-217, June.
    5. Xu, Ganggang & Genton, Marc G., 2015. "Efficient maximum approximated likelihood inference for Tukey’s g-and-h distribution," Computational Statistics & Data Analysis, Elsevier, vol. 91(C), pages 78-91.
    6. Bachoc, François & Bevilacqua, Moreno & Velandia, Daira, 2019. "Composite likelihood estimation for a Gaussian process under fixed domain asymptotics," Journal of Multivariate Analysis, Elsevier, vol. 174(C).
    7. Sun, Ying & Chang, Xiaohui & Guan, Yongtao, 2018. "Flexible and efficient estimating equations for variogram estimation," Computational Statistics & Data Analysis, Elsevier, vol. 122(C), pages 45-58.
    8. Ganggang Xu & Marc G. Genton, 2017. "Tukey -and- Random Fields," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(519), pages 1236-1249, July.
    9. Caragea, Petruta C. & Smith, Richard L., 2007. "Asymptotic properties of computationally efficient alternative estimators for a class of multivariate normal models," Journal of Multivariate Analysis, Elsevier, vol. 98(7), pages 1417-1440, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Daniel A. Griffith & Richard E. Plant, 2022. "Statistical Analysis in the Presence of Spatial Autocorrelation: Selected Sampling Strategy Effects," Stats, MDPI, vol. 5(4), pages 1-20, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Caamaño-Carrillo, Christian & Bevilacqua, Moreno & López, Cristian & Morales-Oñate, Víctor, 2024. "Nearest neighbors weighted composite likelihood based on pairs for (non-)Gaussian massive spatial data with an application to Tukey-hh random fields estimation," Computational Statistics & Data Analysis, Elsevier, vol. 191(C).
    2. Huang Huang & Sameh Abdulah & Ying Sun & Hatem Ltaief & David E. Keyes & Marc G. Genton, 2021. "Competition on Spatial Statistics for Large Datasets," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 26(4), pages 580-595, December.
    3. Felipe Tagle & Marc G. Genton & Andrew Yip & Suleiman Mostamandi & Georgiy Stenchikov & Stefano Castruccio, 2020. "A high‐resolution bilevel skew‐t stochastic generator for assessing Saudi Arabia's wind energy resources," Environmetrics, John Wiley & Sons, Ltd., vol. 31(7), November.
    4. Quan Vu & Yi Cao & Josh Jacobson & Alan R. Pearse & Andrew Zammit-Mangion, 2021. "Discussion on “Competition on Spatial Statistics for Large Datasets”," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 26(4), pages 614-618, December.
    5. Morales-Oñate, Víctor & Crudu, Federico & Bevilacqua, Moreno, 2021. "Blockwise Euclidean likelihood for spatio-temporal covariance models," Econometrics and Statistics, Elsevier, vol. 20(C), pages 176-201.
    6. Jialuo Liu & Tingjin Chu & Jun Zhu & Haonan Wang, 2022. "Large spatial data modeling and analysis: A Krylov subspace approach," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(3), pages 1115-1143, September.
    7. W. D. Walls & Jordi McKenzie, 2020. "Black swan models for the entertainment industry with an application to the movie business," Empirical Economics, Springer, vol. 59(6), pages 3019-3032, December.
    8. Marco Bee & Julien Hambuckers & Flavio Santi & Luca Trapin, 2021. "Testing a parameter restriction on the boundary for the g-and-h distribution: a simulated approach," Computational Statistics, Springer, vol. 36(3), pages 2177-2200, September.
    9. Moreno Bevilacqua & Christian Caamaño‐Carrillo & Carlo Gaetan, 2020. "On modeling positive continuous data with spatiotemporal dependence," Environmetrics, John Wiley & Sons, Ltd., vol. 31(7), November.
    10. Matthias Katzfuss & Joseph Guinness & Wenlong Gong & Daniel Zilber, 2020. "Vecchia Approximations of Gaussian-Process Predictions," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 25(3), pages 383-414, September.
    11. Masoud Faridi & Majid Jafari Khaledi, 2022. "The polar-generalized normal distribution: properties, Bayesian estimation and applications," Statistical Papers, Springer, vol. 63(2), pages 571-603, April.
    12. Matthew Reimherr & Xiao‐Li Meng & Dan L. Nicolae, 2021. "Prior sample size extensions for assessing prior impact and prior‐likelihood discordance," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(3), pages 413-437, July.
    13. Hasler Mario, 2013. "Multiple Contrasts for Repeated Measures," The International Journal of Biostatistics, De Gruyter, vol. 9(1), pages 49-61, July.
    14. Isabelle Grenier & Bruno Sansó & Jessica L. Matthews, 2024. "Multivariate nearest‐neighbors Gaussian processes with random covariance matrices," Environmetrics, John Wiley & Sons, Ltd., vol. 35(3), May.
    15. Gonzalo García-Donato & María Eugenia Castellanos & Alicia Quirós, 2021. "Bayesian Variable Selection with Applications in Health Sciences," Mathematics, MDPI, vol. 9(3), pages 1-16, January.
    16. Dorit Hammerling & Brian J. Reich, 2019. "Guest Editors’ Introduction to the Special Issue on “Climate and the Earth System”," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 24(3), pages 395-397, September.
    17. Bhat, Chandra R. & Sener, Ipek N. & Eluru, Naveen, 2010. "A flexible spatially dependent discrete choice model: Formulation and application to teenagers' weekday recreational activity participation," Transportation Research Part B: Methodological, Elsevier, vol. 44(8-9), pages 903-921, September.
    18. Paige, John & Fuglstad, Geir-Arne & Riebler, Andrea & Wakefield, Jon, 2022. "Bayesian multiresolution modeling of georeferenced data: An extension of ‘LatticeKrig’," Computational Statistics & Data Analysis, Elsevier, vol. 173(C).
    19. Denis Allard & Lucia Clarotto & Thomas Opitz & Thomas Romary, 2021. "Discussion on “Competition on Spatial Statistics for Large Datasets”," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 26(4), pages 604-611, December.
    20. Kelly R. Moran & Matthew W. Wheeler, 2022. "Fast increased fidelity samplers for approximate Bayesian Gaussian process regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(4), pages 1198-1228, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:162:y:2021:i:c:s016794732100116x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.