IDEAS home Printed from https://ideas.repec.org/a/bla/scjsta/v50y2023i3p1391-1419.html
   My bibliography  Save this article

Spatial bootstrapped microeconometrics: Forecasting for out‐of‐sample geo‐locations in big data

Author

Listed:
  • Katarzyna Kopczewska

Abstract

Spatial econometric models estimated on the big geo‐located point data have at least two problems: limited computational capabilities and inefficient forecasting for the new out‐of‐sample geo‐points. This is because of spatial weights matrix W defined for in‐sample observations only and the computational complexity. Machine learning models suffer the same when using kriging for predictions; thus this problem still remains unsolved. The paper presents a novel methodology for estimating spatial models on big data and predicting in new locations. The approach uses bootstrap and tessellation to calibrate both model and space. The best bootstrapped model is selected with the PAM (Partitioning Around Medoids) algorithm by classifying the regression coefficients jointly in a nonindependent manner. Voronoi polygons for the geo‐points used in the best model allow for a representative space division. New out‐of‐sample points are assigned to tessellation tiles and linked to the spatial weights matrix as a replacement for an original point what makes feasible usage of calibrated spatial models as a forecasting tool for new locations. There is no trade‐off between forecast quality and computational efficiency in this approach. An empirical example illustrates a model for business locations and firms' profitability.

Suggested Citation

  • Katarzyna Kopczewska, 2023. "Spatial bootstrapped microeconometrics: Forecasting for out‐of‐sample geo‐locations in big data," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 50(3), pages 1391-1419, September.
  • Handle: RePEc:bla:scjsta:v:50:y:2023:i:3:p:1391-1419
    DOI: 10.1111/sjos.12636
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/sjos.12636
    Download Restriction: no

    File URL: https://libkey.io/10.1111/sjos.12636?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Tim C. Hesterberg, 2015. "What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum," The American Statistician, Taylor & Francis Journals, vol. 69(4), pages 371-386, November.
    2. Katarzyna Kopczewska, 2022. "Spatial machine learning: new opportunities for regional science," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 68(3), pages 713-755, June.
    3. Michel Goulard & Thibault Laurent & Christine Thomas-Agnan, 2017. "About predictions in spatial autoregressive models: optimal and almost optimal strategies," Spatial Economic Analysis, Taylor & Francis Journals, vol. 12(2-3), pages 304-325, July.
    4. Gandomi, Amir & Haider, Murtaza, 2015. "Beyond the hype: Big data concepts, methods, and analytics," International Journal of Information Management, Elsevier, vol. 35(2), pages 137-144.
    5. Arbia, G. & Ghiringhelli, C. & Mira, A., 2019. "Estimation of spatial econometric linear models with large datasets: How big can spatial Big Data be?," Regional Science and Urban Economics, Elsevier, vol. 76(C), pages 67-73.
    6. Matthew J. Heaton & Abhirup Datta & Andrew O. Finley & Reinhard Furrer & Joseph Guinness & Rajarshi Guhaniyogi & Florian Gerber & Robert B. Gramacy & Dorit Hammerling & Matthias Katzfuss & Finn Lindgr, 2019. "A Case Study Competition Among Methods for Analyzing Large Spatial Data," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 24(3), pages 398-425, September.
    7. Flavio Santi & Maria Michela Dickson & Giuseppe Espa & Emanuele Taufer & Andrea Mazzitelli, 2021. "Handling spatial dependence under unknown unit locations," Spatial Economic Analysis, Taylor & Francis Journals, vol. 16(2), pages 194-216, April.
    8. LeSage, James P. & Kelley Pace, R., 2007. "A matrix exponential spatial specification," Journal of Econometrics, Elsevier, vol. 140(1), pages 190-214, September.
    9. Schratz, Patrick & Muenchow, Jannes & Iturritxa, Eugenia & Richter, Jakob & Brenning, Alexander, 2019. "Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data," Ecological Modelling, Elsevier, vol. 406(C), pages 109-120.
    10. Moulton, Lawrence H. & Zeger, Scott L., 1991. "Bootstrapping generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 11(1), pages 53-63, January.
    11. Davide Piacentino & Martina Aronica & Diego Giuliani & Andrea Mazzitelli & Maria Francesca Cracolici, 2021. "The effect of agglomeration economies and geography on the survival of accommodation businesses in Sicily," Spatial Economic Analysis, Taylor & Francis Journals, vol. 16(2), pages 176-193, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Fernando López & Konstatin Kholodilin, 2023. "Putting MARS into space. Non‐linearities and spatial effects in hedonic models," Papers in Regional Science, Wiley Blackwell, vol. 102(4), pages 871-896, August.
    2. Katarzyna Kopczewska, 2022. "Spatial machine learning: new opportunities for regional science," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 68(3), pages 713-755, June.
    3. Juergen Deppner & Marcelo Cajias, 2024. "Accounting for Spatial Autocorrelation in Algorithm-Driven Hedonic Models: A Spatial Cross-Validation Approach," The Journal of Real Estate Finance and Economics, Springer, vol. 68(2), pages 235-273, February.
    4. Matthias Katzfuss & Joseph Guinness & Wenlong Gong & Daniel Zilber, 2020. "Vecchia Approximations of Gaussian-Process Predictions," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 25(3), pages 383-414, September.
    5. Ahmad Ibrahim Aljumah & Mohammed T. Nuseir & Md. Mahmudul Alam, 2021. "Traditional marketing analytics, big data analytics and big data system quality and the success of new product development," Post-Print hal-03538161, HAL.
    6. Cano-Marin, Enrique & Mora-Cantallops, Marçal & Sánchez-Alonso, Salvador, 2023. "Twitter as a predictive system: A systematic literature review," Journal of Business Research, Elsevier, vol. 157(C).
    7. de Camargo Fiorini, Paula & Roman Pais Seles, Bruno Michel & Chiappetta Jabbour, Charbel Jose & Barberio Mariano, Enzo & de Sousa Jabbour, Ana Beatriz Lopes, 2018. "Management theory and big data literature: From a review to a research agenda," International Journal of Information Management, Elsevier, vol. 43(C), pages 112-129.
    8. Amiri, Babak & Karimianghadim, Ramin, 2024. "A novel text clustering model based on topic modelling and social network analysis," Chaos, Solitons & Fractals, Elsevier, vol. 181(C).
    9. Lutfi, Abdalwali & Alrawad, Mahmaod & Alsyouf, Adi & Almaiah, Mohammed Amin & Al-Khasawneh, Ahmad & Al-Khasawneh, Akif Lutfi & Alshira'h, Ahmad Farhan & Alshirah, Malek Hamed & Saad, Mohamed & Ibrahim, 2023. "Drivers and impact of big data analytic adoption in the retail industry: A quantitative investigation applying structural equation modeling," Journal of Retailing and Consumer Services, Elsevier, vol. 70(C).
    10. Samantha Leorato & Maura Mezzetti, 2015. "Spatial Panel Data Model with error dependence: a Bayesian Separable Covariance Approach," CEIS Research Paper 338, Tor Vergata University, CEIS, revised 09 Apr 2015.
    11. Philipp Piribauer & Jesús Crespo Cuaresma, 2016. "Bayesian Variable Selection in Spatial Autoregressive Models," Spatial Economic Analysis, Taylor & Francis Journals, vol. 11(4), pages 457-479, October.
    12. Yan Chen & Youran Qi & Qing Liu & Peter Chien, 2018. "Sequential sampling enhanced composite likelihood approach to estimation of social intercorrelations in large-scale networks," Quantitative Marketing and Economics (QME), Springer, vol. 16(4), pages 409-440, December.
    13. Huang, Danyang & Wang, Feifei & Zhu, Xuening & Wang, Hansheng, 2020. "Two-mode network autoregressive model for large-scale networks," Journal of Econometrics, Elsevier, vol. 216(1), pages 203-219.
    14. Caamaño-Carrillo, Christian & Bevilacqua, Moreno & López, Cristian & Morales-Oñate, Víctor, 2024. "Nearest neighbors weighted composite likelihood based on pairs for (non-)Gaussian massive spatial data with an application to Tukey-hh random fields estimation," Computational Statistics & Data Analysis, Elsevier, vol. 191(C).
    15. Mohamed Gaber & Edward J. Lusk, 2019. "A Vetting Protocol for the Analytical Procedures Platform for the AP-Phase of PCAOB Audits," Accounting and Finance Research, Sciedu Press, vol. 8(4), pages 1-43, November.
    16. Acharya, Abhilash & Singh, Sanjay Kumar & Pereira, Vijay & Singh, Poonam, 2018. "Big data, knowledge co-creation and decision making in fashion industry," International Journal of Information Management, Elsevier, vol. 42(C), pages 90-101.
    17. Arno de Caigny & Kristof Coussement & Koen W. de Bock & Stefan Lessmann, 2019. "Incorporating textual information in customer churn prediction models based on a convolutional neural network," Post-Print hal-02275958, HAL.
    18. Müller, Jonas & Trutnevyte, Evelina, 2020. "Spatial projections of solar PV installations at subnational level: Accuracy testing of regression models," Applied Energy, Elsevier, vol. 265(C).
    19. Debarsy, Nicolas & Jin, Fei & Lee, Lung-fei, 2015. "Large sample properties of the matrix exponential spatial specification with an application to FDI," Journal of Econometrics, Elsevier, vol. 188(1), pages 1-21.
    20. Takafumi Kato, 2020. "Likelihood-based strategies for estimating unknown parameters and predicting missing data in the simultaneous autoregressive model," Journal of Geographical Systems, Springer, vol. 22(1), pages 143-176, January.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:scjsta:v:50:y:2023:i:3:p:1391-1419. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0303-6898 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.