IDEAS home Printed from https://ideas.repec.org/a/kap/jgeosy/v22y2020i2d10.1007_s10109-019-00309-y.html
   My bibliography  Save this article

Strategies to access web-enabled urban spatial data for socioeconomic research using R functions

Author

Listed:
  • Andrés Vallone

    (Universidad Católica del Norte)

  • Coro Chasco

    (Universidad Autónoma de Madrid
    Nebrija University)

  • Beatriz Sánchez

    (Catholic University of Ávila)

Abstract

Since the introduction of the World Wide Web in the 1990s, available information for research purposes has increased exponentially, leading to a significant proliferation of research based on web-enabled data. Nowadays the use of internet-enabled databases, obtained by either primary data online surveys or secondary official and non-official registers, is common. However, information disposal varies depending on data category and country and specifically, the collection of microdata at low geographical level for urban analysis can be a challenge. The most common difficulties when working with secondary web-enabled data can be grouped into two categories: accessibility and availability problems. Accessibility problems are present when the data publication in the servers blocks or delays the download process, which becomes a tedious reiterative task that can produce errors in the construction of big databases. Availability problems usually arise when official agencies restrict access to the information for statistical confidentiality reasons. In order to overcome some of these problems, this paper presents different strategies based on URL parsing, PDF text extraction, and web scraping. A set of functions, which are available under a GPL-2 license, were built in an R package to specifically extract and organize databases at the municipality level (NUTS 5) in Spain for population, unemployment, vehicle fleet, and firm characteristics.

Suggested Citation

  • Andrés Vallone & Coro Chasco & Beatriz Sánchez, 2020. "Strategies to access web-enabled urban spatial data for socioeconomic research using R functions," Journal of Geographical Systems, Springer, vol. 22(2), pages 217-239, April.
  • Handle: RePEc:kap:jgeosy:v:22:y:2020:i:2:d:10.1007_s10109-019-00309-y
    DOI: 10.1007/s10109-019-00309-y
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10109-019-00309-y
    File Function: Abstract
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s10109-019-00309-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Naveen Eluru & Chandra Bhat & Ram Pendyala & Karthik Konduri, 2010. "A joint flexible econometric model system of household residential location and vehicle fleet composition/usage choices," Transportation, Springer, vol. 37(4), pages 603-626, July.
    2. Paskaleva, Krassimira & Cooper, Ian, 2018. "Open innovation and the evaluation of internet-enabled public services in smart cities," Technovation, Elsevier, vol. 78(C), pages 4-14.
    3. Jofre-Monseny, Jordi & Marín-López, Raquel & Viladecans-Marsal, Elisabet, 2011. "The mechanisms of agglomeration: Evidence from the effect of inter-industry relations on the location of new firms," Journal of Urban Economics, Elsevier, vol. 70(2-3), pages 61-74, September.
    4. Nikolaos Papapesios & Claire Ellul & Amanda Shakir & Glen Hart, 2019. "Exploring the use of crowdsourced geographic information in defence: challenges and opportunities," Journal of Geographical Systems, Springer, vol. 21(1), pages 133-160, March.
    5. Braaksma, Barteld & Zeelenberg, Kees, 2015. "“Re-make/Re-model”: Should big data change the modelling paradigm in official statistics?," MPRA Paper 87741, University Library of Munich, Germany.
    6. Mark Graham & Bernie Hogan & Ralph K. Straumann & Ahmed Medhat, 2014. "Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty," Annals of the American Association of Geographers, Taylor & Francis Journals, vol. 104(4), pages 746-764, July.
    7. Benjamin Edelman, 2012. "Using Internet Data for Economic Research," Journal of Economic Perspectives, American Economic Association, vol. 26(2), pages 189-206, Spring.
    8. Josep-Maria Arauzo-Carod & Elisabet Viladecans-Marsal, 2009. "Industrial Location at the Intra-Metropolitan Level: The Role of Agglomeration Economies," Regional Studies, Taylor & Francis Journals, vol. 43(4), pages 545-558.
    9. Abdullah Gök & Alec Waterworth & Philip Shapira, 2015. "Use of web mining in studying innovation," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(1), pages 653-671, January.
    10. Antonio M. Bento & Maureen L. Cropper & Ahmed Mushfiq Mobarak & Katja Vinha, 2005. "The Effects of Urban Spatial Structure on Travel Demand in the United States," The Review of Economics and Statistics, MIT Press, vol. 87(3), pages 466-478, August.
    11. James LeSage, 2015. "Software for Bayesian cross section and panel spatial model comparison," Journal of Geographical Systems, Springer, vol. 17(4), pages 297-310, October.
    12. Josep Maria Arauzo Carod, 2005. "Determinants of industrial location: An application for Catalan municipalities," Papers in Regional Science, Wiley Blackwell, vol. 84(1), pages 105-120, March.
    13. Sana Chaabane & Wassim Jaziri, 2018. "A novel algorithm for fully automated mapping of geospatial ontologies," Journal of Geographical Systems, Springer, vol. 20(1), pages 85-105, January.
    14. Andrea Brandolini & Anthony B. Atkinson, 2001. "Promise and Pitfalls in the Use of "Secondary" Data-Sets: Income Inequality in OECD Countries As a Case Study," Journal of Economic Literature, American Economic Association, vol. 39(3), pages 771-799, September.
    15. Kahn, Matthew E. & Schwartz, Joel, 2008. "Urban air pollution progress despite sprawl: The "greening" of the vehicle fleet," Journal of Urban Economics, Elsevier, vol. 63(3), pages 775-787, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Sławomir Goliszek, 2021. "GIS tools and programming languages for creating models of public and private transport potential accessibility in Szczecin, Poland," Journal of Geographical Systems, Springer, vol. 23(1), pages 115-137, January.
    2. Boegershausen, Johannes & Datta, Hannes & Borah, Abhishek & Stephen, Andrew, 2022. "Fields of Gold: Web Scraping and APIs for Impactful Marketing Insights," Other publications TiSEM 5f1ed70a-48c3-422c-bc10-0, Tilburg University, School of Economics and Management.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Niklas Elert, 2014. "What determines entry? Evidence from Sweden," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 53(1), pages 55-92, August.
    2. Chandra R. Bhat & Rajesh Paleti & Palvinder Singh, 2014. "A Spatial Multivariate Count Model For Firm Location Decisions," Journal of Regional Science, Wiley Blackwell, vol. 54(3), pages 462-502, June.
    3. Coll Martínez, Eva & Arauzo Carod, Josep Maria, 2015. "Creative Industries: a Preliminary Insight to their Location Determinants," Working Papers 2072/250133, Universitat Rovira i Virgili, Department of Economics.
    4. Jordi Jofre-Monseny & Raquel Marín-López & Elisabet Viladecans-Marsal, 2012. "What underlies localization and urbanization economies? Evidence from the location of new firms," Working Papers 2012/9, Institut d'Economia de Barcelona (IEB).
    5. Robert Huang & Matthew E. Kahn, 2024. "Household carbon dioxide emissions Engel Curve dynamics," Contemporary Economic Policy, Western Economic Association International, vol. 42(3), pages 396-415, July.
    6. Blazquez, Desamparados & Domenech, Josep, 2018. "Big Data sources and methods for social and economic analyses," Technological Forecasting and Social Change, Elsevier, vol. 130(C), pages 99-113.
    7. Combes, Pierre-Philippe & Gobillon, Laurent, 2015. "The Empirics of Agglomeration Economies," Handbook of Regional and Urban Economics, in: Gilles Duranton & J. V. Henderson & William C. Strange (ed.), Handbook of Regional and Urban Economics, edition 1, volume 5, chapter 0, pages 247-348, Elsevier.
    8. Calá, Carla Daniela, 2014. "Regional issues on firm entry and exit in Argentina: core and peripheral regions," Nülan. Deposited Documents 2023, Universidad Nacional de Mar del Plata, Facultad de Ciencias Económicas y Sociales, Centro de Documentación.
    9. Coll-Martínez, Eva, 2019. "Creative industries and firm creation: disentangling causal effects through historical cultural associations," INVESTIGACIONES REGIONALES - Journal of REGIONAL RESEARCH, Asociación Española de Ciencia Regional, issue 43, pages 19-39.
    10. Daniel Liviano & Josep-Maria Arauzo-Carod, 2013. "Industrial location and interpretation of zero counts," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 50(2), pages 515-534, April.
    11. Josep-Maria Arauzo-Carod, 2013. "Location Determinants of New Firms: Does Skill Level of Human Capital Really Matter?," Growth and Change, Wiley Blackwell, vol. 44(1), pages 118-148, March.
    12. Kristin Kronenberg, 2013. "Firm relocations in the Netherlands: Why do firms move, and where do they go?," Papers in Regional Science, Wiley Blackwell, vol. 92(4), pages 691-713, November.
    13. Christian Hilber & Charles Palmer, 2014. "Urban development and air pollution: Evidence from a global panel of cities," GRI Working Papers 175, Grantham Research Institute on Climate Change and the Environment.
    14. Liu, Chang & Bardaka, Eleni, 2023. "Transit-induced commercial gentrification: Causal inference through a difference-in-differences analysis of business microdata," Transportation Research Part A: Policy and Practice, Elsevier, vol. 175(C).
    15. Gehringer, Agnieszka & Krenz, Astrid, 2014. "European market integration and the determinants of firm localization: The case of Poland," University of Göttingen Working Papers in Economics 190, University of Goettingen, Department of Economics.
    16. Liviano Solís, Daniel & Arauzo Carod, Josep Maria, 2011. "Industrial Location and Space: New Insights," Working Papers 2072/152137, Universitat Rovira i Virgili, Department of Economics.
    17. Jordi Jofre-Monseny & Raquel Marín-López & Elisabet Viladecans-Marsal, 2014. "The Determinants Of Localization And Urbanization Economies: Evidence From The Location Of New Firms In Spain," Journal of Regional Science, Wiley Blackwell, vol. 54(2), pages 313-337, March.
    18. Eva Coll-Martínez & Josep-Maria Arauzo-Carod, 2017. "Creative milieu and firm location: An empirical appraisal," Environment and Planning A, , vol. 49(7), pages 1613-1641, July.
    19. repec:tur:wpaper:10 is not listed on IDEAS
    20. Christopher Harding & Zachary Patterson & Luis F Miranda-Moreno & Seyed Amir Zahabi, 2014. "A Spatial and Temporal Comparative Analysis of the Effects of Land-Use Clusters on Activity Spaces in Three Quebec Cities," Environment and Planning B, , vol. 41(6), pages 1044-1062, December.
    21. Sabreena Anowar & Naveen Eluru & Luis F. Miranda-Moreno, 2014. "Alternative Modeling Approaches Used for Examining Automobile Ownership: A Comprehensive Review," Transport Reviews, Taylor & Francis Journals, vol. 34(4), pages 441-473, July.

    More about this item

    Keywords

    Web scraping; URL parsing; Spatial microdata; Spain;
    All these keywords.

    JEL classification:

    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • C88 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Other Computer Software
    • R58 - Urban, Rural, Regional, Real Estate, and Transportation Economics - - Regional Government Analysis - - - Regional Development Planning and Policy

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:kap:jgeosy:v:22:y:2020:i:2:d:10.1007_s10109-019-00309-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.