IDEAS home Printed from https://ideas.repec.org/p/osf/socarx/8jbvg.html
   My bibliography  Save this paper

Reading the city through its neighbourhoods: Deep text embeddings of Yelp reviews as a basis for determining similarity and change

Author

Listed:
  • Olson, Alex
  • Calderon-Figueroa, Fernando
  • Bidian, Olimpia
  • Silver, Daniel
  • Sanner, Scott

Abstract

This paper develops novel methods for using Yelp reviews as a window into the collective representations of a city and its neighbourhoods. Basing analysis on social media data such as Yelp is a challenging task because review data is highly sparse and direct analysis may fail to uncover hidden trends. To this end, we propose a deep autoencoder approach for embedding the language of neighbourhood-based business reviews into a reduced dimensional space that facilitates similarity comparison of neighbourhoods and their change over time. Our model improves performance in distinguishing real and fake neighbourhood descriptions derived from real reviews, increasing performance in the task from an average accuracy of 0.46 to 0.77. This improvement in performance indicates that this novel application of embedded language analysis permits us to uncover comparative trends in neighbourhood change through the lens of their venues' reviews, providing a computational methodology for reading a city through its neighbourhoods. The resulting toolkit makes it possible to examine a city's current sociological trends in terms of its neighbourhoods' collective identities.

Suggested Citation

  • Olson, Alex & Calderon-Figueroa, Fernando & Bidian, Olimpia & Silver, Daniel & Sanner, Scott, 2020. "Reading the city through its neighbourhoods: Deep text embeddings of Yelp reviews as a basis for determining similarity and change," SocArXiv 8jbvg, Center for Open Science.
  • Handle: RePEc:osf:socarx:8jbvg
    DOI: 10.31219/osf.io/8jbvg
    as

    Download full text from publisher

    File URL: https://osf.io/download/5fc94e1aaa60b801d9894305/
    Download Restriction: no

    File URL: https://libkey.io/10.31219/osf.io/8jbvg?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Daniel Arribas-Bel & Karima Kourtit & Peter Nijkamp, 2016. "The sociocultural sources of urban buzz," Environment and Planning C, , vol. 34(1), pages 188-204, February.
    2. Edward L. Glaeser & Hyunjin Kim & Michael Luca, 2019. "Nowcasting the Local Economy: Using Yelp Data to Measure Economic Activity," NBER Chapters, in: Big Data for Twenty-First-Century Economic Statistics, pages 249-273, National Bureau of Economic Research, Inc.
    3. Edward L. Glaeser & Hyunjin Kim & Michael Luca, 2018. "Nowcasting Gentrification: Using Yelp Data to Quantify Neighborhood Change," AEA Papers and Proceedings, American Economic Association, vol. 108, pages 77-82, May.
    4. Elizabeth C. Delmelle, 2016. "Mapping the DNA of Urban Neighborhoods: Clustering Longitudinal Sequences of Neighborhood Socioeconomic Change," Annals of the American Association of Geographers, Taylor & Francis Journals, vol. 106(1), pages 36-56, January.
    5. Daniel Arribas-Bel & Jessie Bakens, 2019. "Use and validation of location-based services in urban research: An example with Dutch restaurants," Urban Studies, Urban Studies Journal Limited, vol. 56(5), pages 868-884, April.
    6. Yihong Yuan & Yongmei Lu & T. Edwin Chow & Chao Ye & Abdullatif Alyaqout & Yu Liu, 2020. "The Missing Parts from Social Media–Enabled Smart Cities: Who, Where, When, and What?," Annals of the American Association of Geographers, Taylor & Francis Journals, vol. 110(2), pages 462-475, March.
    7. Balázs Kovács & Glenn R. Carroll & David W. Lehman, 2014. "Authenticity and Consumer Value Ratings: Empirical Tests from the Restaurant Domain," Organization Science, INFORMS, vol. 25(2), pages 458-478, April.
    8. Scott Deerwester & Susan T. Dumais & George W. Furnas & Thomas K. Landauer & Richard Harshman, 1990. "Indexing by latent semantic analysis," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(6), pages 391-407, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Silver, Daniel & Silva, Thiago H, 2021. "Complex Causal Structures of Neighbourhood Change: Evidence From a Functionalist Model and Yelp Data," SocArXiv wprf8, Center for Open Science.
    2. Susan Athey & Michael Luca, 2019. "Economists (and Economics) in Tech Companies," Journal of Economic Perspectives, American Economic Association, vol. 33(1), pages 209-230, Winter.
    3. Dominik Gutt & Philipp Herrmann & Mohammad S. Rahman, 2018. "Crowd-Driven Competitive Intelligence: Understanding the Relationship Between Local Market Competition and Online Rating Distributions," Working Papers Dissertations 41, Paderborn University, Faculty of Business Administration and Economics.
    4. Mohammed Alyakoob & Mohammad S. Rahman, 2022. "Shared Prosperity (or Lack Thereof) in the Sharing Economy," Information Systems Research, INFORMS, vol. 33(2), pages 638-658, June.
    5. Dominik Gutt & Philipp Herrmann & Mohammad S. Rahman, 2019. "Crowd-Driven Competitive Intelligence: Understanding the Relationship Between Local Market Competition and Online Rating Distributions," Information Systems Research, INFORMS, vol. 30(3), pages 980-994, September.
    6. Curci, Ylenia & Mongeau Ospina, Christian A., 2016. "Investigating biofuels through network analysis," Energy Policy, Elsevier, vol. 97(C), pages 60-72.
    7. Morgan Ubeda, 2020. "Local Amenities, Commuting Costs and Income Disparities Within Cities," Working Papers halshs-03082448, HAL.
    8. Chao Wei & Senlin Luo & Xincheng Ma & Hao Ren & Ji Zhang & Limin Pan, 2016. "Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation," PLOS ONE, Public Library of Science, vol. 11(1), pages 1-20, January.
    9. Maksym Polyakov & Morteza Chalak & Md. Sayed Iftekhar & Ram Pandit & Sorada Tapsuwan & Fan Zhang & Chunbo Ma, 2018. "Authorship, Collaboration, Topics, and Research Gaps in Environmental and Resource Economics 1991–2015," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 71(1), pages 217-239, September.
    10. Yong Gao & Yuanyuan Chen & Lan Mu & Shize Gong & Pengcheng Zhang & Yu Liu, 2022. "Measuring urban sentiments from social media data: a dual-polarity metric approach," Journal of Geographical Systems, Springer, vol. 24(2), pages 199-221, April.
    11. Ding, Ying, 2011. "Community detection: Topological vs. topical," Journal of Informetrics, Elsevier, vol. 5(4), pages 498-514.
    12. Juan Shi & Kin Keung Lai & Ping Hu & Gang Chen, 2018. "Factors dominating individual information disseminating behavior on social networking sites," Information Technology and Management, Springer, vol. 19(2), pages 121-139, June.
    13. Ganesh Dash & Chetan Sharma & Shamneesh Sharma, 2023. "Sustainable Marketing and the Role of Social Media: An Experimental Study Using Natural Language Processing (NLP)," Sustainability, MDPI, vol. 15(6), pages 1-16, March.
    14. Paola Cerchiello & Giancarlo Nicola, 2018. "Assessing News Contagion in Finance," Econometrics, MDPI, vol. 6(1), pages 1-19, February.
    15. Song, Hanqun & Yang, Huijun & Ma, Emily, 2022. "Restaurants’ outdoor signs say more than you think: An enquiry from a linguistic landscape perspective," Journal of Retailing and Consumer Services, Elsevier, vol. 68(C).
    16. Shr-Wei Kao & Pin Luarn, 2020. "Topic Modeling Analysis of Social Enterprises: Twitter Evidence," Sustainability, MDPI, vol. 12(8), pages 1-20, April.
    17. Breithaupt, Patrick & Kesler, Reinhold & Niebel, Thomas & Rammer, Christian, 2020. "Intangible capital indicators based on web scraping of social media," ZEW Discussion Papers 20-046, ZEW - Leibniz Centre for European Economic Research.
    18. Gissler, Stefan & Oldfather, Jeremy & Ruffino, Doriana, 2016. "Lending on hold: Regulatory uncertainty and bank lending standards," Journal of Monetary Economics, Elsevier, vol. 81(C), pages 89-101.
    19. Alina Evstigneeva & Mark Sidorovskiy, 2021. "Assessment of Clarity of Bank of Russia Monetary Policy Communication by Neural Network Approach," Russian Journal of Money and Finance, Bank of Russia, vol. 80(3), pages 3-33, September.
    20. Oliver Schilke & Sheen S. Levine & Olenka Kacperczyk & Lynne G. Zucker, 2019. "Call for Papers-Special Issue on Experiments in Organizational Theory," Organization Science, INFORMS, vol. 30(1), pages 232-234, February.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:osf:socarx:8jbvg. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: OSF (email available below). General contact details of provider: https://arabixiv.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.