IDEAS home Printed from https://ideas.repec.org/p/zbw/irtgdp/2018013.html
   My bibliography  Save this paper

Improving Crime Count Forecasts Using Twitter and Taxi Data

Author

Listed:
  • Vomfell, Lara
  • Härdle, Wolfgang Karl
  • Lessmann, Stefan

Abstract

Data from social media has created opportunities to understand how and why people move through their urban environment and how this relates to criminal activity. To aid resource allocation decisions in the scope of predictive policing, the paper proposes an approach to predict weekly crime counts. The novel approach captures spatial dependency of criminal activity through approximating human dynamics. It integrates point of interest data in the form of Foursquare venues with Twitter activity and taxi trip data, and introduces a set of approaches to create features from these data sources. Empirical results demonstrate the explanatory and predictive power of the novel features. Analysis of a six-month period of real-world crime data for the city of New York evidences that both temporal and static features are necessary to eectively account for human dynamics and predict crime counts accurately. Furthermore, results provide new evidence into the underlying mechanisms of crime and give implications for crime analysis and intervention.

Suggested Citation

  • Vomfell, Lara & Härdle, Wolfgang Karl & Lessmann, Stefan, 2018. "Improving Crime Count Forecasts Using Twitter and Taxi Data," IRTG 1792 Discussion Papers 2018-013, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
  • Handle: RePEc:zbw:irtgdp:2018013
    as

    Download full text from publisher

    File URL: https://www.econstor.eu/bitstream/10419/230724/1/irtg1792dp2018-013.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Badi H. Baltagi & Bernard Fingleton & Alain Pirotte, 2014. "Estimating and Forecasting with a Dynamic Spatial Panel Data Model," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 76(1), pages 112-138, February.
    2. Hinkle, Joshua C. & Yang, Sue-Ming, 2014. "A New Look into Broken Windows: What Shapes Individuals’ Perceptions of Social Disorder?," Journal of Criminal Justice, Elsevier, vol. 42(1), pages 26-35.
    3. Friedman, Jerome H., 2002. "Stochastic gradient boosting," Computational Statistics & Data Analysis, Elsevier, vol. 38(4), pages 367-378, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Packham, Natalie, 2018. "Optimal contracts under competition when uncertainty from adverse selection and moral hazard are present," IRTG 1792 Discussion Papers 2018-033, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    2. Qingliang Fan & Wei Zhong, 2018. "Nonparametric Additive Instrumental Variable Estimator: A Group Shrinkage Estimation Perspective," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 36(3), pages 388-399, July.
    3. Bram Janssens & Matthias Bogaert & Mathijs Maton, 2023. "Predicting the next Pogačar: a data analytical approach to detect young professional cycling talents," Annals of Operations Research, Springer, vol. 325(1), pages 557-588, June.
    4. Cai, Zongwu & Fang, Ying & Lin, Ming & Su, Jia, 2018. "Inferences for a Partially Varying Coefficient Model With Endogenous Regressors," IRTG 1792 Discussion Papers 2018-047, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    5. Victor Chernozhukov & Wolfgang K. Hardle & Chen Huang & Weining Wang, 2018. "LASSO-Driven Inference in Time and Space," Papers 1806.05081, arXiv.org, revised May 2020.
    6. Wang, Honglin & Yu, Fan & Zhou, Yinggang, 2018. "Property Investment and Rental Rate under Housing Price Uncertainty: A Real Options Approach," IRTG 1792 Discussion Papers 2018-051, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    7. Yan, Ji Gao, 2018. "Complete Convergence and Complete Moment Convergence for Maximal Weighted Sums of Extended Negatively Dependent Random Variables," IRTG 1792 Discussion Papers 2018-040, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    8. Zhong, Wei & Liu, Xi & Ma, Shuangge, 2018. "Variable selection and direction estimation for single-index models via DC-TGDR method," IRTG 1792 Discussion Papers 2018-050, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    9. Guo, Li & Tao, Yubo & Härdle, Wolfgang Karl, 2018. "Understanding Latent Group Structure of Cryptocurrencies Market: A Dynamic Network Perspective," IRTG 1792 Discussion Papers 2018-032, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    10. Kalkbrener, Michael & Packham, Natalie, 2018. "Correlation Under Stress In Normal Variance Mixture Models," IRTG 1792 Discussion Papers 2018-035, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    11. Packham, Natalie & Woebbeking, Fabian, 2018. "A factor-model approach for correlation scenarios and correlation stress-testing," IRTG 1792 Discussion Papers 2018-034, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    12. Chiu, Hsin-Yu & Chiang, Mi-Hsiu & Kuo, Wei-Yu, 2018. "Predicative Ability of Similarity-based Futures Trading Strategies," IRTG 1792 Discussion Papers 2018-045, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    13. Guo, Shaojun & Li, Dong & Li, Muyi, 2018. "Strict Stationarity Testing and GLAD Estimation of Double Autoregressive Models," IRTG 1792 Discussion Papers 2018-049, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    14. Shalak Mendon & Pankaj Dutta & Abhishek Behl & Stefan Lessmann, 2021. "A Hybrid Approach of Machine Learning and Lexicons to Sentiment Analysis: Enhanced Insights from Twitter Data of Natural Disasters," Information Systems Frontiers, Springer, vol. 23(5), pages 1145-1168, September.
    15. Minxuan Lan & Lin Liu & Andres Hernandez & Weiyi Liu & Hanlin Zhou & Zengli Wang, 2019. "The Spillover Effect of Geotagged Tweets as a Measure of Ambient Population for Theft Crime," Sustainability, MDPI, vol. 11(23), pages 1-17, November.
    16. Xiaojia Bao & Qingliang Fan, 2020. "The impact of temperature on gaming productivity: evidence from online games," Empirical Economics, Springer, vol. 58(2), pages 835-867, February.
    17. Packham, Natalie & Kalkbrener, Michael & Overbeck, Ludger, 2018. "Default probabilities and default correlations under stress," IRTG 1792 Discussion Papers 2018-037, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    18. Kuczmaszewska, Anna & Yan, Ji Gao, 2018. "On complete convergence in Marcinkiewicz-Zygmund type SLLN for random variables," IRTG 1792 Discussion Papers 2018-041, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    19. Koziuk, Andzhey & Spokoiny, Vladimir, 2018. "Toolbox: Gaussian comparison on Eucledian balls," IRTG 1792 Discussion Papers 2018-028, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    20. Chen, Haiqiang & Li, Yingxing & Lin, Ming & Zhu, Yanli, 2018. "A Regime Shift Model with Nonparametric Switching Mechanism," IRTG 1792 Discussion Papers 2018-048, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    21. Yatracos, Yannis G., 2018. "Residual'S Influence Index (Rinfin), Bad Leverage And Unmasking In High Dimensional L2-Regression," IRTG 1792 Discussion Papers 2018-060, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    22. Zbonakova, Lenka & Li, Xinjue & Härdle, Wolfgang Karl, 2018. "Penalized Adaptive Forecasting with Large Information Sets and Structural Changes," IRTG 1792 Discussion Papers 2018-039, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mansoor, Umer & Jamal, Arshad & Su, Junbiao & Sze, N.N. & Chen, Anthony, 2023. "Investigating the risk factors of motorcycle crash injury severity in Pakistan: Insights and policy recommendations," Transport Policy, Elsevier, vol. 139(C), pages 21-38.
    2. Atems, Bebonchu, 2013. "The spatial dynamics of growth and inequality: Evidence using U.S. county-level data," Economics Letters, Elsevier, vol. 118(1), pages 19-22.
    3. Mitze, Timo & Makkonen, Teemu, 2023. "Can large-scale RDI funding stimulate post-crisis recovery growth? Evidence for Finland during COVID-19," Technological Forecasting and Social Change, Elsevier, vol. 186(PB).
    4. Bissan Ghaddar & Ignacio Gómez-Casares & Julio González-Díaz & Brais González-Rodríguez & Beatriz Pateiro-López & Sofía Rodríguez-Ballesteros, 2023. "Learning for Spatial Branching: An Algorithm Selection Approach," INFORMS Journal on Computing, INFORMS, vol. 35(5), pages 1024-1043, September.
    5. Giovanni S F Bruno & Enrico Marelli & Marcello Signorelli, 2014. "The Rise of NEET and Youth Unemployment in EU Regions after the Crisis," Comparative Economic Studies, Palgrave Macmillan;Association for Comparative Economic Studies, vol. 56(4), pages 592-615, December.
    6. Akash Malhotra, 2018. "A hybrid econometric-machine learning approach for relative importance analysis: Prioritizing food policy," Papers 1806.04517, arXiv.org, revised Aug 2020.
    7. Wongsa-art, Pipat & Kim, Namhyun & Xia, Yingcun & Moscone, Francesco, 2024. "Varying coefficient panel data models and methods under correlated error components: Application to disparities in mental health services in England," Regional Science and Urban Economics, Elsevier, vol. 106(C).
    8. Rodolfo Metulini, 2013. "Spatial gravity models for international trade: a panel analysis among OECD countries," ERSA conference papers ersa13p522, European Regional Science Association.
    9. Álvarez, Inmaculada C. & Barbero, Javier & Zofío, José L., 2016. "A spatial autoregressive panel model to analyze road network spillovers on production," Transportation Research Part A: Policy and Practice, Elsevier, vol. 93(C), pages 83-92.
    10. Nahushananda Chakravarthy H G & Karthik M Seenappa & Sujay Raghavendra Naganna & Dayananda Pruthviraja, 2023. "Machine Learning Models for the Prediction of the Compressive Strength of Self-Compacting Concrete Incorporating Incinerated Bio-Medical Waste Ash," Sustainability, MDPI, vol. 15(18), pages 1-22, September.
    11. Tim Voigt & Martin Kohlhase & Oliver Nelles, 2021. "Incremental DoE and Modeling Methodology with Gaussian Process Regression: An Industrially Applicable Approach to Incorporate Expert Knowledge," Mathematics, MDPI, vol. 9(19), pages 1-26, October.
    12. Wen, Shaoting & Buyukada, Musa & Evrendilek, Fatih & Liu, Jingyong, 2020. "Uncertainty and sensitivity analyses of co-combustion/pyrolysis of textile dyeing sludge and incense sticks: Regression and machine-learning models," Renewable Energy, Elsevier, vol. 151(C), pages 463-474.
    13. Zhu, Haibin & Bai, Lu & He, Lidan & Liu, Zhi, 2023. "Forecasting realized volatility with machine learning: Panel data perspective," Journal of Empirical Finance, Elsevier, vol. 73(C), pages 251-271.
    14. Spiliotis, Evangelos & Makridakis, Spyros & Kaltsounis, Anastasios & Assimakopoulos, Vassilios, 2021. "Product sales probabilistic forecasting: An empirical evaluation using the M5 competition data," International Journal of Production Economics, Elsevier, vol. 240(C).
    15. Zhang, Ning & Li, Zhiying & Zou, Xun & Quiring, Steven M., 2019. "Comparison of three short-term load forecast models in Southern California," Energy, Elsevier, vol. 189(C).
    16. Smyl, Slawek & Hua, N. Grace, 2019. "Machine learning methods for GEFCom2017 probabilistic load forecasting," International Journal of Forecasting, Elsevier, vol. 35(4), pages 1424-1431.
    17. Barzin,Samira & Avner,Paolo & Maruyama Rentschler,Jun Erik & O’Clery,Neave, 2022. "Where Are All the Jobs ? A Machine Learning Approach for High Resolution Urban Employment Prediction inDeveloping Countries," Policy Research Working Paper Series 9979, The World Bank.
    18. Atems, Bebonchu, 2015. "Another look at tax policy and state economic growth: The long-run and short-run of it," Economics Letters, Elsevier, vol. 127(C), pages 64-67.
    19. Eike Emrich & Christian Pierdzioch, 2016. "Volunteering, Match Quality, and Internet Use," Schmollers Jahrbuch : Journal of Applied Social Science Studies / Zeitschrift für Wirtschafts- und Sozialwissenschaften, Duncker & Humblot, Berlin, vol. 136(2), pages 199-226.
    20. Bernard Fingleton, 2014. "Forecasting with dynamic spatial panel data: practical implementation methods," Economics and Business Letters, Oviedo University Press, vol. 3(4), pages 194-207.

    More about this item

    Keywords

    Predictive Policing; Crime Forecasting; Social Media Data; Spatial Econometrics;
    All these keywords.

    JEL classification:

    • C00 - Mathematical and Quantitative Methods - - General - - - General

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:zbw:irtgdp:2018013. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ZBW - Leibniz Information Centre for Economics (email available below). General contact details of provider: https://edirc.repec.org/data/wfhubde.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.