IDEAS home Printed from https://ideas.repec.org/a/spr/infosf/v24y2022i1d10.1007_s10796-020-10072-x.html
   My bibliography  Save this article

Analyzing the Quality of Twitter Data Streams

Author

Listed:
  • Franco Arolfo

    (Instituto Tecnológico de Buenos Aires Lavardén 315)

  • Kevin Cortés Rodriguez

    (Instituto Tecnológico de Buenos Aires Lavardén 315)

  • Alejandro Vaisman

    (Instituto Tecnológico de Buenos Aires Lavardén 315)

Abstract

There is a general belief that the quality of Twitter data streams is generally low and unpredictable, making, in some way, unreliable to take decisions based on such data. The work presented here addresses this problem from a Data Quality (DQ) perspective, adapting the traditional methods used in relational databases, based on quality dimensions and metrics, to capture the characteristics of Twitter data streams in particular, and of Big Data in a more general sense. Therefore, as a first contribution, this paper re-defines the classic DQ dimensions and metrics for the scenario under study. Second, the paper introduces a software tool that allows capturing Twitter data streams in real time, computing their DQ and displaying the results through a wide variety of graphics. As a third contribution of this paper, using the aforementioned machinery, a thorough analysis of the DQ of Twitter streams is performed, based on four dimensions: Readability, Completeness, Usefulness, and Trustworthiness. These dimensions are studied for several different cases, namely unfiltered data streams, data streams filtered using a collection of keywords, and classifying tweets referring to different topics, studying the DQ for each topic. Further, although it is well known that the number of geolocalized tweets is very low, the paper studies the DQ of tweets with respect to the place from where they are posted. Last but not least, the tool allows changing the weights of each quality dimension considered in the computation of the overall data quality of a tweet. This allows defining weights that fit different analysis contexts and/or different user profiles. Interestingly, this study reveals that the quality of Twitter streams is higher than what would have been expected.

Suggested Citation

  • Franco Arolfo & Kevin Cortés Rodriguez & Alejandro Vaisman, 2022. "Analyzing the Quality of Twitter Data Streams," Information Systems Frontiers, Springer, vol. 24(1), pages 349-369, February.
  • Handle: RePEc:spr:infosf:v:24:y:2022:i:1:d:10.1007_s10796-020-10072-x
    DOI: 10.1007/s10796-020-10072-x
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10796-020-10072-x
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10796-020-10072-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Amir Hassan Zadeh & Hamed M. Zolbanin & Ramesh Sharda & Dursun Delen, 2019. "Social Media for Nowcasting Flu Activity: Spatio-Temporal Big Data Analysis," Information Systems Frontiers, Springer, vol. 21(4), pages 743-760, August.
    2. Roman Lukyanenko & Andrea Wiggins & Holly K. Rosser, 0. "Citizen Science: An Information Quality Research Frontier," Information Systems Frontiers, Springer, vol. 0, pages 1-23.
    3. Wei-Lun Chang & Yi-Pei Chen, 2019. "Way too sentimental? a credible model for online reviews," Information Systems Frontiers, Springer, vol. 21(2), pages 453-468, April.
    4. Roman Lukyanenko & Andrea Wiggins & Holly K. Rosser, 2020. "Citizen Science: An Information Quality Research Frontier," Information Systems Frontiers, Springer, vol. 22(4), pages 961-983, August.
    5. Babak Abedin & Abdul Babar, 2018. "Institutional vs. Non-institutional use of Social Media during Emergency Response: A Case of Twitter in 2014 Australian Bush Fire," Information Systems Frontiers, Springer, vol. 20(4), pages 729-740, August.
    6. Hua (Jonathan) Ye & Cecil Eng Huang Chua & Jun Sun, 2019. "Enhancing mobile data services performance via online reviews," Information Systems Frontiers, Springer, vol. 21(2), pages 441-452, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Irina Wedel & Michael Palk & Stefan Voß, 2022. "A Bilingual Comparison of Sentiment and Topics for a Product Event on Twitter," Information Systems Frontiers, Springer, vol. 24(5), pages 1635-1646, October.
    2. Osuji E. & Evans O., 2020. "Tourism Effects of Pandemics: New Insights from Novel Coronavirus," SPOUDAI Journal of Economics and Business, SPOUDAI Journal of Economics and Business, University of Piraeus, vol. 70(3-4), pages 56-65, July-Dece.
    3. Ashish Kumar Jha & Indranil Bose, 2021. "Linking Drivers and Outcomes of Innovation in IT Firms: The Role of Partnerships," Information Systems Frontiers, Springer, vol. 23(6), pages 1593-1607, December.
    4. Judita Peterlin & Maja Meško & Vlado Dimovski & Vasja Roblek, 2021. "Automated content analysis: The review of the big data systemic discourse in tourism and hospitality," Systems Research and Behavioral Science, Wiley Blackwell, vol. 38(3), pages 377-385, May.
    5. Yanxin Wang & Jian Li & Xi Zhao & Gengzhong Feng & Xin (Robert) Luo, 2020. "Using Mobile Phone Data for Emergency Management: a Systematic Literature Review," Information Systems Frontiers, Springer, vol. 22(6), pages 1539-1559, December.
    6. Fatuma Namisango & Kyeong Kang & Ghassan Beydoun, 2022. "How the Structures Provided by Social Media Enable Collaborative Outcomes: A Study of Service Co-creation in Nonprofits," Information Systems Frontiers, Springer, vol. 24(2), pages 517-535, April.
    7. Doruk Şen & Cem Çağrı Dönmez & Umman Mahir Yıldırım, 2020. "A Hybrid Bi-level Metaheuristic for Credit Scoring," Information Systems Frontiers, Springer, vol. 22(5), pages 1009-1019, October.
    8. Liu, Hongfei & Jayawardhena, Chanaka & Osburg, Victoria-Sophie & Yoganathan, Vignesh & Cartwright, Severina, 2021. "Social sharing of consumption emotion in electronic word of mouth (eWOM): A cross-media perspective," Journal of Business Research, Elsevier, vol. 132(C), pages 208-220.
    9. Qi Liu & Gengzhong Feng & Giri Kumar Tayi & Jun Tian, 2021. "Managing Data Quality of the Data Warehouse: A Chance-Constrained Programming Approach," Information Systems Frontiers, Springer, vol. 23(2), pages 375-389, April.
    10. Mengyue Wang & Xin Li & Patrick Y. K. Chau, 2021. "Leveraging Image-Processing Techniques for Empirical Research: Feasibility and Reliability in Online Shopping Context," Information Systems Frontiers, Springer, vol. 23(3), pages 607-626, June.
    11. Ghassan Beydoun & Sergiu Dascalu & Dale Dominey-Howes & Andrew Sheehan, 2018. "Disaster Management and Information Systems: Insights to Emerging Challenges," Information Systems Frontiers, Springer, vol. 20(4), pages 649-652, August.
    12. Emily Heaney & Laura Hunter & Angus Clulow & Devin Bowles & Sotiris Vardoulakis, 2021. "Efficacy of Communication Techniques and Health Outcomes of Bushfire Smoke Exposure: A Scoping Review," IJERPH, MDPI, vol. 18(20), pages 1-14, October.
    13. Luvai Motiwalla & Amit V. Deokar & Surendra Sarnikar & Angelika Dimoka, 2019. "Leveraging Data Analytics for Behavioral Research," Information Systems Frontiers, Springer, vol. 21(4), pages 735-742, August.
    14. Kerstin K. Zander & Jonas Rieskamp & Milad Mirbabaie & Mamoun Alazab & Duy Nguyen, 2023. "Responses to heat waves: what can Twitter data tell us?," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 116(3), pages 3547-3564, April.
    15. Prabhsimran Singh & Surleen Kaur & Abdullah M. Baabdullah & Yogesh K. Dwivedi & Sandeep Sharma & Ravinder Singh Sawhney & Ronnie Das, 2023. "Is #SDG13 Trending Online? Insights from Climate Change Discussions on Twitter," Information Systems Frontiers, Springer, vol. 25(1), pages 199-219, February.
    16. Naim Kapucu & Ratna Okhai & Yue Ge & Chris Zobel, 2023. "The use of documentary data for network analysis in emergency and crisis management," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 116(1), pages 425-445, March.
    17. Paras Bhatt & Naga Vemprala & Rohit Valecha & Govind Hariharan & H. Raghav Rao, 2023. "User Privacy, Surveillance and Public Health during COVID-19 – An Examination of Twitterverse," Information Systems Frontiers, Springer, vol. 25(5), pages 1667-1682, October.
    18. Arpan Kumar Kar, 2021. "What Affects Usage Satisfaction in Mobile Payments? Modelling User Generated Content to Develop the “Digital Service Usage Satisfaction Model”," Information Systems Frontiers, Springer, vol. 23(5), pages 1341-1361, September.
    19. Jyoti Prakash Singh & Abhinav Kumar & Nripendra P. Rana & Yogesh K. Dwivedi, 2022. "Attention-Based LSTM Network for Rumor Veracity Estimation of Tweets," Information Systems Frontiers, Springer, vol. 24(2), pages 459-474, April.
    20. Ziqiang Han & Mengfan Shen & Hongbing Liu & Yifan Peng, 2022. "Topical and emotional expressions regarding extreme weather disasters on social media: a comparison of posts from official media and the public," Palgrave Communications, Palgrave Macmillan, vol. 9(1), pages 1-10, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:infosf:v:24:y:2022:i:1:d:10.1007_s10796-020-10072-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.