IDEAS home Printed from https://ideas.repec.org/a/spr/infosf/v24y2022i1d10.1007_s10796-020-10072-x.html
   My bibliography  Save this article

Analyzing the Quality of Twitter Data Streams

Author

Listed:
  • Franco Arolfo

    (Instituto Tecnológico de Buenos Aires Lavardén 315)

  • Kevin Cortés Rodriguez

    (Instituto Tecnológico de Buenos Aires Lavardén 315)

  • Alejandro Vaisman

    (Instituto Tecnológico de Buenos Aires Lavardén 315)

Abstract

There is a general belief that the quality of Twitter data streams is generally low and unpredictable, making, in some way, unreliable to take decisions based on such data. The work presented here addresses this problem from a Data Quality (DQ) perspective, adapting the traditional methods used in relational databases, based on quality dimensions and metrics, to capture the characteristics of Twitter data streams in particular, and of Big Data in a more general sense. Therefore, as a first contribution, this paper re-defines the classic DQ dimensions and metrics for the scenario under study. Second, the paper introduces a software tool that allows capturing Twitter data streams in real time, computing their DQ and displaying the results through a wide variety of graphics. As a third contribution of this paper, using the aforementioned machinery, a thorough analysis of the DQ of Twitter streams is performed, based on four dimensions: Readability, Completeness, Usefulness, and Trustworthiness. These dimensions are studied for several different cases, namely unfiltered data streams, data streams filtered using a collection of keywords, and classifying tweets referring to different topics, studying the DQ for each topic. Further, although it is well known that the number of geolocalized tweets is very low, the paper studies the DQ of tweets with respect to the place from where they are posted. Last but not least, the tool allows changing the weights of each quality dimension considered in the computation of the overall data quality of a tweet. This allows defining weights that fit different analysis contexts and/or different user profiles. Interestingly, this study reveals that the quality of Twitter streams is higher than what would have been expected.

Suggested Citation

  • Franco Arolfo & Kevin Cortés Rodriguez & Alejandro Vaisman, 2022. "Analyzing the Quality of Twitter Data Streams," Information Systems Frontiers, Springer, vol. 24(1), pages 349-369, February.
  • Handle: RePEc:spr:infosf:v:24:y:2022:i:1:d:10.1007_s10796-020-10072-x
    DOI: 10.1007/s10796-020-10072-x
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10796-020-10072-x
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10796-020-10072-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Amir Hassan Zadeh & Hamed M. Zolbanin & Ramesh Sharda & Dursun Delen, 2019. "Social Media for Nowcasting Flu Activity: Spatio-Temporal Big Data Analysis," Information Systems Frontiers, Springer, vol. 21(4), pages 743-760, August.
    2. Babak Abedin & Abdul Babar, 2018. "Institutional vs. Non-institutional use of Social Media during Emergency Response: A Case of Twitter in 2014 Australian Bush Fire," Information Systems Frontiers, Springer, vol. 20(4), pages 729-740, August.
    3. Roman Lukyanenko & Andrea Wiggins & Holly K. Rosser, 0. "Citizen Science: An Information Quality Research Frontier," Information Systems Frontiers, Springer, vol. 0, pages 1-23.
    4. Wei-Lun Chang & Yi-Pei Chen, 2019. "Way too sentimental? a credible model for online reviews," Information Systems Frontiers, Springer, vol. 21(2), pages 453-468, April.
    5. Roman Lukyanenko & Andrea Wiggins & Holly K. Rosser, 2020. "Citizen Science: An Information Quality Research Frontier," Information Systems Frontiers, Springer, vol. 22(4), pages 961-983, August.
    6. Hua (Jonathan) Ye & Cecil Eng Huang Chua & Jun Sun, 2019. "Enhancing mobile data services performance via online reviews," Information Systems Frontiers, Springer, vol. 21(2), pages 441-452, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Osuji E. & Evans O., 2020. "Tourism Effects of Pandemics: New Insights from Novel Coronavirus," SPOUDAI Journal of Economics and Business, SPOUDAI Journal of Economics and Business, University of Piraeus, vol. 70(3-4), pages 56-65, July-Dece.
    2. Fatuma Namisango & Kyeong Kang & Ghassan Beydoun, 2022. "How the Structures Provided by Social Media Enable Collaborative Outcomes: A Study of Service Co-creation in Nonprofits," Information Systems Frontiers, Springer, vol. 24(2), pages 517-535, April.
    3. Liu, Hongfei & Jayawardhena, Chanaka & Osburg, Victoria-Sophie & Yoganathan, Vignesh & Cartwright, Severina, 2021. "Social sharing of consumption emotion in electronic word of mouth (eWOM): A cross-media perspective," Journal of Business Research, Elsevier, vol. 132(C), pages 208-220.
    4. Qi Liu & Gengzhong Feng & Giri Kumar Tayi & Jun Tian, 2021. "Managing Data Quality of the Data Warehouse: A Chance-Constrained Programming Approach," Information Systems Frontiers, Springer, vol. 23(2), pages 375-389, April.
    5. Mengyue Wang & Xin Li & Patrick Y. K. Chau, 2021. "Leveraging Image-Processing Techniques for Empirical Research: Feasibility and Reliability in Online Shopping Context," Information Systems Frontiers, Springer, vol. 23(3), pages 607-626, June.
    6. Ghassan Beydoun & Sergiu Dascalu & Dale Dominey-Howes & Andrew Sheehan, 2018. "Disaster Management and Information Systems: Insights to Emerging Challenges," Information Systems Frontiers, Springer, vol. 20(4), pages 649-652, August.
    7. Emily Heaney & Laura Hunter & Angus Clulow & Devin Bowles & Sotiris Vardoulakis, 2021. "Efficacy of Communication Techniques and Health Outcomes of Bushfire Smoke Exposure: A Scoping Review," IJERPH, MDPI, vol. 18(20), pages 1-14, October.
    8. Luvai Motiwalla & Amit V. Deokar & Surendra Sarnikar & Angelika Dimoka, 2019. "Leveraging Data Analytics for Behavioral Research," Information Systems Frontiers, Springer, vol. 21(4), pages 735-742, August.
    9. Prabhsimran Singh & Surleen Kaur & Abdullah M. Baabdullah & Yogesh K. Dwivedi & Sandeep Sharma & Ravinder Singh Sawhney & Ronnie Das, 2023. "Is #SDG13 Trending Online? Insights from Climate Change Discussions on Twitter," Information Systems Frontiers, Springer, vol. 25(1), pages 199-219, February.
    10. Naim Kapucu & Ratna B Dougherty & Yue Ge & Chris Zobel, 2023. "The use of documentary data for network analysis in emergency and crisis management," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 116(1), pages 425-445, March.
    11. Arpan Kumar Kar, 2021. "What Affects Usage Satisfaction in Mobile Payments? Modelling User Generated Content to Develop the “Digital Service Usage Satisfaction Model”," Information Systems Frontiers, Springer, vol. 23(5), pages 1341-1361, September.
    12. Peng Xie, 2022. "The Interplay Between Investor Activity on Virtual Investment Community and the Trading Dynamics: Evidence From the Bitcoin Market," Information Systems Frontiers, Springer, vol. 24(4), pages 1287-1303, August.
    13. Christof Weinhardt & Simon Kloker & Oliver Hinz & Wil M. P. Aalst, 2020. "Citizen Science in Information Systems Research," Business & Information Systems Engineering: The International Journal of WIRTSCHAFTSINFORMATIK, Springer;Gesellschaft für Informatik e.V. (GI), vol. 62(4), pages 273-277, August.
    14. Carine Dominguez-Péry & Rana Tassabehji & Lakshmi Narasimha Raju Vuddaraju & Vikhram Kofi Duffour, 2021. "Improving emergency response operations in maritime accidents using social media with big data analytics: a case study of the MV Wakashio disaster," Post-Print hal-04021179, HAL.
    15. Armel Lefebvre & Marco Spruit, 2023. "Laboratory Forensics for Open Science Readiness: an Investigative Approach to Research Data Management," Information Systems Frontiers, Springer, vol. 25(1), pages 381-399, February.
    16. Doruk Şen & Cem Çağrı Dönmez & Umman Mahir Yıldırım, 0. "A Hybrid Bi-level Metaheuristic for Credit Scoring," Information Systems Frontiers, Springer, vol. 0, pages 1-11.
    17. María José Aramburu & Rafael Berlanga & Indira Lanza, 2020. "Social Media Multidimensional Analysis for Intelligent Health Surveillance," IJERPH, MDPI, vol. 17(7), pages 1-17, March.
    18. Shalak Mendon & Pankaj Dutta & Abhishek Behl & Stefan Lessmann, 2021. "A Hybrid Approach of Machine Learning and Lexicons to Sentiment Analysis: Enhanced Insights from Twitter Data of Natural Disasters," Information Systems Frontiers, Springer, vol. 23(5), pages 1145-1168, September.
    19. Arpan Kumar Kar & Sunil Kumar & P. Vigneswara Ilavarasan, 2021. "Modelling the Service Experience Encounters Using User-Generated Content: A Text Mining Approach," Global Journal of Flexible Systems Management, Springer;Global Institute of Flexible Systems Management, vol. 22(4), pages 267-288, December.
    20. Ghassan Beydoun & Babak Abedin & José M. Merigó & Melanie Vera, 2019. "Twenty Years of Information Systems Frontiers," Information Systems Frontiers, Springer, vol. 21(2), pages 485-494, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:infosf:v:24:y:2022:i:1:d:10.1007_s10796-020-10072-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.