TB-BCG: Topic-Based BART Counterfeit Generator for Fake News Detection

My bibliography Save this article

TB-BCG: Topic-Based BART Counterfeit Generator for Fake News Detection

Author

Listed:

Andrea Stevens Karnyoto
(State Key Laboratory of Communication Content Cognition, People’s Daily Online, Beijing 100733, China
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China)
Chengjie Sun
(School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China)
Bingquan Liu
(School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China)
Xiaolong Wang
(School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China)

Registered:

Abstract

Fake news has been spreading intentionally and misleading society to believe unconfirmed information; this phenomenon makes it challenging to identify fake news based on shared content. Fake news circulation is not only a current issue, but it has been disseminated for centuries. Dealing with fake news is a challenging task because it spreads massively. Therefore, automatic fake news detection is urgently needed. We introduced TB-BCG, Topic-Based BART Counterfeit Generator, to increase detection accuracy using deep learning. This approach plays an essential role in selecting impacted data rows and adding more training data. Our research implemented Latent Dirichlet Allocation (Topic-based), Bidirectional and Auto-Regressive Transformers (BART), and Cosine Document Similarity as the main tools involved in Constraint @ AAAI2021-COVID19 Fake News Detection dataset shared task. This paper sets forth this simple yet powerful idea by selecting a dataset based on topic and sorting based on distinctive data, generating counterfeit training data using BART, and comparing counterfeit-generated text toward source text using cosine similarity. If the comparison value between counterfeit-generated text and source text is more than 95%, then add that counterfeit-generated text into the dataset. In order to prove the resistance of precision and the robustness in various numbers of data training, we used 30%, 50%, 80%, and 100% from the total dataset and trained it using simple Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN). Compared to baseline, our method improved the testing performance for both LSTM and CNN, and yields are only slightly different.

Suggested Citation

Andrea Stevens Karnyoto & Chengjie Sun & Bingquan Liu & Xiaolong Wang, 2022. "TB-BCG: Topic-Based BART Counterfeit Generator for Fake News Detection," Mathematics, MDPI, vol. 10(4), pages 1-17, February.

Handle: RePEc:gam:jmathe:v:10:y:2022:i:4:p:585-:d:749154

Download full text from publisher

References listed on IDEAS

Guo, Yue & Barnes, Stuart J. & Jia, Qiong, 2017. "Mining meaning from online ratings and reviews: Tourist satisfaction analysis using latent dirichlet allocation," Tourism Management, Elsevier, vol. 59(C), pages 467-483.
Ozbay, Feyza Altunbey & Alatas, Bilal, 2020. "Fake news detection within online social media using supervised artificial intelligence algorithms," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 540(C).

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Wang, Binni & Wang, Pong & Tu, Yiliu, 2021. "Customer satisfaction service match and service quality-based blockchain cloud manufacturing," International Journal of Production Economics, Elsevier, vol. 240(C).
Yucheng Zhang & Zhiling Wang & Lin Xiao & Lijun Wang & Pei Huang, 2023. "Discovering the evolution of online reviews: A bibliometric review," Electronic Markets, Springer;IIM University of St. Gallen, vol. 33(1), pages 1-22, December.
M. Narciso, 2022. "The Unreliability of Online Review Mechanisms," Journal of Consumer Policy, Springer, vol. 45(3), pages 349-368, September.
Rowe, Francisco & Mahony, Michael & Graells-Garrido, Eduardo & Rango, Marzia & Sievers, Niklas, 2021. "Using Twitter to Track Immigration Sentiment During Early Stages of the COVID-19 Pandemic," SocArXiv pc3za_v1, Center for Open Science.
Ian Sutherland & Youngseok Sim & Seul Ki Lee & Jaemun Byun & Kiattipoom Kiatkawsin, 2020. "Topic Modeling of Online Accommodation Reviews via Latent Dirichlet Allocation," Sustainability, MDPI, vol. 12(5), pages 1-15, February.
Jiacong Wu & Yu Wang & Ru Zhang & Jing Cai, 2018. "An Approach to Discovering Product/Service Improvement Priorities: Using Dynamic Importance-Performance Analysis," Sustainability, MDPI, vol. 10(10), pages 1-26, October.
Zuo, Wenming & Bai, Weijing & Zhu, Wenfeng & He, Xinming & Qiu, Xinxin, 2022. "Changes in service quality of sharing accommodation: Evidence from airbnb," Technology in Society, Elsevier, vol. 71(C).
Tahereh Dehdarirad & Kalle Karlsson, 2021. "News media attention in Climate Action: latent topics and open access," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(9), pages 8109-8128, September.
Shuyue Huang & Lena Jingen Liang & Hwansuk Chris Choi, 2022. "How We Failed in Context: A Text-Mining Approach to Understanding Hotel Service Failures," Sustainability, MDPI, vol. 14(5), pages 1-18, February.
Carmela Iorio & Giuseppe Pandolfo & Antonio D’Ambrosio & Roberta Siciliano, 2020. "Mining big data in tourism," Quality & Quantity: International Journal of Methodology, Springer, vol. 54(5), pages 1655-1669, December.
Mohamed M. Mostafa, 2023. "A one-hundred-year structural topic modeling analysis of the knowledge structure of international management research," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(4), pages 3905-3935, August.
Ian Sutherland & Kiattipoom Kiatkawsin, 2020. "Determinants of Guest Experience in Airbnb: A Topic Modeling Approach Using LDA," Sustainability, MDPI, vol. 12(8), pages 1-16, April.
Liu, Xiao & Li, Ming-Yang, 2024. "Sustainable service product design method: Focus on customer demands and triple bottom line," Journal of Retailing and Consumer Services, Elsevier, vol. 80(C).
Muhammad Mudassar Yamin & Mohib Ullah & Habib Ullah & Basel Katt & Mohammad Hijji & Khan Muhammad, 2022. "Mapping Tools for Open Source Intelligence with Cyber Kill Chain for Adversarial Aware Security," Mathematics, MDPI, vol. 10(12), pages 1-25, June.
Sunyoung Hlee & Hanna Lee & Chulmo Koo, 2018. "Hospitality and Tourism Online Review Research: A Systematic Analysis and Heuristic-Systematic Model," Sustainability, MDPI, vol. 10(4), pages 1-27, April.
Lina Zhou & Jie Tao & Dongsong Zhang, 2023. "Does Fake News in Different Languages Tell the Same Story? An Analysis of Multi-level Thematic and Emotional Characteristics of News about COVID-19," Information Systems Frontiers, Springer, vol. 25(2), pages 493-512, April.
Choi, Hyunhong & Woo, JongRoul, 2022. "Investigating emerging hydrogen technology topics and comparing national level technological focus: Patent analysis using a structural topic model," Applied Energy, Elsevier, vol. 313(C).
Wenzhi Cao & Xingen Yang & Yi Yang, 2023. "A Large-Scale Reviews-Driven Multi-Criteria Product Ranking Approach Based on User Credibility and Division Mechanism," Mathematics, MDPI, vol. 11(13), pages 1-19, July.
Nan Yang & Nikolaos Korfiatis & Dimitris Zissis & Konstantina Spanaki, 2024. "Incorporating topic membership in review rating prediction from unstructured data: a gradient boosting approach," Annals of Operations Research, Springer, vol. 339(1), pages 631-662, August.
Boccali, Filippo & Mariani, Marcello M. & Visani, Franco & Mora-Cruz, Alexandra, 2022. "Innovative value-based price assessment in data-rich environments: Leveraging online review analytics through Data Envelopment Analysis to empower managers and entrepreneurs," Technological Forecasting and Social Change, Elsevier, vol. 182(C).

More about this item

Keywords

fake news detection; Latent Dirichlet Allocation (LDA); Bidirectional and Auto-Regressive Transformers (BART); cosine document similarity; AAAI2021-COVID19 Fake News Detection dataset;
All these keywords.

JEL classification:

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:10:y:2022:i:4:p:585-:d:749154. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

TB-BCG: Topic-Based BART Counterfeit Generator for Fake News Detection

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

JEL classification:

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data