Author
Listed:
- Rohit Raju
(Department of Computer Science, University of Colorado, Boulder, CO, USA†Department of Computer Science & Engineering, Amrita School of Computing, Bengaluru, Amrita Vishwa Vidyapeetham, India)
- Peeta Basa Pati
(��Department of Computer Science & Engineering, Amrita School of Computing, Bengaluru, Amrita Vishwa Vidyapeetham, India)
- SA Gandheesh
(��Department of Computer Science & Engineering, Amrita School of Computing, Bengaluru, Amrita Vishwa Vidyapeetham, India)
- Gayatri Sanjana Sannala
(��Department of Computer Science & Engineering, Amrita School of Computing, Bengaluru, Amrita Vishwa Vidyapeetham, India)
- KS Suriya
(��Department of Computer Science & Engineering, Amrita School of Computing, Bengaluru, Amrita Vishwa Vidyapeetham, India)
Abstract
Text continues to remain a relevant form of representation for information. Text documents are created either in digital native platforms or through the conversion of other media files such as images and speech. While the digital native text is invariably obtained through physical or virtual keyboards, technologies such as OCR and speech recognition are utilised to transform the images and speech signals into text content. All these variety of mechanisms of text generation also introduce errors into the captured text. This project aims at analysing different kinds of errors that occur in text documents. The work employs two of the advanced deep neural network-based language models, namely, BART and MarianMT, to rectify the anomalies present in the text. Transfer learning of these models with available dataset is performed to finetune their capacity for error correction. A comparative study is conducted to investigate the effectiveness of these models in handling each of the defined error categories. It is observed that while both models can bring down the erroneous sentences by 20+%, BART can handle spelling errors far better (24.6%) than grammatical errors (8.8%).
Suggested Citation
Rohit Raju & Peeta Basa Pati & SA Gandheesh & Gayatri Sanjana Sannala & KS Suriya, 2024.
"Grammatical versus Spelling Error Correction: An Investigation into the Responsiveness of Transformer-Based Language Models Using BART and MarianMT,"
Journal of Information & Knowledge Management (JIKM), World Scientific Publishing Co. Pte. Ltd., vol. 23(03), pages 1-33, June.
Handle:
RePEc:wsi:jikmxx:v:23:y:2024:i:03:n:s0219649224500370
DOI: 10.1142/S0219649224500370
Download full text from publisher
As the access to this document is restricted, you may want to search for a different version of it.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wsi:jikmxx:v:23:y:2024:i:03:n:s0219649224500370. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Tai Tone Lim (email available below). General contact details of provider: http://www.worldscinet.com/jikm/jikm.shtml .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.