IDEAS home Printed from https://ideas.repec.org/a/prg/jnlaip/vpreprintid263.html
   My bibliography  Save this article

Measuring the Feasibility of a Question and Answering System for the Sarawak Gazette Using Chatbot Technology

Author

Listed:
  • Yasir Lutfan bin Yusuf
  • Suhaila binti Saee

Abstract

Background: The Sarawak Gazette is a critical repository of information pertaining to Sarawak's history. It has received much attention over the last two decades, with prior studies focusing on digitizing and extracting the gazette's ontologies to increase the gazette's accessibility. However, the creation of a question answering system for the Sarawak Gazette, another avenue that could improve accessibility, has been overlooked. Objective: This study created a new system to generate answers for user questions related to the gazette using chatbot technology. Methods: This system sends user queries to a context retrieval system, then generates an answer from the retrieved contexts using a Large Language Model. A question answering dataset was also created using a Large Language Model to evaluate this system, with dataset quality assessed by 10 annotators. Results: The system achieved 55% higher precision, and 42% higher recall compared to previous state-of-the-art historical document question answering while only sacrificing 11% of cosine similarity. The annotators overall rated the dataset 2.9 out of 3. Conclusion: The system could answer the general public's questions about the Sarawak Gazette in a more direct and friendly manner compared to traditional information retrieval methods. The methods developed in this study are also applicable to other Malaysian historical texts that are written in English. All code used in this study have been released on GitHub.

Suggested Citation

  • Yasir Lutfan bin Yusuf & Suhaila binti Saee, . "Measuring the Feasibility of a Question and Answering System for the Sarawak Gazette Using Chatbot Technology," Acta Informatica Pragensia, Prague University of Economics and Business, vol. 0.
  • Handle: RePEc:prg:jnlaip:v:preprint:id:263
    DOI: 10.18267/j.aip.263
    as

    Download full text from publisher

    File URL: http://aip.vse.cz/doi/10.18267/j.aip.263.html
    Download Restriction: free of charge

    File URL: https://libkey.io/10.18267/j.aip.263?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:prg:jnlaip:v:preprint:id:263. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Stanislav Vojir (email available below). General contact details of provider: https://edirc.repec.org/data/uevsecz.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.