IDEAS home Printed from https://ideas.repec.org/a/bla/jamest/v50y1999i9p751-759.html
   My bibliography  Save this article

Text segmentation for Chinese spell checking

Author

Listed:
  • Kin Hong Lee
  • Mau Kit Michael Ng
  • Qin Lu

Abstract

Chinese spell checking is different from its counterparts for Western languages because Chinese words in texts are not separated by spaces. Chinese spell checking in this article refers to how to identify the misuse of characters in text composition. In other words, it is error correction at the word level rather than at the character level. Before Chinese sentences are spell checked, the text is segmented into semantic units. Error detection can then be carried out on the segmented text based on thesaurus and grammar rules. Segmentation is not a trivial process due to ambiguities in the Chinese language and errors in texts. Because it is not practical to define all Chinese words in a dictionary, words not predefined must also be dealt with. The number of word combinations increases exponentially with the length of the sentence. In this article, a Block‐of‐Combinations (BOC) segmentation method based on frequency of word usage is proposed to reduce the word combinations from exponential growth to linear growth. From experiments carried out on Hong Kong newspapers, BOC can correctly solve 10% more ambiguities than the Maximum Match segmentation method. To make the segmentation more suitable for spell checking, user interaction is also suggested.

Suggested Citation

  • Kin Hong Lee & Mau Kit Michael Ng & Qin Lu, 1999. "Text segmentation for Chinese spell checking," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 50(9), pages 751-759.
  • Handle: RePEc:bla:jamest:v:50:y:1999:i:9:p:751-759
    DOI: 10.1002/(SICI)1097-4571(1999)50:93.0.CO;2-P
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/(SICI)1097-4571(1999)50:93.0.CO;2-P
    Download Restriction: no

    File URL: https://libkey.io/10.1002/(SICI)1097-4571(1999)50:93.0.CO;2-P?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jamest:v:50:y:1999:i:9:p:751-759. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.asis.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.