IDEAS home Printed from https://ideas.repec.org/a/igg/jaci00/v13y2022i1p1-21.html
   My bibliography  Save this article

Script-Independent Text Segmentation from Document Images

Author

Listed:
  • Parul Sahare

    (Indian Institute of Information Technology, Nagpur, India)

  • Jitendra V. Tembhurne

    (Indian Institute of Information Technology, Nagpur, India)

  • Mayur R. Parate

    (Indian Institute of Information Technology, Nagpur, India)

  • Tausif Diwan

    (Indian Institute of Information Technology, Nagpur, India)

  • Sanjay B. Dhok

    (Visvesvaraya National Institute of Technology, Nagpur, India)

Abstract

Document image analysis finds broad application in the digital world for the purpose of information retrieval. This includes optical character recognition (OCR), indexing of digital libraries, web image processing, etc. One of the important steps in this field is text segmentation. This segmentation becomes complicated for the documents containing text of uneven spacing and characters of varying font sizes. In this paper, script-independent text-line segmentation and word segmentation algorithms are presented. Fast marching method is used for text-line segmentation, whereas wavelet transform with connected components (CCs) labeling is used for word segmentation. Fast marching method is used as a region growing process that detects potential text-lines. For word segmentation, energy map is calculated using wavelet transform to create text-blocks. Both the proposed algorithms are evaluated on different databases containing documents of different scripts, where highest text-line and word segmentation accuracies of 98.9% and 99.1%, respectively, are obtained.

Suggested Citation

  • Parul Sahare & Jitendra V. Tembhurne & Mayur R. Parate & Tausif Diwan & Sanjay B. Dhok, 2022. "Script-Independent Text Segmentation from Document Images," International Journal of Ambient Computing and Intelligence (IJACI), IGI Global, vol. 13(1), pages 1-21, January.
  • Handle: RePEc:igg:jaci00:v:13:y:2022:i:1:p:1-21
    as

    Download full text from publisher

    File URL: http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/IJACI.313967
    Download Restriction: no
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:igg:jaci00:v:13:y:2022:i:1:p:1-21. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Journal Editor (email available below). General contact details of provider: https://www.igi-global.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.