IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i21p3402-d1510653.html
   My bibliography  Save this article

Mathematical Data Models and Context-Based Features for Enhancing Historical Degraded Manuscripts Using Neural Network Classification

Author

Listed:
  • Pasquale Savino

    (Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Via G. Moruzzi, 1, 56124 Pisa, Italy)

  • Anna Tonazzini

    (Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Via G. Moruzzi, 1, 56124 Pisa, Italy)

Abstract

A common cause of deterioration in historic manuscripts is ink transparency or bleeding from the opposite page. Philologists and paleographers can significantly benefit from minimizing these interferences when attempting to decipher the original text. Additionally, computer-aided text analysis can also gain from such text enhancement. In previous work, we proposed the use of neural networks (NNs) in combination with a data model that characterizes the damage when both sides of a page have been digitized. This approach offers the distinct advantage of allowing the creation of an artificial training set that teaches the NN to differentiate between clean and damaged pixels. We tested this concept using a shallow NN, which proved effective in categorizing texts with varying levels of deterioration. In this study, we adapt the NN design to tackling remaining classification uncertainties caused by areas of text overlap, inhomogeneity, and peaks of degradation. Specifically, we introduce a new output class for pixels within overlapping text areas and incorporate additional features related to the pixel context information to promote the same classification for pixels adjacent to each other. Our experiments demonstrate that these enhancements significantly improve the classification accuracy. This improvement is evident in the quality of both binarization, which aids in text analysis, and virtual restoration, aimed at recovering the manuscript’s original appearance. Tests conducted on a public dataset, using standard quality indices, reveal that the proposed method outperforms both our previous proposals and other notable methods found in the literature.

Suggested Citation

  • Pasquale Savino & Anna Tonazzini, 2024. "Mathematical Data Models and Context-Based Features for Enhancing Historical Degraded Manuscripts Using Neural Network Classification," Mathematics, MDPI, vol. 12(21), pages 1-13, October.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:21:p:3402-:d:1510653
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/21/3402/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/21/3402/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:21:p:3402-:d:1510653. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.