Enhancement of Digitized Book Pages

Lazarov, T. and Gluhchev, G. (2008) Enhancement of Digitized Book Pages. In: Proceedings of the Sixth Conference on Informatics and Information Technology. Institute of Informatics, Faculty of Natural Sciences and Mathematics, Ss. Cyril and Methodius University in Skopje, Macedonia, Skopje, Macedonia, pp. 55-59. ISBN 978-9989-668-78-4

[img]
Preview
Text
978-9989-668-78-4_pp55-59.pdf

Download (270kB) | Preview
Official URL: http://ciit.finki.ukim.mk

Abstract

When examining text materials often times documents are old, authentic, of rare kind or of bad quality. The access to some documents is many times limited because of the state and kind of the materials. The researchers who examine texts need the sources in a handy format, which can improve the process of exploring the data. With the development of technologies this is possible if the materials are digitized – scanned, shot with a camera, etc. Although images are more comfortable to work with they take a significant amount of disk space. If the text from the source material is put in a plain text file it would take tenths of times less space. Creating the text files manually from the source materials is a time consuming and hard work. Automating that process is a challenging work for many scientists. Today we have lots of OCR (Optical Character Recognition) software which can recognize characters from digitized materials to some extent. However there are plenty of cases when the image is of too bad quality which makes it almost impossible for the tool to recognize the characters. Often the text is hardly read, there are shadows from the scanning process, there are places where the page has been burnt, the paper is somehow transparent and the back side content is slightly visible, and so on. Image quality enhancement is needed to improve the readability of the digitized source data and to ease the process of OCR. This work describes some methods and techniques for enhancement of images containing text so that tools for OCR can have a clean source to work on. There are two basic phases which take place in the process: contrast enhancement and noise filtering; separating the objects from the background for easier distinguishing between Objects.

Item Type: Book Section
Subjects: International Conference on Informatics and Information Technologies > Digitization Trends and technologies
Depositing User: Vangel Ajanovski
Date Deposited: 28 Oct 2016 00:15
Last Modified: 28 Oct 2016 00:15
URI: http://eprints.finki.ukim.mk/id/eprint/11293

Actions (login required)

View Item View Item