You are here
TEXT-IMAGE RESTORATION AND TEXT ALIGNMENT FOR MULTI-ENGINE OPTICAL CHARACTER RECOGNITION SYSTEMS
- Date Issued:
- 2006
- Abstract/Description:
- Previous research showed that combining three different optical character recognition (OCR) engines (ExperVision® OCR, Scansoft OCR, and Abbyy® OCR) results using voting algorithms will get higher accuracy rate than each of the engines individually. While a voting algorithm has been realized, several aspects to automate and improve the accuracy rate needed further research. This thesis will focus on morphological image preprocessing and morphological text restoration that goes to OCR engines. This method is similar to the one used in restoration partial finger prints. Series of morphological dilating and eroding filters of various mask shapes and sizes were applied to text of different font sizes and types with various noises added. These images were then processed by the OCR engines, and based on these results successful combinations of text, noise, and filters were chosen. The thesis will also deal with the problem of text alignment. Each OCR engine has its own way of dealing with noise and corrupted characters; as a result, the output texts of OCR engines have different lengths and number of words. This in turn, makes it impossible to use spaces a delimiter as a method to separate the words for processing by the voting part of the system. Text aligning determines, using various techniques, what is an extra word, what is supposed to be two or more words instead of one, which words are missing in one document compared to the other, etc. Alignment algorithm is made up of a series of shifts in the two texts to determine which parts are similar and which are not. Since errors made by OCR engines are due to visual misrecognition, in addition to simple character comparison (equal or not), a technique was developed that allows comparison of characters based on how they look.
Title: | TEXT-IMAGE RESTORATION AND TEXT ALIGNMENT FOR MULTI-ENGINE OPTICAL CHARACTER RECOGNITION SYSTEMS. |
45 views
15 downloads |
---|---|---|
Name(s): |
Kozlovski, Nikolai, Author Weeks, Arthur, Committee Chair University of Central Florida, Degree Grantor |
|
Type of Resource: | text | |
Date Issued: | 2006 | |
Publisher: | University of Central Florida | |
Language(s): | English | |
Abstract/Description: | Previous research showed that combining three different optical character recognition (OCR) engines (ExperVision® OCR, Scansoft OCR, and Abbyy® OCR) results using voting algorithms will get higher accuracy rate than each of the engines individually. While a voting algorithm has been realized, several aspects to automate and improve the accuracy rate needed further research. This thesis will focus on morphological image preprocessing and morphological text restoration that goes to OCR engines. This method is similar to the one used in restoration partial finger prints. Series of morphological dilating and eroding filters of various mask shapes and sizes were applied to text of different font sizes and types with various noises added. These images were then processed by the OCR engines, and based on these results successful combinations of text, noise, and filters were chosen. The thesis will also deal with the problem of text alignment. Each OCR engine has its own way of dealing with noise and corrupted characters; as a result, the output texts of OCR engines have different lengths and number of words. This in turn, makes it impossible to use spaces a delimiter as a method to separate the words for processing by the voting part of the system. Text aligning determines, using various techniques, what is an extra word, what is supposed to be two or more words instead of one, which words are missing in one document compared to the other, etc. Alignment algorithm is made up of a series of shifts in the two texts to determine which parts are similar and which are not. Since errors made by OCR engines are due to visual misrecognition, in addition to simple character comparison (equal or not), a technique was developed that allows comparison of characters based on how they look. | |
Identifier: | CFE0001060 (IID), ucf:46799 (fedora) | |
Note(s): |
2006-05-01 M.S.E.E. Engineering and Computer Science, Department of Electrical and Computer Engineering Masters This record was generated from author submitted information. |
|
Subject(s): |
IMAGE RESTORATION ALIGNMENT OCR MULTI-ENGINE MULTI ENGINE MULTIENGINE OPTICAL CHARACTER RECOGNITION VISUAL CHARACTER COMPARISON |
|
Persistent Link to This Record: | http://purl.flvc.org/ucf/fd/CFE0001060 | |
Restrictions on Access: | public | |
Host Institution: | UCF |