You are here
AUTONOMOUS REPAIR OF OPTICAL CHARACTER RECOGNITION DATA THROUGH SIMPLE VOTING AND MULTI-DIMENSIONAL INDEXING TECHNIQUES
- Date Issued:
- 2005
- Abstract/Description:
- The three major optical character recognition (OCR) engines (ExperVision, Scansoft OCR, and Abby OCR) in use today are all capable of recognizing text at near perfect percentages. The remaining errors however have proven very difficult to identify within a single engine. Recent research has shown that a comparison between the errors of the three engines proved to have very little correlation, and thus, when used in conjunction, may be useful to increase accuracy of the final result. This document discusses the implementation and results of a simple voting system designed to prove the hypothesis and show a statistical improvement in overall accuracy. Additional aspects of implementing an improved OCR scheme such as dealing with multiple engine data output alignment and recognizing application specific solutions are also addressed in this research. Although voting systems are currently in use by many major OCR engine developers, this research focuses on the addition of a collaborative system which is able to utilize the various positive aspects of multiple engines while also addressing the immediate need for practical industry applications such as litigation and forms processing. Doculex TM, a major developer and leader in the document imaging industry, has provided the funding for this research.
Title: | AUTONOMOUS REPAIR OF OPTICAL CHARACTER RECOGNITION DATA THROUGH SIMPLE VOTING AND MULTI-DIMENSIONAL INDEXING TECHNIQUES. |
33 views
15 downloads |
---|---|---|
Name(s): |
Sprague, Christopher, Author Weeks, Arthur, Committee Chair University of Central Florida, Degree Grantor |
|
Type of Resource: | text | |
Date Issued: | 2005 | |
Publisher: | University of Central Florida | |
Language(s): | English | |
Abstract/Description: | The three major optical character recognition (OCR) engines (ExperVision, Scansoft OCR, and Abby OCR) in use today are all capable of recognizing text at near perfect percentages. The remaining errors however have proven very difficult to identify within a single engine. Recent research has shown that a comparison between the errors of the three engines proved to have very little correlation, and thus, when used in conjunction, may be useful to increase accuracy of the final result. This document discusses the implementation and results of a simple voting system designed to prove the hypothesis and show a statistical improvement in overall accuracy. Additional aspects of implementing an improved OCR scheme such as dealing with multiple engine data output alignment and recognizing application specific solutions are also addressed in this research. Although voting systems are currently in use by many major OCR engine developers, this research focuses on the addition of a collaborative system which is able to utilize the various positive aspects of multiple engines while also addressing the immediate need for practical industry applications such as litigation and forms processing. Doculex TM, a major developer and leader in the document imaging industry, has provided the funding for this research. | |
Identifier: | CFE0000380 (IID), ucf:46337 (fedora) | |
Note(s): |
2005-05-01 M.S.Cp.E. Engineering and Computer Science, Department of Electrical and Computer Engineering Masters This record was generated from author submitted information. |
|
Subject(s): |
optical character recognition OCR simple voting multi-dimensional indexing |
|
Persistent Link to This Record: | http://purl.flvc.org/ucf/fd/CFE0000380 | |
Restrictions on Access: | public | |
Host Institution: | UCF |