Abstract
A new post-processing system for the enhancement of OCR produced text is suggested, which improves automatic data acquisition for large full-text Information Retrieval systems. The idea is to match the output of several OCR devices, thereby detecting possible errors, and to suggest possible corrections based on statistical information and dictionaries. The results of testing the new method on OCRs for several languages are reported.
Original language | American English |
---|---|
Title of host publication | Proc. Workshop on Information Retrieval and OCR at SIGIR-02 |
State | Published - 2002 |