This model converts scanned images of text embedded images into electronic text. It takes as input scanned images in multiple formats, including JPG, PNG, and many others; the input can include tables. It produces output in PDF, TSV, plain text, and other formats. The model also supports user-supplied patterns and words, and 107 writing systems (scripts) and languages. It does not process color images or recognize handwriting. This model can be used in multiple ways, such as recovering electronic text from printouts, archival paper documents, and web pages containing only images of text.
Create a Modzy account to get started →
96.6% Word Accuracy (300 dpi 8-bit gray scale) and 96.3% Word Accuracy (300 dpi bitonal)
The training of the adaptive classifier uses a small amount of data: 20 samples of 94 characters from 8 fonts in a single size, with four attributes: normal, bold, italic, bold italic, giving a total of 60,160 training samples. This model gave 96.34% word accuracy for a standard collection of documents provided by DOE when scanned at 300 dpi bitonal, and 96.62% when scanned at 300 dpi 8-bit gray scale, as measured at the Fourth Annual Test of OCR Accuracy. For faxed versions of the business letters (fine mode fax), the model gave 95.30% word accuracy. This model handles multiple image formats, handles tables, and does page segmentation. It supports user-supplied patterns and words, and 107 writing systems (scripts) and languages. It does not process color images or recognize handwriting.
Measures how close the output text is to the reference text at the word level using the Levenshtein distance. If the reference text is a substring in the output text, it is considered correct.
This model uses a number of processing steps, including a connected components technique in which outlines of the components are stored, gathering outlines into approximate shapes (‘blobs’), followed by a two-phase process for word recognition: a first pass that attempts to recognize each word, followed by the satisfactory words being passed to an adaptive classifier as training data.
The training dataset includes a small amount of data: 20 samples of 94 characters from 8 fonts in a single size, with four attributes: normal, bold, italic, bold italic, giving a total of 60,160 training samples.
This model was tested against several datasets, representing different types of documents: original business letters (319K characters), a sample from DOE (1.4M characters), a sample of magazines (666K characters), English newspapers (492K characters), and Spanish newspapers (348K characters). In each of these tests, generally done with two different scanning resolutions, the metrics included the number of errors and the accuracy, at both the word and character levels.
The input(s) to this model must adhere to the following specifications:
This model will output the following:
Get a video demo of Modzy, the ModelOps and MLOps software platform that businesses use to deploy, integrate, run, and monitor AI—anywhere.