OCR with OmniPage (Windows)

This Page

Synopsis
Basics to using OmniPage to OCR a document
Selecting
Start OCR
Process the document
Proofread document
Name, Save and Format

Synopsis

OminPage version 18 is an optical character recognition (OCR) tool for the Windows platform and is available in the CLC Student Computing Labs. This software identifies print characters in images using a scanner and computer software. This allows you to scan printed documents or scan a digital document/images such as a PDF, JPG, and PNG files. This is important because this allows us to make documents easily formatted for accessibility to screen readers for the visually impaired.

Note: Similar products exist for the Mac. A Google search should reveal the latest software packages.

Basics to using OmniPage to OCR a document

Here are the basic steps to ORC a document in OmniPage version 18 found on the CLC PC computer labs here at Penn State.

Selecting your document. OmniPage gives you two options to OCR a document: directly from the scanner and from a digital copy stored on your computer.
Start OCR.
Process the document.
Proofread document.
Name and Save.
Formatting the document after OCR completed. We recommend Rich Text Document (RTF) for compatibility. Saving the document as an RTF allows you to import the document with a simple cut and paste to other programs such as MSWord or a WYSWYG, such as Dreamweaver. This allows you to easily format the document for accessibility.

Selecting your document

Selecting Open File button from the Start page to take an existing digital file to OCR. Alternatively, select Scan Document button to import a hard copy document from a scanner you have connected.

Top of Page

Start OCR

Select Process from the menu bar
Select Perform OCR and Start to begin the process for the imported document. Be sure that Automatic is selected.

Top of Page

Start OCR with Workflow Process

Alternatively, select the Workflow button to begin the OCR process. Using the options arrow to the right side of the Workflow button is another way to allow you to import documents to begin a new scan.

Top of Page

Proofread document

In the OCR Proofreader Pane, unidentified characters are highlighted and as the scanner to identify them. The Proofreader will give you selected options, allow you to ignore and leave the character unidentified or you can manually type the new characters in the highlighted area for the OCR to recognize.
When you have completed proofing the unidentified characters Select Document Ready.
The Start Automatic Processing Pane will open.
Select Finish processing existing pages without adding new ones radio button if there are no other pages to OCR.
Select Start.
The document will be OCR.

Start Automatic processing pane screen shot

Top of Page

Name, Save and Format

In the Save to File pane, title the document under File name and save to the desired location.
Under File Type drop down menu be sure to select RTF. We recommend Rich Text Document (RTF) for compatibility. Saving the document as an RTF allows you to import the document with a simple cut and paste to other programs such as MSWord or a WYSWYG, such as Dreamweaver. This allows you to easily format the document for accessibility.

Note: There is an HTML option in the Text Format drop down menu. However, the HTML file created requires the HTML code to be extensively edited for accessibility.

Top of Page