Top Page > Scan > Various Scan Settings > Specifying the File Type and File Name > Embedding Text Information in Scanned Data

Embedding Text Information in Scanned Data

Previous Next

You can use the OCR function to embed the text information in the scanned document without processing the data on your computer.

For details about the optional units required for this function, see "Functions Requiring Optional Configurations", Getting Started.
This function supports the following file types: [PDF], [High Compression PDF], and [PDF/A].
If [Black & White: Photo] is selected from [Original Type] when originals are being scanned, the text is scanned in shades of gray, and the characters and the top and bottom of the page may not be recognized correctly. If OCR accuracy has a higher priority than the image quality, select [Black & White: Text] in [Original Type] when scanning the original.

Place originals.

Press [Send File Type / Name].

Operation panel screen illustration

Select [PDF] in [File Type].

Select [OCR Settings] under the PDF file settings, and then select [OK].

Configure the settings such as [Add Extrc.Text to File Nm.], [Delete Blank Page] and [Cognitive Language] as required.

Press [OK] twice.

To send an e-mail, configure the destination address and other required settings.

Press the [Start] key.

The OCR function can process texts up to 40,000 characters.
The OCR function can recognize the following languages:
- English, German, French, Italian, Spanish, Dutch, Portuguese, Polish, Swedish, Finnish, Norwegian, Hungarian, Danish, Japanese.
The OCR function cannot be selected if the following conditions are met:
- [TIFF/JPEG] or [TIFF] is selected as the file type.
- [Store to HDD] or [Store to HDD + Send] in [Store File] is selected.
- [100 dpi] is selected as the resolution.
- [Preview] is selected.
- As the destination of the distribution server, [WSD] or [DSM] is used.
The effective resolution may be less than 200 dpi when an image scanned at 200 dpi or greater resolution is reduced by specifying the reproduction ratio. You can apply the OCR function in such cases, but the text recognition accuracy may deteriorate.
Depending on character shapes or types, characters may not be recognized correctly.
A PDF file without embedded text is generated if the scanned page does not contain a section that can be recognized as characters.
If a page contains large blank areas, the top and bottom of the page may not be recognized correctly.
No PDF file is generated if all pages in a document are determined as blank pages. If this happens, make sure to set the originals correctly, and try again.
A blank page or the top and bottom of a page may not be recognized correctly if the scanned page has smears or dirty spots or an image on the back side of the page can be seen through.
No type faces are identified while the OCR function is being applied to scanning. If the widths of the printed and embedded characters differ, the position of the embedded text may not match that of the printed text on the scanned page.
If you specify the OCR Settings and scan multiple sets of originals consecutively, the scan speed may become slower depending on the resolution setting and the sizes of the originals.