Scanning an Original as a PDF File with Embedded Text Data

To enable searching and copying of text in a PDF viewing application, you can embed text data in a PDF created from the scanned data (OCR function).

You can also use this function for a PDF file in the high-compression PDF or PDF/A format.

Important

The optional OCR unit is required to use this function.
Functions Requiring Optional Configurations
You cannot use the OCR function in the following cases:
- [TIFF / JPEG] or [TIFF] is selected as the file type.
- [100 dpi] is selected as the resolution.
- When the WSD or DSM destination list is used.

On machines implemented with RICOH Always Current Technology v1.1 or before

Press [Scanner] on the Home screen.

Place the original on the scanner.

Placing an Original to Scan

Press [Send Settings] on the scanner screen.

Press [File Type][Others].

Press [PDF (Single Page)] when creating a PDF with only one page, and press [PDF (Multi-page)] when creating a PDF with multiple pages.

Select the [OCR] check box and specify how to perform OCR in "PDF Detailed Settings".

OCR Cognitive Language: Select a language that is the same as the language used in the original to scan.
Delete Blank Page: Blank pages are removed from the scanned data when creating a PDF file.
Add Extracted Text to File Name: A text string that is determined to be most appropriate as the file name is extracted and appended to the file name automatically. The text string is extracted from the first page of the scanned data. If no text is contained on the first page, no string is appended to the file name.
Correct Vertical Direction Using Scanned Text Direction: The vertical orientation of the original is determined based on the orientation of characters that are successfully recognized by the OCR process.

Specify the image quality in Original Type.

To improve the recognition accuracy, select [Black & White: Text].

To send the scanned document to an e-mail address, press [Sender] and then specify the sender.

Specify the destination, and press [Start].

Note

The vertical orientation of a page that is nearly blank may not be determined correctly.
When searching for a string in a text-embedded PDF file, you can find the string you are searching for more easily by specifying the search setting to ignore halfwidth and fullwidth forms.
The time to start scanning the next page may take longer depending on the original size or resolution.
The OCR function can process texts up to 40,000 characters per page.
The OCR function can recognize the following languages:
- English, German, French, Italian, Spanish, Dutch, Portuguese, Polish, Swedish, Finnish, Hungarian, Norwegian, Danish, Japanese.
The effective resolution may be less than 200 dpi when an image scanned at 200 dpi or greater resolution is reduced by specifying the reproduction ratio. You can apply the OCR function in such cases, but the text recognition accuracy may deteriorate.
Depending on character shapes or types, characters may not be recognized correctly.
A PDF file without embedded text is generated if the scanned page does not contain a section that can be recognized as characters.
No PDF file is generated if all pages in a document are determined as blank pages. If this happens, make sure to set the originals correctly, and try again.
A blank page or the top and bottom of a page may not be recognized correctly if the scanned page has smears or dirty spots or an image on the back side of the page can be seen through.
No type faces are identified while the OCR function is being applied to scanning. If the widths of the printed and embedded characters differ, the position of the embedded text may not match that of the printed text on the scanned page.

On machines implemented with RICOH Always Current Technology v1.2 or later

Press [Scanner] on the Home screen.

Place the original on the scanner.

Placing an Original to Scan

Press [Send Settings] on the scanner screen.

Press [File Type][Others].

Press [PDF], [High Comp. PDF], or [PDF/A].

Press [OCR Settings], and specify how to perform OCR.

OCR Cognitive Language: Select a language that is the same as the language used in the original to scan.
Delete Blank Page: Blank pages are removed from the scanned data when creating a PDF file.
Add Extracted Text to File Name: A text string that is determined to be most appropriate as the file name is extracted and appended to the file name automatically. The text string is extracted from the first page of the scanned data. If no text is contained on the first page, no string is appended to the file name.
Correct Vertical Direction Using Scanned Text Direction: The vertical orientation of the original is determined based on the orientation of characters that are successfully recognized by the OCR process.

Click [OK].

Specify the image quality in [Original Type].

To improve the recognition accuracy, select [Black & White: Text].

To send the scanned document to an e-mail address, press [Sender] and then specify the sender.

Specify the destination, and press [Start]

Note

The vertical orientation of a page that is nearly blank may not be determined correctly.
When searching for a string in a text-embedded PDF file, you can find the string you are searching for more easily by specifying the search setting to ignore halfwidth and fullwidth forms.
The time to start scanning the next page may take longer depending on the original size or resolution.