Page Content
Creating Encoded PDF Documents
Assuming that the original document is using appropriate Unicode encoding and fonts, the resulting PDF should also be encoded as a Unicode document. See information from different software packages for more information.
Language Tagging
Language tagging can also be accomplished in a tagged PDF within the full version Adobe Acrobat following the instructions below.
Set Language for a Document
- In Adobe Acrobat, go to the File menu, then select Properties.
- In the Properties window, click the Advanced tab.
- Set the Language menu to English or some other appropriate language.
Note: If the language is not present, then you can manually enter an ISO-639 language code.
Set Language for a Portion of the Document
- In Adobe Connect, open the Tags window
- Go to the View menu, then Show/Hide, then Navigation Panes.
- Check the option for Tags. The Tags icon (green price tag) will appear in the far left.
- Click the Tags icon to open the list of tags.
- Click the arrow next to the initial tag to reveal lower layers of tag structure.
- Find the tag corresponding the non-English word or phrase.
- Right click the tag and select Properties.
- Select the appropriate Language from the menu.
Note: If the language is not present, then you can manually enter an ISO-639 language code.
Optimize File Size with Appropriate Font Embedding
Generally speaking a PDF document will embed any font used as part of the document in order to ensure that the text is viewable for anyone. However, the file can become very large when symbols are used unless some precautions are taken. In particular:
East Asian Fonts and Technical Symbols
Many East Asian fonts (for Chinese and Japanese) are large because of the larger character side. However these fonts also contain math and other technical symbols. If you need to include these symbols, make sure you use a technical symbol font rather than an East Asian font. The embedded font will be missing all the East Asian symbols.
Specific Scripts
Similarly, if your document is only using one or two specific scripts, it may be better to use fonts which only include characters from the target scripts.
Multiple Scripts
On the other hand, if your document includes many scripts, it may be be more efficient to embed a single font containing multiple scripts.