5: 8-Bit Non-Roman Encoding

Encoding on the Internet

The Problem

Although 256 characters can support most Western European languages, it is not enough to handle non-Roman characters or even languages written in the Roman alhpabet which fall outside of the Latin 1 character set. Therefore, other 8-bit encodings were developed for languages outside Western Europe.

Template

To accommodate both English and the other script, many 8-bit encodings are structured as follows:

Characters #0-127 – ASCII
Characters #128-255 – Other script.

**Structure of Non-English Encodings**
Script	Encoding	#0-127	#128-255
Arabic	ISO-8859-6* (rarely used)	ASCII	Arabic
Greek	ISO-8859-7*	ASCII	Greek
Hebrew	ISO-8859-8*	ASCII	Hebrew

*External links to Wikipedia

On the Internet, if you switch the encoding View of your browser in most cases, you will still see English because the encoding supports it.

Behavior of Encoded Fonts

Because non-Roman encodings include ASCII, if you switch to a properly encoded font in word-processor font and begin to type, you will see English characters. It is not until you switch your keyboard, that the non-Roman letters appear.

Parallel Standards

"Windows" Encodings vs. ISO-8859-x

For many scripts, there is a competing Windows encoding standard and a non-Windows standard, typically one registered at the ISO as an ISO-8859-x set. For instance Hebrew Web pages can be encoded as either ISO-8859-8 ("Visual Hebrew") or as Windows-1255.

Variant Encodings by Script
Script ISO/Other Windows Encoding

Arabic ISO-8859-6 Windows-1256

Greek ISO-8859-7 ("ELOT") Windows-1253

Hebrew ISO-8859-8 ("Visual Hebrew") Windows-1255

Russian/Cyrillic KOI-8 Windows-1251

Thai TIS-620 Windows-874

**Variant Encodings by Script**
Script	ISO/Other	Windows Encoding
Arabic	ISO-8859-6	Windows-1256
Greek	ISO-8859-7 ("ELOT")	Windows-1253
Hebrew	ISO-8859-8 ("Visual Hebrew")	Windows-1255
Russian/Cyrillic	KOI-8	Windows-1251
Thai	TIS-620	Windows-874

Roman Script but Not Latin 1

In addition to the cases above, there are also languages which are written in the Latin alphabet, but include characters NOT in the ISO-8859-1 (Latin 1) encoding. These included "Central European" languages like Hungarian (with ő). Polish (ą,ł) and Czech (š,ů) as well as other languages like Turkish (ş,ǧ,ı) and Welsh (ŵ,ŷ)and ironically Latin (with ā,ē)

Like Arabic and Greek, they were placed in different 8-bit encoding systems which included the ASCII characters, but also the accented letters needed for a language.

Central European and Latin 2

The characters from the neighboring countries of Hungary, Poland, the former Czechoslovakia, the former Yugoslavia and Germany placed together in a variety of "Central European" encodings including ISO-8859-2 (aka Latin 2).

**Variant Encodings for Central European**
Script	ISO/Other	Windows Encoding
Central Europe	ISO‑8859‑2 ("Latin 2")	Windows-1250

Covered Latin 2 languages include Bosnian, Czech, Croatian, Hungarian, Polish, Serbian, Slovak, Slovenian and Sorbian.

In that era, computers would need access to parallel fonts which included these characters. Hence Mac users in the 90s would have both Times New Roman, Times New Roman CE and even Times New Roman CY with Cyrillic characters. Today, most versions of Times New Roman is based on Unicode and includes all the characters needed.

Other Latin Encodings

Beyond Central Europe, encodings were developed for the Baltic languages (Lithuanian, Lativian, Estoniani) and Turkish. Some theoretical encodings were developed but never fully implemented once Unicode became viable.

**Variant Latin Encodings**
Script	ISO/Other	Windows Encoding
Baltic	ISO‑8859‑4 ("Latin 4")	Windows-1257
Turkish	ISO-8859-9	Windows-1254
Celtic Never Implememted	ISO-8859-14	N/A

Links about Encoding

Top of Page | Encoding Tutorial Index

Previous Page | Next Page

5: 8-Bit Non-Roman Encoding

The Problem

Template

Behavior of Encoded Fonts

Parallel Standards

"Windows" Encodings vs. ISO-8859-x

Variant Encodings by Script
Script ISO/Other Windows Encoding

Arabic ISO-8859-6 Windows-1256

Greek ISO-8859-7 ("ELOT") Windows-1253

Hebrew ISO-8859-8 ("Visual Hebrew") Windows-1255

Russian/Cyrillic KOI-8 Windows-1251

Thai TIS-620 Windows-874

Roman Script but Not Latin 1

Central European and Latin 2

Other Latin Encodings

Links about Encoding

Encoding Tutorial

Web Layout

Windows Setup

Mac Setup

5: 8-Bit Non-Roman Encoding

The Problem

Template

Behavior of Encoded Fonts

Parallel Standards

"Windows" Encodings vs. ISO-8859-x

Variant Encodings by Script Script ISO/Other Windows Encoding Arabic ISO-8859-6 Windows-1256 Greek ISO-8859-7 ("ELOT") Windows-1253 Hebrew ISO-8859-8 ("Visual Hebrew") Windows-1255 Russian/Cyrillic KOI-8 Windows-1251 Thai TIS-620 Windows-874

Roman Script but Not Latin 1

Central European and Latin 2

Other Latin Encodings

Links about Encoding

Encoding Tutorial

Web Layout

Windows Setup

Mac Setup

Variant Encodings by Script
Script ISO/Other Windows Encoding

Arabic ISO-8859-6 Windows-1256

Greek ISO-8859-7 ("ELOT") Windows-1253

Hebrew ISO-8859-8 ("Visual Hebrew") Windows-1255

Russian/Cyrillic KOI-8 Windows-1251

Thai TIS-620 Windows-874