Encoding on the Internet

3: Expanded 8-Bit Encoding

8-Bit Encoding

To increase the number of characters encoded, vendors doubled the range of ASCII from 128 (27) characters to 256 (28) characters. This became known as "8-bit
encoding
". The usual structure is:

  • Characters #0-127 is ASCII
  • Characters #128-255 contains accented letters, non-English characters and extra punctuation

Crucially, each combination of an a letter plus a different accent forms
a separate character or code point. For instance, á, â,
à, Á, Â
and À are assigned six different
numbers
in 8-bit encoding.

Vendor Differences

Windows 1252 vs. Mac-Roman

Unfortunately, not all vendors used the same 8-bit encoding. The biggest difference was that older Windows computers use Windows-1252 while  pre-OS X Macintosh uses MacRoman encoding. As a result not all characters are assigned to the same points, plus not all the same characters can befound in both encodings.

For instance, in the chart below character #128 is (euro) in Windows 1252, but Ä (A-umlaut) in Mac Roman. Similarly the
¥ (yen) character is #165 in Windows-1252, but #180 in MacRoman.

NOTE: Today both Windows and OS X use Unicode, but differences persist due to issues of compatibility with older documents and software. The older the software, the more likely compatibility problems will occur.

Windows-1252 vs. Mac Roman Chart (Partial)

Char 200-222 in Win vs Mac
Char Num Win-1252 MacRoman
200 È »
201 É
202 Ê nbsp
203 Ë À
204 Ì Ã
205 Í Õ
206 Î Œ
207 Ï œ
208 Ð
209 Ñ
210 Ò "
211 Ó "
212 Ô
213 Õ
214 Ö ÷
215 ×
216 Ø ÿ
217 Ù Ÿ
218 Ú
219 Û
220 Ü

Full reference charts are available from the sources below

Links and References

NOTE: Some charts may list the decimal number (base-10) as well the hexadecimal (base-16) number and octal (base-8) number. In most cases, you would refer to the decimal number.

Top of Page | Encoding Tutorial Index

Skip to toolbar