Encoding on the Internet
3: Expanded 8-Bit Encoding
8-Bit Encoding
To increase the number of characters encoded, vendors doubled the range of ASCII from 128 (27) characters to 256 (28) characters. This became known as "8-bit
encoding". The usual structure is:
- Characters #0-127 is ASCII
- Characters #128-255 contains accented letters, non-English characters and extra punctuation
Crucially, each combination of an a letter plus a different accent forms
a separate character or code point. For instance, á, â,
à, Á, Â and À are assigned six different
numbers in 8-bit encoding.
Vendor Differences
Windows 1252 vs. Mac-Roman
Unfortunately, not all vendors used the same 8-bit encoding. The biggest difference was that older Windows computers use Windows-1252 while pre-OS X Macintosh uses MacRoman encoding. As a result not all characters are assigned to the same points, plus not all the same characters can befound in both encodings.
For instance, in the chart below character #128 is € (euro) in Windows 1252, but Ä (A-umlaut) in Mac Roman. Similarly the
¥ (yen) character is #165 in Windows-1252, but #180 in MacRoman.
NOTE: Today both Windows and OS X use Unicode, but differences persist due to issues of compatibility with older documents and software. The older the software, the more likely compatibility problems will occur.
Windows-1252 vs. Mac Roman Chart (Partial)
| Char Num | Win-1252 | MacRoman |
|---|---|---|
| 200 | È | » |
| 201 | É | … |
| 202 | Ê | nbsp |
| 203 | Ë | À |
| 204 | Ì | Ã |
| 205 | Í | Õ |
| 206 | Î | Œ |
| 207 | Ï | œ |
| 208 | Ð | – |
| 209 | Ñ | — |
| 210 | Ò | " |
| 211 | Ó | " |
| 212 | Ô | ‘ |
| 213 | Õ | ‘ |
| 214 | Ö | ÷ |
| 215 | × | ◊ |
| 216 | Ø | ÿ |
| 217 | Ù | Ÿ |
| 218 | Ú | ⁄ |
| 219 | Û | € |
| 220 | Ü | ‹ |
Full reference charts are available from the sources below
Links and References
NOTE: Some charts may list the decimal number (base-10) as well the hexadecimal (base-16) number and octal (base-8) number. In most cases, you would refer to the decimal number.
