I’m currently preparing a seminar on Unicode and I was struck by how far Unicode implementation, especially in terms of raw Unicode text, has come in the past 4 years. Some of the warnings I used to present in 2000 or even in 2004 seem almost quaint now.

For instance when Mac OS X first came out, the older applications were not set up to take advantage of the Mac Unicode utilities, such as the U.S. Extended keyboards. I used to have to specify which applications could work with Unicode and which couldn’t do it. But yesterday I realized that I couldn’t find any old applications on my machine that didn’t work correctly. What a difference that makes.

The same is true on the Windows side. If you get the latest version of most applications, the chances are that Unicode support is there – even for raw text editors.

Similarly, I recall when many HTML editors converted any non-English character to an numeric HTML entity, but now most applications are set to work with real UTF-8 text embeded in HTML tags. This is much easier to edit and crucial for being able to transfer data between the Web and other XML resources.

Russian, Chinese and Greek data are being treated as just “text” and not as a special case that programmers need to agonize over. There are still plenty of issues to be worked out, but it’s good to appreciate progress when it’s made.

Share →

Leave a Reply

Skip to toolbar