I had a hard time deciding which sessions to attend at the last Unicode conference, but I did end up at “Unicode at the Front Lines”, which was a series of mini-presentations from scholars working with lesser-known languages and scripts. This is a place where the Unicode rubber really hits the road, and I learned some interesting “life-lessons”.
1. The problem with “reforming” a script is that new readers may not be able to read the older texts. This was in context of the Tai Viet script (apparently the reform was so unpopular, they ditched it), but occurs in Chinese (Traditional vs. Simplified), Korean (new texts use only Hangul, but older ones included Chinese) and even in cases where spelling reform is enacted (as in the Netherlands and Germany).
BTW – I’m not against spelling/script reform, but we do have to admit that there will be some “loss” (enough to keep a few scholars in archaic languages in business).
2. Try not to invent a new letter for new languages. In the earlier part of the 20th century, linguists were fond of inventing quirky new symbols for languages they were documenting. A classic case is Igbo which has a lots of vowels with dots beneath them as in Ị,ị,Ọ,ọ,Ụ,ụ. There is no objection to the dots per se, but they are an unusual in the context to what Western alphabets do. Because these characters are outside the norm, Igbo internationalization has to play continual catch-up because even programs which can handle Western European languages, may not know what to do with the dots.
If your lesser-known language already includes letters that are common to the major languages, implementation of utilities in your language is much easier. Of course, I think Unicode is better for including dotted letters.
For now though…if you have a choice between “v” or “vh” in your language, the latter is (unfortunately) a little more Unicode ready.
3. H ≠ Η ≠ Н – For the record the first is English H /h/, the second is Greek capital Eta /ē/ and the last is Cyrillic En /n/. I knew that many capital letters are triple encoded (e.g. A/alpha/Cyrillic Ah), but this is the first time I realized that the phonetic values can be so different. Normally this isn’t an issue unless you have linguists from all over Europe trying to use their native script for phonetic spellings. When do you have the right H?
4. ŵ ≠ ŵ it matters when you type the accent). Unicode supports “pre-composed accents” (that is an accent which can float over any letter) and in theory the combination of ̂́+ w (to make ŵ) should be the same as w + ̂́ (to make ŵ) …but it’s not. A linguistic archive database has these precomposed letters but can’t “merge” the two string combinations as one letter.
Again, this wouldn’t be too critical except that sometimes a linguist puts the accent before the w, and sometimes they put the w before the accent. Again these are the same world-wide linguists who gave us the problem of the three H’s.
A member in the audience did suggest that it was a “training issue”, but who are we kidding…these are FACULTY. Faculty are great scholars, but few are well-trained data entry operators.
Share →
Categories
- (X)HTML Markup (10)
- Aboriginal Syllabics (2)
- Accents & Punctuation (58)
- African Scripts (1)
- Ancient Scripts (8)
- Arabic Script (14)
- Armenian (1)
- Braille (1)
- By Script (7)
- Central European (3)
- Cherokee Script (1)
- CJK (23)
- ConScript (1)
- Coptic (3)
- Cyrillic & Eastern Europe (3)
- Domain Names (4)
- Encoding Theory (18)
- Glyph Du Jour (8)
- Greek (4)
- Hebrew (4)
- Humor (13)
- Language Codes (8)
- Linux (1)
- Macintosh (22)
- Math (1)
- News (14)
- Phonetics (4)
- PHP (2)
- Programming (3)
- Runes (1)
- Secret Unicode Link (4)
- Software and Unicode (12)
- South Asian (8)
- Tool Tests (10)
- Uncategorized (5)
- Windows (15)
Archives
- January 2023
- November 2017
- October 2017
- July 2017
- December 2016
- June 2016
- May 2016
- January 2016
- December 2015
- July 2015
- May 2015
- March 2015
- November 2014
- October 2014
- June 2014
- December 2013
- October 2013
- August 2013
- June 2013
- May 2013
- October 2012
- August 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- October 2010
- September 2010
- August 2010
- July 2010
- May 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007