The UTF-8 encoding is not a straight encoding of Unicode code points, but rather a “compromise character encoding” which allows files with just ASCII characters to stay the same size as ASCII, but also include any Unicode code point, regardless of byte size.

If this is sounding a bit confusing, you may want to try this Game Dev article on UTF 8. It’s still under review, but it does step through some parts of the the conversion from a Unicode code point to a UTF-8 representation.

Share →

Leave a Reply

Skip to toolbar