Superscripts in HTML
Both HTML and XHTML include the SUP tag for superscripts and the SUB tag for subscripts. Yet the Unicode specification also includes specific slots for individual superscript/subscript characters.
For example the phrase “two to the fourth power” could be encoded as
- 2<sup>4</sup> (SUP tag) = 24
- 2⁴ (numeric entity code) = 2⁴
- 2⁴ (raw Unicode data) = 2⁴
What’s the difference and which should you use? If you’re displaying static Web pages, there’s probably very minimal difference. Although the entity code &8303; takes up less file space than the SUP tag does, the SUP tag works across most browsers/fonts and can be styled.
The raw data method is the most correct, but also the most prone to cross-platform difficulties. For one thing, you MUST have the UTF-8 encoding header meta tag included or the display will be broken. Another issue is that some browsers (e.g. Mac/Firefox) include extra space around superscript entities or shrink the characters to unreadable sizes. If you’re working with XML though, then you may need to enter superscript/subscripts as raw data.
XML and Flash
On one project we had to feed data for College Algebra exercises into a Flash quiz application. The XML spec didn’t recognize numeric entity codes or the SUP/SUB tag, so we had to enter the superscripts as Unicode characters.
The good news is that if you can create a UTF-8 text file and insert the symbols, it will import into Flash (at least in Flash 8.) For math, your best bet is usally to use the Windows Character Map utility and insert the symbols into a Notepad text file or use the Macintosh Character Palette with a Text Edit text file. The Penn State Unicode and XML page explains how to create UTF-8 encoded XML files.
Reason for Unicode Character Points
Ultimately, the reason why Unicode has positions for these characters isn’t to help Flash developers, but because the superscripts/subscripts do add content to a text string.
If you’re exchanging raw data files, you may need to know whether a character is superscript or subscript, so it has to be encoded within Unicode. Hence, we have superscript/subscript characters