A while ago, I was writing about the difficulty of defining some language tags like Cantonese because even though it’s called a dialect, it’s really a separate language.

The SIL group is using a new term I think should become more common – the macrolanguage. A macrolanguage is basically a set of related languages that share a common “identity” even though speakers can’t normally understand each other.

Macrolanguages happen when language spreads to different regions and changes, but the cultural or political unity remains. Other macrolanguages include Arabic, Cree, Hmong, Quechua (as spoken in the Incan Empire), and Norweigian. I suspect that you could thrown in some other candidates like German and Italian – (we’d have more if the Roman Empire had made it to the 21st century.)

In any case, The ISO-639-3 language tag standard has a set of macrolanguage mappings which show how different related languages can map to each other so that either Mandarin Chinese (cmn) or Cantonese (yue) can also be called Chinese (zh or zho)

I really hope this term takes hold…because I really think it will simplify other discussions about language tags. After all, it was just this year that a language technology guru claimed that English had no “true dialects.” I think he meant to say that English hasn’t reached macrolanguage status yet.

