Monthly Archives: February 2008

French Teenagers Disagree on Noun Gender

A post from Heidi Harley on Language Log mentioned a study in which French teenage native speakers showed a lot of variation in “assigning” genders of French nouns. That is for any given word, some teens would think it was masculine gender and others would think it was feminine gender – even if a French dictionary only assigned it to one gender.

To be honest, I’m not terribly surprised at this. Because of the ways French word-final sounds have evolved, the phonological distinction between genders is very weak in Modern French. Compare this to Spanish where most words ending with -o are masculine and most words ending with -a are feminine. The major cues for determining gender of a word in French might be processes like definite article agreement (le vs la) or pronoun replacement (il vs. elle)…and they may not be salient enough for speakers to make consistent judgments.

To me this is evidence that grammatical gender assignment is often based on phonology. A common historical change in gender assignment is for a noun to be reassigned to another gender because it’s ending is more typical of a different gender. A notable example is the Latin word laurus ‘laurel tree’ which was grammatically feminine in Classical Latin but has changed to masculine gender in Italian il lauro (Italian is a descendant of Latin). The French data here is consistent with this idea that phonology is a factor in determining consistent gender. If there is no regular “rule” or phonological cure, you would expect lots of variation.

On an interesting side note, the research also found that adult speakers were much more consistent in their gender assignments than the teens were. Something has happened between generations. This is very speculative, but I wonder if attitudes towards standard grammar or standard grammar education are changing.

I’m thinking irregular English past tense. Almost all native speakers acquire a set of irregular past tense, but there’s actually a lot of variation. For instance in the U.S. the “correct” past tense of bring is brought, but variations like brang (similar to ring/rang) and brung (as in You Got To Dance with Them What Brung You by Molly Ivins). FYI – Neil Diamond used brang in the song Play Me (“Song she sang to me/Song she brang to me.”)

I definitely recall several 3rd grade grammar lessons which required us to memorize “correct” irregular past tense forms (whereas we never had to memorize Question Formation). I suspect 3rd grade French children get to memorize genders of nouns. In fact, I just found a French Guess the Gender game for children, so it’s probably a “tricky grammar point.” So..if the method of grammar instruction changes, you could have the natural variation surfacing again in a population.

I honestly don’t know what grammar instruction is like in modern France, but it would be worthwhile for a researcher to check (without thinking the apocalypse is coming of course).

If nothing else, it would seem like a fascinating historical linguistic phenomenon is in progress.

P.S. On the difficulty of assigning genders by phonology, a French grammar site notes that “If you study these 40 word endings, it is possible to determine the gender of 75% of French nouns with almost 95% precision.” Hmmm!

Math and Alternate Representations

Since linguistics invokes mathematical formalism (i.e. phrase trees, feature bundles, rules or tableauz, etc), I am interested in some aspects of how math is taught.

One question that comes up a lot is why is it important for all students to learn algebra or trigonometry if only a small minority will ever use these tools in daily life. The standard answer is that algebra teaches you “mathematical thinking,” but I’m pretty sure most students (especially those who hate math) miss the point.  Actually, I would say that if you want to learn “deductive” skills, you’re better off taking formal logic or rhetoric.

However, there is one aspect of algebra that is important in real, but rarely pointed out and that’s its ability to provide multiple respresentations for “the same thing”. For instance the concept of “1” can be represented as “1”, 4/4 (four-fourths), x0, |i2| and my personal favorite – .999999… And believe me I haven’t even touched the tip of the iceburg. Although these formulations all represent the same quantity, they do not quite the same meaning.

You normally use “1” in real life, but if you’re working on a weird property issue where an piece of lanf is divided into quarters maybe the formulation “4/4” would have meaning. Or maybe you have a formula which you raise x to a certain power – whatever it is. It’s just that when it’s zero, the result is 1.

My point isn’t just that the “same” item can have multiple
representations but that the different representations can be selected
to help you focus in a different aspect. To borrow a concept from
Semantics class, the meaning of something is partly fixed by your
context – but you have to know EXACTLY what your context is.

The use of multiple representations does extend beyond algebra (and I don’t just mean linguistics either). For instance, there are lots of places around the world which have multiple place names, and sometimes you select one based on what era you are studying.

For instance modern historians may study be studying Turkey“, but historians from the 14th-early 20th century may be studying the heartland of the “Ottoman Empire” while those who specialize in the Bronze Age probably study “Anatolia” and Roman historians are probably studying “Asia Minor.” It’s roughly the same place, but the different names not only establish the time context, but can be used fudge minor details like changing political borders.

You don’t want to start calling modern Turkey “Anatolia”, but the use of the term “Anatolia” is useful for referencing the set of Bronze Age cultures in the region (none of which are now related to the modern Turkish culture in terms of language or religion)…so you don’t usually call ancient Anatolia “Ancient Turkey” either (unless you’re writing a tourist brochure). And no matter what – you never want to confuse Turkey with Turkestan (not cool).

This kind of mathematical thinking isn’t about accepting one “right answer,” but systematically determining what the possible answers are and when to deploy them while understanding that some answers are just plain wrong!

Spanish Sausage: Chori[s]o vs. Chori[θ]o

Since I was giving the British TV food chefs a bit of a hard time for a slight mispronunciation of Spanish, I thought I should point out the Jamie Oliver did use a Spanish ingredient – chorizo sausage – and pronounced it correctly as [čoriθo] with a “th” or /θ/ for the Spanish z.

But wait (I hear from the U.S. students of Spanish I) – shouldn’t chorizo be [čoriso] with z pronounced as [s]? Yes if it comes from a Latin American country like Mexico. In Latin America the letters c and z are pronounced as [s], while back in Spain, they are pronounced as [θ]. Both appear to come from original Old Spanish [ts] c,ç or [dz] z.

Normally, this would be just linguistic trivia, but here the pronunciation difference reflects an actual culinary difference. According to Norman Van Aken ( the chorizo of Spain is somewhat like pepperoni with a little paprika kick (I can attest to that) while Mexican chorizo is softer like an Italian sausage and goes well with scrambled eggs. Some equate Mexican cite lang=”es”>chorizo with Spain’s chorizo fresco (or “fresh chorizo”).

So…If you are from the U.S. (especially the West coast), chorizo is probably the Mexican variety and should be [čoriso]. But if you’ve got the harder Spanish variety instead (which may be more likely in Britain), then it really would be [čoriθo].

Truth be told, I doubt any Spanish native speaker makes a distinction because all dialects tend to pronounce c,z in just one way. But since English can make the /s/ vs. /t/ distinction, I have decided to have two lexical entries for the chorizo sausage family – [čoriso] for the fresh version and [čoriθo] for the cured version. It’s a good thing that I noted that grammar isn’t always logical.

FYI – Jamie Oliver was using the cured version…[čoriθo]

Grammar is not always logical

One class of linguistic questions I see a lot is “Why does John Doe say X when it doesn’t make any logical sense?” This is usually in reference to a new idiom, dialect or language the questioner is encountering. The answer is that while grammar is usually consistent, it doesn’t always follow real world logic. Even worse, there are lots of idiosyncratic quirks that happen “just because”.

A recent one was – if people from Korea speak Korean, why don’t people from Japan speak *Japanian (instead of Japanese). And why do the Basque speak just Basque and not *Basqu(i)an, *Basquese or even *Basquish. It’s a mystery. You can solve some of it by saying that the –ish/-sh/-ch ending is older and tends to be used with languages closer to Britain (e.g. English, Spanish, Welsh, Irish, French (aka Frankish)). But do you really think the average adult has even connected language names with history of the Anglophone world?  No – it’s just memorized.

Ironically, non-standard dialects can actually be more logical, yet will still be dismissed as “poor grammar”. After all, if we say her pen but that pen is hers and our building and that that building is ours…shouldn’t we also say that pen is mines (like they do in Baltimore). Logically…yes. But I think we all know what happens to the poor student who uses mines in an essay – and it’s not an A+ for logic.

This is an unsettling concept because so many writers and speakers do use language to construct effective, logical arguments. Shouldn’t the bones of language (vocabulary and grammar or syntax and morphology) also be equally logical. The surprising answer is that while grammar can have a system, it’s not one that is “logical”. To me this is a powerful reason to think grammar is not directly connected with “general cognition”.

I mean, who in their right mind would invent a language with as many irregular past tense verbs as English has? Ugh!