Pragmatics and Statistics

One of my Listservs led me to this interesting article by John Allen Paulos about the distinction between the “literary and scientific cultures”. As part of the discussion, Paulos discusses some cases where knowing a narrative background affects how probability is assessed.

Consider the following two statements.

  1. Sarah is a bank teller.
  2. Sarah is a bank teller and has a philosophy degree.

The answer is that the first option is more probable because only one condition needs to be met. In order for the second to be true, two conditions are required – being a bank teller and having a degree in philosophy degree.

Now consider this version from Paulos in which the teller is given a brief bio:

Linda is single, in her early 30s, outspoken, and exceedingly smart. A philosophy major in college, she has devoted herself to issues such as nuclear non-proliferation. So which of the following is more likely?:

  1. Linda is a bank teller.
  2. Linda is a bank teller and is active in the feminist movement.

The finding is that more people will be that the second option is the most likely – i.e. that Linda is a bank teller and in the feminist movement, even though it requires the fulfillment of two conditions.

There are several philosophical tacks one can take to the problem, but I think one factor is that the story along with the presentation of the information affects the construction of the model used to evaluate the statements.

Someone reading first scenario without the narrative probably constructs the intended model where the probability of being a bank teller versus a bank teller with a philosophy degree is evaluated across all adult women. It’s easy to see that fulfilling condition A is more probably than fulfilling condition A and B.

The second scenario with Linda though probably causes most people to build a model not across all adult women but across all adult women who have a philosophy degree and who were activists in their youth. It’s NOT the same pool of candidates, and there is a legitimate reason to think probability judgments COULD be different. Interestingly, if you presented the two Linda options as

  1. Linda is a bank teller.
  2. Linda is active in the feminist movement.

then the conclusion would likely be that Linda being active in the feminist movement is more likely than her being a bank teller. In other words, readers could be using the narrative to build a stereotyped persona where someone who was politically active in college remains active. In the same vein, most people likely assume that someone with a philosophy degree becomes a teller only as a last resort and that most tellers have a degree in accounting or other related field. This is one possible source of the fallacy.

I would also argue that the presentation of the options causes the pragmatic engine to introduce another logical trap. Because both options allow that Linda is a bank teller, this could mean that readers assume the Linda ends up as a bank teller (even though that’s not what the option says). Thus, readers could be interpreting the Linda options as:

  1. Linda is a bank teller who is not active in the feminist movement.
  2. Linda is a bank teller who is active in the feminist movement.

There is a further pragmatic interpretation that option a) “Linda is a bank teller” means that she is not politically active at all. That’s not literally the case (for instance, option a) does allow that Linda could still be active in the anti nuclear proliferation movement, but not the feminist movement). In pragmatic land though, omitting information is interpreted as meaning it doesn’t exist. That’s why people often consider not saying something to be “lying.”

So to summarize, I think the skewed probability judgments aren’t just a result of people being sucked into a mini soap opera, but to two factors the narrative introduces – 1) narrowing the set of women to those with philosophy degrees, which leads to different stereotypes and 2) the options leading to misconstrued pragmatics which differ from what the literal meaning is.

The ability to reasonably construct a pragmatic meaning behind a literal statement is critical for social relations and reducing conversational length. But it can lead to some glitches like the narrative above.

Defining “Authentic Latin”

One of the tenets of folk linguistics that is actually true is that language is imprecise. A discussion that reminded me of that is a discussion of what “authentic Latin” means. Depending on who you talked to “authentic Latin” could mean:

  1. Text by a Classical Latin author of the Roman era (e.g. Cicero, Caesar)
  2. Any text following the rules of Latin grammar
  3. Grafitti found in Pompeii or text from letters found in the Roman fort of Vindolanda.
  4. Latin used authentically such as to ask to go to the bathroom during class.

In case you’re wondering the topic being debated was whether it made sense to speak Latin in a Latin classroom as you would try to use Spanish in a Spanish classroom. As usual, my answer is “Yes and No” because it does all depend on what you mean by “Latin.”

Cicero vs. Graffiti

You might think that everyone agrees that Cicero is “authentic”, but in fact there is a debate. Many people know that Latin evolved into the Romance languages (Spanish, French, Italian, Portuguese, Romanian). However, if you reconstruct “Proto-Romance” based on what we now know about these languages you do NOT get Classical Latin, but something different. We know about Latin only because it was continued to be used and taught in post-Roman Europe.

It is clear that the predecessor of the Romance languages isn’t necessarily written Latin, but rather a form of “street Latin” (probably multiple dialects of street Latin). Therefore historical linguists are extremely excited when they see informal scribblings like graffiti, letters or other texts NOT meant for literary posterity. Sometimes, the MORE they diverge from Cicero, the more excited we get. We are seeing change in progress! And if we can date the manuscript, we can also start dating the change!

So for some historical linguists, “authentic Latin” is really street Latin or Vulgar Latin, the kind spoken casually and spontaneously by the Roman populace. It’s not always pretty, but it is real authentic evidence of what the ancestor of Spanish/French/Italian was like.

But what about Cicero? Isn’t his material authentic? Well…it’s authentic educated written Latin, but there is a question of how close to spoken Latin it was. In English, the distance between educated written English and spoken English by an educated speaker is not huge, but in some societies such as Egypt, Greece, Sri Lanka, the difference can be so significant that the written form is considered a separate language.

Today in the Middle East, educated Arabic speakers literally learn a separate language called Modern Standard Arabic (similar to Quranic Arabic) so they can communicate across national borders. At home though, speakers use their local variety of Colloquial Arabic – but these varieties are so distinct, that English speakers have to learn each one individually (much like we have to learn Italian, Portuguese and Spanish as separate languages).

Linguists aren’t sure about the situation in the Roman Empire, but dialogue from some fairly early plays by Plautus (254-184 BC) shows that structures found in the Romance languages were already in place even before Cicero wrote. Granted, it was dialogue from lower class characters, but we can deduce that spoken Latin was well on its way to Proto Romance before Cicero (106-43 BC) was even born. If that’s the case, what does that mean for Classical Latin? Since Classical Latin materials includes letters and debates, it’s clear that it was a very familiar language and that authors were fluent in it. But how did they address servants or merchants? Was it something they had to be schooled in? And what was spoken at midnight when one was tipsy? It’s not clear.

FYI – If you are studying the archaic history of Latin, then the focus IS on archaic forms which may be found in Classical Latin or archaic Latin texts. Classical Latin is very much a beloved friend for many linguists.

Latin vs. Neo-Latin

Another distinction is Latin vs Neo-Latin. By the time of the Oaths of Strasbourg (842 AD) when the first spoken Old French (or Gallo-Romance) is written, it is clear that local dialects of late Latin was the native language of even upper class speakers and that Classical Latin was used only for written documents or spoken among educated speakers.

But is also the case that documents continued to be written in Latin following grammars established by earlier generations (with some neologisms and local quirks). However, Latin is pretty much “dead” in that no one learns to speak it as a child but rather learns it formally in school. This is particularly true in regions where the native language was NOT a Romance language.

Today, there are many writers of Neo-Latin (post 1550 AD) who produce texts such as Vicipædia Latina, Winne Ille Pu (sic) and Asterix Legionarius

This is authentic Latin also, of a sort but often different from what the Ciceronian Romans would have written. If you take a look at a page of Latin quotes, you’ll often be able to sort out the Neo-Latin from the “authentic” Classical Latin quotes very easily, the Neo-Latin quotes are much longer (e.g. Abutebaris modo subjunctivo denuo “You’ve been misuing the subjunctive again”) than the original (Veni, vidi, vici) and if they are written by English speakers, rarely employ the ablative absolute as a Roman would. It really isn’t quite the same.

One issue is that modern speakers are still filling in gaps not in the original Latin. Not only did the Romans not have iPhones and Twitter, they didn’t have Halloween, Saint Patrick’s Day or the number zero either. And when you get down to it, we may know about the latrine, but are we sure we know how to ask where it is? And did they have a Ladies Room or was it unisex? If we know the answers, they probably come from something like the Vindolanda texts, not from traditional Classical Latin sources.

The other issue is that our secular 20-21st century usage of Latin has a more humorous quality than in earlier generations. It’s rare for anyone, except maybe the Vatican, to write original material in Latin (and the Vatican is pretty much the only organization conducting business in Latin, but in Church Latin). Rather we are translating pre-existing material, and often trivial material such as children’s stories and dialogue from Star Wars. No longer are scientific treatises and formal decrees being written Latin. Latin has become a beloved aunt who has retired after a long term of service (but could come back to work if needed).

And that’s what makes teaching Latin different from teaching Spanish. Classical Latin as we have it was meant to be a formal language for formal situations. While I’m sure the Romans were able to order a drink, ask for directions and translate “DVD”, we can’t always guess what it might have been because it’s not always recorded. We’re often just guessing what might have happened.

So…should an Latin instructor use Latin for “authentic purposes?” Sure, why not? Mine did, and I do remember some grammar and vocabulary points because of it, and it’s fun! But is worth understanding that it’s often a guess and may not work if you accidentally time travel back to the Forum.

Still All the Same “Authentic”

Stepping away from Latin, you may be awed and amazed at how many uses of “authentic” there are, but at some level, it’s all the same use. No matter who you are – a classicist, a linguist, or a Latin instructor – authentic means “worthy of trust, reliance or belief”. For a scholar focusing on Roman political history and rhetoric, there is no better source than Cicero. For a linguist wanting to know about Vulgar Latin, graffiti is the way, and for the instructor, using language for a real purpose is crucial.

Back in my graduate level semantics class, we talked about a concept called “context”. If you wanted to know who “I” and “you” were any given utterance, you had to know who was speaking and who that person was speaking to. This is similar to multiple uses of “authentic” because the meaning of what is most reliable varies on what you are interested in.

Thus, it is true that meaning is always somewhat relative and contextual. The only time you can establish an “absolute” meaning is to establish a context. And then you sound like a lawyer or a pedantic scholar – but that’s what it takes.


Here is a fascinating article on why Winnie Illi Pu is not quite authentic Classical Latin as the Romans would have written it. I do not agree with the final conclusion though.

¿Se habla «cristiano» en España?

As Tudor history buffs will know, Henry VIII’s first queen, Catherine of Aragon hailed from Spain and therefore spoke fluent Spanish. The Showtime series The Tudors have taken advantage of this to allow some of the characters to communicate in Spanish (with subtitles of course).

In season 4 of course, Queen Catherine is no longer with us, but her daughter Mary (the future Queen Mary I or “Bloody Mary”) is multilingual in at least Spanish and English. In this week’s episode, she welcomes a Spanish courtier in Spanish. He seems caught off guard, but Mary says she speaks “Spanish” because, after all, isn’t she the daughter of Catherine of Aragon? (¡Sí claro!).

Well, the subtitle says “Spanish”, but what Mary actually says in the Spanish dialogue is that she speaks “cristiano” (lit “Christian”). It was the case that in that period of history, Catherine and Mary’s dialect of Spanish (probably Castillian) may have been associated with Christianity in contrast to the Moors who did originally speak Arabic (although many later switched to Mozarabic, a sister language of Spanish spoken in Islamic Iberia). In the same vein Catherine of Aragon’s parents, Ferdinand and Isabella were noted for being able to oust the last of the Moorish (Islamic) rulers, hence the religious distinction was an important part of Mary’s family history.

Oddly though I haven’t been able to find any references to Spanish being called “Cristiano”. What there has been is a long running distinction between the term español “Spanish” vs. castellano, literally “Castillian” to designate the language of Castille in central Spain which then became the basis of standard Spanish. In Iberia though, Spanish has co-existed with other related languages including Catalan, Asuturian, Andalucian and others. Hence the persistence of “castellano” in Spain even though people outside of Spain (particularly in Mexico, the Caribean and Central America) prefer “español”.

This is interesting, but I am still wondering if the dialogue writers really meant “cristiano” or “castellano”. It looks like a little more investigation is in order….

A Person from San Diego is a…?

While watching the news action comedy Anchorman: The Legend of Ron Burgundy set in San Diego, I noticed an interesting English grammar gap. At one point the news team needed to find the adjectival form to describe an inhabitant of San Diego, but which term to use?

One suggested “San Diegoan”, another “San Diegan” among other variations, but ultimately they were unable to reach a consensus. The answer is probably “San Diegan” (the name of a local paper). On Google, there were about 74,600 hits for this term versus 999 for “San Diegoan.” I also did a search for “San Diegite” and actually scored 355 hits including a writer on a message board who comments. I scored nothing for “San Dieger” or “San Diegoer”

“Hey, dude, I’m a Sand Diegan. San Diegonian. San Diegite. Person of San Diegoness. Love to see you when you’re in town.”

The lesson here is that there is a lot of confusion, even among the locals. In truth English has several of these adjectival endings including -((i)a)n (San Franciscan, Baltimorean, Australian), -ite (Denverite) and -er (New Yorker). We also have older -ese (Chinese, Viennese, Vietnamese) and the newer -i (Pakistani, Afghani) There are also the “irregular” forms such as Los Angelino (Los Angeles, from Spanish), American (for United States of America) and Monegasque (Monaco).

But…there are some names without any adjective at all, such as Massachusetts, Las Vegas, Westminster and others. I admit that there are probably metrical properties (syllable count, stress patterns) that interfere with placing an adjective, but it is interesting that the grammar can accommodate words without an adjectival forms. In English, an adjectival form is for a person from a specific location is “derivational morphology”, nice to have, but not strictly required.

And the Productive Welsh Verb-Noun Ending is….-io

If you’ve seen my CV, you’ll know that I did my research in Celtic linguistics. I just downloaded a new dissertation on The Integration of English-origin Verbs in Welsh from J.R. Stammers.

I’m still processing it, but one question that comes up is what the “default” marker for Welsh verb-nouns (basically an infinitive) is…because Welsh has a lot of options. I would have guessed -u because it is very frequent in native verbal roots, but it may be -io. Stammers (2009) has this list of English verbs with the –io ending on the verb noun.

Note: Forms with hypenated endings are more recent borrowings.

  • activate-io
  • babysit-io
  • carfio (carve)
  • download-io
  • enjoio
  • email-io
  • ffonio (phone)
  • ffordio (afford)
  • ffotocopïo (photocopy)
  • insult-io
  • iwsio
  • marcio
  • panic-io
  • sincio (sink)
  • stare-io
  • stopio
  • text-io
  • twrio (tour)
  • whine-io

No Ending

  • fancy (-/i/ (i.e. -y) is a valid verb-noun ending)
  • name-dropping (this uses English gerund ending instead)
  • taking (with gerund)

A few exceptions?

  • canslo (how old?)
  • helpu (may be older)
  • freak-o (definitely an recent borrowing)

Interesting stuff. I do wonder if -u was originally the default ending as in helpu and older Latinate verbs like cymharu ‘compare’. But these days, it appears that -io is the clear winner, at least in this data.

More on Onion’s “Rules Grammar Change”

Just out of curiousity, I thought I would transcribe the satiric announcement of a mandated grammar change from the U.S. Grammar Secretary to Anglo-Saxon syntax. Below is the text of the announcement.

Rules grammar change. English traditional replaced to be new syntax. The Onion News it’s. Redland Doyle I’m. The U.S. Grammar Secretary that no more will rules English follow announced today. The changes verb, verb clauses and adjectives placing involved frequent with random shuffling or elimination conjunctions and prepositions of.

Grammar Secretary to according, “Is new structure loosely on obscure 800-year old pre-medieval Anglo Saxon syntax based.” This week, beginning America across all dictionaries, highway signs and other books or objects writing upon revised to fit new syntax will be.

And the approximate “unrevised” syntax translation:

Grammar rules change. Traditional English to be replaced [by] new syntax. It’s the Onion news. I’m Redland Doyle. The U.S. Grammar Secretary announced today that rules will follow English no more (??). The changes involved verb, verb clause[s] and adjective placing with random shuffling or elimination of conjunctions and prepositions.

According to [the] Grammar Secretary, “[The] new structure is loosely based on obscure 800-year old pre-medievel Anglo Saxon syntax.” Beginning this week all dictionaries, highway signs and other books or objects written/(writing) upon (??) will be revised to fit new syntax.

Are there any patterns to be observed – not too much. Yoda was much more consistent in his rearranged English. But hey, just a joke it is. It to kill make doesn’t sense.

Onion Announces Change in English Grammar

Finally the dream of all presciptivists comes true as the U.S. Grammar Secretary mandates a complete overhaul of the English word order, apparently back to the Old English period (but without the messy archaic case and verb endings).

Obviously, it’s a great demonstration of the futility of mandating grammar rules on a long term basis. Personally, I don’t think the grammar secretary went far enough – I would have liked to have seen the restoration of all eight original Indo-European cases.

Late Christmas 2009 Observation from Binghamton

So…this Christmas break I decided to visit a college friend currently living in Binghamton NY. But before I got there, I had to stop of at the local stitching supply store in nearby Endicott NY. As I was standing around looking at blackwork embroidery patterns, I heard something resembling the following sentence:

Has anyone did that pattern yet?

As you can see that irregular past participle done of Standard English was replaced by the past tense did (also an irregular). I was glad I was behind the speaker because I did a linguistic double take I hadn’t done in years. My friend later confirmed that this was somewhat common, noting that it “drove her crazy.”

A Google search of “have did” turned up a quote from hockey player Marc Savard who claimed “Sweet Caroline might have did it.” Savard was born in Ottawa, which may make this feature a Great Lakes or U.S. Canadian border feature.

I will say that there is a tradition in English dialects of conflating the past tense and past participle. First, for most verbs, the past tense and past participle are formed with the same -ed ending. It’s only a subset of irregular verbs like do (did/done), see (saw/seen), eat (ate/eaten), break (broke/broken) that maintain all 3 forms distinctly.
Even some irregulars like buy (bought/bought) have merged the past and past participle.

There has also been a tendency to change a formerly irregular past participle to a regular one (e.g. molten > melted). In modern Standard English, speakers generally say “The ice has melted” (not *The ice has molten). The old past participle is now reserved only for hot melted substances that can burn you (e.g. molten lava, molten sugar)…However, most dialects have maintained the did/done distinction.

Interestingly, there are U.S. dialects which have lost the done/did distinction but kept done and lost did (see “books i done read“) Americans will know that this is associated with non-Standard Southern and African-American forms and thus generally satirized. However, looking at this example I am realizing that done has not only replaced did but also have as the auxiliary of a perfective construction. In this case, all bets are off.

In any case the “have did” vs “done” changes show that dialects find multiple solutions to the same question – do we really have to both a past tense and a past participle form if they are only found in a few irregular verbs?

Rhyming vs. Rap

I just saw a great video of Neil Young covering the theme song “The Fresh Prince of Bel Air.”

It’s fascinating on many levels including the fact that Neil Young makes the song sound like an epic saga of the American dream as well as the fact that Young is apparently familiar with TV show. But on a linguistic level, I noticed that that while Young’s acoustic folk style preserves the original rhyme scheme of the Fresh Prince classic, it loses the “rap” quality.

The difference is that the rap is emphasizing an overall stress pattern in a the folk version is not. In other words, rap is both rhyme and rhythm. The stress pattern in the rap version is distinct from that in spoken English, which is probably one reason why many report that rap lyrics are harder to understand. It’s hard to systematically define the metrical scheme Will Smith is using (especially since I think it changes), but there is a tendency to put a heavy stress on the final syllable (“I’ll see when I get there /
I hope they’re prepared / for the prince of Bel-Air”).

I’ve written about rap as poetry (not always great poetry), but I think Neil Young’s rendition effectively shows the more classically poetic side of rap sometimes lost in the rhythm. I am amazed at the linguistic complexity of the good rap, especially since none of the artisans have ever gone to a formal rap academy. Then again, there was no bluegrass or jazz academy back in the day either.