Category Archives: Uncategorized

Google Search: What are you trying to autocomplete?

Wired magazine had a good article on offensive Google autocomplete answers to questions like “Are Jews…” or “Blacks/Feminists are….”

There have been lots of good articles on built-in bias in different algorithms such as evaluations of credit worthiness which perpetuate unconscious bad assumptions already built in to traditional financial evaluations.

But…what is the question?

In this case though, I wonder if the customer isn’t partly to blame. To me the very question “Are XXX…” to me sounds like a non-XXX person trying to check on an aspect of XXX culture. A benign scenario could be to clarify a stereotype or just to learn more about the XXX culture. I can even see people in the community want to search answers about questions they may have.

But there are plenty of negative reasons people are looking up information about “those” XXX people out there. The fact that Google is prone to bring up offensive stereotypes in the answers suggest a lot of negative searching is happening under the umbrella “Are XXX …” Even today, I asked the question “Are XXX…” for various groups and got some odd answers like “Are Pennsylvanians rude?” (or they just weird?)

Google search for Are Pennsylvians with autocomplete answers rude/weird

Can you ask this instead?

I do use Google to look up cultural information about all sorts of XXX cultures, but I confess I haven’t run into this problem. For one thing, I rarely use words like “are/is” in my search terms. Instead I use nouns and adjectives which are what I’m really interested. It’s up to you if you want to enter in something neutral or more loaded.

In terms of searching I had thought of “to be” as a semantically empty (although grammatically important) verbs that wouldn’t affect my search results one way or another. Apparently that verb is more powerful than I thought.

The Tennis Ball Color is…5GY (Chartreuse)

Tennis ball in standard color
Tennis ball used in 2011 Japan Open. Photo by Christopher Johnson. Licensed by Creative Commons.

Today the most recent color debate is about those bright fluorescent tennis balls – the question being are they yellow or green? The answer probably is …both.

Focal Colors

This issue points to how cultures do divide color space and supports the Berlin and Kay theory of focal color. Although humans can see thousands, if not millions of colors, most languages assign primary names for only a small percentage of them. In English, two of these color words are “yellow” and “green” (along with red, blue, orange, purple, white, black, brown).

Of course these colors are umbrella terms for a range of colors. For instance, we may speak of pine green (dark and a little blue), sea green (pale green), olive green (like the green olive) and so forth. But when “green” stands alone, we may be thinking of the prototypical or focal color – a bright green associated with emeralds, leaves or “Kelly green”. This green is what is normally seen in national flags, corporate logos or many sports team logos.

Similarly, although yellow comes in different shades including mustard (slightly darker), lemon (slightly paler) and saffron (with a touch of orange), the word “yellow” refers to the shade of yellow used in many national flags and logos such as John Deere (tractors). The focal colors are also what is taught to children when they are exposed to color words as can be seen in the sample images below.

What About the Tennis Ball?

An answer that I think many people have guessed is that the color of the fuzzy round object is right on the mental border between a bright yellow and a yellowish pale spring green. If “yellow” or “green” are the only available options, it appears some think yellow and others green. To be clear, the International Tennis Federation (ITF) (and tennis player Roger Federer) calls this color “yellow”, but if you want an ITF approved model, it may be green.

Some other factors to consider is that lighting may make a tennis appear greener in some photos and yellower in others. Even in the Japan Open photo, the ball appears yellower in direct light and greener in the shadow.

For me, I couldn’t really classify this color as either yellow or green…just fluorescent and perhaps fluorescent chartreuse. Interestingly a lesson artists have to learn about color is that 1) there are a lot of them and 2) most professional color wheels have lots of color divisions on the edge, often including chartreuse (or 5GY in the Munsell system).

From a linguistic point of view, this does show what happens when the color of an object sits on a mental color border, and apparently it’s not pretty.

P.S. Japanese Blue Traffic Lights

Ever heard that Japanese label the Go traffic light as “blue” (ao) even when they’re the same color as U.S. traffic lights? There are differences between the English and Japanese color system, but if you look closely, you’ll see that traffic light “green” is actually pushed towards a bluer (cyan shade). In some photos, the correct color may be “cyan.”

Prince Charles Shakes Gerry Adams Hand in County Sligo

The BBC had a fairly amazing story of Prince Charles visiting the place where the IRA killed Lord Mountbatten, but also meeting with Gerry Adams of the Sinn Féin. I think a lot of people thought this could never happen.

Back in the 80s-90s the Troubles were still very much active and something a person studying a Celtic language paid attention too. In many ways the issues surrounding Ulster were very difficult and bitters. One the one hand there the Catholic Irish were angry because their country was invaded in the past, but on the other hand, the Protests were now in the majority and did not want to become “Irish” like the Republic of Ireland.

Violence made it all uglier. Many of the legitimate grievances of the Catholics fell on deaf ears as the IRA set off bombs or assassinated people. It wasn’t until I saw In the Name of the Father (with Daniel Day Lewis) that I understood how the British could badly misprosecute suspected IRA members. But see also Cal which shows how seductive, but damaging the IRA could be.

Eventually a cease fire as declared, but I wondered how long it could last geven the mistrust on both sides over many decades. But it has lasted, and this semester when I mentioned Northern Ireland to my students, I got a lot of blanks stares from my students (oy!). There is not harmony by any means, but Adams has made the Sinn Féin (the IRA’s party) a legitimate political force and the British have been more encouraging of Irish language use in Northern Ireland (although some Protestants associate it with IRA unfortunately).

Still it’s nice to see that people can choose to at least tolerate each other if it will stop violence. I think people are also recognizing that Northern Ireland/Ulster has a unique culture which blends both Irish and UK culture. I really hope to visit some day.

Video of the Week: Irish Carlsberg Ad

If you liked One Semester of Spanish Love Song, you’ll enjoy the Carlsberg Irish ad.

The Carlsberg Irish ad stars three Irish lads attempting to get a beer somewhere outside Ireland. As payment, the barkeep demands they “do something Irish”, preferably “singing or dancing”. Instead they choose to recite a “poem in Irish”, which turns out to be random phrases they vaguely remember from their years of mandatory Irish language education. At last, they have found a use for all of those phrases….

Why Linguists Should Worry About Book Prices and Digital Access

An issue that may seem to be a bit esoteric is the pricing of linguistics books on Amazon, but I do think it has a negative impact in efforts to disseminate information among ourselves and to the community. As most linguists know, most new hardback books are usually over $100 to purchase, but even paperbacks can be expensive. Even paperbacks range from the relatively cheap $30 to over $50.

In my experience, the general public is interested in certain linguistic topics such as the history of English (or other heritage languages). They may also be interested in certain policy issues such as education and language. If possible, it would be helpful for people to get reliable information at a reasonable price. Unfortunately, really good linguistics books at a reasonable price are very scarce.

Indo-European Books

One topic that the general public is fascinated with in Indo-European, but it’s also an issue that leads to lots of problematic theories and political debates. The Nazi “Aryan race” is the worst case scenario of tying a linguistic theory to racism. Pointing people to a good Indo-european handbook might help people understand the methodologies more and put the information. These exist, but are usually over $40.

Right now the The Oxford Introduction to Proto-Indo-European and the Proto-Indo-European World by J.P. Mallory and D.Q. Adams is selling for just under $60. The Cambridge University Press’ textbook Indo-European Linguistics: An Introduction (Cambridge Textbooks in Linguistics) by James Clackson is about $45. Another textbook from Blackwell,
Indo-European Language and Culture: An Introduction by Benjamin Fortson is about $60. The cheapest respectable book is the American Heritage Dictionary of Indo-European roots (under $17) and a few books that focus more on archaeology than language.

Or you could spend $4.99 (free on Kindle) and get Indo-European Origins by William Davey. Reviews are mixed, but I would be concerned with this review that noted that “Googling for an author’s name did not provide any insight at all in regards with his background, so I’m still in some doubt” (I also could not find much on Google). Nevertheless, other people seem to like it, but is it as well researched as other books? Another reviewer feels dubious. But right now, it’s the top link in Amazon. Hmmm.

Lack of Basic References

In a similar vein, as an instructor, I would like my students to read informed sources about different languages or language families, but helping them find basic information is more frustrating than it needs to be in the digital age. A lot the handbooks I would recommend range between $60 to over $300, and most are print only.

Obviously, no undergraduate would make this investment, and it’s steep even for a graduate student or faculty member. Traditionally students could go to the library for these resources (and I do remind my students to step inside the library), but not all the books may even be in the library. Or they may be on permanent loan to an instructor or desperate graduate student.

At the moment, the quickest source for linguistic facts is Wikipedia, and I’ve been known to look things up myself. Hopefully, some of the editors have been able to fund purchasing of the quality resources I’ve mentioned…but you never know.

How Pricing Affects Awareness

The general assumption of academic publishing is that linguistics books are meant for either libraries or other linguists who will agree to pay an increased price that reflects a buying pool. But now that new digital options have emerged, it is time to rethink how information is distributed and take advantage of cheaper models of distribution. The Rutgers Optimality Archive (ROA) allows researchers to both access and contribute information for free. The Atlas of North American English by William Labov can be licensed by libraries in a digital format any registered user can download. Mouton also provides some information at

Libraries are starting to realize these resources are necessary, but we need to find ways to encourage other publishers to make their handbooks more readily available in a digital format. I would also like more of an iTunes model where individual chapters could be purchased as needed.

Our Tax Dollars at Work?

As other organizations such as the Association of Research Libraries have pointed out, many American academic projects are at least partially funded by U.S. government agencies. Therefore, our tax dollars are actually paying for results which should be available to the public. This is similar to the idea that content produced by the federal government is public domain. As many instructors will tell you, it is not as if they expect to live off of royalties from their books based on the limitations of distribution.

It is important to remember that publishers do need to be compensated, but the beauty of the iTunes model is that it provides access to more publishers than traditional music media distribution. It also allows customers more choice in what to buy the chance to preview what they buy. I have become a much more educated music listener thanks to iTunes. It would be great if a similar model could allow people to become more educated citizens.

“Ancient U.S. Weapon”? How old can that be?

The meaning of words can vary from context, and this article about an “Ancient U.S. Weapon” in Syria brought home this point to me.

If you consider that ancient often means “the earliest recorded memory” or sometimes “before our civilization as we know it began”, then definitions can vary across disciplines. In historical linguistics, “ancient” is usually no later than the Roman Empire (at least in my estimation), yet the Academy of Ancient Music is playing pieces by Handel from 1685, which is about the time period when known compositions can be firmly reconstructed. Another semi-amusing case is the PBS program In Search of Ancient Ireland – which apparently ends with the Norman invasion in the 1100s (well into the Middle Ages again). I suspect that any era when Wales, Ireland or Scotland was still under the control of a Celtic language government will be “ancient”.

Still the words “Ancient U.S.” really gave me pause. I know that pre-European history is “Ancient”, although it generally ends between 1492 to 1900 depending on location. But the U.S. itself as “ancient”? That is a new concept for me. Especially since the artifacts were weapons from the 1970s used in the Vietnam War. I actually remember when the troops left Saigon, so definitely in my lifetime.

Analyzing Facebook Posts

The MIT Technology Review published an article about a Penn study analyzing Facebook posts to find correlations betweeen words/phrases and your demographic and personality profile. The actual study is available at the PLOS One Website.

There are lots of interesting correlations for posit, including ones for age predictors. Below are some keywords which are associated with some age groups.

“Words, phrases, and topics most distinguishing subjects aged 13 to 18, 19 to 22, 23 to 29, and 30 to 65. Ordered from top to bottom: 13 to 18 19 to 22 23 to 29, and 30 to 65. Words and phrases are in the center; topics, represented as the 15 most prevalent words, surround. (N~74,859; correlations adjusted for gender; Bonferroni-corrected pv0:001).

Words vs. Age Groups
13-18 19-22 23-29 30-65
  • school
  • tomorrrow
  • homework
  • English
  • bored
  • math
  • prom
  • hahaha
  • :D
  • <3
  • (:
  • semester
  • fuck/fucking
  • apartment
  • studying
  • campus
  • shit
  • roommate
  • at work
  • enjoying
  • office
  • beer
  • drinks
  • new job
  • company
  • apartment
  • daughter
  • my son
  • my kids
  • fb friends
  • husband
  • repost
  • blessed
  • children
  • prayer(s)

On the whole, I would say that the results do have a certain validity. If you’ve ever been on Facebook, I am sure you will have seen some of these words yourself for your age group. And while I don’t doubt the methodology at all, I would be handout the usual caveats for this kind of study.


Who’s in Facebook?

My first caveat is class assumption. The 19-22 word set is dominated by traditional collegiate life with the number one word being “semester” (followed by “fuck”). Other major collegiate-specific words include “campus, studying, classes” and minor words include “papers, exams, assignments, science, professor” and so forth. The list includes recreational words which could be collegiate or not (drunk, hangover), but many students also happen to drink (or talk about drinking) at college.

To me this means that this isn’t just a 19-22 year old sample, but middle class 19-22 year old sample. As many researchers such as danah boyd point out, it is important to note that not EVERYONE is in Facebook, and not everyone is in a particular social media environment.

Personality and Community?

The article also discusses correlations between personality type and word use. For instance, people who test as introvert are apparently interested in “computers” and “anime” (vs. “party” and “boys/girls” for extraverts), while those who are “neurotic” tend to use words like “depressed”, “sick of” and “fucking” (vs “success, basketball, lakers, success” for the emotionally stable).

Again, I don’t necessarily dispute the results, but I do wonder if the notion of “performance” has been taken into account. What I mean by performance is that people may write in a certain style and on certain topics in order to conform to some social norm such as what is expected of a particular gender.

To take a personal example, my Facebook network includes a lot of co-workers and family. I don’t necessarily share everything with everyone on Facebook. I watch a certain amount of manga, but I choose to not talk about it on Facebook since it’s not usually relevant to my circle. Instead, I tend to talk about the far more socially acceptable topic of pets and babies (I have a corgi and he is soooo cute!). I would be curious if my word cloud skewed towards extrovert or not.

Facebook may truly indicate personality preferences, but it is not the same thing as a personal journal.