Category Archives: Teaching

Playing with Machine Translation

Maybe you’ve been hearing about Artificial Intelligence (A.I.) in the news in recent months. AI touches a lot of linguistic areas and one of those is machine translation (MT). I had heard of tools like Google Translate, but I admit that I had turned my nose up at MT. How could it be…really? Would I ever use it?

MT and Accessibility

A lot of people really, including foreign language students. OK – but our university wouldn’t introduce a tool with MT right? Actually yes – some tools designed for students with disabilities (e.g. Anthology Ally and Read and Write) include decent MT options.

For the record, the focus of these tools isn’t translation, but enabling students with certain reading or sensory differences access documents. They typically include TTS (text to speech), magnification, conversion to audio files and other options.

But one of these options is translation – from English to a student’s preferred language. If you’re an international student in a technical class, this could be very helpful for learning what some of those English technical terms translate into. But, of course the option to translate languages (e.g. Spanish) into English is also there. What to do? How about some research?

Other translation tools

By the way, these aren’t the only MT options. It’s also in Facebook (where it is automatic) and (yikes!) Microsoft Word. It seems to be creeping into all my tools!

Some MT Glitches

I’ve done some experiments and so far results are not bad, but not perfect either. They’re enough to say a human should still audit MT output.

Spanish Gender Glitch

As one experiment, I ran MT on an old blog post on language diversity. Most of the translation was pretty good, but there was an interesting gender glitch.

The post first mentions “the geneticist Sarah Tishkoff,” and based on the first name gives the translation la genetista Sarah Tishkoff with the feminine definite article la “the.f.” The article also mentions linguist Salikoko Mufwene who happens to be a male – but his title was translated as la lingüista Salikoko Mufwene.

How did this happen? The personal name “Salikoko” is neither English or Spanish and probably not in a translation database. Prof. Mufwene is from the Democratic Republic of Congo, so my guess is that his first name “Salikoko” originates from an African language. However, the ending -ko is found in many Japanese female names. Maybe the program thought Salikoko was a Japanese personal name? In any case, expect some oddities from your edge cases.

Proper Names

For the most part, proper names for people and places shouldn’t be translated. Some MT can detect them, but some did slip through as in the example below.

English Original

My dog’s name is Glyndwr. We live in State College (PA).

Into Welsh

Enw fy nghi yw Glyndwr. Rydym yn byw yng Ngholeg y Wladwriaeth.

As you can see, the name State College has been translated to Coleg y Wladwriaeith and then the Welsh Nasal mutation has been added. Sweet!

Register Glitch

In addition to the proper name translation, this translator used a verb form new to me (and not the one I was taught in class). Specifically, the first person singular r-ydym for “we are” with the subject pronoun ni dropped (like it can do in Spanish).

Back in the 1980s when I learned Welsh, I was taught that formal written Welsh was pro drop, but that the subject pronoun was required everyday spoken Welsh – I learned rydyn ni’n as “we [are].” That’s what I’m finding on these sites also:

So this tool is giving me a slightly more formal version than what my instructors might be expecting. Just saying.

Spanish Example: Imperfect Subjunctive

This sort of happened in a Spanish example also. I translated a blog post and the tool correctly outputted an imperfect subjunctive. For a language teaching perspective this could be interesting because this tense is usually taught relatively late.

Multilingual Texts

If you really want to trip up the MT, try a blog about pronunciation like whether to pronounce buoy as “boo-y” (U.S.) or “boy” (U.K.). Technically, this was a monolingual article, but in Spanish it will actually become multilingual with some words needing to remain in English.

There were a lot of glitches, but a fun one was that the Spanish version claimed that boya “buoy” could rhyme with chico (cite) or niño” (boy). Not really.

It also kept translating the related word buoyancy (where the first syllable usually rhymes with “boy”) with flotabilidad. The translation is accurate, but not what was needed right them.

Final Words

Like calculators and other modern tools, I’m sure MT is here to stay, but I think it will be a while until it’s perfected. I could see professional translators who know Spanish using this and then editing results. But some translations are weirder than others.

I think language teachers should experiment with the tools to see what’s happening, and even point them out. Lots of students are finding them useful for learning vocabulary – but you need to beware of what could happen if you don’t review the output.

I will say my biggest concern is that the Internet will stop allowing me to live even my quasi multilingual life. I was wondering if the Spanish form un biólogo “biologist (male)” had a feminine form una bióloga “biologist (female).” However, even going to Google Spain, I kept getting English results or offers to translate the Spanish. ¡Basta!

Teaching Standard English…Jeopardy Style!

Some urban (and rural) schools districts have quietly introduced a curriculum that teaches children who don’t natively speak Standard English to “translate” or “code switch” between their native dialect and standard English. One teacher has turned the grammar class into a Jeopardy style review. You can see that the kids are having fun figuring out arcane grammar rules. Generally speaking it’s a lot more motivating and effective to encouraging literacy than constantly correcting a child’s grammar.

P.S. As one educator Noma LeMoine explains, this effort has never been about “teaching” Ebonics to students, because “We don’t need to teach African American Vernacular English…They already know it.”

Linguistics for Young Readers?

I was watching the one of the Turnitin Writing X Tech 2016 Webinars on Teaching the Writing Brain and I was shocked to see that the presentation included the words morphonemic as well as morphology and phonology. You mean linguistics might be useful for understanding how children need to learn to decode the written word? Shocking!

Spelling and Linguistics

FYI – The word morphonemic was related to the issue of teaching spelling. The presenter Virginia Berninger emphasized that children do need to understand that not only do prefixes and suffixes affect the meaning of a word, but can also affect pronunciation (as in the first vowel of nation vs. nation+al. She also mentions another controversial word, phonics, to illustrate that English spelling (“orthography”) is supposed to be phonetically based and that she recommend that children learn the phonological structure of English spelling alongside all of our native spelling system quirks (that is, orthographic awareness).

And (OMG!) you might want to consider word origin (etymology) when teaching spelling. That’s because English borrows a foreign language’s spelling rules when it borrows the words. Linguists definitely know this, but you don’t see this mentioned as a strategy except in spelling bee competitions.

Building a Communication Bridge

For me as a linguist, the idea of teaching children phonics, word structure and matching spelling quirks to pronunciation seems fairly obvious as is the idea that writing teachers should have some linguistic training. Unfortunately linguists and more traditional “English” teachers have often seen each other as the enemy, and I will admit to mocking bad prescriptive grammatical rules. As a result, I often see many language teachers (even foreign language teachers) discuss teaching “culture” or “ideas” instead of “grammar” (As if we can’t we teach both!)

While I sympathize with frustrated linguists, I have to admit we have done a terrible job of explaining how linguistics applies to real world teaching and writing situations until fairly recently. That’s why I’m so happy that a seminar for writing instructors included neurological research supporting basic linguistic analysis. Linguistics could be starting to enter the world of general academic knowledge. Even Grammar Girl sometimes even mentions linguistics in a positive light (you go girl).

For linguistics, I do think we need to work better to appreciate the role of traditional prescriptive rules. While it is important to understand the structure of non-Standard English dialects (e.g. AAVE (African American English), Southern dialects, etc), we have to acknowledge that linguists always write standard academic English in their journal articles. As with other educated speakers, linguistics have learned to write and spell in a particular fashion that is at least a little bit different from their spoken forms (unless they are speaking like Sheldon Cooper from the Big Bang Theory.)

Some traditional grammar instruction is needed, but we also need to help teachers understand the role of linguistics in teaching those who don’t speak Standard English at home or those who have a learning disability related to reading and writing. I hope research like this can help build that bridge.

Why Linguists Should Worry About Book Prices and Digital Access

An issue that may seem to be a bit esoteric is the pricing of linguistics books on Amazon, but I do think it has a negative impact in efforts to disseminate information among ourselves and to the community. As most linguists know, most new hardback books are usually over $100 to purchase, but even paperbacks can be expensive. Even paperbacks range from the relatively cheap $30 to over $50.

In my experience, the general public is interested in certain linguistic topics such as the history of English (or other heritage languages). They may also be interested in certain policy issues such as education and language. If possible, it would be helpful for people to get reliable information at a reasonable price. Unfortunately, really good linguistics books at a reasonable price are very scarce.

Indo-European Books

One topic that the general public is fascinated with in Indo-European, but it’s also an issue that leads to lots of problematic theories and political debates. The Nazi “Aryan race” is the worst case scenario of tying a linguistic theory to racism. Pointing people to a good Indo-european handbook might help people understand the methodologies more and put the information. These exist, but are usually over $40.

Right now the The Oxford Introduction to Proto-Indo-European and the Proto-Indo-European World by J.P. Mallory and D.Q. Adams is selling for just under $60. The Cambridge University Press’ textbook Indo-European Linguistics: An Introduction (Cambridge Textbooks in Linguistics) by James Clackson is about $45. Another textbook from Blackwell,
Indo-European Language and Culture: An Introduction by Benjamin Fortson is about $60. The cheapest respectable book is the American Heritage Dictionary of Indo-European roots (under $17) and a few books that focus more on archaeology than language.

Or you could spend $4.99 (free on Kindle) and get Indo-European Origins by William Davey. Reviews are mixed, but I would be concerned with this review that noted that “Googling for an author’s name did not provide any insight at all in regards with his background, so I’m still in some doubt” (I also could not find much on Google). Nevertheless, other people seem to like it, but is it as well researched as other books? Another reviewer feels dubious. But right now, it’s the top link in Amazon. Hmmm.

Lack of Basic References

In a similar vein, as an instructor, I would like my students to read informed sources about different languages or language families, but helping them find basic information is more frustrating than it needs to be in the digital age. A lot the handbooks I would recommend range between $60 to over $300, and most are print only.

Obviously, no undergraduate would make this investment, and it’s steep even for a graduate student or faculty member. Traditionally students could go to the library for these resources (and I do remind my students to step inside the library), but not all the books may even be in the library. Or they may be on permanent loan to an instructor or desperate graduate student.

At the moment, the quickest source for linguistic facts is Wikipedia, and I’ve been known to look things up myself. Hopefully, some of the editors have been able to fund purchasing of the quality resources I’ve mentioned…but you never know.

How Pricing Affects Awareness

The general assumption of academic publishing is that linguistics books are meant for either libraries or other linguists who will agree to pay an increased price that reflects a buying pool. But now that new digital options have emerged, it is time to rethink how information is distributed and take advantage of cheaper models of distribution. The Rutgers Optimality Archive (ROA) allows researchers to both access and contribute information for free. The Atlas of North American English by William Labov can be licensed by libraries in a digital format any registered user can download. Mouton also provides some information at http://www.atlas.mouton-content.com/.

Libraries are starting to realize these resources are necessary, but we need to find ways to encourage other publishers to make their handbooks more readily available in a digital format. I would also like more of an iTunes model where individual chapters could be purchased as needed.

Our Tax Dollars at Work?

As other organizations such as the Association of Research Libraries have pointed out, many American academic projects are at least partially funded by U.S. government agencies. Therefore, our tax dollars are actually paying for results which should be available to the public. This is similar to the idea that content produced by the federal government is public domain. As many instructors will tell you, it is not as if they expect to live off of royalties from their books based on the limitations of distribution.

It is important to remember that publishers do need to be compensated, but the beauty of the iTunes model is that it provides access to more publishers than traditional music media distribution. It also allows customers more choice in what to buy the chance to preview what they buy. I have become a much more educated music listener thanks to iTunes. It would be great if a similar model could allow people to become more educated citizens.

MLA U.S. Language Map

The MLA (Modern Language Association) has an interactive language map of language communities in the U.S. based on the 2000 Census data (with updates from 2005) at:

http://www.mla.org/resources/census_main

In addition to the basics, you can find information on language communities by state, county and even zip code. If you really want to check it out, I recommend viewing data from the Los Angeles area. It’s probably as linguistically diverse as New York.

As a fun class exercise, I just took the basic U.S. map showing concentrations of non-English speakers (bluer = higher percentage of English speakers) then asked students to guess which language communities were being represented. Another fun exercise would be to have people look up the third largest spoken languages in different regions. Overall in the U.S., the third largest is Chinese, but in Pennsylvania it’s German (and Tagalog (Phillipines) in California).

P.S. I should note that today the map is hanging when collecting data, but Internet speeds have been slow in general…hopefully it’s a temporary glitch. If the map isn’t working, you can retrieve the raw data by clicking “Tabular View”.

Learning “Classical” Languages – Speaking, Translating or Reading?

I’ve been following an interesting discussion on whether a “conversational” approach should be used for Latin or not.

For modern language courses, the behavioral objectives are fairly obvious. After 2-3 semesters of a language, you want to be able to walk into a cafe or bar, read the menu and order the beverage you want (or figure out how to get the train to Marseilles, or get the latest scoop from ¡Hola! magazine. That is you want a certain level of listening, reading and speaking proficiency with enough writing thrown in to fill out an application or compose a quick thank you note.

These days a conversational approach is advocated so that students learn to communicate in the target language “on their feet”. Exposure to native language speech input is also recommended whenever possible so that leaners can parse audio.

With classical languages like Latin and Greek, the objectives may be different. For instance, Attic Greek (i.e. the language Sophocles spoke) is what you need to read the original Ancient Greek literature. If you’re in Greece, Attic Greek is helpful for reading street signs and monument inscriptions. But if you want to order some ouzo in Athens, you probably need to learn Modern Greek. That is, learning classical languages is usually about being able to read in the target language – not being able to speak it.

Can the conversational approach help here? Interestingly many of the Latin teachers said they DID advocate the conversational approach. Apparently learning Latin without using it conversationally was a little be too “abstract” for students. I’ll admit that my Latin teacher burned in the supine into my brain with “correctives” like horrible dictu (or “Ugh! Horrible Latin!”). Interestlngly Latin has taken on a life of its own as a living language community. You can even get your news (nuntii) in Latin. Clearly, there’s something to this.

It should be noted that traditional Latin pedagogy then focused more on grammar and translation. The idea was that if you understood in detail how the Latin phrase or sentence was bulf, you that you would be able to read Latin by “deconstructing” the combination of words and grammatical endings. In practice though, I would say that the result is that many students can recite a lot of paradigms but end up having troubles reading actual texts from Cicero.

But…even with the conversational approach, I wonder if you hit a wall. I’m glad we have “modernized” Latin, but it can’t be the same as what Cicero wrote. It’s a form of Latin spoken mostly by speakers of modern European languages – none of which much resemble Latin anymore. Even modern Italian has very different syntax than Classical Latin.

What I found was that even with “conversation” and “grammar”, I had great trouble parsing Cicero – I could translate the words, but couldn’t string them together so that they meant anything. There’s a certain pragmatic logic in Latin that is lost in literal translation. After all Qui/Cui bono doesn’t literally mean “Who benefits?” but “Good for whom?”

I would say that I didn’t truly understand how to learn Classical languages until I took Middle Welsh. Although we did learn some grammar, the focus wasn’t being able to speak or even translate anything. Instead we just picked up an actual text with a glossary in the back and plowed through. I took notes in the text, but it was so small that I learned to only translate the key vocabulary words I didn’t get. The more “simple” words I could memorize, the faster the reading went. In other words I was learning to read the syntax directly. I had slight indigestion that I would not be able to order a mead in Middle Welsh, but then again, this is not really possible anyway.

Another benefit to the “learn as you read approach” is that you may not be thrown off by minor inconsistencies. Many medieval languages were “flexible” in terms of grammar and spelling – it really is more important that you be able to recognize a potential irregular past tense rather than that you know exactly what it is.

When I thought about it, I realized this is probably the best approach – after all you are trying to read the language, and sometimes you may need to read an undiscovered document which may contain new verb forms as well as previously unattested vocabulary. Sometimes reading ancient texts is a decoding exercise.

In the end, it’s about the reading and neither the speaking or the translation. There’s just one remaining problem – by the time I had gotten to Middle Welsh, I had Modern Welsh under my belt. If you’re starting from scratch, it really can be an interesting chicken and egg challenge.

Math and Alternate Representations

Since linguistics invokes mathematical formalism (i.e. phrase trees, feature bundles, rules or tableauz, etc), I am interested in some aspects of how math is taught.

One question that comes up a lot is why is it important for all students to learn algebra or trigonometry if only a small minority will ever use these tools in daily life. The standard answer is that algebra teaches you “mathematical thinking,” but I’m pretty sure most students (especially those who hate math) miss the point.  Actually, I would say that if you want to learn “deductive” skills, you’re better off taking formal logic or rhetoric.

However, there is one aspect of algebra that is important in real, but rarely pointed out and that’s its ability to provide multiple respresentations for “the same thing”. For instance the concept of “1” can be represented as “1”, 4/4 (four-fourths), x0, |i2| and my personal favorite – .999999… And believe me I haven’t even touched the tip of the iceburg. Although these formulations all represent the same quantity, they do not quite the same meaning.

You normally use “1” in real life, but if you’re working on a weird property issue where an piece of lanf is divided into quarters maybe the formulation “4/4” would have meaning. Or maybe you have a formula which you raise x to a certain power – whatever it is. It’s just that when it’s zero, the result is 1.

My point isn’t just that the “same” item can have multiple
representations but that the different representations can be selected
to help you focus in a different aspect. To borrow a concept from
Semantics class, the meaning of something is partly fixed by your
context – but you have to know EXACTLY what your context is.

The use of multiple representations does extend beyond algebra (and I don’t just mean linguistics either). For instance, there are lots of places around the world which have multiple place names, and sometimes you select one based on what era you are studying.

For instance modern historians may study be studying Turkey“, but historians from the 14th-early 20th century may be studying the heartland of the “Ottoman Empire” while those who specialize in the Bronze Age probably study “Anatolia” and Roman historians are probably studying “Asia Minor.” It’s roughly the same place, but the different names not only establish the time context, but can be used fudge minor details like changing political borders.

You don’t want to start calling modern Turkey “Anatolia”, but the use of the term “Anatolia” is useful for referencing the set of Bronze Age cultures in the region (none of which are now related to the modern Turkish culture in terms of language or religion)…so you don’t usually call ancient Anatolia “Ancient Turkey” either (unless you’re writing a tourist brochure). And no matter what – you never want to confuse Turkey with Turkestan (not cool).

This kind of mathematical thinking isn’t about accepting one “right answer,” but systematically determining what the possible answers are and when to deploy them while understanding that some answers are just plain wrong!

Transcribe with /j/ or /y/?

I’m teaching phonology again, and once again, I am contemplating the issue of which phonetic system to use. It seems like a trivial issue, but it actually gets into some tricky issues.
One of the trickier issues is transcribing the “y” sound of “yes”.

FROM Y TO J

Although linguists generally stick to IPA, there is a close variant called American Transcription. Usually, I just teach both (and accept both for credit), but I do like to stick to one variant in my lecture notes if at all possible.
The “y” sound is /y/ in American, but /j/ in IPA (following German spelling convention). In the past, I used /y/ in order not to confuse my students who are generally familiar with English/French/Spanish – all of which spell this sound as “y”.
Americanyes = /yɛs/ boy = /boy/
IPA – yes = /jɛs/ boy = /boj/
This time I have changed my mind and have moved to /j/. One reason is that all other Penn State classes use IPA. Another is that even the Wikipedia uses /j/. At this point, I’m starting to look a little “backwards” for sticking with American /y/ instead of the more continental /j/.

NOT A COMPLETE SWITCH

But I haven’t made a complete switch…Following Kentowicz’s 1994 textbook Generative Phonology, I still prefer American for some sounds. Some because they show phonological relations more clearly (per Kenstowicz). Others because, quite frankly, it’s easier to crank them out on a keyboard.
Some examples
* I use American /ñ/ (as in señor) instead of IPA /ɲ/ for the palatal nasal.
This is because a) it’s easy to type /ñ/ (especially on a Mac), b) American students are familiar with the Spanish sound and c) there are too many n’s with tails in IPA (ŋ ɲ ɳ). At small point sizes, I think it’s easier to spot ñ.
* I use American /ṭ,ḍ,ṇ/ instead of IPA /ʈ,ɖ,ɳ/ for retroflexes.
This one is because a) almost all scholars of language of India (the prime retroflex languages) use the dot underneath b) I can generate these on the Mac extended keyboard and c) I still don’t like that IPA retroflex tail visually.
* I use American /ü,ö/ instead of IPA /y,ø/ for front rounded vowels.
Because 1) German spelling uses umlauts and b) it signals “front rounded”. It also means I never have to use /y/ in transcriptions – avoiding the whole “What does /y/ mean?” issue.
* I use American /č,ǰ/ instead of IPA /tʃ,dʒ/ for alveolo-palatal affricates
The reason for this one is that even affricates are supposed to be “two sounds”, they are generally treated as a special kind of stop in most languages. Interestingly Indic scripts all treat these two sounds as one letter, as does Arabic (and English “j” and Italian “c”).
Just to be weird though, I use IPA /ʃ,ʒ/ instead of /š,ž/. This was clearer to many students for some reason, and they are distinct in shape. These IPA symbols are also very common in French linguistics.
I won’t claim that this is a perfect solution – after all IPA is becoming more of a universal standard these days than when I was learning linguistics. If nothing else though, I do like to mention the alternates because both were in use for a long time.
A linguist (even me) has to learn to make adjustments for different linguistic documents.

Hate Social Computing? Think Role Play!

I am one of those cranky people who see new social technologies like Twitter and MySpace and ask “Why do I want to tell this stuff to strangers?” or “Why should I care what some guy in Denver is doing Friday night?
But I am intrigued by the FICTIONAL incarnations of these tools where people assume virtual identities based on know historical or fictional figures and then do their blogs, Twitter and MySpace profiles.
Many classes like history and business have latched on to role play, but these technologies take it to a whole new level. Man
Thus far my favorite examples have been:

Silliness aside, there is a chance here for students to explore the messiness of politics and daily life from previous generations in a way that makes it more real. Thomas Paine had his pamphlets, but I’m pretty sure he’d be a prolific blogger today.

Citations in our Lecture Notes?

The H-Teach List is having a really in-depth discussion on plagiarism. The general consensus is that the US plagiarists generally know they are being naughty, but Doug Deal asked an interesting question about most textbooks

Textbooks have a lot of written materials and lists of “suggested readings,” but as far as I know, they don’t usually have footnotes or endnotes or works cited. they don’t cite specific sources for the information or the analyses they contain. Why not?
The lecture presents an oral version of the same problem. We undoubtedly could cite sources for some of what we say in every lecture, but typically we don’t, or at least we don’t do so scrupulously. Is that okay too?

That is, many of our lecture Powerpoints for the classtoom tend to present facts “as-is” and not delve too deeply into where we got our information. To give some people credit, some lectures and textbooks do include citations (and I try to squeeze them in), but it’s rarely a key feature in the information most people see.
By the way, it’s not just the classroom. Most popular non-fiction, news stories and informational Web sites hide or eliminate citations. Why does it happen? Basically to simplify presentation for the audience. In a teaching situation, it may be the case that students don’t care where you got your Welsh data from…just that they have to memorize it.
Another problem many instructors/news providers may encounter is that many younger or less-advanced students usually don’t want to hear “maybe this or that”. I know I didn’t want to hear about it when I had one of these classes as a junior. In the beginning, students usually want to hear about one method/story and be done with it.
For instance, if you asked “how many sounds does English have?” I bet you don’t want to hear the linguist state “It depends on the dialect…” (even though it does).
So…how do we train students to care about citations? We can use the traditional stick method (it’s the one I mostly use). Problem Based Methodology would say that practicing research would teach students the importance of citations. That would probably help.
Here are some things that have taught me to love citations

  1. Sometimes I need to look up a data point back up (usually an Old Irish verb form in my case). Citations really narrow the search process down quite a bit.
  2. The stories I hear about people trying to figure out where different ancient authors really got their information makes me appreciate citations more. When reading ancient travelogues with crazy third-hand stories, you really do wish they had included a citation somewhere.
  3. And it does help to have other sources to back up your kooky idea. It’s not just YOUR kooky idea, it’s just a minor extension of {citation here].
  4. Finally, I like to look up other people’s data points (usually Fula verb forms) to see if any strategic editing has been done. Are ALL the data points included? Is ALL the text quoted? The answer is maybe not. In fact I’m downright paranoid about using other linguists’ data…I usually prefer to go straight to the original non-linguistic grammar or text.
  5. It’s this last point that made me really understand the importance of looking up the original source and how important an honest citation is. It’s only when you can look at and TRACK several sources that you can begin to filter out unconscious bias.