Specialist Corpora

MICASE. This corpus of academic English was developed at the English Language Institute (ELI) at the University of Michigan. It contains approximately 1.7 million words of academic speech from across the university. Speakers represented in the corpus include faculty, staff, and all levels of students, and both native and non-native speakers. You can search the corpus on-line or buy a downloadable version.

CANCODE is the Cambridge and Nottingham Corpus of Discourse in English. It is a unique collection of spoken English that has been built up by Cambridge University Press and the University of Nottingham, forming part of the Cambridge International Corpus. The recordings that make up CANCODE were collected throughout the islands of Britain and Ireland between 1995 and 2000. It contains a total 5 million words. Some more details about the CANCODE and its use from Cambridge University Press.

The Wolverhampton Business English Corpus was produced by the Computational Linguistics Group at University of Wolverhampton (UK). The corpus consists of over 10 million words collected from 23 different web sites related to business. It includes product descriptions and company press releases. It is currently not available free of charge.

Print Friendly, PDF & Email