Learner Corpora
Language | Name of Corpus | Size | Author/Institution | Info |
Arabic | The project of Arabic Learner Corpus (ALC) | 282,732 words | Abdullah Alfaifi and Eric Atwell | a collection of written and spoken materials produced by learners of Arabic in Saudi Arabia |
Chinese | Guangwai-Lancaster Chinese Learner Corpus (CLC) | 1,2 million words | Guangdong University of Foreign Studies and Lancaster University (UK) | spoken and written |
Czech | The Corpus of Czech as a Second Language | 2 million words | Charles University, Prague | (1) Texts written by L2 learners of Czech (ciz) (2) Academic texts written by intern. L2 Czech students in Czech (kval) (3) Texts written by Czech students with Romani background (rom) |
Czech | The MERLIN Corpus | 437 texts; in progress | Merlin project funded by EU Lifelong Learning Programme | written; texts from standardized tests; CEFR ranked |
Estonian | Estonian Interlanguage Corpus | Tallin University | written | |
Finnish | Advanced Finnish Learner Corpus | 288,539 tokens | University of Turku | written; restricted access |
French | French Learner Language Oral Corpora | several corpora of various sizes | Universities of Southampton and Newcastle (UK) research project | oral |
German | The MERLIN Corpus | 1,033 texts; in progress | Merlin project funded by EU Lifelong Learning Programme | written; texts from standardized tests; CEFR ranked |
German | Corpus of Learner German (CLEG13) | 319,421 tokens | Lancaster University (UK) | written; L1 English |
Italian | The MERLIN Corpus | 803 texts; in progress | Merlin project funded by EU Lifelong Learning Programme | written; texts from standardized tests; CEFR ranked |
Russian | Russian Learner Corpus (RLC) | oral and written; L2 learners of Russian and heritage speakers of Russian. |
||
Russian | Corpus of Russian Student Texts (CoRST) | 3.1 million tokens | written | |
Russian | Russian Learner Translator Corpus (RusLTC) | written; parallel learner translator corpus | ||
Spanish | Corpus Escrito del Espanol 2 (CREDEL2) | 500,000; in progress | University of Madrid and Granada (Spain) | L1 English |
Spanish | Multilingual Corpus | in progress | National Cheng Kung University (Taiwan) | L1 Taiwanese (L2 Spanish, German, Chinese, Japanese) |
English Learner Corpora
English | Name of Corpus | Size | Author/Institution | Info |
Written Corpus of Learner English (WRICLE) | 750,000 words | Universidad Autonoma de Madrid | L1 Spanish; written | |
The International Corpus of Learner English (ICLE 2) | 3.7 million words | Sylviane GRANGER, Estelle DAGNEAUX, Fanny MEUNIER & Magali PAQUOT (eds.) | EFL writing; written | |
JEFLL Corpus | 1 million words | NetAdvance Inc | English writing produced by 10,000 people; written | |
Thai Learner English Corpus | 1.5 million words | Faculty of Arts, Chulalongkorn University | written | |
The PELCRA Learner English Corpus (PLEC) | 3 million words | The University of Łódź | written and spoken | |
The NICT Japanese Learner English (JLE) Corpus | 1.2 million words, 300 hours of recording | National Institute of Information and Communications Technology | written and spoken | |
European Corpus of Academic Talk | 58,834 words accessible online | University of Extremadura, Birmingham University, University of Limerick, University of Dalarna and the VU University Amsterdam | spoken | |
Russian Error-Annotated Learner English Corpus | 200,000 tokens | Elizaveta Kuzmenko, Andrey Kutuzov | written | |