Learner Corpora

LanguageName of CorpusSizeAuthor/InstitutionInfo
ArabicThe project of Arabic Learner Corpus (ALC)282,732 wordsAbdullah Alfaifi and Eric Atwella collection of written and spoken materials produced by learners of Arabic in Saudi Arabia
ChineseGuangwai-Lancaster Chinese Learner Corpus (CLC)1,2 million words Guangdong University of Foreign Studies and Lancaster University (UK)
spoken and written
CzechThe Corpus of Czech as a Second Language



2 million wordsCharles University, Prague(1) Texts written by L2 learners of Czech (ciz)
(2) Academic texts written by intern. L2 Czech students in Czech (kval)
(3) Texts written by Czech students with Romani background (rom)
CzechThe MERLIN Corpus437 texts; in progressMerlin project funded by EU Lifelong Learning Programmewritten; texts from standardized tests; CEFR ranked
EstonianEstonian Interlanguage CorpusTallin University written
FinnishAdvanced Finnish Learner Corpus288,539 tokensUniversity of Turkuwritten; restricted access
FrenchFrench Learner Language Oral Corporaseveral corpora of various sizesUniversities of Southampton and Newcastle (UK) research projectoral
GermanThe MERLIN Corpus1,033 texts; in progressMerlin project funded by EU Lifelong Learning Programmewritten; texts from standardized tests; CEFR ranked
GermanCorpus of Learner German (CLEG13)319,421 tokensLancaster University (UK)written; L1 English
ItalianThe MERLIN Corpus803 texts; in progressMerlin project funded by EU Lifelong Learning Programmewritten; texts from standardized tests; CEFR ranked
RussianRussian Learner Corpus (RLC)
oral and written; L2 learners of Russian and heritage speakers of Russian.
RussianCorpus of Russian Student Texts (CoRST)
3.1 million tokenswritten
RussianRussian Learner Translator Corpus (RusLTC)written; parallel learner translator corpus
SpanishCorpus Escrito del Espanol 2 (CREDEL2)


500,000; in progressUniversity of Madrid and Granada (Spain)L1 English
SpanishMultilingual Corpusin progressNational Cheng Kung University (Taiwan)L1 Taiwanese (L2 Spanish, German, Chinese, Japanese)

English Learner Corpora

English Name of Corpus SizeAuthor/InstitutionInfo
Written Corpus of Learner English (WRICLE)750,000 wordsUniversidad Autonoma de MadridL1 Spanish; written
The International Corpus of Learner English (ICLE 2) 3.7 million wordsSylviane GRANGER, Estelle DAGNEAUX, Fanny MEUNIER & Magali PAQUOT (eds.)EFL writing; written 
JEFLL Corpus1 million wordsNetAdvance IncEnglish writing produced by 10,000 people; written
Thai Learner English Corpus1.5 million words Faculty of Arts, Chulalongkorn Universitywritten
The PELCRA Learner English Corpus (PLEC)3 million wordsThe University of Łódź written and spoken
The NICT Japanese Learner English (JLE) Corpus1.2 million words, 300 hours of recording National Institute of Information and Communications Technologywritten and spoken
European Corpus of Academic Talk58,834 words accessible onlineUniversity of Extremadura, Birmingham University, University of Limerick, University of Dalarna and the VU University Amsterdamspoken
Russian Error-Annotated Learner English Corpus200,000 tokensElizaveta Kuzmenko, Andrey Kutuzovwritten
Print Friendly, PDF & Email