Vol. 5 No. 1 (2018)

“HARTA” de noveles: un corpus de español académico

Universidad de León
Published October 25, 2018


corpus linguistics, learner corpus, academic writing, lexical combinations, writing assistant


The aim of this review is to account for the process of compilation and codification of the corpus HARTA-Noveles. This corpus was created as part of the research project titled “Corpus-based study of lexical combinations of academic Spanish for the development of a computational tool for academic writing assistance” (HARTA)[1], under the direction of Margarita Alonso Ramos (University of La Coruña). The corpus consists of representative samples of essays produced by Spanish university students and gathered with the purpose of studying academic lexical combinations (CLA)[2], i.e., recurrent segments specific to the academic domain, along with collocations, discourse markers and other multiword expressions. Inspired by the BAWE corpus (British Academic Written English), HARTA-No-veles is formed exclusively by final project texts (for the degrees) and dissertations (for the masters) selected from different public repositories of Spanish universities and from various scientific domains. These texts have been annotated with an specific system adapted from that followed by the Royal Academy in CORPES[3].

[1] Acrónimo del nombre en español del proyecto: “Estudio de las combinaciones léxicas del español académico basado en corpus para una Herramienta de Ayuda a la Redacción de Textos Académicos” (HARTA).

[2] En español, combinaciones léxicas académicas (CLA).

[3] Corpus del Español del Siglo XXI (CORPES).


