El corpus ESLORA de español oral: diseño, desarrollo y explotación

Mario Barcala, Eva Domínguez, Alba Fernández, Raquel Rivas, Maria Paula Santalla, Victoria Vázquez, Rebeca Villapol

Abstract


ESLORA is a corpus of Spanish made up of semi-directed interviews and spontaneous conversations recorded in Galicia between 2007 and 2015. The design and construction of the corpus meets three objectives: to register the use of a variety of Spanish which to date has been scarcely documented, to gain additional insight into the methods for the construction of spoken corpora, and to develop computational tools for corpus search.  The paper presents the main characteristics of ESLORA and the criteria followed in the corpus building process. It also includes a brief description of the tools used to build the corpus and how they work together to achieve the project needs and, moreover, it shows that the decisions taken at various stages of the compilation of the corpus are closely related to the wide range of possibilities for retrieving the lexical, grammatical and contextual information provided by the materials.


Keywords


spoken corpus; semi-directed interview; conversation; Galician Spanish; POS-tagging

Full Text:

PDF (Español)

References


Biber, D. 1995. Dimensions of register variation: A cross-linguistic comparison. Cambridge & New York: Cambridge University Press.

Biber, D. & Conrad S. 2009. Register, genre, and style. Cambridge: Cambridge University Press.

EAGLES. 1996. Recommendations for the Morphosyntactic Annotation of Corpora. EAGLES Document EAG–TCWG–MAC/R. http://www.ilc.cnr.it/EAGLES96/browse.html (consultado el 20 de junio de 2018).

Garside, R., Leech, G. & McEnery, T. (eds.) 1997. Corpus annotation. Linguistic information from computer text corpora. London & New York: Routledge.

Labov, W. 1966. The Social Stratification of English in New York City. Washington, D.C.: Center of Applied Linguistics.

Labov, W. 1972. Sociolinguistic patterns. Philadelphia: University of Pennsylvania Press.

Labov, W. 1981. Field methods of the project on linguistic change and variation, Sociolinguistic Working Paper nr. 81, Southwest Educational Development Laboratory, Austin, Texas.

Labov, W. 2001. The anatomy of style-shifting. En P. Eckert & J. R. Rickford (eds.), Style and Sociolinguistic Variation. Cambridg:, Cambridge University Press, 85-108.

Moreno Fernández, F. 2006. Información básica sobre el Proyecto para el Estudio Sociolin-güístico del Español de España y de América – PRESEEA (1996-2010). Revista Española de Lingüística 36: 385-392.

Moreno Fernández, F. 2016. En torno a preseea: Notas de investigación y de sociología de la ciencia. Boletín de filología 51(2): 369-376.

Preston, D. 2013. Linguistic Insecurity Forty Years Later. Journal of English Linguistics 41(4): 304-331.

Recalde Fernández, M. & Vázquez Rozas V. 2009. Problemas metodológicos en la formación de corpus orales". En P. Cantos Gómez & A. Sánchez Pérez (eds), A Survey of Corpus-based Research. Panorama de investigaciones basadas en corpus. Murcia: AELINCO, 37-49. https://www.um.es/lacell/aelinco/contenido/pdf/4.pdf.

Recalde Fernández, M. 2012. Aproximación a las representaciones sociales del español de Galicia. En T. Jiménez Juliá, B. López Meirama, V. Vázquez Rozas & A. Veiga (eds.), Cum corde et in nova grammatica: estudios ofrecidos a Guillermo Rojo. Santiago de Compostela: Servizo de Publicacións e Intercambio Científico, Universidade de Santiago de Compostela, 667-680.

Sampson, G. 2000. CHRISTINE Corpus: Documentation. Disponible en http://www.grsampson.net/ChrisDoc.html


Refbacks

  • There are currently no refbacks.




CHIMERA Romance Corpora and Linguistic Studies

ISSN: 2386-2629