The American English spontaneous speech minicorpus. Architecture and comparability
Keywords:Spontaneous speech, Language into Act Theory, information structure, corpus annotation
This paper presents the American English (AE) minicorpus, a spontaneous speech resource created within the auspices of the C-ORAL-BRASIL project consisting of texts selected from the Santa Barbara Corpus of Spoken American English. We focus on the sampling strategy that guided the selection of texts, the transcription criteria that were implemented and the prosodic and informational annotation carried on the AE minicorpus. The minicorpus was designed to be comparable to the minicorpora of the C-ORAL projects for Italian and Brazilian Portuguese, which were conceived to allow the study of information structure in spontaneous speech in accordance with the principles of the Language into Act Theory. This theory comprises a pragmatic framework for the study of spontaneous speech and it integrates the IPO approach into its prosodic model. The IPO approach consists of a perception-based model for the study of intonation, providing an apparatus for the description and classification of melodic contours observed in spontaneous speech.