Vol. 10 (2023)

CHROME Corpus: Multi-level annotation of an audiovisual corpus

Iolanda Alfano
Università di Salerno
Published July 24, 2023


corpus collection, guided tours, linguistic annotation, multimodal annotation
How to Cite
Alfano, I., Cataldo, V., & Schettino, L. (2023). CHROME Corpus: Multi-level annotation of an audiovisual corpus. CHIMERA: Romance Corpora and Linguistic Studies, 10, 135–153. Retrieved from https://revistas.uam.es/chimera/article/view/16013


In this paper we describe the methodology employed for the annotation of a resource developed within the CHROME (Cultural Heritage Resources Orienting Multimodal Experience) project, aimed at the protection and promotion of cultural heritage. More specifically, the ultimate goal of the project is the modelling of multimodal data (including speech features and gestures) for the design of a virtual agent serving in museums and capable of communicating in intelligible as well as effective and natural way. In order to grasp the relationship between linguistic and gestural behaviours, multi-level annotation systems have been developed and implemented for the labelling of linguistic and gesture features on different levels of analysis. This article is dedicated to a general presentation of the corpus and to the description of the different levels of linguistic annotation; then, the final section, reports conclusive remarks considering the applications of the described methodology. The CHROME corpus and the mark-up methodology described in this work represent valuable multimodal resources for investigations on communicative dynamics which may offer valid support for both theoretical and practical applications.


Download data is not yet available.


Albano Leoni, F. & Maturi, P. 2002. Manuale di fonetica. Roma: Carocci.

Alfano, I. 2019. Methodological and practical issues in studying intonation. The case of requests in Italian and Spanish task-oriented dialogues. Studi AISV 5. Milano: Officinaventuno.

Alfano, I., Cataldo, V. Orrico, R. & Schettino, L. 2021. Sentence topics in Italian: An analysis on the CHROME Corpus. Loquens 8(1-2): e083.

Allwood, J. 2008. Multimodal Corpora. In A. Lüdeling & M. Kytö (eds), Corpus Linguistics. An International Handbook. Berlin: Mouton de Gruyter, 207-225.

Boersma, P. & Weenink. 2019. Praat: doing phonetics by computer [Computer program]. Version 6.1.08. http://www.praat.org/ (accessed December 5, 2019)

Campisi, E., 2018. Che cos’è la gestualità. Roma: Carocci.

Chiera, A., Ansani, A., Sessa, I., Cataldo, V., Schettino, L. & Poggi, I. 2023. Gestures and pauses to help thought: hands, voice, and silence in the tourist guide's speech. Cognitive Processing 24: 25-41.

Cataldo, V., Schettino, L., Savy, R., Poggi, I., Origlia, A., Ansani, A., Sessa, I. & Chiera, A. 2019. Phonetic and functional features of pauses, and concurrent gestures, in tourist guides’ speech. In D. Piccardi, F. Ardolino & S. Calamai (eds), Atti del XV Convegno Nazionale AISV. Gli archivi sonori al crocevia tra scienze fonetiche, informatica umanistica e patrimonio digitale. Studi AISV 6, 205-231.

Cresti, E. & Moneglia, M. 2018. The illocutionary basis of Information Structure. Language into Act Theory (L-AcT). In E. Adamou, K. Haude, & M. Vanhove (eds), Information Structure in Lesser-described Languages: Studies in Prosody and Syntax. Amsterdam: John Benjamins, 359-401.

Degand, L. & Simon, A.C. 2009. On identifying basic discourse units in speech: theoretical and empirical issues. Discours 4. http://discours.revues.org/index5852.html (accessed January 7, 2022).

Elfner, E. 2018. The syntax-prosody interface: Current theoretical approaches and outstanding questions. Linguistics Vanguard 4(1): 1-14.

Eklund, R. 2004. Disfluency in Swedish human-human and human-machine travel booking dialogues. PhD diss., Linköping University: Electronic Press.

Firenzuoli, V. & Tucci, I. 2003. L’unità informativa di inciso: correlati intonativi. In G. Marotta & N. Nocchi (eds), La coarticolazione. Atti delle XIII giornate di studio del Gruppo di fonetica sperimentale (AIA), Pisa: ETS, 185-192.

Ginzburg, J., Fernández, R. & Schlangen, D. 2014. Disfluencies as intra-utterance dialogue moves. Semantics and Pragmatics 7(9): 1-64.

Gundel, J. 1988. Universals of topic-comment structure. In M. Hammond, E. Moravcsik, & J. Wirth (eds), Studies in Syntactic Typology. Amsterdam: John Benjamins, 209-239.

Hirst, D. & Di Cristo, A. 1998. A survey of intonation systems. In D. Hirst & A. Di Cristo (eds), Intonation systems: a survey of twenty languages. Cambridge: Cambridge University Press, 1-44.

Kisler, T., Reichel, U. & Schiel, F. 2017. Multilingual processing of speech via web services. Computer Speech & Language 45: 326-347.

Kjellmer. G. 2003. Hesitation. in defence of er and erm. In English Studies 84(2): 170-198.

Krifka, M. 2008. Basic notions of information structure. Acta Linguistica Hungarica 55(3-4): 243-276.

Lambrecht, K. 1994. Information Structure and Sentence Form:Topic Focus and the Mental Representation of Discourse Referents. Cambridge: Cambridge University Press.

McEnery T. & Gabrielatos C. 2006. English Corpus Linguistics. In B. Aarts & A. McMahon (eds), The Handbook of English Linguistics. Oxford: Blackwell, 33-71.

Origlia, A. & Alfano, I. 2012. Prosomarker: a prosodic analysis tool based on optimal pitch stylization and automatic syllabification. In N. Calzolari, K. Choukri, T. Declerck, M. U?ur Do?an, B. Maegaard, J. Mariani, A. Moreno, J. Odijk & S. Piperidis (eds), Proceedings of LREC 2012, Istanbul, Turkey, 21-27 May 2012, 997-1002.

Origlia, A., Savy, R., Poggi, I., Cutugno, F., Alfano, I., D’Errico, F., Vincze, L., & Cataldo, V. 2018. An audiovisual corpus of guided tours in cultural sites: Data collection protocols in the chrome project. In Proceedigs of the AVI-CH Workshop on Advanced Visual Interfaces for Cultural Heritage. Grosseto, Italy.

Origlia, A., Savy, R., Cataldo, V., Schettino, L., Ansani, A., Sessa, I., Chiera, A. & Poggi, I. 2019. Human, all too human. Towards a disfluent Virtual Tourist Guide. In Adjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization. Larnaca, Cyprus, 9-12 June 2019, 393-399.

Reinhart, T. 1981. Pragmatics and linguistics: An analysis of sentence topics in pragmatics and philosophy. Philosophica 27(1): 53-94.

Savy, R. 1999. Riduzioni foniche nel parlato spontaneo: il ruolo della morfologia flessiva nell’interpretazione del messaggio e nella comunicazione. PhD diss., University of Roma Tre.

Savy, R. 2005a. Specifiche per la trascrizione ortografica annotata dei testi. In F. Albano Leoni & R. Giordano (eds), Italiano Parlato. Analisi di un dialogo. Napoli: Liguori.

Savy, R. 2005b. Specifiche per l’etichettatura dei livelli segmentali. In F. Albano Leoni & R. Giordano (eds), Italiano Parlato. Analisi di un dialogo. Napoli: Liguori.

Schettino, L., Betz, S., Cutugno, F., Wagner, P. 2021a. Hesitations and individual variability in Italian tourist guides’ speech. In C. Bernardasci, D. Dipino, D. Garassino, S. Negrinelli, E. Pellegrino & S. Schmid (eds), Atti del XVII Convegno Nazionale AISV. Speaker Individuality in Phonetics and Speech Sciences: Speech Technology and Forensic Applications. Studi AISV 8, 243-262.

Schettino, L., Betz, S., & Wagner, P. 2021b. Hesitations distribution in italian discourse. Proceedings of the 10th Workshop on Disfluency in Spontaneous Speech (DiSS 2021), 29-34.

Schettino, L. 2022. The Role of Disfluencies in Italian Discourse. Modelling and Speech Synthesis Applications. PhD diss., University of Salerno.

Schettino, L., & Cataldo, V. 2019. Lexicalized pauses in Italian. In A. Botinis (ed.), Proceedings of the 10th International Conference of Experimental Linguistics (ExLing 2019), 189-192.

Sloetjes, H. & Wittenburg, P. 2008. Annotation by category-ELAN and ISO DCR. In 6th international Conference on Language Resources and Evaluation (LREC 2008). Max Planck Institute for Psycholinguistics, The Language Archive, Nijmegen, The Netherlands. https://tla.mpi.nl/tools/tla-tools/elan/ (accessed November 16, 2018)

Tottie, G. 2020. Word-Search As Word-Formation?: The Case Of “Uh” And “Um”. In P. Núñez-Pertejo, M.J. López-Couso, B. Méndez-Naya & J. Pérez-Guerra (eds), Crossing linguistic boundaries: systemic, synchronic and diachronic variation in English. London: Bloomsbury Academic, 29-42.

Voghera, M. & Turco, G. 2008. Il peso del parlare e dello scrivere. In M. Pettorino, A. Giannini, M. Vallone & R. Savy (eds), La comunicazione parlata. Napoli: Liguori, 727-760.

Voghera, M. 2017. Dal parlato alla grammatica. Roma: Carocci.