Vol. 8 (2021)
ARTICULOS

Risorse e applicazioni computazionali per l’accesso ai beni culturali: il Corpus CHerIDesCo

Gloria Gagliardi
Università di Napoli "L'Orientale"
Massimo Guarino
Università di Napoli "L'Orientale"
Publicado junio 18, 2021

Palabras clave:

BBCC, Cultural Heritage, NLP – Natural Language Processing
Cómo citar
Gagliardi, G., & Guarino, M. . (2021). Risorse e applicazioni computazionali per l’accesso ai beni culturali: il Corpus CHerIDesCo. CHIMERA: Revista De Corpus De Lenguas Romances Y Estudios Lingüísticos, 8, 25–43. https://doi.org/10.15366/chimera2021.8.002 (Original work published 17 de junio de 2021)

Resumen

The paper presents CHerIDesCo - Cultural Heritage - Italian Description Corpus, a domain-specific linguistic resource designed for the training and testing of novel NLP tools in the Cultural Heritage field. The corpus has been developed by the UNIOR NLP Research group as a part of the SMACH project, a three-year project funded by the National Operative Program to pursue the Smart Specialization Strategy defined by the EU. The project aims at improving language-based human-computer interaction in the Cultural Heritage domain through the development of innovative applications for multilingual access to the contents based on semantic language technologies. In particular, the paper describes the design of the CHerIDesCo corpus, the annotation procedures, and the platforms where the resource has been uploaded. As pointed out in the conclusion, this linguistic resource can be exploited in several NLP tasks (e.g., NER - Named-Entity Recognition, NEL - Named-Entity Linking, and Topic Modeling).

Descargas

Los datos de descargas todavía no están disponibles.

Citas

Aloia, N., Concordia, C. & Meghini, C. 2011. Europeana v1.0. In M. Agosti, F. Esposito, C. Meghini & N. Orio (eds), Digital Libraries and Archives. 7th Italian Research Conference, IRCDL 2011, Pisa, Italy, January 20-21, 2011. Revised Papers (Communications in Computer and Information Science, Vol. 249). Berlin - Heidelberg: Springer-Verlag, 127-129.

Aresti, A. & Lanini, L. 2020. Corpus LBC Italiano. Firenze: Firenze University Press.

Baroni, M. & Ueyama, M. 2006. Building general- and special-purpose corpora by Web crawling. In Proceedings of the 13th NIJL international symposium, language corpora: Their compilation and application, 31-40.

Bertinetto, P.M. & Ossola, C. 1982. Insegnare stanca. Esercizi e proposte per l’insegnamento dell’italiano. Bologna: il Mulino.

Billero, R., & Nicolás Martínez, M.C. 2017. Nuove risorse per la ricerca del lessico del patrimonio culturale: corpora multilingue LBC. CHIMERA Romance Corpora and Linguistic Studies 4(2): 203-216.

Blei, D.M., Ng, A.Y. & Jordan, M.I. 2003. Latent dirichlet allocation. The Journal of machine Learning research 3: 993-1022.

Chiarcos, C., McCrae, J., Cimiano, P. & Fellbaum, C. 2013.Towards open data for linguistics: Linguistic Linked Data. In A. Oltramari, P. Vossen, L. Qin & E. Hovy (eds), New Trends of Research in Ontologies and Lexical Resources. Ideas, Projects, Systems. Heidelberg - New York - Dordrecht - London: Springer, 7-25.

Chiarcos, C., Nordhoff, S. & Hellmann, S. (eds) 2012. Linked Data in Linguistics. Representing and Connecting Language Data and Language Metadata. Heidelberg - New York - Dordrecht - London: Springer.

de Marneffe, M. & Manning, C.D. 2008. The Stanford typed dependencies representation. In J. Bos, E. Briscoe, A. Cahill, J. Carroll, S. Clark, A. Copestake, D. Flickinger, J. van Genabith, J. Hockenmaier, A. Joshi, R. Kaplan, T. Holloway King, S. Kuebler, D. Lin, J. Tore Lønning, C. Manning, Y. Miyao, J. Nivre, S. Oepen, K. Sagae, N. Xue & Y. Zhang (eds), Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation. Coling 2008 Organizing Committee, 1-8.

de Marneffe, M., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J., & Manning, CD. 2014. Universal Stanford Dependencies: A cross-linguistic typology. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk & S. Piperidis (eds), Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation. Paris: European Language Resources Association (ELRA), 4585–4592.

de Marneffe, M., MacCartney, B. & Manning C.D. 2006. Generating typed dependency parses from phrase structure parses. In N. Calzolari, K. Choukri, A. Gangemi, B. Maegaard, J. Mariani, J. Odijk & D. Tapias (eds), Proceedings of LREC 2006: 5th International conference on Language Resources and Evaluation. Paris: European Language Resources Association (ELRA), 449-454.

Doerr, M. & Stead, S. 2011. Harmonized models for the Digital World: CIDOC CRM, FRBROO, CRMDig and Europeana EDM. In Tutorial. 15th International Conference on Theory and Practice of Digital Libraries – TPDL. Abstract.

Doerr, M. 2003. The CIDOC Conceptual Reference Module: An Ontological Approach to Semantic Interoperability of Metadata. AI Magazine 24(3):75-92.

Doerr, M. 2009. Ontologies for Cultural Heritage. In S. Staab & R. Studer (eds.) Handbook on Ontologies. Berlin-Heidelberg: Springer.

Ehrmann, M., Turchi, M. & Steinberger, R. 2011. Building a multilingual Named Entity Annotated corpus using annotation projection. In R. Mitkov & G. Angelova (eds), Proceedings of the International Conference Recent Advances in Natural Language Processing 2011. Stroudsburg (PA): Association for Computational Linguistics, 118-124.

Kilgarriff, A., Baisa, V., Bušta, J., Jakubí?ek, M., Ková?, V., Michelfeit, J., Rychlý, P. & Suchomel, V. 2014. The Sketch Engine: ten years on. Lexicography 1: 7-36.

Kilgarriff, A., Rychlý, P., Smrz, P. & Tugwell, D. 2004. The Sketch Engine. In G. Williams & S. Vessier (ed), Proceedings of the 11th EURALEX International Congress. Lorient, (France): Universite? de Bretagne-Sud, Faculte? des lettres et des sciences humaines, 105-115.

Loos, E.E., Anderson, S., Day, D.H., Jordan, P.C. & Wingate, J.D. 2003. Glossary of linguistic terms, SIL International.

Manzotti, E. 2009. La descrizione. Un profilo linguistico e concettuale. Nuova secondaria 4: 19-40.

Montemagni, S. & Simi, M. 2007. The Italian dependency annotated corpus developed for the CoNLL–2007 Shared Task. Technical report. Pisa: ILC–CNR.

Montemagni, S., Barsotti, F., Battista, M., Calzolari, N., Corazzari, O. Lenci, A., Zampolli, A., Fanciulli, F., Massetani, M., Raffaelli, R., Basili, R., Pazienza, M.T., Saracino, D., Zanzotto, F., Mana, N., Pianesi, F. & Delmonte, R. 2003. Building and the Italian Syntactic–Semantic Treebank. In A. Abeillé (ed), Treebanks. Building and Using Parsed Corpora. Dordrecht: Kluwer, 189-210.

Mortara Garavelli, B. 1988. Manuale di retorica. Milano: Bompiani.

Padó, S. & Lapata, M. 2009. Cross-lingual Annotation Projection of Semantic Roles. Journal of Artificial Intelligence Research 36:307-340.

Petrov, S., Das, D. & McDonald, R. 2012. A universal part-of-speech tagset. In N. Calzolari, K. Choukri, T. Declerck, M. U?ur Do?an, B. Maegaard, J. Mariani, A. Moreno, J. Odijk & S. Piperidis (eds.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation. Paris: European Language Resources Association (ELRA), 2089–2096.

Qi, P., Dozat, T., Zhang, Y. & Manning, C.D. 2018. Universal Dependency Parsing from Scratch. In D. Zeman & J. Haji? (eds), Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. Stroudsburg (PA): Association for Computational Linguistics, 160-170.

Qi, P., Zhang, Y., Zhang, Y., Bolton, J. & Manning C.D. 2020. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. In D. Jurafsky (ed), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg (PA): Association for Computational Linguistics, 101-108.

Roggia, C.E. 2011. Testi descrittivi. In: R. Simone (ed), Enciclopedia dell’Italiano. Roma: Istituto dell’Enciclopedia Italiana Treccani.

Spohr, D., Hollink, L. & Cimiano, P. 2011. A Machine Learning Approach to Multilingual and Cross-Lingual Ontology Matching. In L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. Noy & E. Blomqvist (eds), The Semantic Web – ISWC 2011 (Lecture Notes in Computer Science, vol 7031). Berlin - Heidelberg: Springer, 665-680.

Tiedemann, J. 2014. Rediscovering Annotation Projection for Cross-Lingual Parser Induction. In J. Tsujii & J. Hajic (eds), Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. Dublin City University & Association for Computational Linguistics, 1854-1864.

Tognini-Bonelli, E. 2001. Corpus Linguistics at Work. Amsterdam: John Benjamins Publishing.

UNESCO 1972. Convention concerning the protection of the world cultural and natural heritage. https://whc.unesco.org/en/conventiontext/ (ultimo accesso: 12 febbraio 2021).

Van Hooland, S., De Wilde, M., Verborgh, R., Steiner T. & Van de Walle, R., 2015. Exploring entity recognition and disambiguation for cultural heritage collections. Digital Scholarship in the Humanities 30(2): 262-279.

Withers, P. 2012. Metadata management with Arbil. In N. Calzolari, K. Choukri, T. Declerck, M. U?ur Do?an, B. Maegaard, J. Mariani, A. Moreno, J. Odijk & S. Piperidis (eds), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation. Paris: European Language Resources Association (ELRA), 72-75.

Zeman, D. 2008. Reusable Tagset Conversion Using Tagset Drivers. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis & D. Tapias (eds), Proceedings of LREC 2002: 6th International Conference on Language Resources and Evaluation. Paris: European Language Resources Association (ELRA), 213-218.