CHIMERA: Revista de Corpus de Lenguas Romances y Estudios Lingüísticos 2023-12-11T16:19:37+01:00 Carlota Nicolás Open Journal Systems <p style="text-align: justify;">CHIMERA es una revista científica con doble revisión anónima de ámbito internacional y que publica <strong>estudios basados en corpus sobre lenguas romances</strong>.</p> <p style="text-align: justify;">El objetivo de la revista es la difusión internacional de investigaciones teóricas y aplicadas de calidad científica probada, especialmente aproximaciones innovadoras al análisis de lenguas romances. CHIMERA también pretende promover una mejor conexión entre las comunidades académicas dedicadas a las lenguas romances, tanto europeas como americanas.</p> <p>Se aceptan originales centrados en el análisis de corpus tanto de lengua escrita como oral desde una amplia variedad de perspectivas teóricas y sobre cualquier área lingüística. La revista también publica reseñas de libros relacionados con su temática, corpus (desarrollo de recursos y etiquetados) y herramientas de análisis de corpus.</p> <p>CHIMERA publica artículos escritos en lenguas románicas y en inglés. No hay cobro por tasas por envío de trabajos ni cuotas por la publicación de artículos.</p> Una aplicación para explorar la frecuencia léxica a partir de corpus de referencia 2022-09-12T11:35:58+02:00 Mario Casado-Mancebo <p>La frecuencia léxica o cuánto se utiliza una palabra frente a otros conjuntos en una lengua es un factor fundamental en las tareas de lectura y procesamiento de textos. Esto se ha demostrado mediante investigación experimental tanto con adultos como con niños. Estos estudios han mostrado la estrecha relación entre la comprensión lectora, la habilidad léxica y la descodificación léxica. En este trabajo se presenta una aplicación en línea para explorar la frecuencia léxica en textos en español en función de una serie de corpus de referencia. La versión actual incluye los tres corpus de la Real Academia Española (CORDE, CREA y CORPES XXI), lo que permite realizar investigaciones tanto diacrónicas como sincrónicas. Esta aplicación permite al usuario enviar textos (o palabras) para procesar y devuelve una tabla con información variada sobre la frecuencia de cada forma del texto. La información de frecuencia incluye el orden de la forma en la lista de frecuencias y las frecuencias absoluta y normalizada. Los resultados se pueden descargar fácilmente para ser utilizados en herramientas externas. The paper analyzes the inflectional system of nouns in Italian, a language where nouns are inflected for gender and number and are organized into different inflectional classes. The DeGNI lexical database (De Martino et al., 2019) was interrogated in order to obtain measures of the distribution of genders, gender suffixes and declensional patterns of the Italian nouns. In the second paragraph our method for lexical inquiry is presented together with its core concepts, which are textual Corpus Representativeness, Connotation, Connotation Rate (<em>Quoziente Connotativo</em>, QC) and word Position in the Center-Periphery Vocabulary Model. The third paragraph sketches two possible research lines, the first one regarding the lexicon of a given historical period (Old Italian), the second dealing with the comparison between two different linguistic historical phases (Old Italian vs. Contemporary Italian).</p> 2023-03-01T00:00:00+01:00 Derechos de autor 2023 CHIMERA: Revista de Corpus de Lenguas Romances y Estudios Lingüísticos Using keywords in the automatic classification of language of gender violence 2022-09-29T23:59:02+02:00 Héctor Castro Mosqueda Antonio Rico Sulayes <div> <p class="CHIMERAabstractkeywordsGrassettoNonGrassetto"><span lang="EN-US">This paper employs lexical analysis tools, quantitative processing methods, and natural language processing procedures to analyze language samples and identify lexical items that support automatic topic detection in natural language processing. This paper discusses how keyword extraction, a technique from corpus linguistics, can be employed in obtaining features that improve automatic classification; in particular, this research is concerned with extracting keywords from a corpus obtained from social networks. The corpus consists of 1,841,385 words and is subdivided into three sub-corpora that have been categorized according to the topic of the comments in each one of them. These three topics are violence against women, violence against the LGBT community, and violence in general. The corpus has been obtained by scraping comments from YouTube videos that address issues such as street harassment, femicide, feminist movements, drug trafficking, forced disappearances, equal marriage, among others. The topic detection tasks performed with the corpus extracted from the social media showed that the keywords rendered a 98% accuracy when classifying the collection of comments from 51 videos, as one of the three categories mentioned above, and 92% when classifying almost 7,500 comments individually. When keywords were removed from the classification task and all words were used to perform the classification task, accuracy dropped by an average of 17%. These results support the argument for keyword relevance in automatic topic detection. Le risorse interagiscono grazie alla rappresentazione dei loro (meta)dati tramite ontologie e vocabolari condivisi, in accordo con i principii del paradigma Linked Data. Dopo aver presentato l'architettura di LiLa, che è basata su una raccolta di forme di citazione di parole latine, l'articolo descrive la modellizazione ontologica di ciascuna delle risorse lessicali e testuali attualmente connesse a LiLa. Infine,è riportata una serie di considerazioni in merito alle prospettive di lavoro relative a LiLa. This section gathers 1184 inscriptions (14413 tokens) from the island, covering a broad time span (from the first century BCE to the seventh century CE), including several text types (public and private inscriptions, as well as sacred and funerary texts). The results of our examination helped us to determine the salient features of the variety of Latin spoken in Sardinia, which on the one hand foreshadows the Romance outcomes of the Sardinian varieties, and on the other hand, enables us to highlight common linguistic features between Sardinia and Africa. L'obiettivo del lavoro è quello di indagare se e in che misura le strategie impiegate in LS presentino pattern pragmatici riconducibili a quelli propri della L1 (italiano), propri della lingua target (LT, spagnolo, tedesco) o, piuttosto, caratteristiche legate alla competenza linguistica e strategica in LS (quindi indipendenti dalla L1 e dalla LT). Consideriamo l'articolazione della struttura testuale, le preferenze e le "dispreferenze" accordate per introdurre e gestire argomenti del discorso, insieme a un grado di fluenza basato su alcuni parametri temporali. I nostri risultati indicano che il parlato in LS presenta una struttura testuale meno elaborata e più frammentata rispetto al parlato nativo: le entità topicali trattate tendono a essere disposte linearmente, non gerarchicamente. Allo stesso tempo, i "giochi conversazionali", sebbene sempre meno approfonditi, si concludono in media impiegando un numero maggiore di mosse, nel contesto di una generale lentezza di elaborazione, minore fluenza generale e difficoltà di gestione dell'interazione. In order to grasp the relationship between linguistic and gestural behaviours, multi-level annotation systems have been developed and implemented for the labelling of linguistic and gesture features on different levels of analysis. This article is dedicated to a general presentation of the corpus and to the description of the different levels of linguistic annotation; then, the final section, reports conclusive remarks considering the applications of the described methodology. The CHROME corpus and the mark-up methodology described in this work represent valuable multimodal resources for investigations on communicative dynamics which may offer valid support for both theoretical and practical applications. To examine underexplored questions of joint utterance construction in Mandarin Chinese, 17 natural two-party face-to-face conversation recordings from 30 Mandarin Chinese speakers are utilized. The findings indicate that anticipated completion can be observed in Mandarin Chinese in diverse activity contexts. The data exhibit six functions. These functions are identified in relation to their different forms and the position inside interactive conversations. The results show that multiple resources, including syntactic, lexical, and prosodic features are important for accomplishing anticipatory completion and play a key role in understanding the functions of anticipatory completion. The analysis is centered on textual parameters, encompassing various phenomena related to text segmentation and three dimensions of text organization (the referential-thematic dimension, the logico-argumentative dimension, and the polyphonic-enunciative dimension). Results of different case studies based on a self-assemble corpus of biographies generated by ChatGPT-3.5 and published on Wikipedia are presented. The analysis, grounded in the Language into Act Theory framework, explores the information structure of the speech and linguistic parameters influenced by prosody, such as utterance boundaries, information structure, speech disfluency, mean length of prosodic units, and speech rate. The study also employs Kita's model to analyze bodily movements, including gestures and self-adaptors, and their temporal relation with speech. Notable findings reveal that ASD speech is characterized by a monotonous information structure and prosodic contour, featuring slower and longer units with a limited rate variation and information type. On the gestural side, the ASD subject exhibits fewer gestures and more self-adaptors, with some instances of asynchrony between gestures and speech. This pilot study serves as a foundational step for a broader corpus-based project dedicated to exploring the development of pragmatic skills in individuals with ASD. Significant terms of the selections have been made that are attributed to the prevailing meanings, as well as a conspicuous appropriate phraseology to make better understand the semantic evolution that these entries have suffered during the centuries. Among the terms chosen deserves particular attention to the terms 'cittadinanza' and 'corruzione' that provides an important contribution for understanding the social changes and cultural influences that have taken place throughout Italy. The system attests the presence of these terms in the legislation, doctrine and practice in a time span ranging from 1377 to 1966.