Testuen analisia

Curriculum Learning for large language models in low-resource languages

Large language models (LLMs) are at the core of the current AI revolution, and have laid the groundwork for tremendous advancements in Natural Language Processing. Building LLMs require huge amounts of data, which is not available for low resource languages. As a result, LLMs shine in high-resource languages like English, but lag behind in many others, especially in those where training resources are scarce, including many regional languages in Europe. The data scarcity problem is usually alleviated by augmenting the training corpora in the target

Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation

Counter Narratives (CNs) are non-negative textual responses to Hate Speech (HS) aiming at defusing online hatred and mitigating its spreading across media. Despite the recent increase in HS content posted online, research on automatic CN generation has been relatively scarce and predominantly focused on English. In this paper, we present CONAN-EUS, a new Basque and Spanish dataset for CN generation developed by means of Machine Translation (MT) and professional post-edition.

Adimen Artifizial eta Hizkuntza Teknologiako HiTZ Katedra

Adimen Artifizial eta Hizkuntza Teknologiako HiTZ Katedrak asmo handiko programa du, eta bere helburuen artean hizkuntzaren teknologia lidergoa indartzea da, gure herria abangoardia teknologikoan kokatuz. Horretarako, bi oinarri ditu: batetik, UPV/EHUko HiTZ Hizkuntzaren Teknologiako Euskal Zentroaren bikaintasun zientifikoa eta irakaskuntza arloan, UPV/EHUko Informatika Fakultatearekin lankidetzan.

Pages

Subscribe to RSS - Testuen analisia