Large language models (LLMs) are at the core of the current AI revolution, and have laid the
groundwork for tremendous advancements in Natural Language Processing. Building LLMs require
huge amounts of data, which is not available for low resource languages. As a result, LLMs shine in
high-resource languages like English, but lag behind in many others, especially in those where
training resources are scarce, including many regional languages in Europe.
The data scarcity problem is usually alleviated by augmenting the training corpora in the target