Master Tesia

Multilingual Central Repository version 3.0: improving a very large lexical knowledge base
Daniel Parera Perez
Natural Language Processing (NLP) is a subfield of Artificial Intelligence (AI) that attempts to automatically process human languages. Usually advanced NLP applications require large and sophisticated se- mantic lexicons to model the vocabulary to be processed. Princeton WordNet (WN) [Fellbaum, 1998] is by far the most widely- used semantic resource in NLP. In fact, WordNet is being used world-wide for anchoring different types of semantic knowledge including wordnets for languages other than English. EuroWordNet is a architecture of semantic networks for European languages, based on WordNet, where each language develops its own wordnet. The MEANING project [Rigau et al., 2001] developed the first versions of the Multilingual Central Repository (MCR) [Atserias et al., 2004], follow- ing the EuroWordNet architecture, to maintain the compatibility between wordnets of different languages and versions. A previous improvement of MCR was developed within the KNOW2 project 1 [Aitor et al., 2012]. Now, in this work we develop a new release of the MCR in the framework of SKaTer 2 and TUNER projects. This structure of this master thesis is as follows: First, Chapter 2 presents the state-of-the art on Lexical Knowledge Bases used in NLP systems. Second, we describe the upgrading process carried out for improving the knowledge contained into the MCR. Chapter 3 presents the process for increasing the current coverage of the MCR by using different lexical resources. Chapter 4 introduces a new version of the Basic Level Concepts. Third, Chapter 5 summarizes all the additional modifications and im- provements included in the new MCR release. Furthermore, the MCR now integrates the Portuguese WordNet 3 (PULO) developed by Alberto Sim ̃ oes at the University of Minho. Thus, the cur- rent version of the MCR integrates in the same EuroWordNet framework wordnets from six different languages: English, Spanish, Catalan, Basque, Galician and Portuguese. Finally, we discuss about some concluding remarks and provide some lines for future research in chapter 6.
German Rigau Claramunt