Baliabideak eta tresnak
With the main goal of achieving qualitative improvements in translation quality from the state-of-the-art MT systems, this project will focus on the following two research lines:
- Exploitation of new resources provided by the Internet. In two directions:
- Off-line enrichment of resources oriented to machine translation: for instance, the usage of comparable parallel corpora automatically gathered from the Internet, and the collection of specialized lexicons using Wikipedia and its metadata (entities, multiword terms, categories, multilingual links, etc.).
- On-line gathering of multilingual information to improve translation, especially on unknown words: for instance by accessing sources of multilingual information that are updated very frequently (Twitter, Wikipedia, news, etc.).
- Extension of the contextual information used in translation beyond the sentence.
- Document level translation (not sentence by sentence). Interestingly, that would lead to global document translations showing better discursive coherence. For instance, by translating in a consistent way all the terms which co-refer in a document.
- Exploitation of non-textual meta-information available in documents. For instance by using thematic or domain labels, information extracted from the web links, or, in the case of text from software applications, the context in which it appears (translation can vary drastically if the text appears in a paragraph, a link, a button, or a menu). This research line could improve lexical selection and domain adaptation of current translation systems.
In order to evaluate the developments from the previous research lines (1 and 2) the project will work with texts from three different application domains: Wikipedia articles, Twitter messages and software (localization and translation of user manuals). Translation tools have been applied already to these three cases, showing significant benefits. This project aims at providing improvements able to perform even a larger positive impact on MT in the near and mid-term future.
IXA taldea - EHU-UPV
Hizkuntzaren Prozesamendua ikasgaiko praktika ikusgarri bat
Tesia: Korreferentzia-ebazpena euskarazko testuetan (Ander Soraluze, 2017-07-12)
Tesia: CLIR teknikak baliabide urriko hizkuntzetarako (Xabier Saralegi, 2017-07-11)
Tesia: Multilingual Opinion Mining (Aitor García Pablos, 2017-07-11)
IXA-ko berri gehiago ikusi
25/07/2017 - Pranava Swaroop Madhyastha PhD dissertation
TALP Talk: Generative adversarial networks (GAN) applied to Speech Enhancement
Neural Machine Translation
Deep Dive in Deep Learning with TensorFlow
UPC-ko berri gehiago ikusi