Baliabideak eta tresnak



With the main goal of achieving qualitative improvements in translation quality from the state-of-the-art MT systems, this project will focus on the following two research lines:

  1. Exploitation of new resources provided by the Internet. In two directions:
    • Off-line enrichment of resources oriented to machine translation: for instance, the usage of comparable parallel corpora automatically gathered from the Internet, and the collection of specialized lexicons using Wikipedia and its metadata (entities, multiword terms, categories, multilingual links, etc.).
    • On-line gathering of multilingual information to improve translation, especially on unknown words: for instance by accessing sources of multilingual information that are updated very frequently (Twitter, Wikipedia, news, etc.).
  2. Extension of the contextual information used in translation beyond the sentence.
    • Document level translation (not sentence by sentence). Interestingly, that would lead to global document translations showing better discursive coherence. For instance, by translating in a consistent way all the terms which co-refer in a document.
    • Exploitation of non-textual meta-information available in documents. For instance by using thematic or domain labels, information extracted from the web links, or, in the case of text from software applications, the context in which it appears (translation can vary drastically if the text appears in a paragraph, a link, a button, or a menu). This research line could improve lexical selection and domain adaptation of current translation systems.

In order to evaluate the developments from the previous research lines (1 and 2) the project will work with texts from three different application domains: Wikipedia articles, Twitter messages and software (localization and translation of user manuals). Translation tools have been applied already to these three cases, showing significant benefits. This project aims at providing improvements able to perform even a larger positive impact on MT in the near and mid-term future.


IXA taldea - EHU-UPV

Itzulpengintza automatiko neuronala. Jardunaldi irekia Kyunghyun Cho adituarekin (2017-05-29) 2017-Mai-16

IKERGAZTE sari bat Begoņa Altunari 2017-Mai-16

Mintegia: itzulpen automatikoa eta postedizio kolaboratiboa proiektuan (I. Cortes, 2017-05-09, 15:00) 2017-Mai-08

HITZ BESTE: Ez adiorik, Xuxen-zale amorratu hori 2017-Api-17

IXA-ko berri gehiago ikusi   

TALP Talk: Generative adversarial networks (GAN) applied to Speech Enhancement

Neural Machine Translation

Deep Dive in Deep Learning with TensorFlow

Mapping Unseen Words to Task-Trained Embedding Spaces

UPC-ko berri gehiago ikusi