Baliabideak eta tresnak



With the main goal of achieving qualitative improvements in translation quality from the state-of-the-art MT systems, this project will focus on the following two research lines:

  1. Exploitation of new resources provided by the Internet. In two directions:
    • Off-line enrichment of resources oriented to machine translation: for instance, the usage of comparable parallel corpora automatically gathered from the Internet, and the collection of specialized lexicons using Wikipedia and its metadata (entities, multiword terms, categories, multilingual links, etc.).
    • On-line gathering of multilingual information to improve translation, especially on unknown words: for instance by accessing sources of multilingual information that are updated very frequently (Twitter, Wikipedia, news, etc.).
  2. Extension of the contextual information used in translation beyond the sentence.
    • Document level translation (not sentence by sentence). Interestingly, that would lead to global document translations showing better discursive coherence. For instance, by translating in a consistent way all the terms which co-refer in a document.
    • Exploitation of non-textual meta-information available in documents. For instance by using thematic or domain labels, information extracted from the web links, or, in the case of text from software applications, the context in which it appears (translation can vary drastically if the text appears in a paragraph, a link, a button, or a menu). This research line could improve lexical selection and domain adaptation of current translation systems.

In order to evaluate the developments from the previous research lines (1 and 2) the project will work with texts from three different application domains: Wikipedia articles, Twitter messages and software (localization and translation of user manuals). Translation tools have been applied already to these three cases, showing significant benefits. This project aims at providing improvements able to perform even a larger positive impact on MT in the near and mid-term future.


IXA taldea - EHU-UPV

HITZ BESTE: Ez adiorik, Xuxen-zale amorratu hori 2017-Api-17

HAP Masterreko 7 ikasle-ohi IKERGAZTE biltzarrean 2017-Api-09

Mikel Artetxe-k saria jaso du Bartzelonako HP Hackatoian 2017-Mar-01

Mintegia: Jarreren sailkapena sare sozialetan (A. Zubiaga, 2017-02-14) 2017-Ots-10

IXA-ko berri gehiago ikusi   

20/07/2016 - Anna Raboshchuck PhD Dissertation

PhD scholarship on Deep Learning applied to Speech Technologies

Joan Bruna & Oriol Vinyals Seminar on Deep Learning

27/04/2016 - Hermann Ney - Human Language Technology and Machine Learning

UPC-ko berri gehiago ikusi