Baliabideak eta tresnak
With the main goal of achieving qualitative improvements in translation quality from the state-of-the-art MT systems, this project will focus on the following two research lines:
- Exploitation of new resources provided by the Internet. In two directions:
- Off-line enrichment of resources oriented to machine translation: for instance, the usage of comparable parallel corpora automatically gathered from the Internet, and the collection of specialized lexicons using Wikipedia and its metadata (entities, multiword terms, categories, multilingual links, etc.).
- On-line gathering of multilingual information to improve translation, especially on unknown words: for instance by accessing sources of multilingual information that are updated very frequently (Twitter, Wikipedia, news, etc.).
- Extension of the contextual information used in translation beyond the sentence.
- Document level translation (not sentence by sentence). Interestingly, that would lead to global document translations showing better discursive coherence. For instance, by translating in a consistent way all the terms which co-refer in a document.
- Exploitation of non-textual meta-information available in documents. For instance by using thematic or domain labels, information extracted from the web links, or, in the case of text from software applications, the context in which it appears (translation can vary drastically if the text appears in a paragraph, a link, a button, or a menu). This research line could improve lexical selection and domain adaptation of current translation systems.
In order to evaluate the developments from the previous research lines (1 and 2) the project will work with texts from three different application domains: Wikipedia articles, Twitter messages and software (localization and translation of user manuals). Translation tools have been applied already to these three cases, showing significant benefits. This project aims at providing improvements able to perform even a larger positive impact on MT in the near and mid-term future.
IXA taldea - EHU-UPV
HITZ BESTE: Ez adiorik, Xuxen-zale amorratu hori
HAP Masterreko 7 ikasle-ohi IKERGAZTE biltzarrean
Mikel Artetxe-k saria jaso du Bartzelonako HP Hackatoian
Mintegia: Jarreren sailkapena sare sozialetan (A. Zubiaga, 2017-02-14)
IXA-ko berri gehiago ikusi
20/07/2016 - Anna Raboshchuck PhD Dissertation
PhD scholarship on Deep Learning applied to Speech Technologies
Joan Bruna & Oriol Vinyals Seminar on Deep Learning
27/04/2016 - Hermann Ney - Human Language Technology and Machine Learning
UPC-ko berri gehiago ikusi