EU / EN

Project

Goal

Participants

Publications

Demos

Resources and Tools

References

Wiki

Goal


The interest in Machine Translation (MT) is still growing in 2012. The use of translation Web services is now widespread and end users are already familiar with the advantages and limitations of their usage. At the same time, the research community in language technologies shows a great interest in the MT field. For instance, in the last annual conference from the Association for Computational Linguistics (ACL-2011), the major reference in the Natural Language Processing area, the percentage of articles on MT was above 15%.

With the main goal of achieving qualitative improvements in translation quality from the state-of-the-art MT systems, this project will focus on the following two research lines:

  1. Exploitation of new resources provided by the Internet. In two directions:
    • Off-line enrichment of resources oriented to machine translation: for instance, the usage of comparable parallel corpora automatically gathered from the Internet, and the collection of specialized lexicons using Wikipedia and its metadata (entities, multiword terms, categories, multilingual links, etc.).
    • On-line gathering of multilingual information to improve translation, especially on unknown words: for instance by accessing sources of multilingual information that are updated very frequently (Twitter, Wikipedia, news, etc.).
  2. Extension of the contextual information used in translation beyond the sentence.
    • Document level translation (not sentence by sentence). Interestingly, that would lead to global document translations showing better discursive coherence. For instance, by translating in a consistent way all the terms which co-refer in a document.
    • Exploitation of non-textual meta-information available in documents. For instance by using thematic or domain labels, information extracted from the web links, or, in the case of text from software applications, the context in which it appears (translation can vary drastically if the text appears in a paragraph, a link, a button, or a menu). This research line could improve lexical selection and domain adaptation of current translation systems.

In order to evaluate the developments from the previous research lines (1 and 2) the project will work with texts from three different application domains: Wikipedia articles, Twitter messages and software (localization and translation of user manuals). Translation tools have been applied already to these three cases, showing significant benefits. This project aims at providing improvements able to perform even a larger positive impact on MT in the near and mid-term future.