Master Tesia

Exploring metrics for post-editing effort: and their ability to detect errors in machine translated output
Cristina Cumbreño Díez
As more companies integrate machine translation (MT) systems into their translation workflows, it becomes increasingly relevant to accurately measure post-editing (PE) effort. In this paper we explore how different types of errors in the MT output may affect PE effort, and take a closer look at the techniques used to measure it. For our experiment we curated a test suite of 60 EN > ES sentence pairs controlling certain features (sentence length, error frequency, topic, etc.) and had a group of 7 translators post-edit them using the PET tool, which helped collect temporal, technical and cognitive effort metrics. The results seem to challenge some previous error difficulty rankings; they also imply that, once other sentence features are controlled, the type of error to be addressed might not be as influential on effort as previously assumed. The low correlation values between the metrics for the different effort aspects may indicate that they do not reliably account for the full PE effort if not used in combination of one another.
Nora Aranberri
machine translation, post-editing, post-editing effort, post-editing time, keystrokes, manual scoring, HTER