Master Tesia

Robust Document Representations for Hyperpartisan and Fake News Detection
Talita Anthonio, MA
Hyperpartisan news is characterized by extremely one-sided content from a left-wing or right-wing political perspective. This thesis is concerned with automatically detecting such news through supervised text classification. We work with data from the recent shared task on hyperpartisan news detection (SemEval-2019 Task 4). We use two classification techniques: Support Vector Machines (SVMs) and Recurrent Neural Networks. We experiment with document representations using bag-of-words, bag-of-clusters, word embeddings and contextual character-based embeddings. We also try to improve our classifiers by adding local features, such as POS n-grams, stylistic features and the sentiment of a text. Our aim is to build robust classifiers across tasks related to fake news, for different domains and text genres. Although local features help to model the task in-domain, this thesis shows that dense document representations work better across domains and tasks. We obtain very competitive results in the hyperpartisan news detection task and state-of-the-art results in an out-of-domain evaluation on fake news.
Rodrigo Agerri and Malvina Nissim
hyperpartisan news detection, fake news, supervised text classification