Master Tesia
Title:
Robust Document Representations for
Hyperpartisan and Fake News Detection
Author:
Talita Anthonio, MA
Abstract:
Hyperpartisan news is characterized by extremely one-sided content from a left-wing or
right-wing political perspective. This thesis is concerned with automatically detecting
such news through supervised text classification. We work with data from the recent
shared task on hyperpartisan news detection (SemEval-2019 Task 4). We use two
classification techniques: Support Vector Machines (SVMs) and Recurrent Neural
Networks. We experiment with document representations using bag-of-words,
bag-of-clusters, word embeddings and contextual character-based embeddings. We also
try to improve our classifiers by adding local features, such as POS n-grams, stylistic
features and the sentiment of a text. Our aim is to build robust classifiers across tasks
related to fake news, for different domains and text genres. Although local features help
to model the task in-domain, this thesis shows that dense document representations work
better across domains and tasks. We obtain very competitive results in the hyperpartisan
news detection task and state-of-the-art results in an out-of-domain evaluation on fake
news.
Tutor:
Rodrigo Agerri and Malvina Nissim
Urtea:
2019
hitz_gakoak:
hyperpartisan news detection, fake news, supervised text classification