Master Tesia

Universal Dependencies for Cape Verdean Creole
Brandyn Emile Evora
The Universal Dependencies Project has been a largely successful attempt to devise an annotation protocol that works cross-linguistically. Following the work done by Ryan McDonald and his team in 2013, a standardized system of annotation has been proposed to allow for more uniform multilingual parsing. The system has been widely adopted and there are currently treebanks for over 80 languages with more to come. Following the framework laid out by the Stanford Dependencies Treebank for English as well as the part-of-speech tag set created by Google and detailed in Petrov et al. (2012), linguists worldwide have been able to annotate treebanks which allow for cross-linguistic research and application development. Adhering to the guidelines of the Universal Dependencies Project, I have begun the annotating for Cape Verdean Creole, the oldest creole language still spoken today as well as the most widely spoken Portuguese-based creole. Cape Verdean Creole, or k riolu​ , is not the official language of the independent archipelago found off the northwestern coast of Africa, yet the several varieties of the creole are used daily by the citizens of Cape Verde as well as by its diaspora found in the United States, Portugal, Angola, France, the Netherlands and many other countries worldwide. With more Cape Verdeans living outside of the country, Cape Verdean Creole is the common link between all of the communities and the culture of the motherland. The current treebank that I have built contains 528 sentences of the Sotavento variant of the southern islands which were manually tagged for part-of-speech as well for their dependency relations. The sentences were obtained from N a Boka Noti, ​ a book of old folk tales written by T.V. da Silva.
Aitziber Atutxa Salazar and Koldo Gojenola