Master Tesia
Tituloa:
Universal Dependencies for Cape Verdean Creole
Egilea:
Brandyn Emile Evora
Abstract:
The Universal Dependencies Project has been a largely successful attempt to
devise an annotation protocol that works cross-linguistically. Following the work
done by Ryan McDonald and his team in 2013, a standardized system of annotation
has been proposed to allow for more uniform multilingual parsing. The system has
been widely adopted and there are currently treebanks for over 80 languages with
more to come.
Following the framework laid out by the Stanford Dependencies Treebank for
English as well as the part-of-speech tag set created by Google and detailed in Petrov
et al. (2012), linguists worldwide have been able to annotate treebanks which allow for
cross-linguistic research and application development.
Adhering to the guidelines of the Universal Dependencies Project, I have begun
the annotating for Cape Verdean Creole, the oldest creole language still spoken today
as well as the most widely spoken Portuguese-based creole.
Cape Verdean Creole, or k
riolu , is not the official language of the independent
archipelago found off the northwestern coast of Africa, yet the several varieties of the
creole are used daily by the citizens of Cape Verde as well as by its diaspora found in
the United States, Portugal, Angola, France, the Netherlands and many other
countries worldwide. With more Cape Verdeans living outside of the country, Cape
Verdean Creole is the common link between all of the communities and the culture of
the motherland.
The current treebank that I have built contains 528 sentences of the Sotavento
variant of the southern islands which were manually tagged for part-of-speech as
well for their dependency relations. The sentences were obtained from N
a Boka Noti, a
book of old folk tales written by T.V. da Silva.
Fitxategia:
Tutorea:
Aitziber Atutxa Salazar and Koldo Gojenola
Urtea:
2019
Esleitua: