TACARDI

NLP baliabideak

EU / EN

Project

Goal

Participants

Publications

Demos

Resources and Tools

References

Wiki

NLP baliabideak

In addition to the free available and open source resources and tools for MT, the groups have developed a long list of resources and tools which will be very useful for the project, some of them as result of the OpenMT project.

BILINGUAL CORPORA:

en-eu corpora:

Basque-English ParDeepBank (QTLeap project)

QTLeap corpus (QTLeap Project)

QTLeap WDS/NED corpus (QTLeap Project)

3 million words from Software manuals (Linux, Openoffice, Office, Windows, ...) (Elhuyar) In the case of open source (Linux and Openoffice), they can be easily extended to Spanish and Catalan.

2 million words from recently translated masterpieces in classic.humanities (Darwin, Hume, Locke...) (EHU)

es-eu corpora:

TweetMT corpus

3 million words from Administration (Offical documents) (EHU)

12 million words from the translation memories of Elhuyar; domains: telecommunications, environment, finances, science and technology...

1 million words from Administration (IVAP)

300K words from journalism (EITB)

1 million words from popular journalism (Consumer)

4 million words from environment (IHOBE)

en-es corpora:

QTLeap corpus (QTLeap Project)

Europarl-QTLeap WDS/NED corpus (QTLeap Project)

QTLeap WDS/NED corpus (QTLeap Project)

30 million words from Europarl Corpus (European Parlament speech transcriptions) (UPC)

es-ca corpora:

10 million words from El periodico bilingual edition (UPC)

MONOLINGUAL CORPORA:

es, eu, cat corpora:

TWEET-NORM_2013 corpus (TWEET-NORM_2013 Workshop)

TWEET-LID_2014 corpus (TWEET-LID_2014 Workshop)

LINGUISTIC PROCESSORS

(For more information see Products in the website of Ixa Group)

ixa-pipe-coref-eu: coreference for Basque (http://ixa2.si.ehu.es/ixa-pipes/ and also http://metashare.tilde.com/repository/search/?q=ixa)

ixa-pipe-ned-ukb: Name Entity Disambiguation (http://ixa2.si.ehu.es/ixa-pipes/ and also http://metashare.tilde.com/repository/search/?q=ixa)

ixa-pipe-wsd-ukb: Word sense disambiguation

Interset driver for Basque tagset

ixa-pipe-dep-eu: Dependency anaysis for Basque(http://ixa2.si.ehu.es/ixa-pipes/ and also http://metashare.tilde.com/repository/search/?q=ixa)

ixa-pipe-pos: Generic part of speech tagger

ixa-pipe-pos-eu: Part of speech tagger for Basque (http://ixa2.si.ehu.es/ixa-pipes/ and also http://metashare.tilde.com/repository/search/?q=ixa)

ixa-pipe-srl: Semantic role labelling

Lemmatization for Basque (Alegria et al., 1996) (EHU)

Lemmatization for Spanish and English: We use the Freeling Suite of Language Analyzers (Carreras et al., 2004), which may be downloaded at http://www.lsi.upc.es/~nlp/freeling/ (UPC)

PoS Tagging for Basque: Eustagger (Alegria et al., 2002) (EHU)

PoS Tagging: SVMTool (Giménez and Màrquez, 2004), which may be freely downloaded at http://www.lsi.upc.es/~nlp/SVMTool. (UPC)

Shallow Parsing: We use the Phreco software (Carreras et al., 2005) (UPC)

Dependency analysis for Basque using one of the two paradigms: rules and statistics (Bengoetxea&Gojenola, 2008) (EHU)

Dependency analysis for Spanish and English: FreeLing (www.lsi.upc.edu/~nlp/freeling/ )

Clause Splitting: We use the prototype for English developed by Carreras et al. (2005). Spanish and Catalan clause splitters are under development (UPC). For Basque a prototype is ready (EHU) (Alegria et al., 2008).

Monolingual terminology extraction (eu): Erauzterm (Alegria et al., 2004) (Elhuyar)

Extraction of bilingual terminology (es-eu). Elexbi (Alegria et al., 2005) (Elhuyar)

Semantic Role Labelling: There is a prototype for English developed by Márquez et al. (2005) (UPC)

Topic signatures for all WordNet nominal senses (Agirre et al., 2004c) (EHU)

Word Sense Disambiguation: We may use the all-words WSD system for English developed by Villarejo et al. (2004) (UPC) and others for Basque and Spanish (EHU)

MT ENGINES AND TOOLS

MT systems for Basque, using different technologies: http://ixa2.si.ehu.es/openmt-demo

OpenTrad systems, developed by EHU, UPC, Elhuyar and other partners (open source) ca,gl,en,es,eu: www.opentrad.org

Asiya opensource evaluation system for MT Evaluation: http://www.lsi.upc.edu/~nlp/Asiya/