EU / EN

Proiektua

Helburuak

Partehartzaileak

Argitalpenak

Demoak

Baliabideak eta tresnak

Erreferentziak

Wikia

NLP baliabideak


In addition to the free available and open source resources and tools for MT, the groups have developed a long list of resources and tools which will be very useful for the project, some of them as result of the OpenMT project.

BILINGUAL CORPORA:

  • en-eu corpora:

  1. Basque-English ParDeepBank (QTLeap project)

  2. QTLeap corpus (QTLeap Project)

  3. QTLeap WDS/NED corpus (QTLeap Project)

  4. 3 million words from Software manuals (Linux, Openoffice, Office, Windows, ...) (Elhuyar) In the case of open source (Linux and Openoffice), they can be easily extended to Spanish and Catalan.

  5. 2 million words from recently translated masterpieces in classic.humanities (Darwin, Hume, Locke...) (EHU)

  • es-eu corpora:

  1. TweetMT corpus

  2. 3 million words from Administration (Offical documents) (EHU)

  3. 12 million words from the translation memories of Elhuyar; domains: telecommunications, environment, finances, science and technology...

  4. 1 million words from Administration (IVAP)

  5. 300K words from journalism (EITB)

  6. 1 million words from popular journalism (Consumer)

  7. 4 million words from environment (IHOBE)

  • en-es corpora:

  1. QTLeap corpus (QTLeap Project)

  2. Europarl-QTLeap WDS/NED corpus (QTLeap Project)

  3. QTLeap WDS/NED corpus (QTLeap Project)

  4. 30 million words from Europarl Corpus (European Parlament speech transcriptions) (UPC)

  • es-ca corpora:

  1. 10 million words from El periodico bilingual edition (UPC)

MONOLINGUAL CORPORA:

  • es, eu, cat corpora:

  1. TWEET-NORM_2013 corpus (TWEET-NORM_2013 Workshop)

  2. TWEET-LID_2014 corpus (TWEET-LID_2014 Workshop)

LINGUISTIC PROCESSORS

(For more information see Products in the website of Ixa Group)

  • ixa-pipe-coref-eu: coreference for Basque (http://ixa2.si.ehu.es/ixa-pipes/ and also http://metashare.tilde.com/repository/search/?q=ixa)

  • ixa-pipe-ned-ukb: Name Entity Disambiguation (http://ixa2.si.ehu.es/ixa-pipes/ and also http://metashare.tilde.com/repository/search/?q=ixa)

  • ixa-pipe-wsd-ukb: Word sense disambiguation

  • Interset driver for Basque tagset

  • ixa-pipe-dep-eu: Dependency anaysis for Basque(http://ixa2.si.ehu.es/ixa-pipes/ and also http://metashare.tilde.com/repository/search/?q=ixa)

  • ixa-pipe-pos: Generic part of speech tagger

  • ixa-pipe-pos-eu: Part of speech tagger for Basque (http://ixa2.si.ehu.es/ixa-pipes/ and also http://metashare.tilde.com/repository/search/?q=ixa)

  • ixa-pipe-srl: Semantic role labelling

  • Lemmatization for Basque (Alegria et al., 1996) (EHU)

  • Lemmatization for Spanish and English: We use the Freeling Suite of Language Analyzers (Carreras et al., 2004), which may be downloaded at http://www.lsi.upc.es/~nlp/freeling/ (UPC)

  • PoS Tagging for Basque: Eustagger (Alegria et al., 2002) (EHU)

  • PoS Tagging: SVMTool (Giménez and Màrquez, 2004), which may be freely downloaded at http://www.lsi.upc.es/~nlp/SVMTool. (UPC)

  • Shallow Parsing: We use the Phreco software (Carreras et al., 2005) (UPC)

  • Dependency analysis for Basque using one of the two paradigms: rules and statistics (Bengoetxea&Gojenola, 2008) (EHU)

  • Dependency analysis for Spanish and English: FreeLing (www.lsi.upc.edu/~nlp/freeling/ )

  • Clause Splitting: We use the prototype for English developed by Carreras et al. (2005). Spanish and Catalan clause splitters are under development (UPC). For Basque a prototype is ready (EHU) (Alegria et al., 2008).

  • Monolingual terminology extraction (eu): Erauzterm (Alegria et al., 2004) (Elhuyar)

  • Extraction of bilingual terminology (es-eu). Elexbi (Alegria et al., 2005) (Elhuyar)

  • Semantic Role Labelling: There is a prototype for English developed by Márquez et al. (2005) (UPC)

  • Topic signatures for all WordNet nominal senses (Agirre et al., 2004c) (EHU)

  • Word Sense Disambiguation: We may use the all-words WSD system for English developed by Villarejo et al. (2004) (UPC) and others for Basque and Spanish (EHU)

MT ENGINES AND TOOLS

  • MT systems for Basque, using different technologies: http://ixa2.si.ehu.es/openmt-demo

  • OpenTrad systems, developed by EHU, UPC, Elhuyar and other partners (open source) ca,gl,en,es,eu: www.opentrad.org

  • Asiya opensource evaluation system for MT Evaluation: http://www.lsi.upc.edu/~nlp/Asiya/