Lengoaia Naturalaren Prozesamendurako     IXA Taldea
    Ixa > Research lines
 
  
Home
About us
Members
Research lines
Publications
Products
Projects
Links
Highlights
HAP master
Job vacancies
Demos
Others
Private
 
 
 



Ixa Group
649 Posta kutxa
20080 Donostia (Basque Country)
Kontaktua:  acpalloi@si.ehu.es

Euskal Herriko Unibertsitatea

Informatika Fakultatea
Lengoaia eta Sistema Informatikoak Saila


IxA NLP research group

University of the Basque Country

Language resources

Past

Present

Future

Corpus

Raw text

27 Mw (newspapers)

100 Mw (2010)

Word-forms are tagged with POS and lemma.

27 Mw (newspapers)

30 Kw (hand corrected)

100 Mw (2010)

2 Mw (hand corrected)

Syntactically tagged text

30 Kw

1 Mw (2010)

Semantically tagged text

4 Kw (meanings)

1Mw (2010?) meanings

and semantically analyzed

Multilingual and parallel corpus.

1 Mw (Spanish-Basque)

...

100 Mw(2010)

Lexicon

EDBL lexical database. Lexical support for constructing general applications, including POS and morphological information.

80.000 entries
Enrichment of the lexical database:
- Multiword lexical units
- Verb subcategorization

Improving design.
Enrichment of the lexical database.
- Multiword lexical units
- Verb subcategorization
- Semantics

Machine-readable dictionaries

Machine-readable dictionaries

Machine-readable dictionaries

Morpho

morphological description

Syntax

Syntax description

Syntax description
- Clause boundaries
- Postpositions
- Verb subcategorization
- Dependencies

Syntax description
- Broad coverage
- Different formalisms
(Unfication, CG, Dependency grammar)

Sem

Lexical-semantic multilingual
knowledge base. Taxonomy of concepts (such as WordNet)
BasqueWN

Automatic acquisition from other languages corpora


Enriching & optimising BasqueWN

Enriching & optimising
EuskalWN
- Terms
- Entity names
- (100 K-entry)

Tools

Past

Present

Future

Corpus

Tools for processing and creating corpora

Tools for creating and processing corpora

Tools for creating and processing corpora

Automatic lexical acquisition: Terminology

Lexicon

Structured versions of dictionaries
- EH Ibon Sarasola
- Eng-Basque Morris
- Elhuyar

Lexicographer workbench

Morpho

Morphological analyser/generator

Improving

Lemmatiser/tagger

Improving

Syntax

surface syntax:
syntactic functions
Chunks

Improving
- Resolution of syntactic ambiguity
Clause boundaries
Dependencies
Verb subcateg.
Postpositions

Parser
- Broad coverage
- Efficient
-
Different formalisms
- CG +
dependencies
- Unification
+Statistical

Sem

Word-sense disambiguation (WSD)

Improving WSD
- Multilingual
knowledge

Improving WSD
Semantic analysis

Integration

Environment for tool integration
- TEI guidelines
- using XML

Integration of new tools
- Morphosyntax
- Syntax

Improving

Applications

Past

Present

Future

Spelling checker
corrector

New versions

Web server

Grammar and style checker

Search engine.
Traditional search engine that integrates lemmatisation and language identification.

Information Retrieval
- Using semantics

Information Extraction Question answering
- Crosslingual
- Using semantics

Bilingual dictionary integrated with a common text-processor to be consulted on-line (Elhuyar)

French-Basque

Synonym dictionary integrated with a common text-processor to be consulted on-line (UZEI)

Electronic version of the Basque monolingual dictionary Euskal Hiztegia (Ibon Sarasola)

Advanced dictionary query system


Electronic version of the Diccionario Basico Escolar Cubano

Linguistic tools for children Cuba's schools


Integration of heterogeneous lexical resources

Second language learning systems

Learner and error corpora based computational systems


Generation of translation memories

Translation memories (using units smaller than clause)

Spanish-Basque Transfer MT system.

-Open Code

-No lexical desanbiguation

-No use of corpus

Improving MT system
- EBMT
- SMT

- Lexical disambiguation

- Verb subcategorization

- English-Basque

Dialogue systems

Main Current Research Projects

Funding from European Community:

  • MEANING: developing multilingual web-scale language technologies (2002-2005)

Funding from Spanish Science and Technology Ministry:

  • Machine translation for languages in Spain using open code.(2004-2005, Profit)

  • HERMES: news databases: cross-lingual information retrieval and semantic extraction (2001-2003)

  • Creating a database based on syntactical-semantic trees (2002-2003)

Funding from Basque Government (Industry and Science Departments):

  • Application of Machine Learning based Techniques for IR/IE in Basque (2002-2003)

  • XUXENG: Design and implementation of a prototype of grammar checker for Basque. (2002-2004)

  • HIZKING21: Language engineering for the XXI Century (2002-2005)

O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O