Spanish Full-Form Lexicon
Extremely comprehensive coverage
Simplifies morphological analysis
Monolingual or bilingual format
Overview
CJKI maintains full-form lexicons for Arabic, Japanese, and Spanish whose coverage will soon exceed 150 million entries. CJKI’s monolingual edition Spanish Full-Form Lexicon (SFULEX) covers approximately one million entries and includes part-of-speech codes and other grammatical attributes forms, whereas the bilingual edition contains about 26 million entries.
A full-form lexicon is a comprehensive lexical database that contains all inflected, declined, and conjugated forms of a language. Unlike an ordinary dictionary that lists only the canonical forms (base lexemes), such as eat, a full-form lexicon includes all inflected forms such as eating, eaten, and ate. In English, the number of inflected forms is limited to a handful of word forms, but languages like Arabic (and Japanese and Spanish) can have thousands of inflected forms for each verb. For example, the Spanish hablar ‘to speak’ has hundreds of inflected forms like hablaré ‘I will speak’ and hablaría ‘I would speak’.
SFULEX provides full coverage for all inflected and conjugated forms (word forms) in Spanish. The sample shows a subset of the conjugation paradigm for hablar ‘to speak’.
Main Features
Comprehensive coverage
Of over 26 million entries
Rich set of useful attributes
Conjugation patterns and orthographic variants
Detailed part-of-speech
And other grammatical codes
Includes all inflected and declined word forms
Such as part-of-speech codes
Dozens of data fields
Mapped to their canonical forms
Fully bilingual
Mapped to multiple English equivalent(s)
Spanish Full-Form Lexicon: Hablar
Source | Target | Pers. | No. | Tense | Pers.2 |
---|---|---|---|---|---|
hablar | speak | 0 | C | 00 | 0 |
hablar | to speak | 0 | C | 01 | 0 |
hablar | speaking | 0 | C | 03 | 0 |
hablar | spoke | 0 | S | 13 | 1 |
hablar | talk | 0 | C | 00 | 0 |
hablar | to talk | 0 | C | 01 | 0 |
hablar | talking | 0 | C | 03 | 0 |
hablar | talked | 0 | S | 13 | 1 |
hablar | discuss | 0 | C | 00 | 0 |
hablar | to discuss | 0 | C | 01 | 0 |
hablar | discussing | 0 | C | 03 | 0 |
hablar | discussed | 0 | S | 13 | 1 |
hablar | talk about | 0 | C | 00 | 0 |
hablar | to talk about | 0 | C | 01 | 0 |
hablar | talking about | 0 | C | 03 | 0 |
hablar | talked about | 0 | S | 13 | 1 |
hablar | phone | 0 | C | 00 | 0 |
hablar | to phone | 0 | C | 01 | 0 |
hablar | phoning | 0 | C | 03 | 0 |
hablar | phoned | 0 | S | 13 | 1 |
Practical Applications
CJKI’s full-form lexicons can bring the following benefits to various NLP applications:
Machine translation
Greatly enhanced translation quality
Morphological analysis
Significantly simplified algorithms
Pedagogical applications
Automatic conjugation systems
Information retrieval applications
Support for query processing
Named-entity recognition (NER)
Dramatically improved