Arabic Full-Form Lexicon

Arabic Full-Form Lexicon

Over 120 million entries

Simplifies morphological analysis

Instantly identifies inflected forms

Overview

CJKI is developing an Arabic Full-Form Lexicon (AFULEX) that is expected to exceed 120 million entries and includes part-of-speech codes, detailed grammatical attributes and romanized forms.

A full-form lexicon is a comprehensive lexical database that contains all inflected, declined and conjugated forms of a language. Unlike an ordinary dictionary that lists only the canonical forms (base lexemes), such as eat, a full-form lexicon includes all inflected forms such as eating, eaten and ate. In English, the number of inflected forms is limited to a handful of wordforms, but languages like Arabic (and Japanese and Spanish) can have thousands of inflected forms for each verb.

Moreover, some languages have clitics, such as pronomial suffixes and reflexive particles, that further increase the number of forms. For example, the Arabic word كَتَبَ kataba ‘to write’ has over 4000 cliticized forms such as كَتَبْتَهَ akatabtaha ‘did you write her?’ and فَلْنَكْتُبْ falnaktub ‘let’s write’.

AFULEX provides full coverage for all inflected, declined and conjugated forms (wordforms) in Arabic. The sample shows a small subset of the wordforms for the verb كَتَبَ kataba ‘to write’ and the noun كَاتِبٌ kaatibun ‘writer’.

Arabic Full-Form Lexicon
POSUnvoc.Voc.Trans.
Vأكتبأَكْتُبَʾáktuba
Vأكتبأَكْتُبْʾáktub
Vأكتبأَكْتُبُʾáktubu
Vأكتبأُكْتَبَʾúktaba
Vأكتبأُكْتَبْʾúktab
Vأكتبأُكْتَبُʾúktabu
Vاكتبااُكْتُبَاʾúktuba̱
Vاكتباُكْتُبْʾúktub
Vاكتبناُكْتُبْنَʾuktúbna
Vاكتبوااُكْتُبُواʾúktubu̱
Vاكتبياُكْتُبِيʾúktubi̱
Vكاتبكَاتِبٌkā́tibun
Vكتابكِتَابٌkitā́bun
Vكتابةكِتَابَةٌkitā́batun
Vكتباكَتَبَاkátaba̱
POS: Part of speech
Unvoc.: Unvocalized
Voc.: Vocalized
Trans.: Transcription

Practical Applications

CJKI’s full-form lexicons can bring the following benefits to various NLP applications:

Machine translation

Greatly enhanced translation quality

Morphological analysis

Significantly simplified algorithms

Pedagogical applications

Automatic conjugation systems

Information retrieval applications

Support for query processing

Named-entity recognition (NER)

Dramatically improved

Part-of-speech (POS) analysis and tagging

Related Resources

Arabic Full-Form Lexicon

Japanese Full-Form Lexicon

Includes all inflected, declined and conjugated forms

Arabic Full-Form Lexicon

Spanish Full-Form Lexicon

Includes all inflected, declined and conjugated forms

Arabic Full-Form Lexicon

Arabic Wordlist

General vocabulary, proper nouns and technical terms