Arabic Full-Form Lexicon

Arabic Full-Form Lexicon

Over 200 million entries

Simplifies morphological analysis

Instantly identifies inflected forms

Overview

CJKI is developing an Arabic Full-Form Lexicon (AFULEX) that is expected to exceed 200 million entries covering all inflected, declined and conjugated forms in Arabic, including part-of-speech codes, detailed grammatical attributes and romanized forms.

A full-form lexicon or dictionary is a comprehensive lexical database that contains all inflected, declined and conjugated forms of a language. Unlike an ordinary dictionary that lists only the canonical forms (base lexemes), such as eat, a full-form lexicon includes all inflected forms such as eating, eaten and ate. In English, the number of inflected forms is limited to a handful of wordforms, but languages like Arabic (and Japanese and Spanish) can have many inflected forms, not to mention thousands of conjugated forms for each verb.

Moreover, some languages have clitics, such as pronominal suffixes and reflexive particles, that further increase the number of forms. For example, the Arabic word كَتَبَ kataba ‘to write’ has over 4,000 cliticized forms such as كَتَبْتَهَ akatabtaha ‘did you write her?’ and فَلْنَكْتُبْ falnaktub ‘let’s write’.

AFULEX provides full coverage for all Arabic wordforms. The sample shows a small subset of the wordforms for the Arabic verb كَتَبَ kataba ‘to write’ and the Arabic noun كَاتِبٌ kaatibun ‘writer’.

For a comprehensive sample including verbs, nouns, and adjectives, be sure to study this sample.

Arabic Full-Form Lexicon
POSUnvocalizedVocalizedRoman
Vأكتبأَكْتُبَʾáktuba
Vأكتبأَكْتُبْʾáktub
Vأكتبأَكْتُبُʾáktubu
Vأكتبأُكْتَبَʾúktaba
Vأكتبأُكْتَبْʾúktab
Vأكتبأُكْتَبُʾúktabu
Vاكتبااُكْتُبَاʾúktuba̱
Vاكتباُكْتُبْʾúktub
Vاكتبناُكْتُبْنَʾuktúbna
Vاكتبوااُكْتُبُواʾúktubu̱
Vاكتبياُكْتُبِيʾúktubi̱
Vكاتبكَاتِبٌkā́tibun
Vكتابكِتَابٌkitā́bun
Vكتابةكِتَابَةٌkitā́batun
Vكتباكَتَبَاkátaba̱

Practical Applications

CJKI’s full-form lexicons can bring the following benefits to various NLP applications:

Machine translation

Greatly enhanced translation quality

Morphological analysis

Significantly simplified algorithms

Pedagogical applications

Automatic conjugation systems

Information retrieval applications

Support for query processing

Named-entity recognition (NER)

Dramatically improved

Part-of-speech (POS) analysis and tagging

Related Resources

Japanese Full-Form Lexicon

Includes all inflected, declined and conjugated forms

Spanish Full-Form Lexicon

Includes all inflected, declined and conjugated forms

Arabic Wordlist

General vocabulary, proper nouns and technical terms

Reference Documents

AFULEX

White paper

AFULEX

White paper summary