Arabic NLP Lexicons

Arabic NLP Lexicons

The complexity of Arabic poses special challenges to developers of natural language processing (NLP) applications, especially in the area of word segmentation (WS), information retrieval (IR), named entity extraction (NER), and machine translation (MT).

Major linguistic issues in the development of NLP applications (such as MT, NER, and TTS) are exacerbated by the lack of truly comprehensive lexical resources, especially for proper nouns, and the lack of a standardized orthography. Our Institute has developed various comprehensive lexical resources to enhance the accuracy and reliability of NLP applications.

Below are links to the individual product pages of CJKI’s resources for Arabic NLP lexicons. These pages provide descriptions of each resource, explain how the resources are used and include data samples.



Arabic Phonetic Database

Phonemic transcriptions for core Arabic vocabulary


Arabic Full-Form Lexicon

Arabic Full-Form Lexicon Includes all inflected, declined, and conjugated forms


Arabic Plurals

Extensive coverage of regular and irregular (‘broken’) plurals in Arabic


Arabic Dialects Full-Form Lexicon

Full-form lexicon for all major Arabic dialects


Arabic Wordlist

General vocabulary, proper nouns, and technical terms