NLP Lexicons

Arabic NLP Lexicons

The complexity of Arabic poses special challenges to developers of natural language processing (NLP) applications, especially in the area of word segmentation (WS), information retrieval (IR), named entity extraction (NER), and machine translation (MT).

Major linguistic issues in the development of NLP applications (such as MT, NER and TTS) are exacerbated by the lack of truly comprehensive lexical resources, especially for proper nouns, and the lack of a standardized orthography. Our Institute has developed various comprehensive lexical resources to enhance the accuracy and reliability of NLP applications.

Below are links to the individual product pages of CJKI’s resources for Arabic NLP lexicons. These pages provide descriptions of each resource, explain how the resources are used, and include data samples.

Resources

Arabic Phonetic Database

Phonemic transcriptions for core Arabic vocabulary

Arabic Plurals

Extensive coverage of regular and irregular (‘broken’) plurals in Arabic

Arabic Full-Form Lexicon

Includes all inflected, declined and conjugated forms

Arabic Wordlist

General vocabulary, proper nouns and technical terms