Arabic NLP Lexicons

The complexity of Arabic poses special challenges to developers of natural language processing (NLP) applications, especially in the area of word segmentation (WS), information retrieval (IR), named entity extraction (NER), and machine translation (MT).

Major linguistic issues in the development of NLP applications (such as MT, NER, and TTS) are exacerbated by the lack of truly comprehensive lexical resources, especially for proper nouns, and the lack of a standardized orthography. Our Institute has developed various comprehensive lexical resources to enhance the accuracy and reliability of NLP applications.

Below are links to the individual product pages of CJKI’s resources for Arabic NLP lexicons. These pages provide descriptions of each resource, explain how the resources are used and include data samples.

Resources

APD

Arabic Phonetic Database

Phonemic transcriptions for core Arabic vocabulary

ArabLEX

Arabic Full-Form Lexicon

Arabic Full-Form Lexicon Includes all inflected, declined, and conjugated forms

DAP

Arabic Plurals

Extensive coverage of regular and irregular (‘broken’) plurals in Arabic

AWL

Arabic Wordlist

General vocabulary, proper nouns, and technical terms