CJKI maintains comprehensive monolingual wordlists for Chinese, Japanese, Korean (CJK) and Arabic covering some 30 million entries (and expecting to exceed 45 million entries soon).
Our Spanish Wordlist (SWL) covers about 100,000 canonical forms for general vocabulary includes part-of-speech codes and semantic classification type codes. This database is suitable for a variety of NLP applications for information retrieval like search engines, morphological analysis tools like tokenizers, and speech technology applications like text-to-speech synthesis.
The related SFULEX (Spanish Full-Form Lexicon) contains about 1,000,000 entries million entries in the monolingual edition and 26,000,000 entries in the bilingual edition.
Sample coming soon
CJKI’s Comprehensive Wordlists are being used by some of the world’s leading IT companies for a variety of natural language processing applications, including: