Dictionaries

   Overview
   Arabic
   Chinese
   Japanese
   Korean
   Mobile

Other

   Articles/papers
   KDPS
   Jack Halpern

Company

   About
   Data Licensing
   Jobs
   Location
   Contact
   Map


Comprehensive Word Lists for Chinese, Japanese, Korean and Arabic

The CJK Dictionary Institute maintains comprehensive monolingual word lists for both Simplified and Traditional Chinese, Japanese, Korean and Arabic, including a full-form Arabic word list.

These large word lists cover general vocabulary, proper nouns and technical terms, and are being used by some of the world's leading IT companies for a variety of natural language processing applications, including information retrieval, morphological analysis and word segmentation. The databases cover in total near 30 million entries, broken down as follows:

Language General vocabulary Proper nouns Technical terms Total
Arabic (canonical) 113,973 95,184 --- 209,157
Arabic (full form) 14,452,336 95,184 --- 14,547,520
Japanese 459,980 1,017,221 1,169,652 2,646,853
Korean 83,835 42,280 914,772 1,040,887
Simplified Chinese 1,376,979 1,730,881 2,153,157 5,261,017
Traditional Chinese 1,581,030 1,730,881 2,153,157 5,465,068
Total 18,068,133 4,711,631 6,390,738 29,170,502

Data Samples

Arabic (canonical)
Type POS Unvocalized Arabic Vocalized Arabic Phonemic Transcription
G V
قحط
قَحَّط
qáḥḥaṭ
G V
قطط
قَطَّط
qáṭṭaṭ
G N
قفر
قَفْر
qafr
G N
قمل
قَمْل
qaml
G N
قريحة
قَرِيحَة
qarī́ḥa
G V
قيل
قَيَّل
qáyyal
G N
قرى
قِرَى
qíra̱
G N
قصور
قُصُور
quṣū́r
G N
قرعة
قُرْعَة
qúrɛa
G N
رذيلة
رَذِيلَة
radhī́la
G V
ربت
رَبَّت
rábbat
G N
رغد
رَغْد
raghd
G N
ركم
رَكَم
rákam
G V
رنم
رَنَّم
ránnam
G V
رثى
رَثَى
rátha̱
Arabic (full-form)
Type POS Unvocalized Arabic Vocalized Arabic Phonemic Transcription
G V
أكتب
أَكْتُبَ
ʾáktuba
G V
أكتب
أَكْتُبْ
ʾáktub
G V
أكتب
أَكْتُبُ
ʾáktubu
G V
أكتب
أُكْتَبَ
ʾúktaba
G V
أكتب
أُكْتَبْ
ʾúktab
G V
أكتب
أُكْتَبُ
ʾúktabu
G V
اكتبا
اُكْتُبَا
ʾúktuba̱
G V
اكتب
اُكْتُبْ
ʾúktub
G V
اكتبن
اُكْتُبْنَ
ʾuktúbna
G V
اكتبوا
اُكْتُبُوا
ʾúktubu̱
G V
اكتبي
اُكْتُبِي
ʾúktubi̱
G V
كاتب
كَاتِبٌ
kā́tibun
G V
كتاب
كِتَابٌ
kitā́bun
G V
كتابة
كِتَابَةٌ
kitā́batun
G V
كتبا
كَتَبَا
kátaba̱
Japanese
Type POS Headword Reading
G NC
G VN
G NC
G NC
G NC
G NC
G V1
G NC
G NC
G NC
G NC
G NC
G NC
G NC
G NC
Korean
Type Headword
G 응축
G 응전하다
G 응고되다
G 응모
G 응봉
G 응답하다
G 응용되다
G 읍촌
G 와지끈뚝닥
G 와해하다
G 와각거리다
G 와문
G 와병하다
G 와상
G 와석종신
Simplified Chinese
Type POS Headword Pinyin
G N 天帝 tian1-di4
G N 天道 tian1-dao4
G N 天南星 tian1-nan2-xing1
G N 天年 tian1-nian2
G U 天秤 tian1-cheng4
G E 天不盖,地不载 tian1-bu4-gai4-di4-bu4-zai4
G N 天文 tian1-wen2
G N 天文馆 tian1-wen2-guan3
G U 天文照相望远镜 tian1-wen2-zhao4-xiang4-wang4-yuan3-jing4
G N 天文物理学 tian1-wen2-wu4-li3-xue2
G N 天平动 tian1-ping2-dong4
G N 天宝之乱 tian1-bao3-zhi1-luan4
G N 天方夜谭 tian1-fang1-ye4-tan2
G N 天魔外道 tian1-mo2-wai4-dao4
G N 天命真主 tian1-ming4-zhen1-zhu3
Traditional Chinese
Type POS Headword Pinyin
G N 標兵 biao1-bing1
G N 標本館 biao1-ben3-guan3
G N 標本商 biao1-ben3-shang1
G N 標名 biao1-ming2
G N 標目 biao1-mu4
G V 標會 biao1-hui4
G N 標價卡 biao1-jia4-ka3
G N 標售制 biao1-shou4-zhi4
G N 標售物 biao1-shou4-wu4
G N 標幟 biao1-zhi4
G V 標籤化 biao1-qian1-hua4
G N 標籤紙 biao1-qian1-zhi3
G N 標驗局 biao1-yan4-ju2
G V 漂航 piao1-hang2
G N 漂染廠 piao3-ran3-chang3