Chinese Lexical Database

Covers over 500,000 entries

Simplified and Traditional Chinese

Optimized for NLP applications

Overview

The CJKI Chinese Lexical Database (CLD) is a comprehensive monolingual lexical database specifically designed for NLP applications. It consists of two modules, Simplified Chinese (SC) and Traditional Chinese (TC), with about 250,000 entries in each module covering general vocabulary, technical terms, and important proper nouns.

A unique feature of CLD is that the readings (pinyin and zhuyin) take into account the differences in pronunciation between the PRC and Taiwan. For example, SC 危险 wēixiǎn ‘dangerous’ is TC 危險 wéixiǎn. Furthermore, the TC not merely a code-conversion equivalent of the SC version, but has been carefully proofread to ensure accuracy on both the orthographic and lexemic levels.
For example, 出租车 chūzūchē ‘taxi’ has a lexemic equivalent of 計程車 jīchéngchē, rather than the SC orthographic equivalent 出租車. Developed by CJKI’s team of Chinese specialists over many years, CLD is a significant contribution to the field of Chinese lexicography and information processing.

Main Features

Phonological information

Such as pinyin, zhuyin, and IPA

Semantic classification codes

Such as type of proper noun

Grammatical information

Such as POS and adjacency attributes

Morphological information

derivational affixes and binding valency codes

* Select one of the tabs below.

POSSCPinyin
NC东家之子dōngjiāzhīzǐ
E东家效颦dōngjiāxiàopín
NP东架松dōngjiàsōng
NP东河dōnghé
NP东河dōnghé
NP东河镇dōnghézhèn
NP东河沿dōnghéyán
NP东河区dōnghéqū
NP东河漕胡同dōnghécáo hútóng
NP东河道dōnghédào
NP东花dōnghuā
NP东花厅胡同dōnghuātīng hútóng
NP东花枝胡同dōnghuāzhī hútóng
NP东霞dōngxiá
NP东会村dōnghuìcūn
NC东海dōnghǎi
NP东海dōnghǎi
NP东海dōnghǎi
NP东海县dōnghǎixiàn
E东海扬尘dōnghǎiyángchén
E东海捞针dōnghǎilāozhēn
U东海舰队dōnghǎijiànduì
E东海桑田dōnghǎisāngtián
NP东海大学dōnghǎidàxué
NP东外大街dōngwàidàjiē
NC东郭dōngguō
NP东郭dōngguō
E东郭先生dōngguōxiānshēng
NC东郭履dōngguōlǚ
NP东革新里dōnggéxīnlǐ
NC东岳dōngyuè
NP东岳dōngyuè
NP东冠英胡同dōngguānyīng hútóng
NP东官房胡同dōngguānfáng hútóng
NC东干dōnggān
NP东管头dōngguǎntóu
NP东管头前街dōngguǎntóuqiánjiē
NP东莞dōngguān
NP东莞市dōngguānshì
NC东岸dōngàn
NP东岩dōngyán
NP东喜dōngxǐ
NP东旗dōngqí
NP东起dōngqǐ
NP东吉dōngjí
NP东吉祥胡同dōngjíxiáng hútóng
NP东弓匠胡同dōnggōngjiàng hútóng
NP东旧帘子胡同dōngjiùliánzǐ hútóng
NP东牛角胡同dōngniújiǎo hútóng
NP东京dōngjīng
NP东京影展dōngjīngyǐngzhǎn
NP东京畿道dōngjīngjīdào
NC东京股市dōngjīnggǔshì
NP东京大学dōngjīngdàxué
NP东京都dōngjīngdū
NP东京湾dōngjīngwān
NP东教场胡同dōngjiāocháng hútóng
NP东教胡同dōngjiāo hútóng
NP东局村dōngjúcūn
NP东玉dōngyù
NP东玉河dōngyùhé
NP东琴dōngqín
NP东琴科dōngqínkē
NP东区dōngqū
NC东隅dōngyú
NC东君dōngjūn
NP东慧dōnghuì
NP东月dōngyuè
NP东健dōngjiàn
NP东源dōngyuán
NP东源县dōngyuánxiàn
NP东湖dōnghú
NP东湖渠dōnghúqú
NP东湖区dōnghúqū
NC东胡dōnghú
N东胡史dōnghúshǐ
NP东交民巷dōngjiāomínxiàng
NP东光dōngguāng
NP东光dōngguāng
NP东光县dōngguāngxiàn
NP东光镇dōngguāngzhèn
NP东光胡同dōngguāng hútóng
NP东公街dōnggōngjiē
NP东公文dōnggōngwén
NP东厚dōnghòu
NP东口袋胡同dōngkǒudài hútóng
NC东向dōngxiàng
NP东向dōngxiàng
NP东后河沿dōnghòuhéyán
NP东幸福街dōngxìngfújiē
NP东康dōngkāng
NP东江dōngjiāng
NP东浩dōnghào
NP东港dōnggǎng
NP东港区dōnggǎngqū
NP东港市dōnggǎngshì
NC东皇dōnghuáng
NP东皇城根南街dōnghuángchénggēnnánjiē
NP东皇城根北街dōnghuángchénggēnběijiē
NA东航dōngháng
NP东航dōngháng
NP东航dōngháng
U东行航程dōngxínghángchéng
NC东郊dōngjiāo
NP东香dōngxiāng
NP东香河园dōngxiānghéyuán
NP东高地dōnggāodì
NP东高房胡同dōnggāofáng hútóng
NP东合dōnghé
NP东合盛dōnghéchéng
NP东克尔dōngkèěr
NP东克尔曼dōngkèěrmàn
NP东国dōngguó
NP东根dōnggēn
NP东佐夫dōngzuǒfū
E东差西误dōngchāxīwù
NP东沙岛dōngshādǎo
NP东沙群岛dōngshāqúndǎo
NP东塞尔dōngsāiěr
NP东才dōngcái
NC东作dōngzuò
NP东三亲家坟dōngsānqīnjiāfén
NP东三环中路dōngsānhuánzhōnglù
NP东三环北路dōngsānhuánběilù
NP东三巷dōngsānxiàng
NC东三省dōngsānshěng
NP东三省事宜条约dōngsānshěngshìyítiáoyuē
NP东三条dōngsāntiáo
NP东三道街dōngsāndàojiē
NP东山dōngshān
NP东山dōngshān
NP东山县dōngshānxiàn
NP东山镇dōngshānzhèn
NP东山区dōngshānqū
E东山高卧dōngshāngāowò
E东山再起dōngshānzàiqǐ
E东山之志dōngshānzhīzhì
NC东山法门dōngshānfǎmén
NP东山坡一里dōngshānpōyīlǐ
NP东山坡三里dōngshānpōsānlǐ
NP东山坡二里dōngshānpōèrlǐ
NC东司dōngsī
NP东四块玉南街dōngsìkuàiyùnánjiē
NP东四块玉北街dōngsìkuàiyùběijiē
NP东四头条dōngsìtóutiáo
NP东四九条dōngsìjiǔtiáo
NP东四西大街dōngsìxīdàjiē
NP东四道街dōngsìdàojiē
NP东四道口dōngsìdàokǒu
NP东四南大街dōngsìnándàjiē
NP东四北大街dōngsìběidàjiē
NP东子dōngzǐ
NC东市dōngshì
NP东市dōngshì
NP东市场五巷dōngshìchángwǔxiàng
NP东市区dōngshìqū
E东市朝衣dōngshìcháoyī
NP东志远dōngzhìyuǎn
NC东指dōngzhǐ
E东支西吾dōngzhīxīwú
NP东斯dōngsī
NP东斯科伊dōngsīkēyī
E东施效颦dōngshīxiàopín
NP东枝dōngzhī
NP东至县dōngzhìxiàn
NP东耳dōngěr

Practical Applications

CLD is being used by major IT companies to enhance their Chinese morphological analysis technology and is especially suitable for natural language processing (NLP) applications, such as:

Segmentation and tokenization

Named-entity recognition

Input method editors

Morphological analysis

Information retrieval

Part-of-speech tagging

Related Resources

JLD

Japanese Lexical Database

Monolingual general vocabulary for NLP applications

KLD

Korean Lexical Database

Monolingual general vocabulary for NLP applications

CHD

Chinese Hanyu Pinyin Database

Accurate hanyu pinyin data including technical terms and proper nouns