Chinese Lexical Database

Covers over 500,000 entries

Simplified and Traditional Chinese

Optimized for NLP applications

Overview

The CJKI Chinese Lexical Database (CLD) is a comprehensive monolingual lexical database specifically designed for NLP applications. It consists of two modules, Simplified Chinese (SC) and Traditional Chinese (TC), with about 250,000 entries in each module covering general vocabulary, technical terms, and important proper nouns.

A unique feature of CLD is that the readings (pinyin and zhuyin) take into account the differences in pronunciation between the PRC and Taiwan. For example, SC ๅฑ้™ฉ wฤ“ixiวŽn โ€˜dangerousโ€™ is TC ๅฑ้šช wรฉixiวŽn. Furthermore, the TC not merely a code-conversion equivalent of the SC version, but has been carefully proofread to ensure accuracy on both the orthographic and lexemic levels.
For example, ๅ‡บ็งŸ่ฝฆ chลซzลซchฤ“ โ€˜taxiโ€™ has a lexemic equivalent of ่จˆ็จ‹่ปŠ jฤซchรฉngchฤ“, rather than the SC orthographic equivalent ๅ‡บ็งŸ่ปŠ. Developed by CJKIโ€™s team of Chinese specialists over many years, CLD is a significant contribution to the field of Chinese lexicography and information processing.

Main Features

Phonological information

Such as pinyin, zhuyin, and IPA

Semantic classification codes

Such as type of proper noun

Grammatical information

Such as POS and adjacency attributes

Morphological information

derivational affixes and binding valency codes

* Select one of the tabs below.

POSSCPinyin
NCไธœๅฎถไน‹ๅญdลngjiฤzhฤซzว
Eไธœๅฎถๆ•ˆ้ขฆdลngjiฤxiร opรญn
NPไธœๆžถๆพdลngjiร sลng
NPไธœๆฒณdลnghรฉ
NPไธœๆฒณdลnghรฉ
NPไธœๆฒณ้•‡dลnghรฉzhรจn
NPไธœๆฒณๆฒฟdลnghรฉyรกn
NPไธœๆฒณๅŒบdลnghรฉqลซ
NPไธœๆฒณๆผ•่ƒกๅŒdลnghรฉcรกo hรบtรณng
NPไธœๆฒณ้“dลnghรฉdร o
NPไธœ่Šฑdลnghuฤ
NPไธœ่ŠฑๅŽ…่ƒกๅŒdลnghuฤtฤซng hรบtรณng
NPไธœ่Šฑๆž่ƒกๅŒdลnghuฤzhฤซ hรบtรณng
NPไธœ้œždลngxiรก
NPไธœไผšๆ‘dลnghuรฌcลซn
NCไธœๆตทdลnghวŽi
NPไธœๆตทdลnghวŽi
NPไธœๆตทdลnghวŽi
NPไธœๆตทๅŽฟdลnghวŽixiร n
Eไธœๆตทๆ‰ฌๅฐ˜dลnghวŽiyรกngchรฉn
Eไธœๆตทๆž้’ˆdลnghวŽilฤozhฤ“n
Uไธœๆตท่ˆฐ้˜ŸdลnghวŽijiร nduรฌ
Eไธœๆตทๆก‘็”ฐdลnghวŽisฤngtiรกn
NPไธœๆตทๅคงๅญฆdลnghวŽidร xuรฉ
NPไธœๅค–ๅคง่ก—dลngwร idร jiฤ“
NCไธœ้ƒญdลngguล
NPไธœ้ƒญdลngguล
Eไธœ้ƒญๅ…ˆ็”Ÿdลngguลxiฤnshฤ“ng
NCไธœ้ƒญๅฑฅdลngguลlวš
NPไธœ้ฉๆ–ฐ้‡Œdลnggรฉxฤซnlว
NCไธœๅฒณdลngyuรจ
NPไธœๅฒณdลngyuรจ
NPไธœๅ† ่‹ฑ่ƒกๅŒdลngguฤnyฤซng hรบtรณng
NPไธœๅฎ˜ๆˆฟ่ƒกๅŒdลngguฤnfรกng hรบtรณng
NCไธœๅนฒdลnggฤn
NPไธœ็ฎกๅคดdลngguวŽntรณu
NPไธœ็ฎกๅคดๅ‰่ก—dลngguวŽntรณuqiรกnjiฤ“
NPไธœ่Žždลngguฤn
NPไธœ่Žžๅธ‚dลngguฤnshรฌ
NCไธœๅฒธdลngร n
NPไธœๅฒฉdลngyรกn
NPไธœๅ–œdลngxว
NPไธœๆ——dลngqรญ
NPไธœ่ตทdลngqว
NPไธœๅ‰dลngjรญ
NPไธœๅ‰็ฅฅ่ƒกๅŒdลngjรญxiรกng hรบtรณng
NPไธœๅผ“ๅŒ ่ƒกๅŒdลnggลngjiร ng hรบtรณng
NPไธœๆ—งๅธ˜ๅญ่ƒกๅŒdลngjiรนliรกnzว hรบtรณng
NPไธœ็‰›่ง’่ƒกๅŒdลngniรบjiวŽo hรบtรณng
NPไธœไบฌdลngjฤซng
NPไธœไบฌๅฝฑๅฑ•dลngjฤซngyวngzhวŽn
NPไธœไบฌ็•ฟ้“dลngjฤซngjฤซdร o
NCไธœไบฌ่‚กๅธ‚dลngjฤซnggว”shรฌ
NPไธœไบฌๅคงๅญฆdลngjฤซngdร xuรฉ
NPไธœไบฌ้ƒฝdลngjฤซngdลซ
NPไธœไบฌๆนพdลngjฤซngwฤn
NPไธœๆ•™ๅœบ่ƒกๅŒdลngjiฤochรกng hรบtรณng
NPไธœๆ•™่ƒกๅŒdลngjiฤo hรบtรณng
NPไธœๅฑ€ๆ‘dลngjรบcลซn
NPไธœ็މdลngyรน
NPไธœ็މๆฒณdลngyรนhรฉ
NPไธœ็ดdลngqรญn
NPไธœ็ด็ง‘dลngqรญnkฤ“
NPไธœๅŒบdลngqลซ
NCไธœ้š…dลngyรบ
NCไธœๅ›dลngjลซn
NPไธœๆ…งdลnghuรฌ
NPไธœๆœˆdลngyuรจ
NPไธœๅฅdลngjiร n
NPไธœๆบdลngyuรกn
NPไธœๆบๅŽฟdลngyuรกnxiร n
NPไธœๆน–dลnghรบ
NPไธœๆน–ๆธ dลnghรบqรบ
NPไธœๆน–ๅŒบdลnghรบqลซ
NCไธœ่ƒกdลnghรบ
Nไธœ่ƒกๅฒdลnghรบshว
NPไธœไบคๆฐ‘ๅททdลngjiฤomรญnxiร ng
NPไธœๅ…‰dลngguฤng
NPไธœๅ…‰dลngguฤng
NPไธœๅ…‰ๅŽฟdลngguฤngxiร n
NPไธœๅ…‰้•‡dลngguฤngzhรจn
NPไธœๅ…‰่ƒกๅŒdลngguฤng hรบtรณng
NPไธœๅ…ฌ่ก—dลnggลngjiฤ“
NPไธœๅ…ฌๆ–‡dลnggลngwรฉn
NPไธœๅŽšdลnghรฒu
NPไธœๅฃ่ข‹่ƒกๅŒdลngkว’udร i hรบtรณng
NCไธœๅ‘dลngxiร ng
NPไธœๅ‘dลngxiร ng
NPไธœๅŽๆฒณๆฒฟdลnghรฒuhรฉyรกn
NPไธœๅนธ็ฆ่ก—dลngxรฌngfรบjiฤ“
NPไธœๅบทdลngkฤng
NPไธœๆฑŸdลngjiฤng
NPไธœๆตฉdลnghร o
NPไธœๆธฏdลnggวŽng
NPไธœๆธฏๅŒบdลnggวŽngqลซ
NPไธœๆธฏๅธ‚dลnggวŽngshรฌ
NCไธœ็š‡dลnghuรกng
NPไธœ็š‡ๅŸŽๆ นๅ—่ก—dลnghuรกngchรฉnggฤ“nnรกnjiฤ“
NPไธœ็š‡ๅŸŽๆ นๅŒ—่ก—dลnghuรกngchรฉnggฤ“nbฤ›ijiฤ“
NAไธœ่ˆชdลnghรกng
NPไธœ่ˆชdลnghรกng
NPไธœ่ˆชdลnghรกng
Uไธœ่กŒ่ˆช็จ‹dลngxรญnghรกngchรฉng
NCไธœ้ƒŠdลngjiฤo
NPไธœ้ฆ™dลngxiฤng
NPไธœ้ฆ™ๆฒณๅ›ญdลngxiฤnghรฉyuรกn
NPไธœ้ซ˜ๅœฐdลnggฤodรฌ
NPไธœ้ซ˜ๆˆฟ่ƒกๅŒdลnggฤofรกng hรบtรณng
NPไธœๅˆdลnghรฉ
NPไธœๅˆ็››dลnghรฉchรฉng
NPไธœๅ…‹ๅฐ”dลngkรจฤ›r
NPไธœๅ…‹ๅฐ”ๆ›ผdลngkรจฤ›rmร n
NPไธœๅ›ฝdลngguรณ
NPไธœๆ นdลnggฤ“n
NPไธœไฝๅคซdลngzuว’fลซ
Eไธœๅทฎ่ฅฟ่ฏฏdลngchฤxฤซwรน
NPไธœๆฒ™ๅฒ›dลngshฤdวŽo
NPไธœๆฒ™็พคๅฒ›dลngshฤqรบndวŽo
NPไธœๅกžๅฐ”dลngsฤiฤ›r
NPไธœๆ‰dลngcรกi
NCไธœไฝœdลngzuรฒ
NPไธœไธ‰ไบฒๅฎถๅŸdลngsฤnqฤซnjiฤfรฉn
NPไธœไธ‰็Žฏไธญ่ทฏdลngsฤnhuรกnzhลnglรน
NPไธœไธ‰็ŽฏๅŒ—่ทฏdลngsฤnhuรกnbฤ›ilรน
NPไธœไธ‰ๅททdลngsฤnxiร ng
NCไธœไธ‰็œdลngsฤnshฤ›ng
NPไธœไธ‰็œไบ‹ๅฎœๆก็บฆdลngsฤnshฤ›ngshรฌyรญtiรกoyuฤ“
NPไธœไธ‰ๆกdลngsฤntiรกo
NPไธœไธ‰้“่ก—dลngsฤndร ojiฤ“
NPไธœๅฑฑdลngshฤn
NPไธœๅฑฑdลngshฤn
NPไธœๅฑฑๅŽฟdลngshฤnxiร n
NPไธœๅฑฑ้•‡dลngshฤnzhรจn
NPไธœๅฑฑๅŒบdลngshฤnqลซ
Eไธœๅฑฑ้ซ˜ๅงdลngshฤngฤowรฒ
Eไธœๅฑฑๅ†่ตทdลngshฤnzร iqว
Eไธœๅฑฑไน‹ๅฟ—dลngshฤnzhฤซzhรฌ
NCไธœๅฑฑๆณ•้—จdลngshฤnfวŽmรฉn
NPไธœๅฑฑๅกไธ€้‡Œdลngshฤnpลyฤซlว
NPไธœๅฑฑๅกไธ‰้‡Œdลngshฤnpลsฤnlว
NPไธœๅฑฑๅกไบŒ้‡Œdลngshฤnpลรจrlว
NCไธœๅธdลngsฤซ
NPไธœๅ››ๅ—็މๅ—่ก—dลngsรฌkuร iyรนnรกnjiฤ“
NPไธœๅ››ๅ—็މๅŒ—่ก—dลngsรฌkuร iyรนbฤ›ijiฤ“
NPไธœๅ››ๅคดๆกdลngsรฌtรณutiรกo
NPไธœๅ››ไนๆกdลngsรฌjiว”tiรกo
NPไธœๅ››่ฅฟๅคง่ก—dลngsรฌxฤซdร jiฤ“
NPไธœๅ››้“่ก—dลngsรฌdร ojiฤ“
NPไธœๅ››้“ๅฃdลngsรฌdร okว’u
NPไธœๅ››ๅ—ๅคง่ก—dลngsรฌnรกndร jiฤ“
NPไธœๅ››ๅŒ—ๅคง่ก—dลngsรฌbฤ›idร jiฤ“
NPไธœๅญdลngzว
NCไธœๅธ‚dลngshรฌ
NPไธœๅธ‚dลngshรฌ
NPไธœๅธ‚ๅœบไบ”ๅททdลngshรฌchรกngwว”xiร ng
NPไธœๅธ‚ๅŒบdลngshรฌqลซ
Eไธœๅธ‚ๆœ่กฃdลngshรฌchรกoyฤซ
NPไธœๅฟ—่ฟœdลngzhรฌyuวŽn
NCไธœๆŒ‡dลngzhว
Eไธœๆ”ฏ่ฅฟๅพdลngzhฤซxฤซwรบ
NPไธœๆ–ฏdลngsฤซ
NPไธœๆ–ฏ็ง‘ไผŠdลngsฤซkฤ“yฤซ
Eไธœๆ–ฝๆ•ˆ้ขฆdลngshฤซxiร opรญn
NPไธœๆždลngzhฤซ
NPไธœ่‡ณๅŽฟdลngzhรฌxiร n
NPไธœ่€ณdลngฤ›r

Practical Applications

CLD is being used by major IT companies to enhance their Chinese morphological analysis technology and is especially suitable for natural language processing (NLP) applications, such as:

Segmentation and tokenization

Named-entity recognition

Input method editors

Morphological analysis

Information retrieval

Part-of-speech tagging

Related Resources

JLD

Japanese Lexical Database

Monolingual general vocabulary for NLP applications

KLD

Korean Lexical Database

Monolingual general vocabulary for NLP applications

CHD

Chinese Hanyu Pinyin Database

Accurate hanyu pinyin data including technical terms and proper nouns