SC and TC Chinese Pinyin Database

The CJKI Chinese lexical database currently contains over four million Simplified Chinese (SC) and Traditional Chinese (TC) headwords covering general vocabulary, important technical terms, and proper nouns. Each lexeme is accompanied by a pinyin reading or readings, and various other attributes (see chinword.htm for details).

What is especially noteworthy is that the pinyin readings take into account the differences in pronunciation between Taiwan and the People's Republic of China, as shown in the table below. Even highly educated native Chinese speakers are often surprised to discover that such differences exist.

Our pinyin readings have been thoroughly proofread for accuracy, and explicitly indicate the neutral tone, which is often ignored by conventional dictionaries. This data, which can be provided in all the major transcription systems such as Yale, Wade-Giles, Zhuyin and IPA, is especially useful for speech technology applications, such as TTS (text-to-speech) software.

Some differences between Taiwan and PRC Pinyin
ID Diff SC Hanzi SC Frequency SC Pinyin TC Hanzi TC Frequency TC Pinyin
I00001 D 临期 0000029000 línqī 臨期 0000028800 línqí
I00002 D 0030900000 0030900000
I00003 D 企业 0163000000 qǐyè 企業 0102000000 qìyè
I00004 D 倬雄 0000000167 zhuōxióng 倬雄 0000000167 zhuóxióng
I00005 D 0006720000 wēi 0006720000 wéi
I00006 D 危险 0022400000 wēixiǎn 危險 0003080000 wéixiǎn
I00007 D 0235000000 0006950000
I00008 D 埒城 0000000411 lièchéng 埒城 0000000411 lèchéng
I00009 D 夕日 0002020000 xīrì 夕日 0002020000 xìrì
I00010 D 大期 0000061500 dàqī 大期 0000061500 dàqí
I00011 D 巍八郎 0000000044 wēibāláng 巍八郎 0000000044 wéibāláng
I00012 D 帆柱 0000030600 fānzhù 帆柱 0000030600 fánzhù
I00013 D 0035500000 wēi 0035500000 wéi
I00014 D 微笑 0018400000 wēixiào 微笑 0018400000 wéixiào
I00015 D 拙夫 0000017200 zhuōfū 拙夫 0000017200 zhuófū
I00016 S 无着 0000265000 wúzhuó 無著 0000265000 wúzhuó
I00017 D 昔日 0004880000 xīrì 昔日 0004880000 xírì
I00018 D 显微镜 0003390000 xiǎnwēijìng 顯微鏡 0000228000 xiǎnwéijìng
I00019 D 期待 0059100000 qīdài 期待 0059100000 qídài
I00020 D 池穴 0000059400 chíxué 池穴 0000059400 chíxuè
I00021 D 理发 0002170000 lǐfà 理髮 0000495000 lǐfǎ
I00022 D 隆巴妮 0000000137 lóngbānī 隆巴妮 0000000137 lóngbāní
I00023 D 麦卡锡 0000058100 màikǎxī 麥卡錫 0000010400 màikǎxí
Comparison of Simplified Chinese Pinyin with Traditional Chinese Pinyin
ID Diff SC Hanzi SC Frequency SC Pinyin TC Hanzi TC Frequency TC Pinyin
G00018 S 咖啡豆 0000779000 kāfēidòu 咖啡豆 0000779000 kāfēidòu
G00019 S 咖啡豆 研磨机 0000001780 kāfēidòuyánmójī 咖啡豆 研磨機 0000001770 kāfēidòuyánmójī
G04348 S 咖啡豆象 0000000405 kāfēidòuxiàng 咖啡豆象 0000000405 kāfēidòuxiàng
G04349 S 咖啡豆酊 0000000028 kāfēidòudīng 咖啡豆酊 0000000028 kāfēidòudīng
G04340 S 咖啡酸 0000026200 kāfēisuān 咖啡酸 0000026200 kāfēisuān
G04342 S 咖啡醇 0000001910 kāfēichún 咖啡醇 0000001910 kāfēichún
G04335 S 咖啡 锈病 0000000208 kāfēixiùbìng 咖啡 鏽病 0000000023 kāfēixiùbìng
G00022 S 咖啡 面包卷 0000000110 kāfēimiànbāojuǎn 咖啡 面包卷 0000000110 kāfēimiànbāojuǎn
G00008 S 咖啡馆 0002260000 kāfēiguǎn 咖啡館 0002320000 kāfēiguǎn
G00024 D 咖喱 0001800000 gālí 咖喱 0001800000 kālǐ
G04358 D 咖喱 牛肉 0000072700 gālíniúròu 咖喱 牛肉 0000072700 kālǐniúròu
G00027 D 咖喱粉 0000087400 gālífěn 咖喱粉 0000087400 kālǐfěn
G04356 D 咖喱酱 0000030000 gālíjiàng 咖喱醬 0000029900 kālǐjiàng
G04357 D 咖喱饭 0000122000 gālífàn 咖喱飯 0000122000 kālǐfàn
G00026 D 咖喱鸡 0000146000 gālíjī 咖喱雞 0000146000 kālǐjī
G04360 S 咖陶导数 0000000000 kātáodǎoshù 咖陶導數 0000000000 kātáodǎoshù
G04359 S 咖马拉 0000000002 kāmǎlā 咖馬拉 0000000002 kāmǎlā