In 1996, CJKI launched a project whose goal was to develop a Chinese to Chinese conversion (C2C) system that supports Simplified to Traditional Chinese (SC>TC) and Traditional to Simplified Chinese (TC>SC) conversions that give near-perfect results. Orthographic conversion is mapping simplified forms to traditional forms on a character and word levels, such as SC 国家 to TC 國家 (‘country’) and vice versa, while lexemic conversion is mapping such as vocabulary items on a semantic level, such as SC 出租车 to TC 計程車 (‘taxi’) and vice versa.

C2C has been a major undertaking that required a considerable investment of funds and human resources. To this end, we have engaged in the following research and development activities:

In-depth investigation of technical and linguistic issues related to C2C

Research on Chinese word segmentation technology

Construction of comprehensive SC-TC mapping tables

To achieve a high level of conversion accuracy, our large-scale mapping tables include various other attributes, such as pinyin readings, grammatical information, part-of-speech codes, and semantic classification codes.


Each entry is accompanied by various attributes, such as:

Code level mappings

In the SC-to-TC and TC-to-SC directions

Semantic classification codes

Such as orthographic or lexemic codes

Lexemic mappings

In the SC-to-TC and TC-to-SC directions

Orthographic mappings

In the SC-to-TC and TC-to-SC directions

Phonological information

Such as pinyin and zhuyin

Grammatical information

Such as part-of-speech codes

SC and TC Conversion

TypeSCSC PinyinTCTC Pinyin
O鲍林bào lín鮑林bào lín
O抱拢bào lǒng抱攏bào lǒng
O报录bào lù報錄bào lù
OP暴露bào lù暴露pù lù
O暴乱bào luàn暴亂bào luàn
O鲍伦bào lún鮑倫bào lún
O鲍螺bào luó鮑螺bào luó
O抱锣bào luó抱鑼bào luó
OP显微镜xiǎn wēi jìng顕微鏡xiǎn wéi jìng
O国家guó jiā國家guó jiā
OP企业qǐ yè企業qì yè
OP危险wēi xiǎn危險wéi xiǎn
O计算机jì suàn jī計算機jì suàn jī
L计算机jì suàn jī電腦diàn nǎo
O电脑diàn nǎo電腦diàn nǎo
L出租车chū zū chē計程車jì chéng chē
O计程车jì chéng chē計程車jì chéng chē
O出租车chū zū chē出租車chū zū chē
L文件wén jiàn檔案dǎng àn

Practical Applications

C2C is ideal for applications such as:

Machine translation into TC

Such as convert SC to TC instead of English to TC

SC-to-TC and TC-to-SC conversion

Machine translation into SC

Such as convert TC to SC instead of English to SC

