Dictionaries

   Overview
   Arabic
   Chinese
   Japanese
   Korean
   Mobile

Other

   Articles/papers
   KDPS
   Jack Halpern

Company

   About
   Data Licensing
   Jobs
   Location
   Contact
   Map










Chinese-English Database of Proper Nouns

汉英专有名词数据库

Overview

We maintain the world's largest Chinese-English bilingual bidirectional databases of proper nouns. These are used by some of the world's major IT companies for a wide variety of applications, such as:

  • machine translation (MT)
  • information retrieval (IR)
  • morphological analysis (MA)
  • electronic dictionaries (ED)
  • input method editors (IME)
  • named entity recognition (NER)

See also our Database of Chinese and non-Chinese Place Names, a subset of the data described on this page. Also, please see our Comprehensive Database of Chinese Name Variants, which contains hundreds of thousands of name variants.

Coverage

Our comprehensive database of proper nouns in Chinese currently has about one million Simplified Chinese and one million Traditional Chinese headwords. This includes:

  • Chinese personal names
  • Non-Chinese personal names
  • Chinese place names
  • Non-Chinese place names
  • Companies and organizations
  • Publications and literary works
  • Names of famous people
  • Miscellaneous proper nouns

These include various attributes such as readings in pinyin, zhuyin fuhao, Cantonese and several romanization systems, semantic classification codes and frequency rankings, locale codes, and other useful information such as frequency statistics, described in chinfreq.htm. Many fields, such as frequency and English equivalents, are not shown here.

The data covers both Chinese and non-Chinese names. For information on Japanese names written in Chinese, see SC Japanese Proper Nouns. For information on the CJKI comprehensive database covering general vocabulary, see chinword.htm.

Format and Encoding

The data in any desired format, such as plain text files with fields delimited by tabs, Excel, Access, html, etc. in any encoding, such as UTF-8, UCS-2, EUC, GB-2312 and Big Five (the sample below is encoded in UTF-8).

Field Description
Field
Number
Field
Name
Description
1 ID The prefix "N" stands for "Name"
2 TYPE Semantic classification code such "S" for surname, "G" for given name and "P" for place name. For details, see cpostype.htm.
3 SC Name in Simplified Chinese
4 TC Name in Traditional Chinese
5 ENG Name in English (not shown in sample below).
6 PINYIN Pinyin plus tone with hyphen separating syllables.
7 LOCALE Country code for origin of name (not shown in sample below).

Data Sample

SC and TC Proper Nouns
ID TYPE SC TC Pinyin
N0056616P阿巴丹阿巴丹a1-ba1-dan1
N0056619P阿必尚阿必尚a1-bi4-shang4
N0056627P阿布达比阿布達比a1-bu4-da2-bi3
N0056688P阿迪斯阿贝巴阿迪斯阿貝巴a1-di2-si1-a1-bei4-ba1
N0056538Pc阿尔巴尼亚阿爾巴尼亞a1-er3-ba1-ni2-ya4
N0056539Pc阿尔巴尼亚共和国阿爾巴尼亞共和國a1-er3-ba1-ni2-ya4-gong4-he2-guo2
N0056529P阿尔及尔阿爾及爾a1-er3-ji2-er3
N0056530Pc阿尔及利亚阿爾及利亞a1-er3-ji2-li4-ya4
N0056533N阿尔考特阿爾考特a1-er3-kao3-te4
N0056536P阿尔斯太阿爾斯太a1-er3-si1-tai4
N0056537P阿尔泰山阿爾泰山a1-er3-tai4-shan1
N0056623Pc阿富汗阿富汗a1-fu4-han4
N0056624Pc阿富汗伊斯兰国阿富汗伊斯蘭國a1-fu4-han4-yi1-si1-lan2-guo2
N0056596Pc阿根廷共和国阿根廷共和國a1-gen1-ting2-gong4-he2-guo2
N0056570N阿基米得阿基米得a1-ji1-mi3-de2
N0056582P阿克苏阿克蘇a1-ke4-su1
N0056576P阿肯色阿肯色a1-ken3-se4
N0056577P阿肯色州阿肯色州a1-ken3-se4-zhou1
N0056658N阿奎那阿奎那a1-kui2-na4
N0056676P阿奎那阿拉巴馬州a1-la1-ba1-ma3-zhou1
N0056682P阿拉伯半岛阿拉伯半島a1-la1-bo2-ban4-dao3
N0056678N阿拉伯劳伦斯阿拉伯勞倫斯a1-la1-bo2-lao2-lun2-si1
N0056680Pc阿拉伯联合大公国阿拉伯聯合大公國a1-la1-bo2-lian2-he2-da4-gong1-guo2
N0056681Pc阿拉伯沙漠阿拉伯沙漠a1-la1-bo2-sha1-mo4
N0056666P阿拉斯加阿拉斯加a1-la1-si1-jia1
N0056667P阿拉斯加州阿拉斯加州a1-la1-si1-jia1-zhou1
N0056645GP阿里阿里a1-li3
N0056648P阿里山山脉阿里山山脈a1-li3-shan1-shan1-mai4
N0056652P阿留申群岛阿留申群島a1-liu2-shen1-qun2-dao3
N0056552P阿马达阿馬達a1-ma3-da2
N0056657Pc阿曼苏丹国阿曼蘇丹國a1-man4-su1-dan1-guo2
N0056659P阿姆河阿姆河a1-mu3-he2
N0056661P阿姆斯特丹阿姆斯特丹a1-mu3-si1-te4-dan1
N0056613N阿那律阿那律a1-na4-lv4
N0056551N阿难陀阿難陀a1-nan2-tuo2
N0056541P阿帕拉契山脉阿帕拉契山脈a1-pa4-la1-qi4-shan1-mai4
N0056547Pc阿萨密省阿薩密省a1-sa4-mi4-sheng3
N0056689N阿阇世阿闍世a1-she2-shi4
N0056544P阿苏山阿蘇山a1-su1-shan1
N0056564P阿瓦阿瓦a1-wa3
N0146242P*ai1
N0146249P埃佛勒斯峰埃佛勒斯峰ai1-fo2-le4-si1-feng1
N0146243Pc埃及埃及ai1-ji2
N0146244Pc埃及阿拉伯共和国埃及阿拉伯共和國ai1-ji2-a1-la1-bo2-gong4-he2-guo2
N0146247P埃特纳火山埃特納火山ai1-te4-na4-huo3-shan1
N0153643Sai4
N0030297P爱奥尼亚愛奧尼亞ai4-ao4-ni2-ya4
N0030288P爱达荷州愛達荷州ai4-da2-he2-zhou1
N0030394P爱德华岛愛德華島ai4-de2-hua2-dao3
N0030393N爱迪生愛迪生ai4-di2-sheng1