Multilingual Database of Proper Nouns

CJKE-DPN

多言語固有名詞辞典    多语专有名词词典

日本語
日中韓辭典研究所
©2004-2012 The CJK Dictionary Institute, Inc.



Basic Specifications
Overview and Coverage


We maintain the world's largest databases of CJK proper nouns, with approximately three million entries, used by some of the world's major IT companies for a wide variety of applications such as named entity recognition (NER), machine translation (MT), information retrieval (IR) and input method editors. This edition, the Multilingual Database of Proper Nouns, (CJKE-DPN) currently contains about 150,000 entries (including variants), covering the most common CJK and Western personal names and surnames, brings together five languages (CJKE) -- Simplified Chinese (SC), Traditional Chinese (TC), Japanese, Korean and English, in a multidirectional format, and has been expanded to include Arabic (see Table 2) and Spanish (not shown here).

The database includes various data fields (many of which are not shown in the sample), such as readings in pinyin, zhuyin fuhao (註音符號), hiragana and several romanization systems, semantic classification codes and frequency rankings, locale codes, and other useful information such as frequency statistics, only some of which are shown here.

Editorial Policy


It is important to note that the TC names are not merely a code-conversion equivalent of the SC names, but are accurate on both the orthographic and the lexemic levels (similar to American 'color' vs. British 'colour' as opposed to American 'gas' vs. British 'petrol'). For example, New Zealand in SC is 新西兰 Xīnxīlán but in TC it is 紐西蘭 Niǔxīlán (click here for details).

This database is constantly kept up-to-date, and includes such recent changes and additions to proper names as the late 2005 change of the Chinese for Seoul (서울) from 汉城 (hànchéng) to 首尔 (shǒuěr).

A unique feature of this database is that we distingsuish between SC and TC readings. Thus the pinyin for SC 期荣 is qīróng, but for the TC 期榮 it is qírōng. For details, see Taiwan and PRC Pinyin differences.



Data Fields
Field Number Field
Name
Field Description
1 ID

Unique identifying number ("N" prefix for "Name")

2 TYPE

Semantic classification code such "S" for surname, "G" for given name and "P" for place name. For details, click here.

3 ENG

Name in English

4 JAP

Name in Japanese

5 SC

Name in Simplified Chinese (hanzi)

6 TC

Name in Traditional Chinese (hanzi)

7 KOR

Name in Korean (hangul)

8 LO

Lexemic or Orthographic (see Editorial Policy above)

9 YOMI

Japanese reading in hiragana

10 SC_PIN

SC reading in pinyin plus tone with hyphen separating syllables

11 TC_PIN

TC reading in pinyin plus tone with hyphen separating syllables

12 MOE

Korean transcription in former Ministry of Education romanization

13 ZHUYIN

TC reading in zhuyin fuhao (註音符號) with tone (not shown here)

14 LOCALE

Country code for origin of name (not shown here):




Table 1
CJKE Multilingual Database of Place Names
IDTYPEEnglishJapaneseSCTCKoreanLO  YOMISC_PINTC_PINMOE
N002657PArubaアルーバ阿鲁巴阿盧巴아루바섬Lあるーばa1-lu3-ba1a1-lu2-ba1a-ru-pa-so~m
N001635PAzerbaijanアゼルバイジャン阿塞拜疆亞塞拜然아제르바이잔Lあぜるばいじゃんa1-sai1-bai4-jiang1ya4-se4-bai4-ran2a-che-ru~-pa-i-chan
N081006PBrasiliaブラジリア巴西利亚巴西利亞브라질리아Oぶらじりあba1-xi1-li4-ya4ba1-xi1-li4-ya4pu~-ra-chil-ri-a
N016658PCaracasカラカス加拉加斯卡拉卡斯카라카스Lからかすjia1-la1-jia1-si1ka3-la1-ka3-si1k'a-ra-k'a-su~
N014214PCairoカイロ开罗開羅카이로Oかいろkai1-luo2kai1-luo2k'a-i-ro
N017653PCanton広東广东廣東광둥Oかんとんguang3-dong1guang3-dong1kwang-tung
N058842SPChadチャド乍得查德차드Lちゃどzha4-de2cha2-de2ch'a-tu~
N047517GPuGeorgiaジョージア乔治亚喬治亞조지아Oじょーじあqiao2-zhi4-ya4qiao2-zhi4-ya4cho-chi-a
N023778PGuineaギニア几内亚幾內亞기니Oぎにあji3-nei4-ya4ji3-nei4-ya4ki-ni
N078960SPFukuoka福岡福冈福岡후쿠오카Oふくおかfu2-gang1fu2-gang1hu-k'u-o-k'a
N000617PIrelandアイルランド爱尔兰愛爾蘭아일랜드Oあいるらんどai4-er3-lan2ai4-er3-lan2a-il-raen-tu~
N068134PNew Zealandニュージーランド新西兰紐西蘭뉴질랜드Lにゅーじーらんどxin1-xi1-lan2niu3-xi1-lan2nyu-chil-raen-tu~
N36301PSeoulソウル首尔首爾서울Oそうるshou3-er3shou3-er3so~-ul
N054474PSeoulソウル汉城漢城서울Oそうるhan4-cheng2han4-cheng2so~-ul
N062125PTel Avivテルアビブ特拉维夫特拉維夫텔아비브Oてるあびぶte4-la1-wei2-fu1te4-la1-wei2-fu1t'el-a-pi-pu~
N004005PYemenイエメン也门葉門예멘Lいえめんye3-men2ye4-men2ye-men
N100468PWeishan微山微山微山웨이산Oびざんwei1-shan1wei2-shan1we-i-san
N080687PWuhan武漢武汉武漢우한Oぶかんwu3-han4wu3-han4u-han



Table 2
CJKA Multilingual Database of Place Names
English Japanese SC LO TC Korean Arabic
Aruba アルーバ 阿鲁巴 L 阿盧巴 아루바섬 أروبا
Brasilia ブラジリア 巴西利亚 O 巴西利亞 브라질리아 برازيليا
Caracas カラカス 加拉加斯 L 卡拉卡斯 카라카스 كراكاس
Cairo カイロ 开罗 O 開羅 카이로 القاهرة
Chad チャド 乍得 L 查德 차드 تشاد
Georgia ジョージア 乔治亚 O 喬治亞 조지아 جورجيا
Ireland アイルランド 爱尔兰 O 愛爾蘭 아일랜드 آيرلندا
Seoul ソウル 首尔 O 首爾 서울 سيول
Seoul ソウル 汉城 O 漢城 서울 سيول
Tel Aviv テルアビブ 特拉维夫 O 特拉維夫 텔아비브 تل أبيب
Yemen イエメン 也门 L 葉門 예멘 اليمن





Table 3
CJKE Multilingual Database of Personal Names
IDTYPEEnglishJapaneseSCTCKoreanLO  YOMISC_PINTC_PINMOE
N000034SAbbaアッバ阿巴亞伯아바Lあっばa1-ba1ya4-bo2a-pa
N000035SAbbasアッバース阿巴斯阿巴斯아바스Oあっばーすa1-ba1-si1a1-ba1-si1a-pa-su~
N002982GAlbertoアルベルト阿尔韦托阿爾韋托알베르토Oあるべるとa1-er3-wei2-tuo1a1-er3-wei2-tuo1al-pe-ru~-t'o
N0386171GQirong期栄期荣期榮치룽Oきえいqi1-rong2qi2-rong2ch'i-rung
N000871FAkiko暁子晓子曉子아키코Oあきこxiao3-zi3xiao3-zi3a-k'i-k'o
N000872FAkiko顕子显子顯子아키코Oあきこxian3-zi3xian3-zi3a-k'i-k'o
N000873FAkiko昭子昭子昭子아키코Oあきこzhao1-zi3zhao1-zi3a-k'i-k'o
N001161FMAkira아키라Oあきらming2ming2a-k'i-ra
C139707GDengOとうdeng1deng1to~ng
N000629SEinsteinアインスタイン爱因斯坦愛因斯坦아인슈타인Oあいんすたいんai4-yin1-si1-tan3ai4-yin1-si1-tan3a-in-syu-t'a-in
N000134GErnestアーネスト欧内斯特歐尼斯特어니스트Lあーねすとou1-nei4-si1-te4ou1-ni2-si1-te4o~-ni-su~-t'u~
N026074SGreggグレッグ格雷格葛瑞格그레그Lぐれっぐge2-lei2-ge2ge3-rui4-ge2ku~-re-ku~
N026075GGregグレッグ格雷格葛瑞格그레그Lぐれっぐge2-lei2-ge2ge3-rui4-ge2ku~-re-ku~
N014143GHaiyang海洋海洋海洋하이양Oかいようhai3-yang2hai3-yang2ha-i-yang
N014144GHuaiyang懐陽怀阳懷陽화이양Oかいようhuai2-yang2huai2-yang2hwa-i-yang
N046125GJackジャック杰克傑克Oじゃっくjie2-ke4jie2-ke4chaek
N046119GJackieジャッキー杰基傑基재키Oじゃっきーjie2-ji1jie2-ji1chae-k'i
N028385SKennedyケネディ肯尼迪甘迺迪케네디Lけねでぃken3-ni2-di2gan1-nai3-di2k'e-ne-ti
N014142PKaiyang開陽开阳開陽카이양Oかいようkai1-yang2kai1-yang2k'a-i-yang
N067417SPNakajima中島中岛中島나카지마Oなかじまzhong1-dao3zhong1-dao3na-k'a-chi-ma
N006561GWilliamウィリアム威廉威廉빌리암Oうぃりあむwei1-lian2wei1-lian2pil-ri-am
C110425SZhangOちょうzhang1zhang1chang