Chinese Personal Name Variants
Covers over seven million entries
Chinese and non-Chinese names
Attributes such as name type and pinyin
Overview
The number of Chinese personal names and their variants is very large (in the millions) and identifying those is a difficult computational linguistic task. To enhance named-entity recognition (NER) technology and various other NLP applications, CJKI maintains a comprehensive dataset of over 1.6 million Chinese seed names (surnames and given names separately) and approximately 7.6 million romanized variants for these names.
CJKI’s Database of Chinese Personal Name Variants (CNV) includes Chinese first names and last names in all the standard romanization systems. CNV also includes classification codes, frequency of occurrence statistics, and gender codes. There are many systems for romanizing Chinese, such as Hanyu Pinyin, Wade-Giles, Yale, and Tongyong Pinyin (Taiwan), as well as various other popular romanization systems, and many ways to romanize Chinese that have fallen out of use.
CNV provides comprehensive coverage for the major Chinese romanization systems and their variants. We also maintain a Chinese Name Dialectal Variants database for dialectical romanized variants, covering Cantonese, Hakka and Hokkien dialects.
Chinese Personal Name Variants
* Select one of the tabs below.
Type | Chinese | Pinyin | Tongyong | Yale | Wade-Giles | Variants | English |
---|---|---|---|---|---|---|---|
G | 百欣 | bǎi xīn | Baisin | Baisyin | Paihsin | Paisin | Baixin |
S | 白 | bái | Bai | Bai | Pai | Bai | |
G | 北强 | běi qiáng | Beiciang | Beichyang | Peich’iang | Peits’iêng, Peichiang, Peitsiêng | Beiqiang |
G | 炳章 | bǐng zhāng | Bingjhang | Bingjang | Pingchang | Bingzhang | |
G | 宝程 | bǎo chéng | Baocheng | Baucheng | Paoch’eng | Paocheng | Baocheng |
G | 爱华 | ài huá | Aihua | Aihwa | Aihua | Ngaihua | Aihua |
G | 伯芝 | bó zhī | Bojhih | Bwojr | Pochih | Bozhi | |
G | 长流 | cháng liú | Changliou | Changlyou | Ch’angliu | Changliu | |
G | 邦达 | bāng dá | Bangda | Bangda | Pangta | Bangda | |
S | 曹 | cáo | Cao | Tsau | Ts’ao | Cao |
Category | Variants | Type |
---|---|---|
LANGUAGE | 业经 | Simplified Chinese |
業經 | Traditional Chinese | |
業経 | Japanese | |
예징 | Hanja reading | |
業經 | Korean Hanja | |
Yejing | MOE (Korean Ministry of Education Romanization) | |
Yejing | NRS (New Romanization System) | |
Yejing | KLS (Korean Language Society Romanization) | |
Yecing | ISO DPRK (Used in North Korea) | |
Yejing | ISO ROK (Used in South Korea) |
Practical Applications
CNV is used for identifying, processing and normalizing names and their numerous romanized variants and is indispensable in a variety of applications, including:
Improving accuracy of machine translation
Segmentation and morphological analysis
Immigration control systems
Security applications
Query processing by search engines
Named-entity recognition
Database cleansing and normalization
Anti-money laundering (AML)
Reference Documents
Related Resources

Chinese-English Personal Names
Chinese-English database of CJK and Western personal names

Chinese-Japanese Personal Names
Chinese-Japanese database of CJK and Western personal names

Japanese Personal Name Variants
Japanese personal names and their romanized variants