Chinese Resources

Chinese Proper Nouns

Chinese Personal Name Variants

Chinese Personal Name Variants

Covers over seven million entries

Chinese and non-Chinese names

Attributes such as name type and pinyin

Overview

The number of Chinese personal names and their variants is very large (in the millions) and identifying those is a difficult computational linguistic task. To enhance named-entity recognition (NER) technology and various other NLP applications, CJKI maintains a comprehensive dataset of over 1.6 million Chinese seed names (surnames and given names separately) and approximately 7.6 million romanized variants for these names.

CJKI’s Database of Chinese Personal Name Variants (CNV) includes Chinese first names and last names in all the standard romanization systems. CNV also includes classification codes, frequency of occurrence statistics, and gender codes. There are many systems for romanizing Chinese, such as Hanyu Pinyin, Wade-Giles, Yale, and Tongyong Pinyin (Taiwan), as well as various other popular romanization systems, and many ways to romanize Chinese that have fallen out of use.

CNV provides comprehensive coverage for the major Chinese romanization systems and their variants. We also maintain a Chinese Name Dialectal Variants database for dialectical romanized variants, covering Cantonese, Hakka and Hokkien dialects.

Chinese Personal Name Variants

* Select one of the tabs below.

Type	Chinese	Pinyin	Tongyong	Yale	Wade-Giles	Variants	English
G	百欣	bǎi xīn	Baisin	Baisyin	Paihsin	Paisin	Baixin
S	白	bái	Bai	Bai	Pai	Bai
G	北强	běi qiáng	Beiciang	Beichyang	Peich’iang	Peits’iêng, Peichiang, Peitsiêng	Beiqiang
G	炳章	bǐng zhāng	Bingjhang	Bingjang	Pingchang	Bingzhang
G	宝程	bǎo chéng	Baocheng	Baucheng	Paoch’eng	Paocheng	Baocheng
G	爱华	ài huá	Aihua	Aihwa	Aihua	Ngaihua	Aihua
G	伯芝	bó zhī	Bojhih	Bwojr	Pochih	Bozhi
G	长流	cháng liú	Changliou	Changlyou	Ch’angliu	Changliu
G	邦达	bāng dá	Bangda	Bangda	Pangta	Bangda
S	曹	cáo	Cao	Tsau	Ts’ao	Cao

Category	Variants	Type
LANGUAGE	业经	Simplified Chinese
	業經	Traditional Chinese
	業経	Japanese
	예징	Hanja reading
	業經	Korean Hanja
	Yejing	MOE (Korean Ministry of Education Romanization)
	Yejing	NRS (New Romanization System)
	Yejing	KLS (Korean Language Society Romanization)
	Yecing	ISO DPRK (Used in North Korea)
	Yejing	ISO ROK (Used in South Korea)

Practical Applications

CNV is used for identifying, processing and normalizing names and their numerous romanized variants and is indispensable in a variety of applications, including:

Improving accuracy of machine translation

Segmentation and morphological analysis

Immigration control systems

Security applications

Query processing by search engines

Named-entity recognition

Database cleansing and normalization

Anti-money laundering (AML)

Reference Documents

Comprehensive Sample

Related Resources