
The number of personal names and their variants (e.g. over 100 ways to spell Jun'ichirō) is in the world is in the billions. The number of place names is also large, but they have fewer variants. Identifying names and their variants is a difficult computational linguistic task. Named Entity Recognition (NER) is a hot topic in computational linguistics and plays an important role in many IT applications.
To enhance this technology, CJKI maintains comprehensive databases of several million proper nouns, especially of Japanese names and Chinese names. This document describes some issues of Japanese name variation and provides samples of our extensive Japanese name variant resources. For reference, see also The Role of Lexical Resources in CJK NLP Applications and Named Entity Contextual Clues.
Identifying, processing and normalizing names and their numerous variants are useful in a variety of applications, including:
- Anti money-laundering by financial institutions.
- Security applications such as identifying suspected name variants of terrorists and criminals.
- Query processing by search engines.
- Immigration control systems.
- Improving the accuracy of machine translation.
- Entity and information extraction.
- Segmentation and morphological analysis of CJK languages.
Large databases of name variants play a critical role in such applications. CJKI maintains databases of several million names and name variants in all major and most minor romanization systems for Chinese, Japanese and Korean, including the major Chinese dialects, as well as for Arabic and Spanish.
Japanese personal names are extremely numerous. Our database contains about 400,000 unique given names and some 150,000 surnames (see japname.htm for details). If we add to this the numerous romanized variants, we get millions of names.
There are several well-established systems for romanizing Japanese, as well as various popular ones and even hybrid ones where the same word is written in a mixture of different systems. The principal systems and other systems that CJKI databases support are as follows (click on the link for details). The examples are for the name 大津 (おおづ) and 山口 (やまぐち).
| System | Example | Description |
|---|---|---|
| Hepburn | Ōzu | The most widely used system, in several variations as shown in Table 2 below. |
| Kunrei | Ôzu | The official Japanese government system that has become an ISO standard (ISO 3602). |
| Nippon | Ôdu | The predecessor of the Kunrei system but still in use. |
| Waapuro | Ouzu | Based on popular input methods. |
| English | Ozu | The most common English spelling based on Hepburn with long vowels omitted. |
| Germanic | Jamagutschi | German based romanization. |
| Romance | Yamagutchi | Romance language based romanization. |
| Variants | Oozu Ohzu Oodu Oudu Ohdu Odu | Miscellaneous variants of each system, such as the different flavors of Hepburn. |
CJKI's name variant databases contain millions of entries that cover all the above systems, their variants, and hybrids. Below are samples of these variants and a brief description of why there is so much variation. There are other systems, like the JSL system devised by Eleanor Jorden, and the ALA-LC system, essentially identical to the Revised Hepburn system, which are not shown in the samples below.
The English-based Hepburn romanization system was devised by the Reverend James Curtis Hepburn and introduced in his Japanese–English dictionary published in 1867. It is the most widely used system and serves as the de facto standard. It is in common use even by the Japanese government in place of the Kunrei Romnization, the official standard.
Contrary to popular belief, the Hepburn system comes in many flavors. The standard, official Hepburn system is called Revised Hepburn, but some of the other variants shown below are just as popular, if not more so. Note that Revised Hepburn is sometimes confusingly referred to as Modified Hepburn, a less popular system used by some dictionaries and linguists.
| KANJI | YOMI | ENGLISH | Revised Hepburn | Modified Hepburn | Traditional Hepburn | Passport Hepburn | Waapuro Hepburn | Hepburn Variants |
|---|---|---|---|---|---|---|---|---|
| 佐藤 | さとう | Sato | Satō | Satoo | Satō | Satoh, Sato | Satou | Satô |
| 大津 | おおづ | Ozu | Ōzu | Oozu | Ōzu | Ohzu, Ozu | Oozu | Ôzu |
| 井生 | いおう | Io | Iō | Ioo | Iō | Ioh, Io | Iou | Iô |
| 伊大地 | いおおじ | Ioji | Iōji | Iōji | Iōji | Iohji, Ioji | Iooji | Iôji |
| 天満屋 | てんまんや | Tenman'ya, Tenmanya | Tenman'ya, Tenmanya, Tenman-ya | Tenman'ya, Ten̄man̄ya | Tenman'ya | Tenman'ya, Tenmanya, Tenman-ya | Tenmanya | |
| 山陰房 | さんいんぼう | San'inbo, Saninbo | San'inbō, Saninbō, San-inbō | San'inboo, Saninboo, San̄in̄boo | San'imbō, Sanimbō | San'imboh, Sanimboh, San-imboh, San'imbo, Sanimbo, San-imbo | Saninbou | San'inbô, Saninbô, San-inbô, San'imbô, Sanimbô, San-imbô |
| 本間 | ほんま | Honma | Honma | Honma, Hon̄ma | Homma | Homma | Honma | |
| 淳一郎 | じゅんいちろう | Jun'ichiro, Junichiro | Jun'ichirō, Junichirō, Jun-ichirō | Jun'ichiroo, Junichiroo, Jun̄ichiroo | Jun'ichirō, Junichirō | Jun'ichiroh, Junichiroh, Jun-ichiroh, Jun'ichiro, Junichiro, Jun-ichiro | Junichirou | Jun'ichirô, Junichirô, Jun-ichirô |
| 山口 | やまぐち | Yamaguchi | Yamaguchi | Yamaguchi | Yamaguchi | Yamaguchi | Yamaguchi | |
| 愛子 | あいこ | Aiko | Aiko | Aiko | Aiko | Aiko | Aiko |
Table 3 below shows examples of romanized names in various official and unofficial systems. Only the standard, official version is shown under the column for each of the three principal systems: Hepburn, Kuneri and Nippon. Variants of each of these systems, such as the different flavors of Hepburn, are all collected in the Variants column. All hybrids are shown in the Hybrids column. The Waapuro System, which has many variants, is not shown in a separate column but is included in the Variants column.
As can be seen from Table 2 and Table 3, variation occurs for various reasons:
- The representation of long vowels, especially /o:/ written as ō, o, ô, ou. or oh.
- Moraic /N/ (ん) sometimes written as m, rather than n, before /b/, /p/, and /m/.
- Apostrophes omitted or replaced by hyphens when /N/ is followed by a vowel or /y/.
- Multiple representations for certain consonants: e.g., じゃ is written as ja, zya or jya.
In the real world, each of the various systems has variants, and names are often written by mixing multiple systems. For example, Juniti consists of Jun, the Modified Hepburn for じゅんう, and iti, the Kunrei version of いち. We refer to such combinations as hybrids.
| Kanji | Yomi | English | Hepburn | Kunrei | Nippon | Variants | Hybrids | Germanic | Latin |
|---|---|---|---|---|---|---|---|---|---|
| 佐藤 | さとう | Sato | Satō | Satô | Satô | Satoo, Satou, Satoh | |||
| 青塚 | あおづか | Aozuka | Aozuka | Aozuka | Aoduka | Aozuca | Aoduca | ||
| 愛子 | あいこ | Aiko | Aiko | Aiko | Aiko | Aico | |||
| 生越 | いくごし | Ikugoshi | Ikugoshi | Ikugosi | Ikugosi | Icugosi | Icugoshi | Ikugoschi | Ikugochi |
| 大津 | おおづ | Ozu | Ōzu | Ôzu | Ôdu | Oozu, Ouzu, Ohzu, Oodu, Oudu, Ohdu, Odu | Ōdu | ||
| 井生 | いおう | Io | Iō | Iô | Iô | Ioo, Iou, Ioh | |||
| 伊大地 | いおおじ | Ioji | Iōji | Iôzi | Iôzi | Iōzi, Ioozi, Iouzi, Iohzi, Iozi, Iooji, Iouji, Iohji, Iôji | |||
| 橋本 | はしもと | Hashimoto | Hashimoto | Hasimoto | Hasimoto | Haschimoto | Hachimoto | ||
| 青柳塘 | あおやぎとう | Aoyagito | Aoyagitō | Aoyagitô | Aoyagitô | Aoyagitoo, Aoyagitou, Aoyagitoh | Aojagito | ||
| 天満屋 | てんまんや | Tenman'ya | Tenman'ya | Tenman'ya | Tenman'ya | Temman'ya, Temmanya, Temman-ya, Tenmanya, Tenman-ya | Tenman'ja, Tenmanja, Tenman-ja | ||
| 靑山 | あおやま | Aoyama | Aoyama | Aoyama | Aoyama | Aojama | |||
| 赤口 | あかぐち | Akaguchi | Akaguchi | Akaguti | Akaguti | Acaguci | Akaguci, Acaguchi, Acaguti | Akagutschi | Akagutchi |
| 山口 | やまぐち | Yamaguchi | Yamaguchi | Yamaguti | Yamaguti | Yamaguci | Jamagutschi | Yamagutchi | |
| 裕子 | ゆうこ | Yuko | Yūko | Yûko | Yûko | Yûco, Yūco, Yuuco, Yuco, Yuuko | Juko | ||
| 相越 | あいこし | Aikoshi | Aikoshi | Aikosi | Aikosi | Aicosi | Aicoshi | Aikoschi | Aikochi |
| 吉田 | よしだ | Yoshida | Yoshida | Yosida | Yosida | Joschida | Yochida | ||
| 正月 | しょうげつ | Shogetsu | Shōgetsu | Syôgetu | Syôgetu | Syōgetu, Syoogetu, Syougetu, Syohgetu, Syogetu, Shoogetsu, Shougetsu, Shohgetsu, Shôgetsu | Shōgetu, Shoogetu, Shougetu, Shohgetu, Shogetu, Shôgetu, Syôgetsu, Syōgetsu, Syoogetsu, Syougetsu, Syohgetsu, Syogetsu | Schogetsu | Chogetsu |
| 山陰房 | さんいんぼう | San'inbo | San'inbō | San'inbô | San'inbô | Saninbô, San-inbô, Saninbō, San-inbō, San'inboo, Saninboo, San-inboo, San'inbou, Saninbou, San-inbou, San'inboh, Saninboh, San-inboh, Saninbo, San-inbo, San'imbō, Sanimbō, San-imbō, San'imboo, Sanimboo, San-imboo, San'imbou, Sanimbou, San-imbou, San'imboh, Sanimboh, San-imboh, San'imbo, Sanimbo, San-imbo, San'imbô, Sanimbô, San-imbô | |||
| 四本松 | しほんまつ | Shihonmatsu | Shihonmatsu | Sihonmatu | Sihonmatu | Shihommatsu | Shihonmatu, Shihommatu, Sihonmatsu, Sihommatsu, Sihommatu | Schihonmatsu | Chihonmatsu |
| 佳子 | よしこ | Yoshiko | Yoshiko | Yosiko | Yosiko | Yosico | Yoshico | Joschiko | Yochiko |
As mentioned above, the reason there are so many Japanese name variants is because of such phenomena as the presence or absence of apostrophes and the multiple ways of expressing long vowels and certain consonants. If these factors happen to combine in the same name, the number of permutations explodes. Combined with the many hybrids, the number of variants for a single name can go into the hundreds.
An example of this is the first name of Japan's former prime minister Jun'ichirō Koizumi (in standard Revised Hepburn). The table below shows the 169 variants of Jun'ichirō, classified roughly by rank, many of which are high frequency and in widespread use. Although they are all legitimate in the sense that they follow the rules of spelling variation for each system, or are hybrids of such variants, some may be rare or non-existing at a particular time or a particular corpus. But since such variants can potentially occur at different times in different corpora, they are included in our databases, which aim to provide a full solution to identifying name variants.
| LS_ID | Type | Romanization | Rank |
|---|---|---|---|
| LS038 | VARIANT | Junichiro | A |
| LS001 | ENG | Jun'ichiro | A |
| LS039 | VARIANT | Jun-ichiro | A |
| LS041 | VARIANT | Junichirô | A |
| LS093 | HYBRID | Juniciro | A |
| LS002 | HEPBURN | Jun'ichirō | A |
| LS059 | VARIANT | Jun-ichirō | A |
| LS033 | VARIANT | Junichirou | B |
| LS032 | VARIANT | Jun'ichirou | B |
| LS034 | VARIANT | Jun-ichirou | B |
| LS058 | VARIANT | Junichirō | B |
| LS147 | HYBRID | Jyunichiro | B |
| LS069 | HYBRID | Junitirou | B |
| LS075 | HYBRID | Junitiro | C |
| LS055 | VARIANT | Zyun'itiro | C |
| LS057 | VARIANT | Zyun-itiro | C |
| LS030 | VARIANT | Junichiroo | C |
| LS036 | VARIANT | Junichiroh | C |
| LS141 | HYBRID | Jyunichirou | C |
| LS035 | VARIANT | Jun'ichiroh | C |
| LS037 | VARIANT | Jun-ichiroh | C |
| LS046 | VARIANT | Zyun'itiroo | C |
| LS048 | VARIANT | Zyun-itiroo | C |
| LS146 | HYBRID | Jyun'ichiro | C |
| LS148 | HYBRID | Jyun-ichiro | C |
| LS144 | HYBRID | Jyunichiroh | C |
| LS029 | VARIANT | Jun'ichiroo | C |
| LS031 | VARIANT | Jun-ichiroo | C |
| LS159 | HYBRID | Jyunitirou | C |
| LS050 | VARIANT | Zyunitirou | C |
| LS165 | HYBRID | Jyunitiro | C |
| LS072 | HYBRID | Junitiroh | C |
| LS047 | VARIANT | Zyunitiroo | D |
| LS049 | VARIANT | Zyun'itirou | D |
| LS051 | VARIANT | Zyun-itirou | D |
| LS056 | VARIANT | Zyunitiro | D |
| LS111 | HYBRID | Zyunichiro | D |
| LS009 | LATIN | Junitchiro | D |
| LS092 | HYBRID | Jun'iciro | D |
| LS094 | HYBRID | Jun-iciro | D |
| LS043 | VARIANT | Zyun'itirō | D |
| LS045 | VARIANT | Zyun-itirō | D |
| LS110 | HYBRID | Zyun'ichiro | D |
| LS112 | HYBRID | Zyun-ichiro | D |
| LS143 | HYBRID | Jyun'ichiroh | D |
| LS145 | HYBRID | Jyun-ichiroh | D |
| LS162 | HYBRID | Jyunitiroh | D |
| LS104 | HYBRID | Zyun'ichirou | D |
| LS105 | HYBRID | Zyunichirou | D |
| LS106 | HYBRID | Zyun-ichirou | D |
| LS140 | HYBRID | Jyun'ichirou | D |
| LS142 | HYBRID | Jyun-ichirou | D |
| LS053 | VARIANT | Zyunitiroh | D |
| LS074 | HYBRID | Jun'itiro | D |
| LS076 | HYBRID | Jun-itiro | D |
| LS003 | KUNREI | Zyun'itirô | E |
| LS004 | NIPPON | Zyun'itirô | E |
| LS005 | GERMANIC | Jun'itschiro | E |
| LS006 | GERMANIC | Junitschiro | E |
| LS007 | GERMANIC | Jun-itschiro | E |
| LS008 | LATIN | Jun'itchiro | E |
| LS010 | LATIN | Jun-itchiro | E |
| LS011 | VARIANT | Jyun'icirô | E |
| LS012 | VARIANT | Jyunicirô | E |
| LS013 | VARIANT | Jyun-icirô | E |
| LS014 | VARIANT | Jyun'icirō | E |
| LS015 | VARIANT | Jyunicirō | E |
| LS016 | VARIANT | Jyun-icirō | E |
| LS017 | VARIANT | Jyun'iciroo | E |
| LS018 | VARIANT | Jyuniciroo | E |
| LS019 | VARIANT | Jyun-iciroo | E |
| LS020 | VARIANT | Jyun'icirou | E |
| LS021 | VARIANT | Jyunicirou | E |
| LS022 | VARIANT | Jyun-icirou | E |
| LS023 | VARIANT | Jyun'iciroh | E |
| LS024 | VARIANT | Jyuniciroh | E |
| LS025 | VARIANT | Jyun-iciroh | E |
| LS026 | VARIANT | Jyun'iciro | E |
| LS027 | VARIANT | Jyuniciro | E |
| LS028 | VARIANT | Jyun-iciro | E |
| LS040 | VARIANT | Jun'ichirô | E |
| LS042 | VARIANT | Jun-ichirô | E |
| LS044 | VARIANT | Zyunitirō | E |
| LS052 | VARIANT | Zyun'itiroh | E |
| LS054 | VARIANT | Zyun-itiroh | E |
| LS060 | VARIANT | Zyunitirô | E |
| LS061 | VARIANT | Zyun-itirô | E |
| LS062 | HYBRID | Jun'itirō | E |
| LS063 | HYBRID | Junitirō | E |
| LS064 | HYBRID | Jun-itirō | E |
| LS065 | HYBRID | Jun'itiroo | E |
| LS066 | HYBRID | Junitiroo | E |
| LS067 | HYBRID | Jun-itiroo | E |
| LS068 | HYBRID | Jun'itirou | E |
| LS070 | HYBRID | Jun-itirou | E |
| LS071 | HYBRID | Jun'itiroh | E |
| LS073 | HYBRID | Jun-itiroh | E |
| LS077 | HYBRID | Jun'itirô | E |
| LS078 | HYBRID | Junitirô | E |
| LS079 | HYBRID | Jun-itirô | E |
| LS080 | HYBRID | Jun'icirō | E |
| LS081 | HYBRID | Junicirō | E |
| LS082 | HYBRID | Jun-icirō | E |
| LS083 | HYBRID | Jun'iciroo | E |
| LS084 | HYBRID | Juniciroo | E |
| LS085 | HYBRID | Jun-iciroo | E |
| LS086 | HYBRID | Jun'icirou | E |
| LS087 | HYBRID | Junicirou | E |
| LS088 | HYBRID | Jun-icirou | E |
| LS089 | HYBRID | Jun'iciroh | E |
| LS090 | HYBRID | Juniciroh | E |
| LS091 | HYBRID | Jun-iciroh | E |
| LS095 | HYBRID | Jun'icirô | E |
| LS096 | HYBRID | Junicirô | E |
| LS097 | HYBRID | Jun-icirô | E |
| LS098 | HYBRID | Zyun'ichirō | E |
| LS099 | HYBRID | Zyunichirō | E |
| LS100 | HYBRID | Zyun-ichirō | E |
| LS101 | HYBRID | Zyun'ichiroo | E |
| LS102 | HYBRID | Zyunichiroo | E |
| LS103 | HYBRID | Zyun-ichiroo | E |
| LS107 | HYBRID | Zyun'ichiroh | E |
| LS108 | HYBRID | Zyunichiroh | E |
| LS109 | HYBRID | Zyun-ichiroh | E |
| LS113 | HYBRID | Zyun'ichirô | E |
| LS114 | HYBRID | Zyunichirô | E |
| LS115 | HYBRID | Zyun-ichirô | E |
| LS116 | HYBRID | Zyun'icirō | E |
| LS117 | HYBRID | Zyunicirō | E |
| LS118 | HYBRID | Zyun-icirō | E |
| LS119 | HYBRID | Zyun'iciroo | E |
| LS120 | HYBRID | Zyuniciroo | E |
| LS121 | HYBRID | Zyun-iciroo | E |
| LS122 | HYBRID | Zyun'icirou | E |
| LS123 | HYBRID | Zyunicirou | E |
| LS124 | HYBRID | Zyun-icirou | E |
| LS125 | HYBRID | Zyun'iciroh | E |
| LS126 | HYBRID | Zyuniciroh | E |
| LS127 | HYBRID | Zyun-iciroh | E |
| LS128 | HYBRID | Zyun'iciro | E |
| LS129 | HYBRID | Zyuniciro | E |
| LS130 | HYBRID | Zyun-iciro | E |
| LS131 | HYBRID | Zyun'icirô | E |
| LS132 | HYBRID | Zyunicirô | E |
| LS133 | HYBRID | Zyun-icirô | E |
| LS134 | HYBRID | Jyun'ichirō | E |
| LS135 | HYBRID | Jyunichirō | E |
| LS136 | HYBRID | Jyun-ichirō | E |
| LS137 | HYBRID | Jyun'ichiroo | E |
| LS138 | HYBRID | Jyunichiroo | E |
| LS139 | HYBRID | Jyun-ichiroo | E |
| LS149 | HYBRID | Jyun'ichirô | E |
| LS150 | HYBRID | Jyunichirô | E |
| LS151 | HYBRID | Jyun-ichirô | E |
| LS152 | HYBRID | Jyun'itirō | E |
| LS153 | HYBRID | Jyunitirō | E |
| LS154 | HYBRID | Jyun-itirō | E |
| LS155 | HYBRID | Jyun'itiroo | E |
| LS156 | HYBRID | Jyunitiroo | E |
| LS157 | HYBRID | Jyun-itiroo | E |
| LS158 | HYBRID | Jyun'itirou | E |
| LS160 | HYBRID | Jyun-itirou | E |
| LS161 | HYBRID | Jyun'itiroh | E |
| LS163 | HYBRID | Jyun-itiroh | E |
| LS164 | HYBRID | Jyun'itiro | E |
| LS166 | HYBRID | Jyun-itiro | E |
| LS167 | HYBRID | Jyun'itirô | E |
| LS168 | HYBRID | Jyunitirô | E |
| LS169 | HYBRID | Jyun-itirô | E |