JAPANESE PHONETIC DATABASE (JPD)

日本語音韻データベース

[日本語]
©2006 -2012 The CJK Dictionary Institute, Inc.


Overview and Coverage

The CJK Dictionary Institute (CJKI) is engaged in research on Chinese, Japanese and Korean (CJK) phonetics and phonology and the development of phonetic/phonological databases referred to as the Japanese Phonetic Database (JPD). The JPD provides IPA phonetic transcriptions that accurately indicate how Japanese names and words are pronounced in actual speech, as well as accent codes, for each entry.

After years of development work, in December 2007 CJKI announced the completion of the first phase of the project by delivering JPD to The National Language Research Institute, the largest and most respected government organization in Japan that conducts scientific research on the Japanese language. To our knowledge, this is the first database of its kind which provides both IPA and accent codes covering Japanese vocabulary.

Developed by a team of experienced Japanese editors, this database can be used for such applications as:

  1. Speech synthesis systems (text-to-speech generation or TTS).
  2. Pedagogical research to help in the acquisition of Japanese as a foreign language.
  3. Research and development of Japanese speech technology in general.
Attributes


Each entry is accompanied by various attributes, such as:

  1. Rich phonological information, such as phonemic and phonetic transcriptions and accent codes.
  2. Grammatical information such as part-of-speech codes.
  3. Semantic classification codes such as the TYPE of proper noun.

Editorial Policy


The JDP was compiled to meet the needs of speech technology requirements, especially the generation of natural speech (TTS), by experienced editors with in-depth knowledge of Japanese phonology and phonetics. The phonetic transcriptions can be provided in various formats other than IPA, at different levels of precision, and fine-tuned to the needs of specific applications. To properly understand the data samples and field descriptions below, it is necessary to refer to The Challenges of Japanese Speech Technology, which describes the linguistic details.



Description of Fields
1 HEADWORD

Orthographic representation of Japanese word (kana/kanji)

2POS

Part-of-speech code (other grammatical/morphological attributes available) (see Japanese Lexical Database.)

3TYPE

A subclassification that identifies the semantic properties of the headword or gives supplementary information such as grammatical attributes, especially for proper nouns. See cpostype.htm for TYPE code definitions.

4ACCENT

This shows the pitch accent for the standard Tokyo Dialect, compiled by experienced native linguists. Since pitch accent for names is not standardized (it could vary with the individual), the concept of "100% accuracy" does not apply to names. Sometimes, multiple accents for the same name may be given.

The number in the Accent field represents the accented mora – that is, the point after which the accent begins to fall. For example, "1" means that the first mora is accented or high pitched, and that the pitch falls starting from the second mora. Accentless words are represented by “0”.


5KANA
READING

The reading of the headword in katakana.

6KANA
PHONEMIC

This field disambiguates the kana ambiguities in the Kana Reading field, and serves as a phonemic representation. Though it does contain minimal phonetic information, such as /g/ nasalization and devoicing, it does not accurately indicate allophonic variants, which is the function of the Phonetic field.

[‘] Follows the accented mora, marking the point from which the pitch begins to fall.
[&] The preceding /g/ is optionally nasalized, becoming [ŋ], a phonetically distinct allophone of [g], e.g. "アオカゲ&" represents [aokaŋe].
[^] The preceding vowel is devoiced. This is common for /i/ and /u/ between voiceless consonants, such as in "アイノス^ケ", where "ス^" represents [sɯ̥], not [sɯ].
[~] The preceding vowel is elongated. For example, "ウンドウ~ " (運動) represents /undoo/ [ɯndoː], whereas "イノウエ" (井上) represents /inoue/ [inoɯe], not /inooe/ [inoːe].

7PHONEMIC

This field fully disambiguates the kana ambiguities in the Kana Reading field, and thus serves as a precise phonemic representation. It does not indicate allophonic variants, which is the function of the Phonetic field.

8PHONETIC

The phonemic transcription in the Phonemic field is insufficient for accurate voice synthesis, whose goal is naturally-sounding Japanese. This field, the most important feature of this database, provides what can be roughly considered a broad phonetic transcription in IPA (International Phonetic Alphabet). In practice, it is considerably more accurate than traditional broad transcriptions, and accurately reflects allophonic variations fairly close to actual pronunciation (narrow transcriptions, not necessarily desirable, could be made available.)


9REMARKS

Comments on phonetic/phonological processes.

10RANK

Rank based on the occurrence of frequency in corpora and lexical sources can be provided (not shown here).



Data Sample

Below is a sample of our Japanese Phonetic Database (JPD). The most important features are the IPA transcription in the Phonetic field, which provides a precise representation of many allophonic changes such as devoicing and nasalization, and the Accent field, which gives the accent pattern code. Though allophonic variations occur mostly unconsciously in native speaker speech, they have a marked effect on the naturalness of synthesized speech. (The Remarks is now being translated to English.)


Sample of CJKI's Japanese Phonetic Database (JPD)
Head
word
POSTYPEAccentKana ReadingKana PhonemicPhonemicPhoneticRemarks
NC-3カガミカガ&ミkagamikaŋamivoiced velar nasal
NC-3カガミカガミkagamikaɡamivoiced velar stop
NC-3カガミカガ&ミkagamikaɣamivoiced velar fricative
危ないAJ-0アブナイアブナイabunaiabɯnaivoiced bilabial plosive
危ないAJ-0アブナイアブナイabunaiaβɯ̥naivoiced bilabial fricative
飾りNC-0カザリカザリkazarikazaɾivoiced alveolopalatal fricative
ザリガニNC-0ザリガニザリガ&ニzariganidzaɾiɡaɲialveolopalatal affricate, palatal nasal
新聞NC-0シンブンシンブンsiNbuNɕimbɯɴ 
自分NC-0ジブンジブンzibuNdʑibɯɴweakening of voiced alveolopalatal plosive (fricativized)
比較VN-0ヒカクヒ^カクhikakuhi̥kakɯdevoicing of [hi]
比較VN-0ヒカクヒカクhikakuhikakɯno devoiced vowel
続くV5-0ツヅクツ^ヅクtuzukutsɯ̥zɯkɯdevoicing of [tsɯ]
続くV5-0ツヅクツヅクtuzukutsɯzɯkɯno devoiced vowel
NC-2ハジハジhazihaʑivoiced alveolopalatal fricative
NC-2ハジハジhazihadʑivoiced alveolopalatal affricate
NC-0ハチハチhatihatɕi 
NC-2ハチハチhatihatɕi 
NC-1uʔɯʔglottal stop
アッI-1アッアッaQglottal stop
運動VN-0ウンドウウンドウ~uNdooɯndoː 
病院NC-0ビョウインビョウ~インbjooiNbʲoːiɴvoiced palatal
美容院NC-2ビヨウインビヨ'ウ~インbijooiNbijoːiɴ 
宴会NC-0エンカイエンカイeNkaieŋkai 
蒟蒻NC-0コンニャクコンニャクkoNnjakukoɲɲakɯalveolar nasal, voiced palatal nasal
NC-1ホンホ'ンhoNhoɴuvular nasalized consonant
NC-1ホンホ'ンhoNhõclose front nasalized vowel
本案NC-1ホンアンホ'ンアンhoNaNhoãɴclose back nasalized vowel (slight nasalization before /N/)
本意NC-1ホンイホ'ンイhoNihõĩclose front nasalized vowel
本会議NC-3ホンカイギホンカイギhoNkaigihoŋkaiŋivelar nasal
本棚NC-1ホンダナホ'ンダナhoNdanahondanaalveolar nasal
本箱NC-1ホンバコホ'ンバコhoNbakohombakobilabial nasal
電話NC-0デンワデンワdeNwadeẽwa 
案外D-1アンガイア'ンガ&イaNgaiaŋŋaivoiced velar nasal
案外D-1アンガイア'ンガイaNgaiaŋɡaivoiced velar stop
NC-2タニタニtanitaɲivoiced palatal nasal
単位NC-1タンイタ'ンイtaNitaĩ 
NC-0イカリイカリikariikaɾi 
怒りNC-3イカリイカリikariikaɾi 
休みNC-3ヤスミヤスミjasumijasɯmi 
これからNC-4コレカラコレカラkorekarakoɾekaɾa 
無銭飲食NC-4ムセンインショクムセンインショク^museNiNsjokumɯseĩɕ̃oku̥[ku]の無声化
井口NPS1イグチイ'グ&チigutiiŋɯtɕi 
磯貝NPS2イソガイイソ'ガ&イisogaiisoŋai 
伊藤NPS0イトウイトウ~itooitoː 
稲垣NPS2イナガキイナ'ガ&キinagakiinaŋaki 
稲田NPS0イナダイナダinadainada 
井上NPS0イノウエイノウエinoueinoɯe"ノウ" is pronounced to [oɯ], not long vowel
栄次郎NPM1エイジロウエ'イ~ジロウ~eizirooeːʑiɾoːvoiced alveolopalatal fricative
岸和田NPPS0キシワダキ^シワダkisiwadaki̥ɕiwadadevoicing of [ki]
銀座NPP0ギンザギンザgiNzaɡindzavoiced alveolopalatal affricate
慎一NPM0シンイチシンイチsiNitiɕiĩtɕivowel nasalization of /N/
純子NPF1ジュンコジュ'ンコzjuNkodʑɯŋkoweakening of voiced alveolopalatal plosive(fricativized)
関口NPS2セキグチセキ'グ&チsekigutisekiŋɯtɕi 
大地NPM1ダイチダ'イチdaitidaitɕi̥devoicing of [tɕi]
東京NPP0トウキョウトウ~キョウ~tookjootoːkʲoːdevoiced palatal
福生NPP0フッサフッサhuQsaɸɯ̥s̚safirst [s] is unreleased, devoicing of [ɸɯ]
一本NC-1イッポンイッポンiQpoNip̚poɴfirst [p] is unreleased
丸井NPS0マルイマルイmaruimaɾɯi