| Overview and Coverage | The CJK Dictionary Institute (CJKI) is engaged in research on Chinese, Japanese and Korean (CJK) phonetics and phonology and the development of phonetic/phonological databases referred to as the Japanese Phonetic Database (JPD). The JPD provides IPA phonetic transcriptions that accurately indicate how Japanese names and words are pronounced in actual speech, as well as accent codes, for each entry. After years of development work, in December 2007 CJKI announced the completion of the first phase of the project by delivering JPD to The National Language Research Institute, the largest and most respected government organization in Japan that conducts scientific research on the Japanese language. To our knowledge, this is the first database of its kind which provides both IPA and accent codes covering Japanese vocabulary. Developed by a team of experienced Japanese editors, this database can be used for such applications as:
|
|---|---|
| Attributes |
|
| Editorial Policy |
The JDP was compiled to meet the needs of speech technology requirements, especially the generation of natural speech (TTS), by experienced editors with in-depth knowledge of Japanese phonology and phonetics. The phonetic transcriptions can be provided in various formats other than IPA, at different levels of precision, and fine-tuned to the needs of specific applications. To properly understand the data samples and field descriptions below, it is necessary to refer to The Challenges of Japanese Speech Technology, which describes the linguistic details. |
| 1 | HEADWORD | Orthographic representation of Japanese word (kana/kanji) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 2 | POS | Part-of-speech code (other grammatical/morphological attributes available) (see Japanese Lexical Database.) | ||||||||
| 3 | TYPE | A subclassification that identifies the semantic properties of the headword or gives supplementary information such as grammatical attributes, especially for proper nouns. See cpostype.htm for TYPE code definitions. | ||||||||
| 4 | ACCENT | This shows the pitch accent for the standard Tokyo Dialect, compiled by experienced native linguists. Since pitch accent for names is not standardized (it could vary with the individual), the concept of "100% accuracy" does not apply to names. Sometimes, multiple accents for the same name may be given. The number in the Accent field represents the accented mora – that is, the point after which the accent begins to fall. For example, "1" means that the first mora is accented or high pitched, and that the pitch falls starting from the second mora. Accentless words are represented by “0”. | ||||||||
| 5 | KANA READING | The reading of the headword in katakana. | ||||||||
| 6 | KANA PHONEMIC | This field disambiguates the kana ambiguities in the Kana Reading field, and serves as a phonemic representation. Though it does contain minimal phonetic information, such as /g/ nasalization and devoicing, it does not accurately indicate allophonic variants, which is the function of the Phonetic field.
| ||||||||
| 7 | PHONEMIC | This field fully disambiguates the kana ambiguities in the Kana Reading field, and thus serves as a precise phonemic representation. It does not indicate allophonic variants, which is the function of the Phonetic field. | ||||||||
| 8 | PHONETIC | The phonemic transcription in the Phonemic field is insufficient for accurate voice synthesis, whose goal is naturally-sounding Japanese. This field, the most important feature of this database, provides what can be roughly considered a broad phonetic transcription in IPA (International Phonetic Alphabet). In practice, it is considerably more accurate than traditional broad transcriptions, and accurately reflects allophonic variations fairly close to actual pronunciation (narrow transcriptions, not necessarily desirable, could be made available.) | ||||||||
| 9 | REMARKS | Comments on phonetic/phonological processes. | ||||||||
| 10 | RANK | Rank based on the occurrence of frequency in corpora and lexical
sources can be provided (not shown here). |
Below is a sample of our Japanese Phonetic Database (JPD). The most important features are the IPA transcription in the Phonetic field, which provides a precise representation of many allophonic changes such as devoicing and nasalization, and the Accent field, which gives the accent pattern code. Though allophonic variations occur mostly unconsciously in native speaker speech, they have a marked effect on the naturalness of synthesized speech. (The Remarks is now being translated to English.)
| Head word | POS | TYPE | Accent | Kana Reading | Kana Phonemic | Phonemic | Phonetic | Remarks |
|---|---|---|---|---|---|---|---|---|
| 鏡 | NC | - | 3 | カガミ | カガ&ミ | kagami | kaŋami | voiced velar nasal |
| 鏡 | NC | - | 3 | カガミ | カガミ | kagami | kaɡami | voiced velar stop |
| 鏡 | NC | - | 3 | カガミ | カガ&ミ | kagami | kaɣami | voiced velar fricative |
| 危ない | AJ | - | 0 | アブナイ | アブナイ | abunai | abɯnai | voiced bilabial plosive |
| 危ない | AJ | - | 0 | アブナイ | アブナイ | abunai | aβɯ̥nai | voiced bilabial fricative |
| 飾り | NC | - | 0 | カザリ | カザリ | kazari | kazaɾi | voiced alveolopalatal fricative |
| ザリガニ | NC | - | 0 | ザリガニ | ザリガ&ニ | zarigani | dzaɾiɡaɲi | alveolopalatal affricate, palatal nasal |
| 新聞 | NC | - | 0 | シンブン | シンブン | siNbuN | ɕimbɯɴ | |
| 自分 | NC | - | 0 | ジブン | ジブン | zibuN | dʑibɯɴ | weakening of voiced alveolopalatal plosive (fricativized) |
| 比較 | VN | - | 0 | ヒカク | ヒ^カク | hikaku | hi̥kakɯ | devoicing of [hi] |
| 比較 | VN | - | 0 | ヒカク | ヒカク | hikaku | hikakɯ | no devoiced vowel |
| 続く | V5 | - | 0 | ツヅク | ツ^ヅク | tuzuku | tsɯ̥zɯkɯ | devoicing of [tsɯ] |
| 続く | V5 | - | 0 | ツヅク | ツヅク | tuzuku | tsɯzɯkɯ | no devoiced vowel |
| 恥 | NC | - | 2 | ハジ | ハジ | hazi | haʑi | voiced alveolopalatal fricative |
| 恥 | NC | - | 2 | ハジ | ハジ | hazi | hadʑi | voiced alveolopalatal affricate |
| 蜂 | NC | - | 0 | ハチ | ハチ | hati | hatɕi | |
| 八 | NC | - | 2 | ハチ | ハチ | hati | hatɕi | |
| 鵜 | NC | - | 1 | ウ | ウ | u | ʔɯʔ | glottal stop |
| アッ | I | - | 1 | アッ | アッ | aQ | aʔ | glottal stop |
| 運動 | VN | - | 0 | ウンドウ | ウンドウ~ | uNdoo | ɯndoː | |
| 病院 | NC | - | 0 | ビョウイン | ビョウ~イン | bjooiN | bʲoːiɴ | voiced palatal |
| 美容院 | NC | - | 2 | ビヨウイン | ビヨ'ウ~イン | bijooiN | bijoːiɴ | |
| 宴会 | NC | - | 0 | エンカイ | エンカイ | eNkai | eŋkai | |
| 蒟蒻 | NC | - | 0 | コンニャク | コンニャク | koNnjaku | koɲɲakɯ | alveolar nasal, voiced palatal nasal |
| 本 | NC | - | 1 | ホン | ホ'ン | hoN | hoɴ | uvular nasalized consonant |
| 本 | NC | - | 1 | ホン | ホ'ン | hoN | hõ | close front nasalized vowel |
| 本案 | NC | - | 1 | ホンアン | ホ'ンアン | hoNaN | hoãɴ | close back nasalized vowel (slight nasalization before /N/) |
| 本意 | NC | - | 1 | ホンイ | ホ'ンイ | hoNi | hõĩ | close front nasalized vowel |
| 本会議 | NC | - | 3 | ホンカイギ | ホンカイギ | hoNkaigi | hoŋkaiŋi | velar nasal |
| 本棚 | NC | - | 1 | ホンダナ | ホ'ンダナ | hoNdana | hondana | alveolar nasal |
| 本箱 | NC | - | 1 | ホンバコ | ホ'ンバコ | hoNbako | hombako | bilabial nasal |
| 電話 | NC | - | 0 | デンワ | デンワ | deNwa | deẽwa | |
| 案外 | D | - | 1 | アンガイ | ア'ンガ&イ | aNgai | aŋŋai | voiced velar nasal |
| 案外 | D | - | 1 | アンガイ | ア'ンガイ | aNgai | aŋɡai | voiced velar stop |
| 谷 | NC | - | 2 | タニ | タニ | tani | taɲi | voiced palatal nasal |
| 単位 | NC | - | 1 | タンイ | タ'ンイ | taNi | taĩ | |
| 錨 | NC | - | 0 | イカリ | イカリ | ikari | ikaɾi | |
| 怒り | NC | - | 3 | イカリ | イカリ | ikari | ikaɾi | |
| 休み | NC | - | 3 | ヤスミ | ヤスミ | jasumi | jasɯmi | |
| これから | NC | - | 4 | コレカラ | コレカラ | korekara | koɾekaɾa | |
| 無銭飲食 | NC | - | 4 | ムセンインショク | ムセンインショク^ | museNiNsjoku | mɯseĩɕ̃oku̥ | [ku]の無声化 |
| 井口 | NP | S | 1 | イグチ | イ'グ&チ | iguti | iŋɯtɕi | |
| 磯貝 | NP | S | 2 | イソガイ | イソ'ガ&イ | isogai | isoŋai | |
| 伊藤 | NP | S | 0 | イトウ | イトウ~ | itoo | itoː | |
| 稲垣 | NP | S | 2 | イナガキ | イナ'ガ&キ | inagaki | inaŋaki | |
| 稲田 | NP | S | 0 | イナダ | イナダ | inada | inada | |
| 井上 | NP | S | 0 | イノウエ | イノウエ | inoue | inoɯe | "ノウ" is pronounced to [oɯ], not long vowel |
| 栄次郎 | NP | M | 1 | エイジロウ | エ'イ~ジロウ~ | eiziroo | eːʑiɾoː | voiced alveolopalatal fricative |
| 岸和田 | NP | PS | 0 | キシワダ | キ^シワダ | kisiwada | ki̥ɕiwada | devoicing of [ki] |
| 銀座 | NP | P | 0 | ギンザ | ギンザ | giNza | ɡindza | voiced alveolopalatal affricate |
| 慎一 | NP | M | 0 | シンイチ | シンイチ | siNiti | ɕiĩtɕi | vowel nasalization of /N/ |
| 純子 | NP | F | 1 | ジュンコ | ジュ'ンコ | zjuNko | dʑɯŋko | weakening of voiced alveolopalatal plosive(fricativized) |
| 関口 | NP | S | 2 | セキグチ | セキ'グ&チ | sekiguti | sekiŋɯtɕi | |
| 大地 | NP | M | 1 | ダイチ | ダ'イチ | daiti | daitɕi̥ | devoicing of [tɕi] |
| 東京 | NP | P | 0 | トウキョウ | トウ~キョウ~ | tookjoo | toːkʲoː | devoiced palatal |
| 福生 | NP | P | 0 | フッサ | フッサ | huQsa | ɸɯ̥s̚sa | first [s] is unreleased, devoicing of [ɸɯ] |
| 一本 | NC | - | 1 | イッポン | イッポン | iQpoN | ip̚poɴ | first [p] is unreleased |
| 丸井 | NP | S | 0 | マルイ | マルイ | marui | maɾɯi |
| |