NLP Lexicons

Japanese NLP Lexicons

The complexity of Japanese poses special challenges to developers of natural language processing (NLP) applications, such as in the areas of word segmentation, information retrieval, speech technology, named-entity extraction, and machine translation. These challenges are exacerbated by the lack of truly comprehensive lexical resources, especially for proper nouns.

Below are links to the individual product pages of CJKI’s resources for Japanese NLP lexicons. These pages provide descriptions of each resource, explain how the resources are used, and include data samples.

Resources

Japanese Lexical Database

Monolingual general vocabulary for NLP applications

Japanese Orthographic Database

Orthographic variants for core Japanese vocabulary

Japanese Phonetic Database

IPA phonetic and phonemic transcriptions for core Japanese vocabulary

Japanese Full-Form Lexicon

Includes all inflected, declined and conjugated forms