Katakana Lexical Database

Covers about 50,000 entries

Various attributes such as part-of-speech codes

Japanese katakana words, especially foreign loanwords

Overview

CJKI’s Katakana Lexical Database (KAT) contains about 50,000 entries, mostly loanwords and various types of native Japanese words written in katakana, accompanied by hiragana readings and part-of-speech codes.

The number of katakana words in Japanese, especially technical terms, has increased dramatically in recent years. KAT is used to enhance the accuracy of NLP applications such as machine translation and information retrieval.

Katakana Lexical Database

Practical Applications

KAT is suited for NLP applications such as:

Information retrieval

Morphological analysis

Machine translation

Reference Documents

The Challenges of Intelligent Japanese Searching

Linguistic issues that need to be addressed by advanced information retrieval technologies

Related Resources

JOD

Japanese Orthographic Database

Orthographic variants for core Japanese vocabulary

JLD

Japanese Lexical Database

Monolingual general vocabulary for NLP applications

JETERM

Japanese-English Technical Terms

Japanese-English technical terms covering 20 domains