Katakana Lexical Database
Covers about 50,000 entries
Various attributes such as part-of-speech codes
Japanese katakana words, especially foreign loanwords
Overview
CJKI’s Katakana Lexical Database (KAT) contains about 50,000 entries, mostly loanwords and various types of native Japanese words written in katakana, accompanied by hiragana readings and part-of-speech codes.
The number of katakana words in Japanese, especially technical terms, has increased dramatically in recent years. KAT is used to enhance the accuracy of NLP applications such as machine translation and information retrieval.
Katakana Lexical Database
Japanese | English |
---|---|
アッサンブラージュ | assemblage |
アットバット | at bat |
アットホーム | at home |
アットマーク | at mark |
アットランダム | at random |
アッパーカット | uppercut |
アッパースイング | uppercut |
アップ | raising |
アップ・ダウン | ups and downs |
アップクォーク | up quark |
Practical Applications
KAT is suited for NLP applications such as:
Information retrieval
Morphological analysis
Machine translation
Reference Documents
The Challenges of Intelligent Japanese Searching
Linguistic issues that need to be addressed by advanced information retrieval technologies
Cross-Synonym and Cross-Language Searching in Japanese
Academic paper
Related Resources

Japanese Orthographic Database
Orthographic variants for core Japanese vocabulary

Japanese Lexical Database
Monolingual general vocabulary for NLP applications

Japanese-English Technical Terms
Japanese-English technical terms covering 20 domains