Japanese Full-Form Lexicon

Simplifies morphological analysis

Instantly identifies inflected forms

Comprehensive coverage, especially verbs

Overview

CJKI provides aΒ Japanese Full-Form Lexicon (JFULEX) that covers roughly 120 million entries, including canonical forms, inflected forms, and compound words. This lexicon is being used byΒ  major IT companies like Amazon and Google to enhance their search technology.

The Japanese language is agglutinative; that is, it forms words by putting together basic elements called morphemes to form countless inflected forms, compound words, and affixed words. For example, the compound ι€ θˆΉζ‰€Β zōsenjoΒ β€˜shipyard’ consists of the free word ι€ θˆΉ β€˜shipbuilding’ (ι€  β€˜make; build’ + 船 β€˜ship’) followed by the suffix 所 β€˜place’.

Japanese also has many derived words, (morpheme + grammatical suffix) such as combining ι»’ kuro β€˜black’ with the suffix い i to form the adjective 黒い kuroiΒ β€˜black’. Derivation should not be confused with inflection, which consists of adding word endings to indicate grammatical functions such as tense. For example, the last syllable of the verb εΈ°γ‚‹Β kaeruΒ β€˜to return’ is inflected to yield εΈ°γ‚ŒΒ kaere, the imperative. Japanese verbs have thousands of inflected forms.

If proper nouns, technical terms and verb-following expressions (such as γͺγ‘γ‚Œγ°γͺらγͺい nakerebanaranai) are included, the the total can exceed 120 million.

tazuneruοΌˆγŸγšγ­γ‚‹οΌ‰γ€€POS=V1

TenseStemKanaKanjiInflectedRoman
PastたずねS + γΎγ—γŸ-γŸγšγ­γΎγ—γŸTAZUNEmashita
PastたずねS + て い γΎγ—γŸS + て ε±… γΎγ—γŸγŸγšγ­γ¦ γ„γΎγ—γŸTAZUNEte imashita
PastたずねS + て γŠγ‚Š γΎγ—γŸS + て ε±…γ‚Š γΎγ—γŸγŸγšγ­γ¦ γŠγ‚ŠγΎγ—γŸTAZUNEte orimashita
PastたずねS + γ‚„γ—γŸ-γŸγšγ­γ‚„γ—γŸTAZUNEyashita
PastたずねS + て い γ‚„γ—γŸS + て ε±… γ‚„γ—γŸγŸγšγ­γ¦ γ„γ‚„γ—γŸTAZUNEte iyashita
PastたずねS + て γŠγ‚Š γ‚„γ—γŸS + て ε±…γ‚Š γ‚„γ—γŸγŸγšγ­γ¦ γŠγ‚Šγ‚„γ—γŸTAZUNEte oriyashita
Past -tara IたずねS + γΎγ—γŸγ‚‰-γŸγšγ­γΎγ—γŸγ‚‰TAZUNEmashitara
Past -tara Iたずねお + S + して γŠγ‚Š γΎγ—γŸγ‚‰εΎ‘ + S + 為て ε±…γ‚Š γΎγ—γŸγ‚‰γŠγŸγšγ­γ—γ¦ γŠγ‚ŠγΎγ—γŸγ‚‰oTAZUNE shite orimashitara
Past -tara IたずねS + γ‚„γ—γŸγ‚‰-γŸγšγ­γ‚„γ—γŸγ‚‰TAZUNEyashitara
Past -tara Iたずねお + S + して γŠγ‚Š γ‚„γ—γŸγ‚‰εΎ‘ + S + 為て ε±…γ‚Š γ‚„γ—γŸγ‚‰γŠγŸγšγ­γ—γ¦ γŠγ‚Šγ‚„γ—γŸγ‚‰oTAZUNE shite oriyashitara
Past -tara IIたずねS + γΎγ—γŸγ‚‰γ°-γŸγšγ­γΎγ—γŸγ‚‰γ°TAZUNEmashitaraba
Past -tara IIたずねお + S + して γŠγ‚Š γΎγ—γŸγ‚‰γ°εΎ‘ + S + 為て ε±…γ‚Š γΎγ—γŸγ‚‰γ°γŠγŸγšγ­ して γŠγ‚ŠγΎγ—γŸγ‚‰γ°oTAZUNE shite orimashitaraba
Past -tara IIたずねS + γ‚„γ—γŸγ‚‰γ°-γŸγšγ­γ‚„γ—γŸγ‚‰γ°TAZUNEyashitaraba
Past -tara IIたずねお + S + して γŠγ‚Š γ‚„γ—γŸγ‚‰γ°εΎ‘ + S + 為て ε±…γ‚Š γ‚„γ—γŸγ‚‰γ°γŠγŸγšγ­ して γŠγ‚Šγ‚„γ—γŸγ‚‰γ°oTAZUNE shite oriyashitaraba
Past causativeたずねS + させ γΎγ—γŸ-γŸγšγ­γ•γ›γΎγ—γŸTAZUNEsasemashita
Past causativeたずねS + させ γ‚„γ—γŸ-γŸγšγ­γ•γ›γ‚„γ—γŸTAZUNEsaseyashita
Past causative honorificたずねS + させ γ‚‰γ‚Œ γΎγ—γŸ-γŸγšγ­γ•γ›γ‚‰γ‚ŒγΎγ—γŸTAZUNEsaseraremashita
Past causative honorificたずねS + させ γ‚‰γ‚Œ て い γΎγ—γŸS + させ γ‚‰γ‚Œ て ε±… γΎγ—γŸγŸγšγ­γ•γ›γ‚‰γ‚Œγ¦ γ„γΎγ—γŸTAZUNEsaserarete imashita
Past causative honorificたずねS + させ γ‚‰γ‚Œ γ‚„γ—γŸ-γŸγšγ­γ•γ›γ‚‰γ‚Œγ‚„γ—γŸTAZUNEsaserareyashita
Past causative honorificたずねS + させ γ‚‰γ‚Œ て い γ‚„γ—γŸS + させ γ‚‰γ‚Œ て ε±… γ‚„γ—γŸγŸγšγ­γ•γ›γ‚‰γ‚Œγ¦ γ„γ‚„γ—γŸTAZUNEsaserarete iyashita
Past causative passiveたずねS + させ γ‚‰γ‚Œ γΎγ—γŸ-γŸγšγ­γ•γ›γ‚‰γ‚ŒγΎγ—γŸTAZUNEsaseraremashita

Practical Applications

CJKI’sΒ full-form lexiconsΒ can bring the following benefits to various NLP applications:

Machine translation

Greatly enhanced translation quality

Named-entity recognition (NER)

Dramatically improved

Morphological analysis

Significantly simplified algorithms

Information retrieval applications

Support for query processing

Pedagogical applications

Automatic conjugation systems

Part-of-speech (POS) analysis and tagging

Automatic conjugation systems

JFULEX Related Resources

ArabLEX

Arabic Full-Form Lexicon

Includes all inflected, declined, and conjugated forms

SFULEX

Spanish Full-Form Lexicon

Includes all inflected, declined, and conjugated forms

JWL

Comprehensive Japanese Wordlist

General vocabulary, proper nouns and technical terms