Chinese Resources

The complexity of Chinese poses special challenges to developers of natural language processing (NLP) applications, such as in the areas of word segmentation, information retrieval, speech technology, named-entity extraction, and machine translation. These challenges are exacerbated by the lack of truly comprehensive lexical resources, especially for proper nouns.

CJKI’s comprehensive Chinese lexical resources currently include millions of entries, covering general vocabulary, technical terminology, proper nouns, and other categories in both Simplified Chinese (SC) and Traditional Chinese (TC).

These resources are used in such applications as machine translation, named-entity recognition, and speech technology, and include a rich set of grammatical, phonological, and semantic attributes, such as pinyin and Zhuyin readings, part-of-speech codes, millions of personal names and their variants, frequency of occurrence statistics, and more.

