Chinese Morphological Database

Chinese Morphological Database

Covers 3 millions entries

Simplified and Traditional Chinese

Overview

CJKI’s Chinese Morphological Database (CMD) is a comprehensive lexical database of Chinese derivative affixes with adjacency attributes. With about three million Simplified and Traditional Chinese entries, it covers a broad spectrum of fields including proper nouns, technical terminology, and general vocabulary.

A derivational affix (DA) is a bound morpheme (though some also function as free forms) prefixed or suffixed to a base to create new words. In traditional morphology, DAs do not have lexical meanings of their own, and only add grammatical meanings. Here, we include “lexical affixes”, which are compound-forming word elements that have a substantial lexical meaning of their own.

An adjacency attribute is a part of speech (POS) code that indicates the morphological restrictions that apply to adjacent words or DAs when these are actually used in the formation of compound words or affixed lexemes. Adjacency attributes help programs identify DAs with greater reliability, especially in systems that fully support POS-tagging.

CMD Sample
[table “cmd01” not found /]

Practical Applications

As they significantly contribute to the accuracy of algorithmically identifying countless lexemes not registered in the lexicon, derivational affixes are very useful in:

Input method editors (IME)

Natural language processing applications (NLP)

Information retrieval