DeepLEX
DeepLEX: Lexical Resources for Deep Learning
The CJK Dictionary Institute is engaged in the active development of very large-scale lexical resources, referred to as DeepLEX Resources, to support Deep Learning technologies in such diverse technologies as named entity recognition (NER), cybersecurity, neural machine translation (NMT), and speech technology.
Selected Resources
Our DeepLEX Resources include tens of millions of CJK named entities specifically designed to support NLP applications such as NER and speech technology. They are being used by the world’s largest IT companies in NLP and AI applications including speech technology, machine translation, and AI applications such as natural language generation. (Please click through the resources listed below for more details on each database.)

Chinese Personal Name Variants
7.6 million Chinese personal names and their romanized variants

Japanese Orthographic Database
Orthographic variants for core Japanese vocabulary, covering 126,000 entries

Japanese Personal Name Variants
3.5 million Japanese personal names and their romanized variants

Japanese-Multilingual Place Names and POIs
3.1 million, multilingual database of Japanese and Western place names

Arabic Full-Form Lexicon
530 million entries, including all inflected, declined, and conjugated forms

Database of Arabic Names
6.5 million Arabic personal names and their romanized variants
Use Cases
Our DeepLEX Resources can benefit the development of Deep Learning systems and technologies platforms in the following ways: