Japanese Information Processing
Current State of JMWEL: a Comprehensive Japanese MWE Lexicon and its Applications
A paper co-authored by Masahito Takahashi, Toshifumi Tanabe, Kosho Shudo, and Jack Halpern on JMWEL, a comprehensive lexicon of Japanese Multiword Expressions (MWEs) with a rich set of grammatical attributes fine-tuned for phrase-based NLP applications such as machine translation and information retrieval. Presented at the EUROPHRAS 2019: Computational and Corpus-based Phraseology in Malaga, Spain in September, 2019.
Very Large-scale Lexical Resources to Enhance Chinese and Japanese Machine Translation
This paper presented at the TAUS Executive Forum Tokyo 2017 looks at the linguistic issues related to orthographic variation, showing how Very Large-scale Lexical Resources (VLSLR) can significantly enhance the accuracy of NLP tools, with focus on machine translation (MT),named entity recognition (NER) and named entity translation (NET). See also the slide show.
Some Linguistic Issues in the Machine Transliteration of Chinese, Japanese, and Arabic Names
This keynote address given at the 6th NEWS Named Entities Workshop in Berlin in August, 2016 focuses on the special characteristics of Chinese, Japanese, and Arabic scripts that impact machine translation, and the role played by lexical resources such as personal name dictionaries and how these resources can be used to enhance the accuracy of name transliteration systems. See also the slide show.
Pedagogical Lexicography Applied to Chinese and Japanese Learner’s Dictionaries
Introduces The CJKI Chinese Learner’s Dictionary, designed to satisfy the needs of learners and to overcome the shortcomings of existing Chinese dictionaries. Presented at ASIALEX 2011. See also the slide show.
The Role of Lexical Resources in CJK Natural Language Processing
A linguistic description of the principal challenges to be overcome by developers of CJK NLP application, this paper was presented at workshops of COLING/ACL 2006 in Sydney as well as other conferences.
The Role of Phonetics and Phonetic Databases in Japanese Speech Technology
Presented at the 11th Oriental COCOSDA Workshop held in Kyoto in 2008, this paper summarizes the complex allophonic variations that need to be considered in developing Japanese speech technology applications, and introduces the 130,000-entry Japanese Phonetic Database (JPD) developed by CJKI.
The Challenges of Japanese Speech Technology
A linguistic description of the principal challenges to be overcome by developers of Japanese speech technology and the role of phonological databases.
Lexicon-based Orthographic Disambiguation in CJK Intelligent Information Retrieval
Presented at COLING 2002 (Taipei), this paper analyzes the linguistic issues of CJK orthographic variation, including Japanese, and discusses why lexical databases should play a central role in NLP.
The Challenges of Intelligent Japanese Searching
This paper analyzes in detail the linguistic issues related to orthographic variation in Japanese, and discusses advanced information retrieval technologies such as cross-script and cross-orthographic searching for use in intelligent IR.
Orthographic Variation in Japanese
The highly irregular orthography and morphological complexity of Japanese pose formidable challenges to software developers. This report focuses on orthographic variation and analyzes the linguistic issues in developing Japanese linguistic tools.
The Complexities of Japanese Homophones
Explains the subtle distinctions between the numerous homophones in Japanese, and shows why homophone processing deserves special attention in Japanese information retrieval.
Cross-Synonym and Cross-Language Searching in Japanese
Describes the linguistic issues to be addressed by advanced Japanese information retrieval technologies, focusing on cross-language and cross-synonym searching.
Morphological Attributes in Japanese
Describes the derivational affixes and binding valency in our Japanese lexical database, particularly useful for disambiguating Japanese lexemes in such applications as search engine query processing.