Jack Halpern


   Data Licensing

Database of Foreign Names in Arabic

Because of the important role personal names play in such natural language applications as named entity extraction and machine translation, The CJK Dictionary Institute is continuously expanding and revising its proper noun resources, which provide systematic coverage of Arabic orthographic variants and common orthographic errors.

Our institute, in an international collaboration effort including Arabic name specialists, has developed new techniques for the collection, validation and attestation of non-Arab names written in Arabic, and are now in the process of building a comprehensive Database of Foreign Names in Arabic, referred to as DAFNA.

The sample below shows orthographic variants and spelling errors of a common American given name (John), and a common American surname (Davis). The original American name data was obtained from the U.S. Census Bureau. A larger sample is also available.

Data sample

Arabic Variants of John (Male First Name)
and Davis (Surname)
ENGLISH ARABIC WEB FREQ (English+Arabic) WEB FREQ (Arabic only)
John جوون 0036500 0044500
John جون 0032700 0947000
John جان 0031300 2160000
John جوهان 0000224 0007090
John جوهن 0000173 0001180
John دجون 0000029 0001680
John جهون 0000009 0000328
Davis ديفيس 0000613 0012300
Davis دافيس 0000249 0001680
Davis ديفز 0000228 0002300
Davis ديفس 0000157 0002020
Davis دايفس 0000040 0000652
Davis دفيس 0000034 0000490
Davis دفيز 0000005 0000098