Jack Halpern


   Data Licensing

Database of Arabic Place Names


  • Covers the entire world
  • Carefully proofread


Though Arabic has become a world language of critical importance, lexical resources, especially for proper nouns, are either scarce or exist only on a small scale. The CJK Dictionary Institute is engaged in the development and continuous expansion of comprehensive lexical databases for CJK languages and Arabic. This document describes our Database of Arabic Place Names.

Carefully proofread and validated

It is important to note that although there are a handful of machine translation packages and data providers that offer Arabic place names, their coverage is poor, the data contains many machine-generated errors, and they do not cover variants. Our project may well be the first attempt to build a comprehensive database of Arabic place names that covers the entire world, is accurate, validated, and based on state-of-the art techniques in computational lexicography.

Our Arabic place names are carefully proofread to ensure strict adherence to the complex rules of hamza orthography, something which is often ignored outside of publications of the highest editorial standards. The result of this strict editorial policy is that we can provide not only the linguistically correct standard MSA version, but also all common non-standard and incorrect versions as well, carefully flagged to distinguish between them.

Data sample

covers both the Arab and non-Arab world, including variants
Arabic English Variant Error
أبو ظبي Abu Dhabi ابو ظبي أبو ظبى, ابو ظبى
الإسكندرية Alexandria الاسكندرية الإسكندريه
الجزائر Algiers   الجزاير
برازيليا Brasilia برازيلية برازيليه
القاهرة Cairo   القاهره
الشرق الاقصى Far East   الشرق الاقصي
ألمانيا Germany المانيا  
الجيزة Giza   الجيزه
حيفا Haifa   حيفة
جدة Jeddah جدّة جده
القدس Jerusalem    
المنامة Manama   المنامه
مكة Mecca مكه
نابلس Nablus    
نانجينغ Nanjing    
بالو ألتو Palo Alto بالو التو, بالو آلتو  
الرياض Riyadh الرّياض  
Variants of Alexandria
Arabic Frequency
الاسكندرية 0002930000
الإسكندرية 0000690000
الاسكندريه 0000089200
الإسكندريّة 0000000954
الإسكندريه 0000000897
الاسكندريّة 0000000245
الاسكندريا 0000000080
الإسْكَنْدَريَّة 0000000024
الاسكندريّه 0000000012
الإسكندريا 0000000007
A larger sample can be found here.