Jack Halpern


   Data Licensing

Database of Arab Names in Arabic


  • Every Arabic name is normalized and vocalized.


The complexity of the Arabic script gives rise to a variety of Arabic spelling variants and spelling errors, which can lead to a variety of problems in Arabic information processing. To deal with this issue, the CJKI has created the Database of Arab Names in Arabic (DANA), a one-of-a-kind resource which covers several hundred thousand Arabic script variants and common spelling mistakes, as illustrated in the data sample below.

A key feature of DANA is that every Arabic name is normalized and vocalized to produce a database of error-free, fully sanitized Arabic canonical forms. The vocalization is performed by a team of editors with the aid of tools and interfaces designed to achieve maximum efficiency. The canonical forms are used both as a basis for creating accurate romanized variants for our Database of Arab Names (DAN) -- which contains over 6.5 million romanized variants of Arab names -- as well as Arabic orthographic variants for DANA.

Arabic names are spelled with or without a hamza over the alif, sometimes a shadda appears and sometime not, sometimes a madda is not written over the alif, and the like. Other than variants, there are also common errors such as yaa' being replaced by alif maqsuura and taa' marbuuta being replaced by haa'.

You can see the breadth of coverage in DANA by trying out our ANTE demo.

Below are examples of Arabic variants for two male surnames. A larger sample is also available.

Data samples

Variants of عبد الله 'Abdallah
(Male Surname)
Arabic Variant Frequency
عبدالله 77248500
عبد الله 35427490
عبدلله 00536060
عبد اللّه 00000239
عبداللّه 00000123
عبداللاه 00000115
عبد اللاه 00000109
عبد لله 00000081
عبدألله 00000033
عبدللّه 00000030
عبد ألله 00000010
Variants of أبو علي Abu-'Ali
(Male Surname)
Arabic Variant Frequency
أبو علي 02210880
ابو علي 00000985
أبوعلي 00000963
ابوعلي 00000495
ابو على 00000408
ابوعلى 00000379
أبو على 00000164
أبو علىّ 00000035
أبوعليّ 00000030
أبوعلى 00000017