PATTS

Palestinian Arabic Text-to-Speech system
نظام نص-إلى-كلام باللهجة الفلسطينية
by The CJK Dictionary Institute

The CJKI Dictionary Institute is pleased to announce Palestinian Arabic Text-To-Speech (PATTS), the first application that accurately generates the pronunciation and prosodic features of Palestinian Arabic (PA).

Introduction

Arabic text-to-speech systems focus almost exclusively on Modern Standard Arabic (MSA) and do not offer support for regional dialects. Although some claim to have created Palestinian Arabic TTS solutions, they fail to accurately capture the distinctive speech sounds of Palestinian Arabic. In practice, they generate speech resembling MSA, whereas in fact PA possesses phonemic features that are significantly different from MSA.

The Challenges of PA Phonology

PA differs from MSA in significant ways and poses various challenges. Merely using an MSA TTS system does not work. Below are some of the major issues.

PA has six short vowel phonemes (mostly phonemic) and two non-phonemic epenthetic or helping vowels, as opposed to three in MSA. There are also neutralized vowels and consonants, as well as prosodic features. The Arabic script is totally inadequate for representing PA phonemes, not to speak of the actual speech sounds (phones). For example, the allophones /a/ vs. /ɑ/, as in أَبُ and أَحرَ, cannot be represented in the script.

Our institute developed a phonemo-phonetic transcription system called DARS, which precisely represents PA phonemes, as well as algorithms to convert from DARS to IPA, representing actual phonetic realizations and accurately representing prosodic features such as stress and vowel neutralization. The most challenging aspect was developing a converter from PA written in raw, unvocalized Arabic script to DARS. DARS is then converted to IPA, and finally the actual sounds are generated. Here are the conversion steps:

يابان يَابَان yaa_bAA/n → ya̱bɑ̄́n → jaˈbɑːn → ▶️ Audio

The table below displays some of the grapheme-phoneme-phone correspondences. For a detailed table, you can download a draft of an unpublished paper on PA phonology by Jack Halpern at https://www.cjk.org/wp-content/uploads/PA_phonology.pdf.


Short Vowels

Graphemic
(Arabic)
Phonemic DARS
(DARS+)
Phonetic
(IPA)
Example
بَ
fatha
a (a) 1. a~æ
2. a~ʌ
ʾabu
صَ
fatha
ɑ (A) ɑ ʾɑḥra
بِ
kasra
i (i) 1. ɪ~e
2. i
3. ɵ
1. ʾíḥna
2. bíddi
3. btíḍḥak
بِ
kasra
e (e) e~i wāled
بُ
damma
o (o) ʊ bēto
بُ
damma
u (u) 1. ʊ
2. u
ʾujrɑ

Other Vowels

Graphemic
(Arabic)
Phonemic DARS
(DARS+)
Phonetic
(IPA)
Example
- ᵉ (e^) ɘ~ĕ tends to [ĕ], as in lᵉktā́b
- ᵒ (o^) ɘ~ŏ tends to [ŏ], as in shugᵒl
- ʸ (y^) ʲ stressed possible?
بَا ā (aa) lᵉktā́b
صَا ɑ̄ (AA) 1. ɑː
2. ɑ
بِي ī (ii) 1. iː
2. i
1. jīb
2. jib-li
بُو ū (uu) 1. uː
2. u
shūf
بِي, بَيْ ē (ee)
se:~i:
wēn
بَوْ, بُو ō (oo) ʾo̱da
- a̱ (aa_) a yā_bā́ni
- ɑ̱ (AA_) ɑ <...>

Sample Conversions

Converting Arabic script graphemes into the phonetic IPA is an extremely difficult linguistic task. It is the phonemo-phonetic DARS transcription that enables the creation of accurate synthesized audio samples. The following are a few sentences. To listen to the samples, visit https://www.cjk.org/wp-content/uploads/patts.html or scan the QR code below.

Verb Paradigm Sample

Arabic (Graphemic)
DARS (Phonemic)
IPA (Phonetic)
كَتَبْتْ katáb-t kaˈtabt
كَتَبْتْ katáb-ᵉt kaˈtabɘt
كَتَبْ katáb-ti kaˈtabtɪ
كَتْبَتْ kátab ˈkatab
كَتْبَتْ kátb-at ˈkatbat
كَتَبْنَا katáb-na kaˈtabna
كَتَبْتُو katáb-tu kaˈtabtʊ
كَتَبُو kátab-u ˈkatabʊ

Full-Text Sample

Arabic (Graphemic)

مرحبا جميعاً. إسمي جاك هالبرن أنا خواجا بيدرس عامية من آخر معمر الله -من اليابان- إللي هي أرض الشمس المشرقة. صرلي عايش في اليابان أكثر من 40 سنه. قطعت كل هاي المسافة لأنه متحمس أتعلم العامية إللي هي لغتي رقم 12.

DARS (Phonemic)

márḥaba jamī́3an. ʾísmi Jack Halpern. ʾána khawā́ja búdros 3ammíyye min ʾā́kher m3ámmar ʾallɑ́h - - min il-ya̱bɑ̄́n. ʾílli híyye, ʾɑ́rḍ issh-sháms ilmúshriq̈a. ṣárli 3ā́yesh fil ya̱bɑ̄́n ʾáktar min ʾarb3ī́n sáne. qatá3t kull hay il-masā́fe laʾínno mitḥámmes ʾat3állam il-3ammíyye ʾílli híyye lúghati̱ rɑ́q̈ɑm ṭnɑ́3sh.

IPA (Phonetic)

mˈarħaba ʒamˈiːʕan. ʔˈɪsmɪ Jack Halpern. ʔˈana xawˈaːʒa bˈʊdrʊs ʕammˈɪjje mɪn ʔˈaːxer mʕˈammar ʔallˈɑh - - mɪn ɪl-jabˈɑːn. ʔˈɪllɪ hˈɪjje, ʔˈɑrdˤ ɪsʃ-ʃˈams ɪlmˈʊʃrɪqa. sˤˈarlɪ ʕˈaːjeʃ fɪl jabˈɑːn ʔˈaktar mɪn ʔarbʕˈiːn sˈane. ʔatˈaʕt kʊll haj ɪl-masˈaːfe laʔˈɪnnʊ mɪtħˈammes ʔatʕˈallam ɪl-ʕammˈɪjje ʔˈɪllɪ hˈɪjje lˈʊɣati rˈɑqɑm tˤnˈɑʕʃ.

Audio