Parallel Annotated Synthetic Corpora
Perfect multilingual alignment
Accurate and idiomatic translations
Rich set of annotation tags
Overview
The goal of the Parallel Annotated Synthetic Corpora (PASC) project is to create large-scale synthetic corpora for various natural language processing (NLP) applications, including machine translation (MT) for NMT and generative AI. Synthetic data mimics natural language, used especially for training machine learning models when actual data is scarce or expensive to obtain. Artificial corpora show great promise in improving machine translation quality.
The PASC project aims to create synthetic corpora using supervised generation techniques. Unlike augmented corpora, which expand existing corpora, PASC constructs synthetic corpora from scratch using predefined sentence templates, ensuring strict adherence to linguistic rules. This meticulous approach yields precise translations, accurate alignment, grammatical annotation, accurate phonemic transcriptions, and more.
PASC includes very large-scale databases consisting of tens to hundreds of millions of entries for each domain. Currently it focuses on named entities, especially personal names, place names and points of interest for CJK languages and Arabic, to be followed by technical terms. Its distinctive features include full alignment, translation accuracy, accurate transcriptions, multilingual formats, full annotation, and consistency.
Parallel Annotated Synthetic Corpora
* Select different languages by clicking on the tabs below.
ID | ENGLISH | JAPANESE |
---|---|---|
0002-01 | My full name is [Michael Owen]. | ็งใฎๅงๅใฏ[ใชใผใฆใงใณใปใใคใฑใซ]ใงใใ |
0002-02 | [Michael] is my given name and [Owen] is my surname. | [ใใคใฑใซ]ใฏ็งใฎๅๅใงใ[ใชใผใฆใงใณ]ใฏ็งใฎ่ๅญใงใใ |
0002-03 | I’m called [Michael Owen]. | [ใชใผใฆใงใณใปใใคใฑใซ]ใจ่จใใพใใ |
0002-04 | Both [Michael] and [Owen] are personal names. | [ใชใผใฆใงใณ]ใจ[ใใคใฑใซ]ใฏไธกๆนใจใไบบๅใงใใ |
0002-05 | [Michael Owen] is my full name. | [ใชใผใฆใงใณใปใใคใฑใซ]ใจใฏ็งใฎใใซใใผใ ใงใใ |
0002-06 | [Michael Owen] is what’s written on my ID. | ๆ ๅธใซ่จ่ผใใใฆใใๅงๅใฏ[ใชใผใฆใงใณใปใใคใฑใซ]ใงใใ |
0002-07 | I’ve never heard of anyone called [Michael Owen]. | [ใชใผใฆใงใณใปใใคใฑใซ]ใจ่จใไบบใฎใใจใ่ใใใใจใใชใใ |
0002-08 | I go by the name [Michael Owen]. | [ใชใผใฆใงใณใปใใคใฑใซ]ใจ่จใๅๅใงๅผใฐใใฆใใพใใ |
0002-09 | Do you know of anyone who goes by the name of [Michael Owen]? | [ใชใผใฆใงใณใปใใคใฑใซ]ใจใใไบบใ็ฅใฃใฆใใพใใใ |
ID | JAPANESE | ENGLISH |
---|---|---|
0030-01 | ็งใฎๅงๅใฏ[ๆฃฎ้ๅคง]ใงใใ | My full name is [Takahiro Mori]. |
0030-02 | [้ๅคง]ใฏ็งใฎๅๅใงใ[ๆฃฎ]ใฏ็งใฎ่ๅญใงใใ | [Takahiro] is my given name and [Mori] is my surname. |
0030-03 | [ๆฃฎ้ๅคง]ใจ่จใใพใใ | I’m called [Takahiro Mori]. |
0030-04 | [ๆฃฎ]ใจ[้ๅคง]ใฏไธกๆนใจใไบบๅใงใใ | Both [Takahiro] and [Mori] are personal names. |
0030-05 | [ๆฃฎ้ๅคง]ใจใฏ็งใฎใใซใใผใ ใงใใ | [Takahiro Mori] is my full name. |
0030-06 | ๆ ๅธใซ่จ่ผใใใฆใใๅงๅใฏ[ๆฃฎ้ๅคง]ใงใใ | [Takahiro Mori] is what’s written on my ID. |
0030-07 | [ๆฃฎ้ๅคง]ใจ่จใไบบใฎใใจใ่ใใใใจใใชใใ | I’ve never heard of anyone called [Takahiro Mori]. |
0030-08 | [ๆฃฎ้ๅคง]ใจ่จใๅๅใงๅผใฐใใฆใใพใใ | I go by the name [Takahiro Mori]. |
0030-09 | [ๆฃฎ้ๅคง]ใจใใไบบใ็ฅใฃใฆใใพใใใ | Do you know of anyone who goes by the name of [Takahiro Mori]? |
ID | CHINESE | ENGLISH |
---|---|---|
0040-01 | ๆ็ๅงๅๆฏ[ๅผ ๅฐไธ]ใ | My full name is [Xiaodong Zhang]. |
0040-02 | [ๅฐไธ]ๆฏๆ็ๅๅญ๏ผ[ๅผ ]ๆฏๆ็ๅงใ | [Xiaodong] is my given name and [Zhang] is my surname. |
0040-03 | ๆๅซ[ๅผ ๅฐไธ]ใ | I’m called [Xiaodong Zhang]. |
0040-04 | [ๅฐไธ]ๅ[ๅผ ]้ฝๆฏไบบๅใ | Both [Xiaodong] and [Zhang] are personal names. |
0040-05 | [ๅผ ๅฐไธ]ๆฏๆ็ๅงๅใ | [Xiaodong Zhang] is my full name. |
0040-06 | ๆ็่บซไปฝ่ฏไธ็ๅงๅๆฏ[ๅผ ๅฐไธ]ใ | [Xiaodong Zhang] is what’s written on my ID. |
0040-07 | ๆไปๆชๅฌ่ฟๅซ[ๅผ ๅฐไธ]็ไบบใ | I’ve never heard of anyone called [Xiaodong Zhang]. |
0040-08 | ๆๅซ[ๅผ ๅฐไธ]ใ | I go by the name [Xiaodong Zhang]. |
0040-09 | ไฝ ็ฅ้ๅซ[ๅผ ๅฐไธ]็ไบบๅ๏ผ | Do you know of anyone who goes by the name of [Xiaodong Zhang]? |
ID | KOREAN | ENGLISH |
---|---|---|
0050-01 | ์ ์ ์ฑ๋ช ์ [๊น์ง์]์ ๋๋ค. | My full name is [Jiyeong Gim]. |
0050-02 | [์ง์]์ ์ ์ ์ด๋ฆ์ด๊ณ , [๊น]์ ์ ์ ์ฑ์ ๋๋ค. | [Jiyeong] is my given name and [Gim] is my surname. |
0050-03 | ์ ๋ [๊น์ง์]์ด๋ผ๊ณ ํฉ๋๋ค. | I’m called [Jiyeong Gim]. |
0050-04 | [์ง์]๊ณผ [๊น]์ ๋ชจ๋ ๋ค ์ธ๋ช ์ ๋๋ค. | Both [Jiyeong] and [Gim] are personal names. |
0050-05 | [๊น์ง์]์ ์ ์ ์ฑ๋ช ์ ๋๋ค. | [Jiyeong Gim] is my full name. |
0050-06 | ์ ์ ์ ๋ถ์ฆ์ ์ด๋ฆ์ [๊น์ง์]์ ๋๋ค. | [Jiyeong Gim] is what’s written on my ID. |
0050-07 | [๊น์ง์]์ด๋ผ๋ ์ด๋ฆ์ ๋ค์ด๋ณธ ์ ์ด ์์ต๋๋ค. | I’ve never heard of anyone called [Jiyeong Gim]. |
0050-08 | ์ ๋ [๊น์ง์]์ด๋ผ๊ณ ํฉ๋๋ค. | I go by the name [Jiyeong Gim]. |
0050-09 | [๊น์ง์]์ด๋ผ๋ ๋ถ์ ์์๋์? | Do you know of anyone who goes by the name of [Jiyeong Gim]? |
ID | ARABIC | ENGLISH |
---|---|---|
0060-01 | โุงุณู ู ุงููุงู ู ูู [ู ุญู ุฏ ุงูุนุจุฏู] | My full name is [Mohammed Al-Abadi]. |
0060-02 | โ[ู ุญู ุฏ] ูู ุงุณู ู ุงูุงููุ ู [ุงูุนุจุฏู] ูู ุงุณู ู ุงูุนุงุฆูู | [Mohammed] is my first name and [Al-Abadi] is my family name. |
0060-03 | ุฃโูุง ุฃุฏุนู [ู ุญู ุฏ ุงูุนุจุฏู] | I’m called [Mohammed Al-Abadi]. |
0060-04 | โ[ู ุญู ุฏ] ู[ุงูุนุจุฏู] ููุงูู ุง ุฃุณู ุงุก ุดุฎุตูุฉ | Both [Mohammed] and [Al-Abadi] are personal names. |
0060-05 | โ[ู ุญู ุฏ ุงูุนุจุฏู] ูู ุงุณู ู ุงููุงู ู | [Mohammed Al-Abadi] is my full name. |
0060-06 | ุงโูุฅุณู ุงูู ุฏุฑุฌ ูู ุจุทุงูุฉ ูููุชู ูู [ู ุญู ุฏ ุงูุนุจุฏู] | The name listed on my ID card is [Mohammed Al-Abadi]. |
0060-07 | โูู ุฃุณู ุน ุนู ุฃุญุฏ ูุฏุนู [ู ุญู ุฏ ุงูุนุจุฏู] | I haven’t heard of anyone called [Mohammed Al-Abadi]. |
0060-08 | โุฃูุง ุฃููุจ ุจ [ู ุญู ุฏ ุงูุนุจุฏู] | I go by the name [Mohammed Al-Abadi]. |
0060-09 | โูู ุชุนุฑู ุดุฎุตุง ูููุจ ุจู[ู ุญู ุฏ ุงูุนุจุฏู]ุ | Do you know of anyone who goes by the name [Mohammed Al-Abadi]? |
Practical Applications
PASC can enhance the quality of language models and NLP algorithms for various applications, such as:
Neural Machine Translation
Automatic Speech Recognition
Text-to-Speech
Reference Documents
Related Resources

Chinese Personal Name Variants
Over 7 million Chinese and non-Chinese names and romanized variants

Database of Arabic Names
6.5 million Arabic personal names and their romanized variants

Japanese Personal Name Variants
Japanese personal names and their romanized variants