Language review for Yiddish for my NLP class.
There are about 1.5 million speakers of Yiddish around the world. (via Ethnologue)
It is mainly used by Ashkenazi Jewish - i.e. German-Jewish - people, who are now mainly live in East Europe, Israel, the United States and Russia.
Yiddish is of Indo-European language family, with a High German-based vernacular fused with elements taken from Hebrew, Aramaic, and Slavic languages.
Yiddish is written in the Hebrew alphabet (vocalized version). However, people often write Yiddish in Latin characters in emails as a notable exception.
Yiddish, like other Germanic languages, generally follows the V2 word order: the second constituent of any clause is a finite verb. However, verb-initial word order may be used to indicate a causal relationship between consecutive sentences.
There are three genders of nouns: masculine, feminine, and neuter.
The most common plural suffixes are
-s for nouns ending in a vowel and
-n for nouns ending in a consonant. Hebrew nouns usually retain their original plural forms, while Germanic or Semitic nouns also changes in stem vowels (e.g.
There are three nominal cases, nominative, accusative and dative.
The most common possessive marker is
-s that is attached to the dative form of a noun phrase.
Verb stems are directly inflected for person and number only in the present tense.
There is a distinction between familiar singular and formal singular or unmarked plural ‘you’.
Yiddish English Yiddish English ikh zing “I sing” mir zingt “we sing” du zingst “you sing” (sing./familiar) ir zingt “you sing” (pl./polite) er, zi, es zingt “he, she it sings” zey zingen “they sing”
Past tense is formed by the auxiliary “to have” plus the past participle.
ikh hob gezungen
“I sang”, “I have sung”
One interesting phenomenon is that due to historical reason, Yiddish becomes a “fusion language”, baring different features from multiple languages.
Der zeyde hot gebentsht khanike likht.
“The grandfather blessed the Chanukkah candles”
In this example, the basic grammar is Germanic, as are the function words der and hot, the past tense markers
-t, and the word
Zeyde is Slavic,
khanike is Semitic, and
bentsh is from the Romance component. Sentences like this are very common in Yiddish.
Difficulties for NLP in Yiddish
- Yiddish written in Hebrew alphabet is difficult for OCR software, for the “combining points” are hard to tell from characters.
- Lack of lingual data.
I chose this language because I happened to have watched Fiddler on the Roof in Yiddish at New York several days before my presentation.