The lexicon of Romani consists of several layers that can be sub-grouped into a pre-European and a European part. The so-called Indian "words of origin" and the either "earlier" or "later" loans from Persian, Armenian, and Byzantine Greek make up the pre-European lexicon. These "inherited words" (Boretzky 1992) comprise about 700 roots from Indian, likely no more than 100 roots from Persian and other Iranian languages, at least 20 from Armenian and up to 250 from Greek. This total of more than 1000 lexemes, however, is not present in its entirety in any single variety. "Recent" loans adopted at a later point in time stem from a range of different European contact languages. Among these, loans from Southern Slavic form the last general layer of the Romani varieties spoken in Europe today.

kham < inc. gharma 'sun'
veš < ira. veša 'wood'
khoni < arm. khoni 'suet'
drom < grc. drómos 'road'
praxo < sla. prax 'dust'
lumja < ron. lume 'world'
kolopa < hun. kalap 'hat'
berga < deu. Berg 'mountain'

As already mentioned, the notion of a common lexicon in this table is valid as far as the Slavic praxo, with all further lexemes being variety-specific: lumja, a loan from Romanian, pertains to the lexical inventory of Kalderaš Romani; kolopa, stemming from Hungarian, is used in Lovara Romani, and berga, which is a loan from German, is used in Sinti Romani.

The Lexicon as a Migration Route

The pre-European loan strata of Romani have made possible a reconstruction of the migratory route followed by Romani speakers. After their emigration from the northwest of the Indian subcontinent, the first influential language contact took place in what was at the time Sassanid Persia. As a consequence, there are elements in the Romani lexicon that can be traced back to Middle Persian Pahlevi. It is impossible to define the amount of time this first contact lasted. In fact, it is unclear whether Romani speakers actually settled in the region for a period of time or whether they were engaged in a slow process of transition. Since Romani does not have any Arabic loans at all, it can be assumed that Romani speakers must have left the Persian region before the hybridisation of the Iranian and Arabic cultures took place. Most likely, they moved on via Armenia into the Byzantine sphere of influence, where they eventually stayed for an extended period. This assumption is supported by loans from Armenian and a strong influence of Byzantine Greek, going well beyond mere lexical loans, on the other. This heavy influence on Romani is also reflected in the cardinal numbers listed below, which include several Greek loans alongside Indic words:

jekh < inc. ekka- 'one' oxto < grc. όχτώ 'eight'
duj < inc. d(u)vā 'two' enja < grc. έννιά 'nine'
trin < inc. trīṇi 'three' deš < inc. daśa- 'ten'
štar < inc. catvāra 'four' biš < inc. viṁśati 'twenty'
pandž < inc. pañca- 'five' tr(ij)anda < grc. τριάντα 'thirty'
šov < inc. śaś/śat 'six' saranda < grc. σαράντα 'forty'
efta < grc. έφτά 'seven' šel < inc. śata 'hundred'

The fact that there are no Turkish loans found among Romani speakers who immigrated into Europe via the Balkans indicates that they must have emigrated from Asia Minor before the region became Turkish, that is, before the hybridisation of the Arabic-Iranian-Islamic and Byzantine-Greek cultures under Ottoman political dominance. The Roma living in Europe today did not take part in this process. The varieties spoken by the Roma who remained in the Balkans and who were later influenced either directly or indirectly by Ottoman-Islamic culture, of course also have Ottoman-Turkish loans. However, these loans should be seen as belonging to the European part of the lexicon along with all other loans from Slavic languages onwards, which in numbers dominate in all Romani varieties.

The layers of European loans in Romani varieties provide evidence about the subsequent migrations of individual groups in Europe. For example, lexemes of German origin in the Finnish Kale variety are a sign of contact with German and probably also show that the ancestors of the Kale lived in German-speaking regions for a period. Romanian elements in many Romani varieties (which are now distributed around the world), such as Kalderaš, Gurbet and Čurara Romani, reflect the common history of these groups in serfdom and slavery in the Moldavian and Vlach regions; hence the collective term "Vlax Romani" for these varieties. In the case of the Kalderaš group, elements derived from Russian in the varieties now spoken in Sweden, France, the Americas and other regions suggest that their migration route passed through Russia.

Basic Vocabulary

The large majority of the lexicon of individual Romani varieties is made up of European loans. Also, each word of the respective contact language is also a potential Romani lexeme that can be integrated if need be. Despite this, the majority of the so-called basic vocabulary (words for existentially important entities, states, and processes) of each individual Romani variety consists of words of Indian origin, so that Romani can still be characterised as a New Indo-Aryan language on lexical grounds.1 For instance, for 178 out of the 207 elements contained by the Swadesh lists2 the corresponding Romani terms can be traced back to Indian. Another sixteen lexemes are "early" loans with pre-European roots, while only thirteen derive from European languages, five of them being of Slavic origin. In other words, 86% of all terms for basic meanings listed by Swadesh come from Indian, and 96% from the common pre-European vocabulary.

This confirms both the kinship of Romani to New Indo-Aryan languages and the lexical homogeneity of its basic vocabulary. The basic vocabulary covers existentially important basic domains; these are areas close to the human being concerning life and the environment. For example, the following Romani terms (personal designations) can be traced back to Indian:

[+ romani] neutral [– romani]
rom manuš murš gadžo 'man'
romni manušni džuvli gadži 'woman'
čhavo [manušoro] [muršoro] raklo 'boy'
čhaj [manušori] [džuvlori] rakli 'girl'

Particularly striking is the differentiation according to ethnic criteria, marked in the table by the feature of [± romani]. For ethnically neutral terms, the pair of murš / džuvli focuses gender difference, while manuš / manušni emphasize the human aspect. The neutral terms for 'boy' / 'girl' given in parenthesis are more or less common diminutive forms of the corresponding terms for 'man' / 'woman'.

Designations for human beings, essentially of Indo-Aryan origin, also function as kinship terms. Accordingly, terms describing direct relatives of the same generation – rom / romni 'husband' / 'wife', and phral / phen, 'brother' / 'sister'– as well as those designating relatives of the immediately preceding and following generations – čhavo / čhaj 'son' / 'daughter' and dad / daj 'father' / 'mother' – have Indian origins.

In contrast, terms for the grandparent generation are loans from Greek – papus / mami 'grandfather' / 'grandmother'. Terms designating indirect relatives of the preceding generation (i.e. siblings of parents) also belong to the early loans and most likely stem from Persian – kak / bibi 'uncle' / 'aunt'. All of the other kinship terms are either variety-specific loans from European contact languages or paraphrases. The following table summarizes the kinship system and the lexical layers of the respective terms from an individual's point of view:

+2 +1 –1 –2
direct 'grandson' čhavo phral dad papu(s)
'granddaughter' čhaj phen daj mami
indirect 'nephew' 'male cousin' kak(o)
'niece' 'female cousin' bibi

The human body is another basic domain where a great majority of terms are of Indic origin (body parts, functions, movements, physical and mental states). Numbers (see above); the environment (landscape, weather, plants, animals); shelter, tools and basic foods; and professions and social functions also belong to the basic vocabulary. The following examples from the domain of time show a preponderance of Indo-Aryan lexemes:

ivend < inc. hemanta 'winter'
nilaj < inc. nidāgha 'summer'
dives < inc. divasa 'day'
rat < inc. rātrī 'night'
berš < inc. varṣa 'year'
masek < inc. māsa 'month'
kurko < grc. kuriaké 'week, Sunday'
ciros < grc. kairós 'time'

Like almost all other basic areas, this domain also contains some pre-European loans, many of them from Byzantine Greek, which supports the idea that Romani speakers must have spent a considerable period in Asia Minor in intensive language contact with the majority populations. The resulting influence of Greek on Romani goes well beyond the lexical domain and is an important factor in morphology and syntax.

1. ^ A similar situation holds true for English. Even though only about one third of the English vocabulary is "Western Germanic" by origin, English is classified as a Western Germanic language, due to the fact that its basic vocabulary largely pertains to this third. In the same way, nobody questions the autonomy of the Japanese language, even though about 50 percent of its vocabulary stem from Chinese and a number of other words are English loans. As in many other comparable cases, the basic vocabulary of Japanese is "originary" by a great majority.

2. ^ The Swadesh-Lists contain 207 basic meanings used in lexico-statistics, that is, in the quantitative analysis of linguistic relationship.