* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download A newly discovered founder population: the
Mitochondrial DNA wikipedia , lookup
Genetic code wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Koinophilia wikipedia , lookup
Frameshift mutation wikipedia , lookup
Point mutation wikipedia , lookup
Behavioural genetics wikipedia , lookup
Designer baby wikipedia , lookup
Heritability of IQ wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genetic studies on Bulgarians wikipedia , lookup
Genetic testing wikipedia , lookup
Genetic engineering wikipedia , lookup
Genetic drift wikipedia , lookup
Public health genomics wikipedia , lookup
Medical genetics wikipedia , lookup
Genome (book) wikipedia , lookup
Genetics and archaeogenetics of South Asia wikipedia , lookup
Population genetics wikipedia , lookup
Challenges A newly discovered founder population: the Roma/Gypsies Luba Kalaydjieva,1* Bharti Morar,1 Raphaelle Chaix,2 and Hua Tang3 Summary The Gypsies (a misnomer, derived from an early legend about Egyptian origins) defy the conventional definition of a population: they have no nation-state, speak different languages, belong to many religions and comprise a mosaic of socially and culturally divergent groups separated by strict rules of endogamy. Referred to as ‘‘the invisible minority’’, the Gypsies have for centuries been ignored by Western medicine, and their genetic heritage has only recently attracted attention. Common origins from a small group of ancestors characterise the 8– 10 million European Gypsies as an unusual trans-national founder population, whose exodus from India played the role of a profound demographic bottleneck. Social and economic pressures within Europe led to gradual fragmentation, generating multiple genetically differentiated subisolates. The string of population bottlenecks and founder effects have shaped a unique genetic profile, whose potential for genetic research can be met only by study designs that acknowledge cultural tradition and self-identity. BioEssays 27:1084–1094, 2005. ! 2005 Wiley Periodicals, Inc. Introduction Isolated founder populations have been a precious resource for Mendelian genetics, contributing knowledge on novel disorders, genes and mutations, as well as new experimental and statistical approaches designed to exploit the special characteristics of such populations.(1–4) As the term implies, founder populations derive from a small number of ancestors, with subsequent isolation (geographic, cultural or religious etc.) leading to limited immigration and demographic growth mainly from within. Reduced genetic diversity and founder 1 Western Australian Institute for Medical Research and Centre for Medical Research, The University of Western Australia, Perth, Australia. 2 Equipe de Génétique des Populations, Unité d’Eco-Athropologie, Musée de l’Homme, Paris, France. 3 Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA Funding agencies: The Wellcome Trust, the Australian Research Council and the National Health and Medical Research Council. *Correspondence to: Luba Kalaydjieva, Laboratory of Molecular Genetics, WAIMR, B Block, QEII Medical Centre, Nedlands, WA6009, Perth, Australia. E-mail: [email protected] DOI 10.1002/bies.20287 Published online in Wiley InterScience (www.interscience.wiley.com). 1084 BioEssays 27.10 effect, resulting in a more homogeneous basis of inherited disorders and predispositions, make it possible for genetic studies to treat the whole population as one large family, where individuals affected by a specific condition are likely to share the same ancestral disease-causing DNA variant(s). Unlike other founder populations, whose history, genealogy and genetic epidemiology have been extensively documented,(5–8) the geographically dispersed and socially marginalised Gypsies have been ignored by European medicine for hundreds of years, and their unique genetic heritage is only now becoming a focus of interest for geneticists and medical practitioners. Private (meaning confined to this population) disease-causing mutations and evidence of founder effect were published for the first time in 1996 in two independent studies—of the novel Hereditary Motor and Sensory Neuropathy type Lom (HMSNL)(9) and of Limb-Girdle Muscular Dystrophy 2C.(10) Further medical genetic research, adding to the list of private mutations and aiming to understand their molecular epidemiology, has revealed a peculiar combination of genetic homogeneity and mutation sharing by affected subjects across Europe(10 –12) and, at the same time, an internal mosaic of striking differences in the prevalence of genetic disorders and mutations between neighbouring Gypsy communities in the same country. This epidemiological pattern suggests a complex population structure, whose understanding is essential for further genetic research, as well as for patient care and public health interventions. The issue has not been addressed in earlier population genetic investigations (summarised in ref. 13), relying on the random sampling of ‘‘pure-blood Gypsies’’, classified by country of residence, with disregard for social structure and self-identity. Recent population studies, initiated and designed to answer the questions raised by medical genetics, have made a big step forward and characterised the Gypsies as one of the most interesting founder populations and a valuable part of the European genetic landscape. The gypsies through the eyes of social scientists The potential of founder populations to contribute to understanding the genetic basis of human disease is largely determined by historical demography: size and diversity of the founding population, subsequent demographic growth, extent of genetic isolation, and internal genetic differentiation BioEssays 27:1084–1094, ! 2005 Wiley Periodicals, Inc. Challenges and substructure.(14– 18) The Gypsies have no written chronicles, church records and historical research of their own, and the puzzle of their history has been assembled from the accounts of outsiders and from inferences of linguistic and cultural anthropology studies. This is what the social sciences tell us. The current Gypsy population of Europe, around 10 million people,(19) is equivalent to that of a medium-sized European country. Dispersed throughout the continent, the Gypsies are known under different names—Roma, Sinti, Manouches, Romanichals, and Kalo, etc. Strictly speaking, ‘‘Roma’’, which is increasingly used as a generic term, refers to Eastern European Gypsies and will be used in this text when discussing the Gypsies of Bulgaria. The 200-year-old linguistic theory of the Indian origins of the Gypsies(20) is widely accepted by social scientists, with different opinions dating the exodus to 1000–1500 years ago, and the arrival in the Byzantine Empire to the 9th –11th century AD. Settling in the Balkans has taken place in the 13th – 14th century, and the early diaspora into the rest of Europe completed by the end of the 15th century.(20,21) The Balkans have traditionally hosted the largest proportion of European Gypsies, referred to as Balkan and Vlax Roma, depending on their history of migrations and related dialects of Romanes (classified according to the borrowings from other languages). The Balkan Roma, speakers of ‘‘Balkan’’ or stratum I dialects of Romanes, descend from the early Gypsy settlers in the lands south of the Danube River, within the limits of the Ottoman Empire.(20,21) The ancestors of the Vlax Roma continued the journey north of the Danube, into the Wallachian Principalities (present-day Romania), where Gypsy slavery was instituted in the early 14th century.(22) Slavery has not only played an important role in the social and biological history of the Vlax Roma, but is also related to important later migrations. During the Austrian–Turkish wars in the late 17th –early 18th century, temporary occupation of Wallachia allowed large numbers to escape into the ethnically tolerant Ottoman Empire:(21) these are speakers of ‘‘old Vlax’’ (stratum II) dialects of Romanes. By the end of the 19th –beginning of the 20th century, abolition of slavery led to mass migrations of Vlax Roma, speaking ‘‘new Vlax’’ (stratum III) dialects, to all parts of Europe. The latest migration wave, making the Roma a visible presence in the big cities of Western Europe, was triggered by the social and economic changes in Eastern Europe of the 1990s and the wars in former Yugoslavia. Adding to the complexity of historical migrations is the traditional organisation of Gypsy society, which has long fascinated cultural anthropologists and has been meticulously described in many studies. The primary unit of this social organisation is the Gypsy group, whose close resemblance to the professional jatis of India has been interpreted as further support for the Indian origins of the Gypsies.(20,21) Identity is defined by the group ethnonym (usually reflecting a traditional trade), customs and traditions, organs of self-rule, language, history of migrations and religion. Persisting in the midst of the nation states of Europe, Gypsy groups are separated from each other by strict rules of endogamy, yet recognise no political boundaries and may spread across many countries: members of the same group in different countries are eligible marriage partners, while intermarriage between neighbouring communities in the same small town is often proscribed. Group structure and diversity are particularly well preserved in the Balkans. In the early 20th century, a British ethnologist serving as the British consul in the Black Sea port city of Varna, described 19 different Gypsy ‘‘tribes’’ in the North-East of Bulgaria alone.(23) Drawing a convoluted picture of historical migrations and a mosaic social structure, linguists and cultural anthropologists also favour an ethnically and linguistically diverse founding population. Hypotheses on the origins of the proto-Gypsies range from the lowest strata of the Indian caste system, to a mixed society of warriors and camp followers fighting off the early Islamic incursions in the north of India.(20,21,24) Some scenarios assume a single founding event—the exodus from India, while others allow for a slow continuous trickle of small nomadic bands.(20,21) A geneticist’s summary of these data would describe the Gypsies as a conglomerate of Asian populations of largely unknown history and current size, that may fit the observed pattern of Mendelian disorders but does not satisfy the criteria for a large founder population of serious potential for research into complex disorders. The picture emerging from recent population genetic studies Population genetic studies of the Gypsies in the last few years(25 – 29) have been facilitated by new technologies, growing knowledge of variation in the human genome and rapidly accumulating data on global genetic diversity (please see box for a brief description of methodology). The design of these studies, based on the social tradition of the Gypsy people and made possible by a close interaction with cultural anthropologists, has aimed at understanding the common biological history of the Gypsy population of Europe, as well as the relationship between cultural and genetic identity of a diversity of Roma groups (Fig. 1). A young founder population of common Indian descent Unambiguous proof of the Indian ancestry of the Gypsies comes from three genetic marker systems: Y chromosome haplogroup H-M82, mtDNA haplogroup M(25) (Fig. 2), and the pathogenic 1267delG mutation in CHRNE(12) (causing autosomal recessive congenital myasthenia), found on the same ancestral chromosomal background in Gypsy, Indian and Pakistani subjects.(29) While confirming the centuries-old BioEssays 27.10 1085 Challenges Box: Brief description of methodology Genetic variation in the Gypsy population has been studied in nearly 2000 subjects, representing different parts of Europe and divergent Gypsy groups. The analyses included polymorphisms on the paternally inherited Y chromosome, the maternally transmitted mitochondrial (mt) DNA and biparentally inherited autosomal markers (neutral variants and diseasecausing mutations). Slowly evolving or unique event polymorphisms, which define Y chromosome and mtDNA haplogroups and reflect ancient human history, were used in comparison to global diversity data(51– 75) to trace the continental origins of Gypsy lineages. Markers of higher mutability, such as Y chromosomal and autosomal microsatellites (with haplotypes defined as the combination of marker alleles carried on the same chromosome), the Y chromosomal minisatellite MSY1,(76) and sequence variation in the mitochondrial control region were used to assess current diversity and infer recent population history. The population genetics analysis tool Arlequin(77) was used to examine genetic diversity (using parameters such as the observed number of haplotypes, number of polymorphic sites, and gene diversity), test demographic models based on mtDNA sequence variation, and assess genetic distances between populations using the number of pairwise differences as the molecular distance. The dating of historical events was based on coalescence time analysis of Y chromosome and autosomal haplotypes.(27,29,78– 80) Historical demography parameters were inferred from genetic data using the coalescence-based approach incorporated in the Batwing software.(79) Population structure was examined as described.(81) linguistic theory of the Indian origins is no great triumph for modern genetic research, the major, unexpected and mostsignificant result of these studies is the strong evidence of the common descent of all Gypsies regardless of declared group identity, country of residence and rules of endogamy. A staggering 47.3% of men carry H-M82 Y chromosomes, mtDNA haplogroup M accounts for almost 30% of Gypsy subjects, and congenital myasthenia is one of the most common Mendelian disorders of this population with !4% average carrier rates of the 1267delG mutation. The Y chromosome H-M82 and the mtDNA M haplogroups are characterised by limited internal haplotype and sequence diversity, which can easily be explained with the accumulation of mutations within the Gypsy population subsequent to its founding, rather than variation among the founders.(25) Within the H-M82 haplogroup, an identical 8-microsatellite Y chromo- 1086 BioEssays 27.10 some haplotype is shared by nearly 30% of Gypsy men, an astonishing degree of preservation of a highly differentiated lineage, previously described only in Jewish priests.(30) Analysis of the highly mutable minisatellite MSY1 in these Y chromosomes provides further evidence of limited variation, with diversity values lower than in Finns, Basques and Cook Islanders.(26) Similarly, nearly all mtDNA haplogroup M sequences fall into a single sub-haplogroup, M5.(25) These closely related lineages, which form minute tight clusters within the overall diversity of Asia, signal genetic homogeneity among the founders—possibly a small group of related individuals separating from a single, ethnically defined population. Based on the accumulated diversity, the most-recent common ancestor of Gypsy men carrying H-M82 Y chromosomes is estimated to have lived around 39 generation ago. Estimates in the same range, 32–36 generations ago,(29) are obtained for the mean coalescence times of the microsatellite haplotypes surrounding the two most-common and widespread founder mutations, 1267delG (congenital myasthenia) and R148X (Hereditary Motor and Sensory Neuropathy Lom(31)), used as an indicator of the time when these two mutations were introduced into the overall Gypsy population. Assuming a generation time of 30 years, these dates translate to 960–1170 years ago, supporting the exodus from India as the single recent founding event. The Gypsy group was born in Europe During the long journey from India to Europe, the Gypsies added new words to their language and newcomers to their society (Fig. 2). There is no documented history of their early social organisation; however, old European chronicles already mention small groups of 100–300 people with horses and dogs, headed by ‘‘barons, kings or princes’’ and provide some indirect evidence of tribal endogamy and hostilities between groups.(19 –21) The splits leading to group formation could have occurred at any time since the founding, while the description of group organisation as a ‘‘fluid mosaic’’(19) implies multiple divisions and mergers and changing rules of endogamy. Genetic dating of the population fissions is again based on microsatellite variation in Y chromosomes and in the autosomal regions surrounding four private disease-causing mutations.(26,27,29) All marker systems suggest that the earliest splits occurred 20–24 generations ago, i.e. from the late 13th century onwards, when Gypsy settlement in Europe has been documented beyond doubt.(20,21) The genetic data point to a gradual process of consecutive separations, which has continued until as recently as 6–8 generations ago. The size of the founding population of individual groups is estimated at several hundreds, with less than 100 for the effective male population size of some groups.(27) These figures may be inflated, as they are based on the overall current diversity, including lineages that are likely to have been Challenges Figure 1. Roma groups from Bulgaria. The different groups have been selected to represent various criteria of group identity: group ethnonym, history of migrations respectively dialects of Romanes; tradition of nomadism; religion. Ethnonym abbreviations: Kalaidjii—two different groups residing in the north (kan) and south (kas) of the country, Lom (lom), Kalderash (kal), Darakchii (dar), Rudari (rud), Burgudjii (bur), Feredjelii (fer), Turgovtzi (tur), Muzikanti (mus), Koshnitchari or Basketmakers (bas)—two different groups using different basketmaking technologies; Kovachi (bla). Pairs of groups living in close proximity in the same small town include kan/lom; fer/tur, kas/bas. The Rudari (rud) and Kalderas (kal) represent groups which are geographically spread out, with members sampled from different geographic locations. The Kalaidjii South (kas) belong linguistically to the Vlax category, however their genetic features are much closer to Balkan groups.(29) In addition, genetic studies include Gypsy subjects from Romania, Serbia, Hungary, Lithuania and Spain.(25,29) introduced by admixture subsequent to the founding. Such small sizes are compatible with the early historical descriptions of Gypsy groups, as well as with the data provided by the Ottoman tax registries, for example listing a total of !5,700 Gypsies in the territory of present-day Bulgaria in the early 16th century.(21) The demographic growth of Gypsy groups has been slow, with growth rate estimates based on Y chromosome variation in the range of 1.0017–1.0084, substantially lower than the 1.016–1.027 proposed for other European populations.(32) Given the common origins and lack of pre-existing differentiation of the early Gypsies, the formation of small strictly demarcated jati-like groups is surprising in itself. Even more surprising is the fact that the restitution of the ancient Indian tradition has taken place several centuries after the exodus, in the midst of European culture and social organisation. Social anthropologists attribute a major role in this process to the economy of ‘‘symbiotic’’ nomadism.(21) The Gypsies have always made a living by providing services to the surrounding population. Such dependence on the macrosociety would have favoured disintegration as a result of the economic pressure resulting from limited needs for specific services. The low capacity of small regions and communities to sustain blacksmiths, basket makers, and musicians etc. could lead to a continual budding off of branches initially maintaining contact with the old clan and eventually evolving into new, geographically dispersed endogamous groups. The hostility of the surrounding society has also exerted a profound influence on this process. (1) Slavery in Romania involved the separation of groups of people based on their owner and trade,(22) (2) murderous persecution of the Gypsies in Western Europe in the early centuries after their arrival has made the small mobile group a means of survival,(19) and (3) even the ethnically diverse and tolerant Ottoman Empire has encouraged the BioEssays 27.10 1087 Challenges Figure 2. The origins of Gypsy Y chromosome and mtDNA haplogroups. Linguistic analysis has charted the Gypsy migration from India to Europe through the Hindu Kush, along Persia and the southern shoreline of the Caspian Sea, through the southern Caucasus (Armenia) and westwards to Anatolia and Byzantium. Genetic evidence of the geographical origins of paternal and maternal lineages in the Gypsies is based on the study of unique event polymorphisms of the Y chromosome (left panel) and mtDNA (right panel), compared to published data on other populations.(51–62) Y chromosome haplogroup H-M82 and mtDNA haplogroup M can be unambiguously assigned to Asia and the Indian subcontinent. The wide geographical distribution of the remaining lineages precludes the tracing of their origins in the Gypsy population. The internal diversity of the latter lineages points to multiple admixture events during the journey or after the arrival of the Gypsies in Europe. separations by banning communication between Muslim and Christian Gypsies and imposing higher taxes on Christians.(21) The evolution of Gypsy groups into genetic subisolates Intermarriage between groups is generally proscribed or subject to strict rules, which can be complicated, hierarchical and asymmetrical for the two sexes. Failure to comply leads to the expulsion of the couple or the assimilation of the member of the higher-ranking into the lower-ranking group.(21) Interestingly, the ‘‘blood purity’’ laws are applied much more rigorously between Roma groups than to the surrounding population, and gadje outsiders, especially women, are often readily assimilated. In some cases, mixed marriages can result in the formation of a new Gypsy group, for example the Zhutane Roma in Bulgaria (literally Jewish Roma) born in the common deportation camps during World War II.(21) The strongest genetic indication of severely limited intergroup migrations is provided by a study of the chromosomal haplotypes surrounding the R148X founder mutation in three Vlax Roma groups, where R148X occurs at very high frequencies, with carrier rates ranging between 10 and 16%.(29) While the common origin of the mutation is clearly documented by a shared small conserved region around the mutated site, extended haplotypes spanning larger genetic distances have evolved independently, generating unique, 1088 BioEssays 27.10 group-specific haplotype profiles.(29) The striking lack of haplotype sharing is evidence of strict inter-group endogamy throughout the history of Roma groups. At the same time, genetic data support the declared tolerance towards outsiders, which appears to have always been part of Gypsy culture. Non-Indian Y chromosome and mtDNA lineages contribute to the genetic profile of the Gypsies (Fig. 2), with high internal diversity suggesting multiple admixture events.(25) A higher proportion of admixed lineages and a stronger signal of demographic growth are observed in the Balkan than in the Vlax Roma (Fig. 3). The majority of Balkan groups included in the genetic studies have settled centuries ago, in contrast to a long history of nomadism persisting among the Vlax Roma until their forced sedentarisation in the 1950s. Nomadic groups are known to be more conservative and to adhere strictly to social and cultural traditions.(20,21) The social reasons behind the different demographic regimes and proportion of admixed lineages may be related to internal differences in the tolerance to intermarriage and/or to the historically closer relationships with the surrounding populations. The harsh conditions of nomadic life may also have played a role, affecting life expectancy, population growth and genetic diversity. Small population size, strong drift effects, limited intergroup gene flow and differential admixture have led to rapid genetic differentiation between Gypsy groups. The extent of Challenges Figure 3. Paternal and maternal lineages in Roma groups from Bulgaria. Proportions of Indian Y chromosome H-M82, mtDNA haplogroup M5 and of other, admixed lineages in the Vlax Roma with a recent history of nomadism (left panel) and in the sedentary Balkan groups from Bulgaria. Ethnonym abbreviations are as shown in Figure 1. Preservation of the ancestral Y chromosome lineage is particularly strong in the Vlax compared to the Balkan Roma (two-tailed permutation test P ¼ 0.017). mtDNA lineages show greater overall diversity in the entire Gypsy population,(25) with the difference in the proportions of ancestral versus admixed lineages between Vlax and Balkan Roma not reaching statistical significance (P ¼ 0.071). A rank-test comparison of the mean growth rate values, estimated using Batwing, shows a marginally higher demographic growth in Balkan than in Vlax groups (two-tailed test P ¼ 0.043). divergence becomes evident in comparison to global populations (Fig. 4). The most striking feature of the multidimensional scaling (MDS) plot, based on mtDNA sequences, is the dispersal and large genetic distances separating Gypsy groups. A similar and even more pronounced pattern is presented by the tribal groups of India, in sharp contrast to other (especially European) populations, which cluster closely together. Significant internal substructure in the Gypsy population is also supported by other genetic systems, namely Y chromosomes, neutral autosomal polymorphisms, and disease mutations and their linked haplotypes (refs(25,29) and Table 1). On a condensed time scale, the history of the Gypsies (and, to a large extent, the social phenomena behind it) reproduces the more widely known history of the Jews, with the exodus, diaspora and fragmentation into small geographically dispersed communities. Possibly smaller historical population size, stronger drift effects and persisting group separation, maintained by social tradition and marriage customs, account for the marked internal differentiation of the Roma, making the group the basic unit of genetic research, as it is of social organisation. Genetic correlates of group identity Analysis of the correspondence between genetic affinities and the cultural anthropology classification of Roma groups shows that most criteria defining group identity do not reflect genetic similarities. Neither ethnonyms nor religion shared between Roma groups predict common genetic profiles and, contrary to other Europeans,(34) genetic and geographical distances show no correlation.(25) Higher order classification into metagroups, divided by particularly rigid rules of endogamy, separates groups which are genetically very close.(26) The only cultural anthropological criterion reflecting genetic relatedness is the history of migrations, hence dialects of Romanes.(25,29) The Balkan and Vlax Roma, and the Gypsies of Western Europe have been separated since the early migrations of the 14th century. Within each migrational category, the earliest internal splits have taken place simultaneously with the migrational separations, but the gradual and prolonged process of group formation has maintained intermarriage within each category for generations, with differential admixture from surrounding European populations contributing to the distinctive genetic profiles of migrational categories.(25,29) The impact on genetic disease A list of the private Gypsy mutations, known to be associated with single-gene disorders, is presented in Table 2. The list is not exhaustive and reflects the interests of individual researchers rather than comprehensive studies of the genetic disease heritage of the population. The major lesson learned from Mendelian genetics so far is that the identification of a BioEssays 27.10 1089 Challenges Figure 4. Genetic distances between Gypsy groups and other Eurasian populations. Fst between Gypsy groups and other Eurasian populations(63–75) were computed for the mtDNA hypervariable segment 1, using the number of differences between sequences as the molecular distance. Multidimensional scaling analysis was performed to represent the distance matrix in 2 dimensions. Stress ¼ 14.9%. Colour codes: Gypsy black, European red, Middle Eastern green, Central Asian light blue, Pakistani blue, and Indian orange. Abbreviations: Gypsy populations include the Roma groups from Bulgaria (shown in Fig. 1), as well as Gypsies from Spain (spR); Lithuania (liR), Serbia (seR), Romania (roR) and Hungary (huR). Indian tribal populations:(67) Koragas (kor), Bettakurumba (bet), Mullukurunan (muln), Mullukurumba (mulm), Paniya (pan),Apatani (apa), Adi (adi), Nishi (nis), Soligas (sol), Andh (anh), Naga (nog), Kuruchian (kru), Jenukurumba (jen), Thoti (tho), Pardhi (pad), Yerava (yer), Kattunaiken (kat). small number of affected Gypsy families sharing the same private mutation usually signals a widespread problem, involving large numbers of patients in many countries. Currently available frequency data(29,37) show that an average of 1 in 8 subjects in the general Gypsy population is a carrier of one of the five mutations tested. Within individual Gypsy groups, carrier rates for specific mutations often exceed 5% and can be as high as 16%. Mendelian disorders are thus a considerable health burden, making community-based carrier testing programs a highly beneficial public health initiative. Such programs would be facilitated by the allelic homogeneity, which allows simple testing procedures and high detection rates. 1090 BioEssays 27.10 The Gypsies have already made a contribution to Mendelian genetics through four novel private conditions: Hereditary Motor and Sensory Neuropathy types Lom and Russe—two severe forms of autosomal recessive Charcot-Marie-Tooth disease;(9,31,38) Congenital Cataracts Facial Dysmorphism Neuropathy syndrome, a developmental disorder affecting the basal transcription machinery;(35) and a new form of immune deficiency, characterised by absence of CD8þ cells.(45) Furthermore, private mutations in the Gypsies, causing known disorders of wide ethnic distribution, facilitate genotype– phenotype correlations in large genetically homogeneous groups of affected individuals carrying the same molecular defect.(11,12,50) Similar to other founder populations, gene Challenges Table 1. Genetic differentiation of Roma groups in Bulgaria Genetic cluster BLA N ¼ 54 DAR N ¼ 46 KAL N ¼ 33 KAN N ¼ 69 KAS N ¼ 39 LOM N ¼ 71 MUS N ¼ 51 RUD N ¼ 51 37 7 1 2 2 0 3 2 2 27 2 5 6 0 4 0 1 1 21 4 2 1 1 3 10 12 8 21 6 6 4 2 6 1 6 6 8 4 7 1 7 3 7 8 2 32 8 4 6 6 5 1 4 0 29 0 1 0 1 1 1 2 0 45 C.1 C.2 C.3 C.4 C.5 C.6 C.7 C.8 Analysis of 28 unlinked autosomal microsatellites selected from the World Diversity Panel(49) among the top 100 markers providing the best differentiation between European and Asian populations (the set overlaps partly with the markers used in a similar analysis of Jewish populations(33)). Stratification was assessed using the software tool Structure (vs 2.0)(81) under a non-admixture model, assuming 8 underlying sub-populations. The observed genetic clusters follow the boundaries of self-declared Roma group identity, with most clusters formed mainly (in some cases exclusively) by individuals from a specific group. Significant genetic differentiation is also revealed by analysis of the pairwise differences between mtDNA sequences and Y chromosome haplotypes in different Roma groups(25), where Fst ¼ 0.05, P ¼ 0 is obtained for mtDNA sequences and Fst ¼ 0.14 P ¼ 0 for Y chromosome haplotypes. mapping is based on the assumption of a single founder mutation inherited from a common ancestor on a chromosomal segment identical by descent (IBD). The unique advantage of the Gypsies as a founder population is related to the internal structure and the genetic differentiation between Gypsy groups. Independent haplotype diversification makes the informed sampling of affected individuals from different Gypsy groups a particularly efficient approach, where the critical gene interval is defined as the minimum region shared IBD between groups.(29) Incorporating population history in gene-cloning strategies has prompted the novel ‘‘Not-Quite Identical By Descent’’ (NQIBD) approach,(35) which could be applicable to other young isolates as well. In this approach, the identification of the mutated gene was greatly facilitated by tracing the origin of the mutation to a recent event occurring on a common haplotype background, which allowed the detection Table 2. . Private mutations associated with Mendelian disorders in the Gypsies Disorder With demonstrated founder effect Reported in a small number of families, no extensive mutation frequency data yet OMIM # Inheritance Locus Gene Mutation Ref. HMSN-Lom1 601455 AR 8q24 NDRG1 R148X Congenital cataracts facial dysmorphism neuropathy1 Limb-girdle muscular dystrophy 2C1 Congenital myasthenia1 Galactokinase deficiency1 HMSN-Russe Gitelman syndrome Neuronal ceroid lipofuscinosis 604168 AR 18qter CTDP1 IVS6 þ 389C > T 35 253700 AR 13q12 SGCG C283Y 10 254210 230200 605285 263800 606725 AR AR AR AR AR 17p13 17q24 10q23 16q13 15q21 CHRNE GK1 1267delG P28T SLC12A3 CLN6 Congenital glaucoma 231300 AR 2p21 CYP1B1 IVS9 þ 1G > T Unidentified, shared haplotype E387K Glanzmann thrombasthenia Ataxia-teleangiectasia Von Willebrand disease CD8þ T-cell deficiency Congenital nemaline myopathy Polycystic kidney disease Hyperparathyroidism-jaw tumour syndrome 273800 208900 277480 186910 102610 173900 145000 AR AR AR AR AR AD AD 17q21 11q22 12p13 2p12 1q42 4q21 1q31.2 ITGA2B ATM VWF CD8A ACTA1 PKD2 HRPT2 9,31 13 36,37 38 39 40 IVS15-1G > A 9010_9037del28 Q1311X G90S R39X R306X Exon 8 del2 (TG or GT) 41 42 43 44 45 46 47 48 1 Used in the genetic analyses of Gypsy population history (ref. 29 and 37). AR autosomal recessive, AD autosomal dominant. BioEssays 27.10 1091 Challenges Table 3. Probability of allele sharing between Roma groups Allele frequency in a given group <0.02 0.02–0.065 0.065–0.13 0.13–0.25 Probability of allele occurring in n other groups 0 0.629 0.087 0.000 0.000 1 0.185 0.105 0.000 0.000 2 0.074 0.140 0.019 0.000 3 0.074 0.210 0.000 0.000 4 0.037 0.140 0.019 0.017 5 0.000 0.122 0.173 0.017 6 0.000 0.087 0.365 0.070 7 0.000 0.105 0.423 0.894 The estimates are based on the allele frequency distributions of 28 unlinked autosomal markers (marker set as used in the genetic differentiation analysis, Table 1). For each allele, we recorded two characteristics: the number of Gypsy groups where the allele was observed, and the maximum frequency of the allele (pmax). We then grouped all alleles into four categories based on pmax. For each category, we computed the proportion of alleles, which are observed in 0,1,2, etc. other Gypsy groups. The thresholds for the four categories were chosen to have approximately equal numbers of alleles in each bin. The distribution of these neutral, low-frequency microsatellite alleles resembled that observed for rare recessive mutations, causing Mendelian disorders in the Roma (Table 1 in ref. 29). of the mutation by comparing the sequence of a small part of the critical gene region between the two chromosomes of a single carrier parent. The potential of the Gypsies to help research into common, genetically complex disorders is still in need of solid proof. The specific advantage in this case is again related to the combination of a primary founder effect !40 generations ago for the entire Gypsy population, with secondary founder effects 6–24 generations ago giving rise to divergent subisolates. The strong primary founder effect raises expectations of a relatively homogeneous genetic basis of complex disorders across subisolates and, similar to the general approach used in Mendelian disorders, their unique, group-specific, haplotype profiles could provide a powerful tool for positional cloning,. Based on the current data, the best approach to the identification of susceptibility genes would involve initial crude mapping in extended multiplex families from a single Gypsy group, with replication, fine mapping and association studies based on the informed cross-sampling of genetically related groups. At the same time, presumed high frequencies of diverse susceptibility alleles in all populations warrant caution and, in combination with the contribution of differential admixture to the genetic divergence of Gypsy groups, question the concept of the Gypsies as a single isolated founder population suitable for the study of genetically complex disorders. The internal stratification of the Gypsy population warrants caution in association studies, with an informed selection of cases and controls based on knowledge of the cultural anthropology and genetic affinities of the individual Gypsy groups involved. Table 3 shows the estimated probabilities of sharing the same susceptibility allele between Gypsy groups, given a frequency of that allele in an individual group. These estimates, based on neutral autosomal polymorphisms, are in agreement with the observed distributions of private mutations causing Mendelian disorders, where the highest frequency attained by a given mutation in a particular Gypsy group correlated with the overall 1092 BioEssays 27.10 number of groups where the mutation was detected.(29) The real situation is likely to differ between complex disorders and a true assessment can only be based on empirical studies, some (e.g. an investigation of bipolar affective disorder) currently in progress. Conclusion Many challenges have already been met by recent research into the genetics of the Gypsies. Breaking the divide between cultural and genetic anthropology has been crucial to unravelling biological history and inferring the parameters of historical demography. Involvement in these studies has been beneficial for all sides—researchers have come to appreciate this unusual population and its multiple social and medical problems, and affected families and communities have seen improvements in health care. Yet the big challenge still lies ahead. The story that we have told is the genetic image of xenophobia, the fate of an Asian people fractured and dispersed among Europeans and responding to hostility with a labyrinth of walls of endogamy. It may well be applicable to the numerous Asian, Middle Eastern and African immigrant communities now forming in the midst of prosperous united Europe. Without true integration and access to education and health care, genetic studies of the Gypsies will stall and what has already been achieved will come to be regarded as a curious and brief episode outside of mainstream genetics. Acknowledgments We wish to thank the numerous Gypsy families and communities for participation in these studies, Dr. Ivailo Tournev and other colleagues from Bulgaria and other countries for sample collection and for providing data on other populations, and Elena Marushiakova and Vesselin Popov for expert advice on the cultural anthropology of the Roma and guidance throughout these studies. Challenges REFERENCES 1. Peltonen L, Jalanko A, Varilo T. 1999. Molecular genetics of the Finnish disease heritage. Hum Mol Genet 8:1913–1923. 2. Ostrer H. 2001. A genetic profile of contemporary Jewish populations. Nat Rev Genet 2:891–898. 3. Arcos-Burgos M, Muenke M. 2002. Genetics of population isolates. Clin Genet 61:233–247. 4. Sheffield VC, Stone EM, Carmi R. 1998. Use of isolated inbred human populations for identification of disease genes. Trends Genet 14:391–396. 5. Gulcher J, Stefansson K. 1998. Population genomics: laying the groundwork for genetic disease modeling and targeting. Clin Chem Lab Med 36:523–527. 6. Agarwala R, Biesecker LG, Tomlin JF, Schaffer AA. 1999. Towards a complete North American Anabaptist genealogy: A systematic approach to merging partially overlapping genealogy resources. Am J Med Genet 86:156–161. 7. Scriver CR. 2001. Human genetics: lessons from Quebec populations. Annu Rev Genomics Hum Genet 2:69–101. 8. Norio R. 2003. Finnish Disease Heritage II: population prehistory and genetic roots of Finns. Hum Genet 112:457–469. 9. Kalaydjieva L, Hallmayer J, Chandler D, Savov A, Nikolova A, et al. 1996. Gene mapping in Gypsies identifies a novel demyelinating neuropathy on chromosome 8q24. Nat Genet 14:214–217. 10. Piccolo F, Jeanpierre M, Leturcq F, Dode C, Azibi K, et al. 1996. A founder mutation in the gamma-sarcoglycan gene of gypsies possibly predating their migration out of India. Hum Mol Genet 5:2019–2022. 11. Chandler D, Angelicheva D, Heather L, Gooding R, Gresham D, et al. 2000. Hereditary motor and sensory neuropathy—Lom (HMSNL): refined genetic mapping in Romani (Gypsy) families from several European countries. Neuromusc Disord 10:584–591. 12. Abicht A, Stucka R, Karcagi V, Herczegfalvi A, Horvath R, et al. 1999. A common mutation (epsilon1267delG) in congenital myasthenic patients of Gypsy ethnic origin. Neurology 53:1564–1569. 13. Kalaydjieva L, Gresham D, Calafell F. 2001. Genetic studies of the Roma (Gypsies): a review. BMC Med Genet 2:5. 14. Kruglyak L. 1999. Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat Genet 22:139–144. 15. Wright AF, Carothers AD, Pirastu M. 1999. Population choice in mapping genes for complex diseases. Nat Genet 23:397–404. 16. Peltonen L, Palotie A, Lange K. 2000. Use of population isolates for mapping complex traits. Nat Rev Genet 1:182–190. 17. Shifman S, Darvasi A. 2001. The value of isolated populations. Nat Genet 28:309–310. 18. Heutink P, Oostra BA. 2002. Gene finding in genetically isolated populations. Hum Mol Genet 11:2507–2515. 19. Liegeois J-P. 1994. Roma, Gypsies, Travellers. Strasbourg: Council of Europe Press. 20. Fraser A. 1992. The Gypsies. Oxford: Blackwell Publishers. 21. Marushiakova E, Popov V. 1997. Gypsies (Roma) in Bulgaria. Frankfurt am Main: Peter Lang. 22. Hancock I. 1987. The Pariah Syndrome. Ann Arbor: Karoma Publishers Inc. 23. ‘‘Petulengro’’. 1915–1916. Report on the Gypsy tribes of north-east Bulgaria. J Gypsy Lore Soc 9:1–109. 24. Hancock I. 2000. The emergence of Romani as a koine outside of India. In: Acton T, editor. Scholarship and Gypsy struggle: commitment in Romani studies. Hatfield: University of Hertfordshire Press. p 1–13. 25. Gresham D, Morar B, Underhill PA, Passarino G, Lin AA, et al. 2001. Origins and divergence of the Roma (gypsies). Am J Hum Genet 69:1314–1331. 26. Kalaydjieva L, Calafell F, Jobling MA, Angelicheva D, de Knijff P, et al. 2001. Patterns of inter- and intra-group genetic diversity in the Vlax Roma as revealed by Y chromosome and mitochondrial DNA lineages. Eur J Hum Genet 9:97–104. 27. Chaix R, Austerlitz F, Morar B, Kalaydjieva L, Heyer E. 2004. Vlax Roma history: what do coalescent-based methods tell us? Eur J Hum Genet 12:285–292. 28. Zhivotovsky L, Underhill PA, Cinnioglu C, Kayser M, Morar B, et al. 2004. On the effective mutation rate at Y-chromosome STRs with 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. application to human population divergence time. Am J Hum Genet 74:50–61. Morar B, Gresham D, Angelicheva D, Tournev I, Gooding R, et al. 2004. Mutation history of the Roma/Gypsies. Am J Hum Genet 75:596–609. Thomas MG, Skorecki K, Ben-Ami H, Parfitt T, Bradman N, et al. 1998. Origins of Old Testament priests. Nature 394:138–140. Kalaydjieva L, Gresham D, Gooding R, Heather L, Baas F, et al. 2000. N-myc downstream-regulated gene 1 is mutated in hereditary motor and sensory neuropathy-Lom. Am J Hum Genet 7:47–58. Slatkin M, Bertorelle G. 2001. The use of intraalelic variability for testing neutrality and estimating population growth rate. Genetics 158:865–874. Rosenberg NA, Woolf E, Pritchard JK, Schapp T, Gefel D, et al. 2001. Distinctive genetic signatures in the Libyan Jews. Proc Natl Acad Sci USA 98:858–863. Rosser ZH, Zerjal T, Hurles ME, Adojaan M, Alavantic D, et al. 2000. Y-chromosome diversity in Europe is clinal and influenced primarily by geography, rather than by language. Am J Hum Genet 67:1526–1543. Varon R, Gooding R, Steglich C, Marns L, Tang H, et al. 2003. Partial deficiency of the C-terminal-domain phosphatase of RNA polymerase II is associated with congenital cataracts facial dysmorphism neuropathy syndrome. Nat Genet 35:185–189. Kalaydjieva L, Perez-Lezaun A, Angelicheva D, Onengut S, Dye D, et al. 1999. A founder mutation in the GK1 gene is responsible for galactokinase deficiency in Roma (Gypsies). Am J Hum Genet 65:1299–1307. Hunter M, Heyer E, Austerlitz F, Angelicheva D, Nedkova V, et al. 2002. The P28T mutation in the GALK1 gene accounts for galactokinase deficiency in Roma (Gypsy) patients across Europe. Pediatr Res 51: 602–606. Rogers T, Chandler D, Angelicheva D, Thomas PK, Youl B, et al. 2000. A novel locus for autosomal recessive peripheral neuropathy in the EGR2 region on 10q23. Am J Hum Genet 67:664–671. Coto E, Rodriguez J, Jeck N, Alvarez V, Stone R, et al. 2004. A new mutation (intron 9 þ 1G > T) in the SLC12A3 gene is linked to Gitelman syndrome in Gypsies. Kidney Int 65:25–29. Sharp JD, Wheeler RB, Parker KA, Gardiner RM, Williams RE, et al. 2003. Spectrum of CLN6 mutations in Variant late infantile neuronal ceroid lipofuscinosis. Hum Mut 22:35–42. Plasilova M, Stoilov I, Srafarazi M, Kadasi L, Ferakava E, et al. 1999. Identification of a single ancestral CYP1B1 mutation in Slovak Gypsies (Roms) affected with primary congenital glaucoma. J Med Genet 36: 290–294. de la Salle C, Schwartz A, Baas M-J, Lanza F, Cazenave J-P. 1995. Detection by PCR and HphI restriction analysis of a splice site mutation at the 50 end of intron 15 of the platele GPIIb (IIb integrin) gene responsible for Glanzmann’s thrombasthenia type I in Gypsies originating from the Strasbourg area. Thromb Haemost 74:990–991. Mitui M, Campbell C, Coutinho G, Sun X, Lai C-H, et al. 2003. Independent mutational events are rare in the ATM gene: haplotype pre-screening enhances mutation detection rate. Hum Mut 22:43–50. Casanha P, Martinez F, Haya S, Lorenzo JI, Espinos C, et al. 2000. Q1311X: a novel missense mutation of putative ancient origin in the von Willebrand factor gene. Br J Haematol 111:552–555. Calle-Martin O, Hernandez M, Ordi J, Casamitjana N, Arostegui JI, et al. 2001. Familial CD8 deficiency due to a mutation in the CD8a gene. J Clin Invest 108:117–123. Romero NB, Barois A, Leroy J-P, Durling H, Fardeaux M, et al. 2003. Homozygous nonsense mutation of the ACTA1 gene—clinical phenotype and muscle pathology. Neuromuscul Disord 13:619. Veldhuisen B, Saris JJ, de Haij S, Hayashi T, Reynolds DM, et al. 1997. A spectrum of mutations in the second gene for autosomal dominant polycystic kidney disease (PKD2). Am J Hum Genet 61:547–555. Cavaco BM, Guerra L, Bradley KJ, Carvalho D, Harding B, et al. 2004. Hyperparathyroidism-jaw tumor syndrome in Roma families from Portugal is due to a founder mutation of the HRPT2 gene. J Clin EndocrinMetab 89:1747–1752. Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, et al. 2002. Genetic structure of human populations. Science 298:2381–2385. Merlini L, Kaplan J-C, Navarro C, Barois A, Bonneau D, et al. 2000. Homogeneous phenotype of the gypsy limb-girdle MD with the gammasarcoglycan C283Y mutation. Neurology 54:1075–1079. BioEssays 27.10 1093 Challenges 51. Y chromosome Consortium. 2002. A nomenclature system for the tree of human Y chromosomal binary haplogroups. Genome Res 12:339– 348. 52. Wallace DC, Lott MT. 2004. ‘‘MITOMAP: A Human Mitochondrial Genome Database’’ http://www.mitomap.org/. 53. Basu A, Mukherjee N, Roy S, Sengupta S, Banerjee S, et al. 2003. Ethnic India: a genomic view, with special reference to peopling and structure. Genome Res 13:2277–2290. 54. Cinnioglu C, King R, Kivisild T, Kalfoglu E, Atasoy S, et al. 2004. Excavating Y-chromosome haplotype strata in Anatolia. Hum Genet 114: 127–148. 55. Cordaux R, Aunger R, Bentley G, Nasidze I, Sirajuddin SM, et al. 2004. Independent origins of Indian caste and tribal paternal lineages. Curr Biol 14:231–235. 56. Kivisild T, Bamshad MJ, Kaldma K, Metspalu M, Metspalu E, et al. 1999. Deep common ancestry of Indian and western-Eurasian mitochondrial DNA lineages. Curr Biol 9:1331–1334. 57. Kivisild T, Rootsi S, Metspalu M, Mastana S, Kaldma K, et al. 2003. The genetic heritage of the earliest settlers persists both in Indian tribal and caste populations. Am J Hum Genet 72:313–332. 58. Nasidze I, Ling EY, Quinque D, Dupanloup I, Cordaux R, et al. 2004. Mitochondrial DNA and Y-chromosome variation in the Caucasus. Ann Hum Genet 68:205–221. 59. Semino O, Passarino G, Oefner PJ, Lin AA, Arbuzova S, et al. 2000. The genetic legacy of Paleolithic Homo sapiens sapiens in extant Europeans: a Y chromosome perspective. Science 290:1155–1159. 60. Torroni A, Bandelt HJ, D’Urbano L, Lahermo P, Moral P, et al. 1998. mtDNA analysis reveals a major late Paleolithic population expansion from southwestern to northeastern Europe. Am J Hum Genet 62:1137– 1152. 61. Torroni A, Richards M, Macaulay V, Forster P, Villems R, et al. 2000. mtDNA haplogroups and frequency patterns in Europe. Am J Hum Genet 66:1173–1177. 62. Wells RS, Yuldasheva N, Ruzibakiev R, Underhill PA, Evseeva I, et al. 2001. The Eurasian heartland: a continental perspective on Ychromosome diversity. Proc Natl Acad Sci USA 98:10244–10249. 63. Belledi M, Poloni ES, Casalotti R, Conterio F, Mikerezi I, et al. 2000. Maternal and paternal lineages in Albania and the genetic structure of Indo-European populations. Eur J Hum Genet 8:480–486. 64. Bertranpetit J, Sala J, Calafell F, Underhill PA, Moral P, et al. 1995. Human mitochondrial DNA variation and the origin of Basques. Ann Hum Genet 59:63–81. 65. Calafell F, Underhill P, Tolun A, Angelicheva D, Kalaydjieva L. 1996. From Asia to Europe: mitochondrial DNA sequence variability in Bulgarians and Turks. Ann Hum Genet 60:35–49. 66. Comas D, Calafell F, Bendukidze N, Fananas L, Bertranpetit J. 2000. Georgian and Kurd mtDNA sequence analysis shows a lack of 1094 BioEssays 27.10 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. correlation between languages and female genetic lineages. Am J Phys Anthropol 112:115–116. Cordaux R, Saha N, Bentley GR, Aunger R, Sirajuddin SM, et al. 2003. Mitochondrial DNA analysis reveals diverse histories of tribal populations from India. Eur J Hum Genet 11:253–264. Corte-Real HB, Macaulay VA, Richards MB, Hariti G, Issad MS, et al. 1996. Genetic diversity in the Iberian Peninsula determined from mitochondrial sequence analysis. Ann Hum Genet 60:331–350. Hofmann S, Jaksch M, Bezold R, Mertens S, Aholt S, et al. 1997. Population genetics and disease susceptibility: characterization of central European haplogroups by mtDNA gene mutations, correlation with D loop variants and association with disease. Hum Mol Genet 6:1835–1846. Horai S, Hayasaka K. 1990. Intraspecific nucleotide sequence differences in the major noncoding region of human mitochondrial DNA. Am J Hum Genet 46:828–842. Mountain JL, Hebert JM, Bhattacharyya S, Underhill PA, Ottolenghi C, et al. 1995. Demographic history of India and mtDNA-sequence diversity. Am J Hum Genet 56:979–992. Nasidze I, Stoneking M. 2001. Mitochondrial DNA variation and language replacements in the Caucasus. Proc R Soc Lond B Biol Sci 268:1197– 1206. Quintana-Murci L, Chaix R, Wells RS, Behar DM, Sayar H, et al. 2004. Where west meets east: the complex mtDNA landscape of the southwest and Central Asian corridor. Am J Hum Genet 74:827– 845. Richards M, Macaulay V, Hickey E, Vega E, Sykes B, et al. 2000. Tracing European founder lineages in the Near Eastern mtDNA pool. Am J Hum Genet 67:1251–1276. Sajantila A, Lahermo P, Anttinen T, Lukka M, Sistonen P, et al. 1995. Genes and languages in Europe: an analysis of mitochondrial lineages. Genome Res 5:42–52. Jobling MA, Bouzekri N, Taylor PG. 1998. Hypervariable digital DNA codes for human paternal lineages: MVR-PCR at the Y-specific minisatellite, MSY1 (DYF155S1). Hum Mol Genet 7:643–653. Schneider S, Roessli D, Excoffier L. 2000. Arlequin ver. 2.0: A software for population genetic data analysis. Genetics and Biometry Laboratory, University of Geneva, Switzerland. Tang H, Siegmund DO, Shen P, Oefner PJ, Feldman MW. 2002. Frequentist estimation of coalescence times from nucleotide sequence data using a tree-based partition. Genetics 161:447–459. Wilson IJ, Balding DJ. 1998. Genealogical inference from microsatellite data. Genetics 150:499–510. Rannala B, Slatkin M. 1998. Likelihood analysis of disequilibrium mapping, and related problems. Am J Hum Genet 62:459–473. Pritchard JK, Stephens M, Donnelly P. 2000. Inference of population structure using multilocus genotype data. Genetics 155:945–959.