Download We have, using a unique data base, successfully genotyped

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cre-Lox recombination wikipedia , lookup

Public health genomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Genetic code wikipedia , lookup

Mutation wikipedia , lookup

Koinophilia wikipedia , lookup

Population genetics wikipedia , lookup

SNP genotyping wikipedia , lookup

Frameshift mutation wikipedia , lookup

Genetics and archaeogenetics of South Asia wikipedia , lookup

Genealogical DNA test wikipedia , lookup

Microevolution wikipedia , lookup

Point mutation wikipedia , lookup

Human genetic variation wikipedia , lookup

Haplogroup E-M215 (Y-DNA) wikipedia , lookup

Haplogroup G-M201 wikipedia , lookup

HLA A1-B8-DR3-DQ2 wikipedia , lookup

Tag SNP wikipedia , lookup

A30-Cw5-B18-DR3-DQ2 (HLA Haplotype) wikipedia , lookup

Transcript
Supplementary Material:
CYP2C8 evolution
It is not surprising that several historical recombinants must have occurred in the ancestry
of the current haplotypes. However, the haplotype frequency distributions argue that
none of the genomic regions has a particularly high rate of recombination and almost all
crossover products now seen represent “ancient” crossovers that occurred before modern
humans expanded out of Africa.
We can identify 17 haplotypes that occur commonly enough in multiple populations that
we can be confident of their existence. Undoubtedly many of the individually very rare
haplotypes, lumped into the residual class, are also true. However, the occasional
occurrence of missing data allows for the possibility of incorrect inference by the
statistical programs used, as witness by the fact that different programs (HAPLO, PHASE,
fastPHASE) commonly differ somewhat on the occurrences/frequencies of very rare
haplotypes but virtually never differ on those found at 5% or greater frequency in at least
one population. Clearly, almost all the evolutionary information is present in the more
common of the 17 haplotypes and we consider the amino acid variants in that context.
The five of the known uncommon amino acid variants we have studied do occur
sufficiently frequently that we are confident of their haplotypes.
In order to better understand the evolutionary relationships of these haplotypes we have
identified those groups of adjacent SNPs within the haplotype among which we see no
definite evidence of recombination. For these groups accumulation of mutations from the
ancestral sequence is a sufficient (parsimonious) explanation of all existing haplotypes
within a group (or molecular subregion). (This follows the process we have described for
ADH7 (Han et al., 2005) and CYP2E1 (Lee et al., 2008). Three such groups exist for the
10 SNPs we have studied (Figure S1). The first group comprises the two SNPs at
CYP2C9 and that pair is separated by the longest intermarker distance from the 8 SNPs at
CYP2C8. Those CYP2C8 SNPs divide into two groups: one consists of three SNPs
1
(numbers 3, 4, and 5 in Table 1) and the other consists of five SNPs (numbers 6-10 in
Table 1). The diagrams on the left of Figure S1 show how mutations accrued within each
group with the ancestral sequence numbered “1” within each group. The trees on the
right of Figure S1 show how the 17 haplotypes in Figure 2 of the paper are composed of
the different subhaplotypes of the three groups. Unfortunately, there is no unambiguous
way to temporally order all of the mutation and historical crossover events, though many
reasonable inferences can be made.
Most of the presumably recombinant haplotypes appear to be ancient crossovers that
became common and not to be common because of frequent ongoing recombination. The
implication is that since humans expanded out of Africa each extant copy of each of the
17 haplotypes has a history of evolving by descent independently from every other
distinct haplotype that preexisted in Africa. Understanding this recent independent
evolution provides a potential guide to the search for additional biomedically relevant
variants: such a variant may not have been identified because the haplotype on which it
occurs may not have been sufficiently resequenced. What occurs on the background of
one haplotype is very unlikely to occur on any other. We note especially haplotypes D
(the ancestral haplotype) and M (a triply derived haplotype) that are common in African
populations and rare to virtually absent in non-African populations. Those haplotypes
have evolved independently from each other and from any common non-African
haplotype for at least 100,000 years. Because they represent some of the oldest
haplotypes, they have had the most time to accumulate additional functional mutations.
Dataset comparison
Figure S2 gives the CYP2C8 haplotype frequencies around the world for the four SNPs
in common in our study and the study of Rodriguez-Antona et al. (2008). As can be seen
most of the data collapse into a single haplotype common around the world. The
haplotype B in their study is a subset of that common haplotype.
2
Lee MY, Mukherjee N, Pakstis AJ, Khaliq S, Mohyuddin A, Mehdi SQ, et al. Global
patterns of variation in allele and haplotype frequencies and linkage disequilibrium across
the CYP2E1 gene. Pharmacogenomics J 2008; 8:349-56.
Han Y, Gu S, Oota H, Osier M, Pakstis AJ, Speed WC, et al. Evidence of positive
selection on a Class I ADH locus. Am J Hum Genet 2007, 80:441-456.
3
Supplementary Figure S1. The most parsimonious mutational histories are presented on
the left with #1 corresponding to the ancestral sequence of each group of SNPs. On the
right the combinations of specific haplotypes in each group, ordered left to right, are
equated to the full 10-SNP lettered haplotypes in Table 2 and Figure 2 of the main text.
4
Supplementary Figure S2.
5