* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Populations
History of genetic engineering wikipedia , lookup
Copy-number variation wikipedia , lookup
Gene desert wikipedia , lookup
Genetic studies on Bulgarians wikipedia , lookup
Koinophilia wikipedia , lookup
Gene therapy wikipedia , lookup
Public health genomics wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Gene nomenclature wikipedia , lookup
Genetic engineering wikipedia , lookup
Gene expression programming wikipedia , lookup
Genome (book) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Designer baby wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Genetics and archaeogenetics of South Asia wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Genetic drift wikipedia , lookup
Population genetics wikipedia , lookup
Populations Large populations Terns Small populations Dryopteris fragrans, a rare cliff fern Dynamic populations Homo sapiens Complex populations Markers: isozymes AFLPs MMMMM Illumina Beadstation genotyping for SNPs • • High throughput genotypins Genotyping of a cross: • Low cost per genotype (5-20 cents) but need to assay for large number of genotypes (either 384, or 768, or 1586) makes total cost large (thousands of $) What do populations have to do with genetic markers Influence levels of diversity Conversely, polymorphic genetic markers can infer many population processes Emphasis in FRST432 is the latter Quantifying genetic variation Gene frequency Genetic diversity Hardy-Weinberg Estimation of gene frequency For co-dominant loci, simply count the numbers (“gene counting method”) Gene counting method also is the “maximum likelihood estimate” Estimation of allele frequency MN blood group: genotype MM MN NN number 392 707 320 Total: 1419 (actual sample size is twice) Frequency of M=PM=(2 x 392 + 707)/[2 x 1419]=0.525 Frequency of N is 1-PM Estimation of gene frequency Estimation based upon gene counting More theoretical relationship pA = (2NAA+NAa.)/(2NAA+2NAa+2Naa) pA = fAA+.5fAa F’s are frequencies Var(pA) = Var(pa) = pA (1-pA)/(2N) Binomial sampling variance Construct confidence interval Comparing among populations Hardy-Weinberg Predict genotypic frequencies from gene frequencies F(AA)=p2 F(Aa)=2pq F(aa)=q2 Expansion of (p+q)2 HW is basis for almost all models Inbreeding also detected as excess of homozygotes Historical context Is the population going to be driven to a particular frequency for an allele simply because it is inherited in a Mendelian fashion? Is the recessive phenotype driven to occur in 25% of the population? Hardy and Weinberg proved this was false HW “Equilibrium” Equilibrium = nothing changes across generations Genotypes are transient, broken up each generation Reconstituted randomly into zygotes Reached in just one generation Assumptions of HW No directive forces No mutation, migration, selection No dispersive forces Infinite population size, random mating Predictions of HW Allele frequencies unchanged over time After one generation, genotypic frequencies unchanged over time Allele frequencies, not genotypic frequencies, are sufficient parameters for models One prediction of H-W rule The fundamental measure of genetic variation: expected heterozygosity pi At one locus, gene frequency for i-th allele is expected Hardy-Weinberg frequency of homozygous genotype is 2 pi Over all possible alleles i, i=1,n, the probability that the n locus is homozygous for any allele is J p i 1 2 i Expected heterozygosity = 1-expected homozygosity n H 1 J 1 p i 1 often referred to as gene diversity 2 i Heterozygosity at 20 variable allozymes out of 71 loci sampled in a population of European people Gene Locus Aph Alkaline phosphatase (placental) Acph Acid phosphatase Gpt Glutamate-pyruvate transaminase Adh-3 Alcohol dehydrogenase-3 Peps Pepsinogen Pgm-2 Phosphoglucomutase-2 Pept-A Peptidase-A Pgm-1 Phosphoglucomutase-l Me Malic enzyme Ace Acetylcholinesterase Adn Adenosine deaminase Gput Galactose-1-phosphate uridyl transferase Adk Adenylate kinase Amy Amylase (pancreatic) Adh-2 Alcohol dehydrogenase-2 6Pgdh 6-Phosphogluconate dehydrogenase Hk Hexokinase (white-cell) Got Glutamate-oxaloacetate transaminase Pept-C Peptidase-C Pept-D Peptid ase-D 51 Loci invariant (Monomorphic) Enzyme Encoded After H. Harris and D. A. Hopkinson, J. Human Genetics Heterozygosity H 0.53 0.52 0.50 0.48 0.47 0.38 0.37 0.36 0.30 0.23 0.11 0.11 0.09 0.09 0.07 0.05 0.05 0.03 0.02 0.02 0.00 Comparison of isozyme variation across kingdoms Variation of diversity among species Explaining levels of diversity is a prime activity of population genetics Plants have most diverse array of life histories, shortlived and self-fertilizers have least variation, long-lived outcrossers have most variation Vertebrates have narrowest array of life histories, hence lowest variation of diversity among species Just explaining the mean level of diversity is challenging Outcome of complex interplay of mutation, selection, and chance (drift)… Q. What does heterozygosity measure? A. The tendency for a population to have “intermediate” gene frequencies Other measures of genetic variation Polymorphism Ford (1940) “the occurrence together in the same habitat of two or more discontinuous forms in such proportions that the rarest of them cannot be maintained by recurrent mutation” probably not a good definition in 2006 Polymorphism Cavalli-Sforza and Bodmer (1971) “the occurrence in the same population of two or more alleles at one locus, each with appreciable frequency” but what is “appreciable frequency?” Other measures of diversity Proportion of polymorphic loci: P practical definition of “appreciable frequency” arbitrary limit for most common allele 0.95 normally 0.99 sometimes (used when sample is adequate, N >100) Numbers of alleles Number of alleles, n allele diversity or allele richness strongly influenced by sample size Effective number of alleles ne = 1 / ( 1 - H ) number of equally frequent alleles that gives observed H P vs. H over taxa Measures of nucleotide diversity Proportion of sites that differ = S/N S=number of segregating sites N=number of nucleotide sites Depends on number of sequences aligned the more sequences, the higher S like the proportion of polymorphic loci Nucleotide diversity Heterozygosity averaged over aligned sites If there are K sequences, make all possible pairwise comparisons (there are K(K-1)/2 comparisons) Analogous to H as estimated from gene frequencies Estimation of gene frequency Gene counting Freq(A) = Freq(AA)+.5 Freq(Aa) Var(p) = p(1-p)/(2N) Binomial sampling variance Construct confidence interval Dominance: need Hardy-Weinberg Estimation of gene frequency Dominance: assume Hardy-Weinberg f aa q qˆ 2 f aa Var (qˆ ) (1 q ) /( 4 N ) 2 Kermode bear example A total of 87 bears were collected for hair samples on Gribbell, Princess Royal and Roderick Islands 66 were black, 21 were white Frequency of recessive phenotype = 21/(66+21) = 0.241 Estimate of gene frequency of white gene is square root of this: sqrt(0.241) = 0.49 Variance is (1-0.492)/(4*87)=0.00218 SE is sqrt of this, sqrt(0.00218) =0.046 We also have nucleotide data for gene underlying Kermode coat color AA and AG = black, GG=white 42 AA, 24 AG, 21 GG Gene frequency of G (white) = (24 + 2 x 21)) / (2 x 87) = 0.38 SE = sqrt(q(1-q)/2N) = 0.040 Using just coat color, with white recessive q=0.49, SE=0.046 (from previous slide) q is higher (0.49 vs. 0.38); why? Expected frequency of white bears Using co-dominant Mc1r data, expected number of GG = 87 x (0.38)2 = 12.5 Observed number is 21 (>>12.5) Can be caused by Assortative mating which creates excess of white genotype (GG) over HW expectations Variation of gene frequency among islands Microsatellite loci show no excess homozygosity! Assortative mating at coat color locus Excess homozygosity only at Mc1r Null alleles or inbreeding? Fis values (excess homozygosity above HW expectations) for Yellow Warbler microsatellites Locus Caµ 28 Dpµ 01 Dpµ 03 Dpµ 15 Dpµ 16 Maµ 23 Fis value 0.30 0.01 0.05 0.12 0.00 0.02 Another exercise in HW: null alleles increase apparent homozygote frequency Sum of all true homozygotes plus all heterozygous nulls (e.g., sum last row and column of the expansion of gene frequencies, except for the lower right corner) Equals expected homozygosity plus twice null frequency n p i 1 p12 p1 p2 ... p1 pn 2 i n 1 2 pi pn i 1 p1 p2 p22 ... ... ... ... p2 pn ... p1 pn p 2 pn ... 2 pn J e 2 pn (1 pn ) Populations: defining and identifying Two major paradigms for defining populations •Ecological paradigm A group of individuals of the same species that co-occur in space and time and have an opportunity to interact with each other. •Evolutionary paradigm A group of individuals of the same species living in close enough proximity that any member of the group can potentially mate with any other member. Cocoa from 32 abandoned estates in Trinidad 88 Imperial College Selection (ICS) clones conserved in the International Cocoa Genebank, Trinidad, assayed for 35 microsatellite loci Unweighted pair group method used to construct dendrogram of relatedness between individuals The different colored groups can be identified by eye, or identified with the computer program “STRUCTURE” (as was done here). Yellow perch The yellow perch (Perca flavescens) is found in the United States and Canada, and looks similar to the European perch but are paler. It is in the same family as the walleye, but in a different family from white perch. The yellow perch plays a significant role in the survival and success of the double-crested cormorant and other birds, predatory fish, commercial fisherman, and sport fisherman in the Great Lakes region. This fish must be properly managed in order to prevent the trophic structure and economy of the Great Lakes region from collapsing. mt DNA Control region haplotype frequency patterning for Yellow Perch spawning site groups across North America Relationships among mtDNA haplotypes of Yellow Perch Allele distribution for six representative Yellow Perch microsatellite loci among selected regions. Rings represent loci, colors within a ring represent alleles. Bayesian assignment of Yellow Perch genetic structure, using STRUCTURE. Vertical bars represent individuals, colors within a bar represent probability of assignment to a cluster. 8 microsatellite loci, 25 collection sites, N= 495 fish, K=10 Inference of population structure using multi-locus genotype data STRUCTURE V2.1 Pritchard, J.K., and Wen, W. (2004) Pritchard, Stephens, and Donnelly (2000) Falush, Stephens, and Pritchard (2003) Main objective of “structure” Assign individuals to populations on the bases of their genotypes, while simultaneously estimating population allele frequencies Infer number of populations “K” in the process Other objectives Begin with a set of predefined populations and to classify individuals of unknown origin Identify the extent of admixture of individuals Infer the origin of particular loci in the sampled individuals Structure is a Bayesian Model Based method of clustering many assumptions about parameters and distributions Four basic models 1. Model without admixture each individual is assumed to originate in one (only one) of K populations 2. Model with admixture each individual is assumed to have inherited some proportion of its ancestry from each of K populations Four basic models 3. Linkage model “Chunks” of chromosomes as derived as intact units from one or another K population and all allele copies on the same “chunk” derive from the same population. Four basic models 4. F model The populations all diverged from a common ancestral population at the same time, but allows that the populations may have experienced different amounts of drift since the divergence event Assumptions • The main modeling assumptions are HardyWeinberg equilibrium (HW) within populations and complete linkage equilibrium (LD) between loci within populations • The model accounts for the presence of HW or LD by introducing population structure and attempts to find populations groupings that (as far as possible) are not in disequilibrium Hardy-Weinberg Gives relationship between gene frequencies and genotypic frequencies, assuming random mating F(AA)=p2 F(Aa)=2pq F(aa)=q2 The extent of a randomly mating population is predicted from STUCTURE using HW predictions Pairwise comparison of LD along chromosomes, high LD is red, low LD is green Bayesian procedure employed by STRUCTURE Step 1: estimate the allele frequencies for each population assuming that the population of origin of each individual is known. Step 2: estimate the population of origin of each individual, assuming that the population allele frequencies are known. Iterate several times using “Markov-Chain Monte-Carlo” procedure Good and bad things about “structure” When populations are real, most efficient way to estimate number of populations K and the membership of individuals to populations When populations are more continuous (for example a continuous cline), can impose incorrect structure on data, and create an arbitrary number of artificial groups. Human variation and differentiation Hundreds of microsatellites now available ALU markers Can evolutionary history be reconstructed Are there distinct “races” Are certain populations less diverse K is set to 3 We place individuals in three groups, without prior knowledge of group membership More loci, the better identification of groups Noah Rosenberg et al, Science, 2002 • Human Genome Diversity Panel • 55 Indigenous Populations from 5 Continents: Africa, Americas, Asia, Europe, Oceania, total of 1,056 people • 377 microsatellite markers assayed Structure within structure Jun Li et al, Science, 2008 Human Genome Diversity Panel, 938 individuals from 51 populations, 5 continents 650,000 SNP Markers Bayesian prior for population assignment Ursus americanus ssp. Kermodii Purpose of Kermode bear study, conducted in conjunction with Western Forest Products • Determine if white bear populations are genetically unique for other types of genetic variation • Identify the gene, or genes, that cause the white coat color difference • Infer the role of natural selection vs. genetic drift from patterns of genetic variation for this gene • Predict effects of forest practices using this information Populations sampled for Kermode bear hairs Barb wire hair trap with Kermode hair From 1685 hair samples to 766 microsatellite profiles to 216 unique genotypes (22 Kermode) Kermode-containing populations (yellow): perhaps 10% less genetic variation, but other island populations show 10% less variation too Genetic divergence (below diagonal), gene flow (above diagonal) Relationship of populations based upon pairwise genetic divergence (previous table); gene frequencies of white phase given in parenthesis (0.00) (0.08) (0.02) (0.05) (.013) (0.05) (0.56) (0.33) (0.04) (0.21) (0.00) (0.10) Kermode populations are not closely related to each other, some suggestion of complex interrelations E-Pr E-Pr H (Hawkesbury) W-H P (Pooly Is), R (Roderick Is) T (Terrace/Nass)