* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Making sense of genetic variation!
Deoxyribozyme wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Copy-number variation wikipedia , lookup
Genetic code wikipedia , lookup
Genetic testing wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
DNA barcoding wikipedia , lookup
Behavioural genetics wikipedia , lookup
Metagenomics wikipedia , lookup
Genome evolution wikipedia , lookup
Medical genetics wikipedia , lookup
Frameshift mutation wikipedia , lookup
Public health genomics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Non-coding DNA wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Human genome wikipedia , lookup
Genetics and archaeogenetics of South Asia wikipedia , lookup
Designer baby wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Genetic engineering wikipedia , lookup
Genome (book) wikipedia , lookup
Koinophilia wikipedia , lookup
History of genetic engineering wikipedia , lookup
Helitron (biology) wikipedia , lookup
Genome editing wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genetic drift wikipedia , lookup
Microsatellite wikipedia , lookup
Point mutation wikipedia , lookup
Heritability of IQ wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Population genetics wikipedia , lookup
Native Americans Finns African Americans Making sense of genetic variation! •! Is there an association between DNA sequence variation and the disease phenotype? •! What do the sequences tell us about human history? •! How has natural selection shaped diversity DNA sequence diversity in a 9.7-kb region of the human lipoprotein lipase gene Nickerson, et al. 1998 Nature Genetics 19, 233 - 240 in the gene? Population and Quantitative Genetics Population genetics describes variation within and between species There are two major areas of interest: •!Describe degrees of genetic variation within and between individuals and/or population •!infer the evolutionary mechanisms responsible for the origins and maintenance of genetic variation Mutation is the source of variation that stochastic and deterministic factors can upon. The aims of population genetics! •! To understand the link between genetic variation and phenotypic variation! –! Is variation at this gene associated with disease susceptibility?! –! Which loci contribute the variation in hair colour?! •! To investigate the evolutionary history of a species! –! How long have these populations been separate?! –! Which genes have experienced recent adaptive evolution?! •! To learn about fundamental biological processes! –! How does the recombination rate vary along the genome?! –! What determines the mutation rate?! Genetic variation are of several types (1) visible, discrete variation Biston betularia melanic and non-melanic moths in Great Britain Fruit color variation between different cultivars (populations) of chile peppers Color variation between snail shells (Amphidromus floresianus) (2) quantitative variation variation is of degree rather than kind (3) chromosomal or cytogenetic variation Genetic variation in chromosomal structure or constitution between individuals Diversity of chromosomal structure •!Insertions •!Deletions •!Inversions •!Translocation Example: classic third chromosome inversion polymorphisms in Drosophila pseudoobscura Diversity in chromosome number •!Variation in entire chromosome sets (polyploidization) •!Variation in numbers of single chromosomes Fairly common in plant species Transposable elements -movement of genomic elements througout the genome - may or may not be site specific Tragopogon species in Washington, Oregon and Idaho. Three species (diploid) co-exist with tetraploid individuals that result from interspecific hybridization (4) protein variation Allozymes are protein products of allelic variants of genes Allozymes can sometimes be discriminated by electrophoresis (e.g. electromorphs) These electromorphs differ in charge (+ or -) and can be observed by starch gel electrophoreses - Isozymes are like allozymes but variants are from potentially more than one gene. Example: Variation in the GPI locus in snail species Other example: Drosophila melanogaster Adh Fast (F) and Slow (S) electromorphs Use of allozyme data in population genetics was revolutionary in two ways •!It showed populations contained a significant amount of genetic variation •!Techniques used to detect this diversity could be applied to virtually any organism Another method of detecting protein variation: Immunological variation (e.g., blood group detection, the ABO allele system) Codominant markers (5) DNA sequence variation All genetic variation stems from variation in DNA sequence Several techniques: •!Restriction fragment length polymorphisms (RFLP) Variation in DNA sequences due to mutations in restriction sites normal EcoRI site mutant EcoRI site GAATTC GAGTTC EcoRI site 300 bps 500 bps Individual A Individual B A B •!Microsatellite sequences Simple sequence repeats of di-, tri- or tetranucleotides Randomly scattered across throughout genome (ubiquitous) Very polymorphic because replication slippage results in high mutation rates For example, AGGTCGGT(CTG)nGGTATCGG n = 1 to >100 Microsatellite gel of willow population An example: structuring of human populations! •! Questions –! Is there significant natural structuring to genetic variation in humans? –! Does this structuring coincide with geographical boundaries? •! Data –! 377 autosomal microsatellite loci in 1056 individuals from 52 populations. Rosenberg et al (2002) •! Model –! K ‘Hidden’ populations in linkage and Hardy-Weinberg equilibrium •! Estimation –! Estimate population allele frequencies –! Most likely value of K –! Posterior probability for each individual Africa Europe Middle east Asia Oceania America Science 298: 2381 •! direct DNA sequencing The ultimate assay for genetic variation is direct sequence information of the DNA. The first systematic assay of variation using direct sequencing was of the Drosophila melanogaster Adh gene. single nucleotide polymorphisms or SNPs Levels of Genetic Variation: One can always view genetic variation at different levels, from the molecular to phenotypic variation. Adh gene DNA sequence variation (exon 4 A to T change) ! ADH protein sequence variation (threonine to lysine change in amino acids in the protein sequence) ! Adh allozyme variation in electrophoretic mobility (Fast [F] and Slow [S] polymorphism) MUTATIONS: The Ultimate Source of Genetic Variation •!genetic variation ultimately traces from mutations •!some mutations are large-scale •!others occur at the smaller, DNA scale. Three types of mutations at the DNA level: •!Insertions •!Deletions •!Substitutions! Insertions: AGGTCGT " AGGGTCGTATCGT large insertions (>100 bps) can be caused by mobile transposable element sequences these include SINES, LINES, Alu, transposons and retrotransposons. Deletions: AGGTCGTGCTCGT " AGGTCGT Caused largely by unequal crossover or excision of inserted transposable elements. Substitutions: •!transitions - nucleotide changes between similar nucleotide types purine to purine A # G pyrimidine to pyrimidine C # T •!transversions - nucleotide changes between different nucleotide types purine to pyrimidine A,G # C,T •! mutations according to functional impact coding region mutation synonymous or silent mutation $! Gly Gly . . . GGG . . . . . . GGC . . . nonsynonymous or replacement mutation $! Gly Ala . . . GGG . . . . . . GAG . . . QUANTIFYING THE VARIATION Variation at single loci If we want to have a sufficient description of the genetic constitution of a population at a single locus, we need to specify two things: •!specification of what genotypes are present •!specification of how many of each genotype there is in the population Example: Australian aborigine population using the MN blood group. This variation is protein variation of red blood cell antigens using immunological techniques. Here’s the data: Blood group Number of individuals Genotype frequency Notation MM MN NN Total 22 216 492 730 0.030 0.296 0.674 1.000 P H Q P+H+Q = 1 Genotype frequencies gives the description of a population at an individual instance of time (present time). •!NOT a description as a breeding group, since whole genotypes are not transmitted between generations. •!Alleles (also sometimes referred to incorrectly by molecular biologists as genes) are the ones transmitted across generations. •!We can describe the genetic constitution of a population by specifying the allele frequencies. Blood Group Number of individuals Number of alleles M N MM MN NN 22 216 492 44 216 0 0 216 984 Total 730 260 1200 Computing allele frequencies: •!frequency of M allele f(M) = p = 260/1460 = 0.178 •!frequency of N allele f(N) = q = 1200/1460 = 0.822 You can also compute the gene frequencies directly from the genotype frequencies: p = P + H/2 q = Q + H/2 Note that p + q = 1. Variation at many loci (multilocus measures) There are two ways one can readily quantify variation if you are looking at more than one locus. •!fraction of loci that are polymorphic P = proportion of polymorphic loci •!estimate the heterozygosity of the population averaged over all loci. •!H = mean heterozygosity Suppose we have data for 5 allozyme loci from Zea mays. Genotype Locus FF FS SS Adh 32 16 2 Mdh 1 9 40 total 50 50 f(F) f(FS) 0.80 0.32 0.11 0.18 Pgi-A Pgm 50 8 0 24 0 18 50 50 1.00* 0.00 0.40 0.48 Ldh 0 1 49 50 0.01* 0.02 The frequencies of FS heterozygotes are also known as the “observed heterozygosities”. •!Proportion of polymorphic loci: The loci that show variation are Adh, Mdh and Pgm. Both Pgi and Ldh are monomorphic (frequency of least common allele < 5%). P = 3/5 = 0.60 •!Mean heterozygosity Take the average of the frequencies of the heterozygous genotypes across all loci. H = [f(FS)Adh + f(FS)Mdh + f(FS)Pgi + f(FS)Pgm + f(FS)Ldh]/5 = 0.20 From Hartl and Clark (chapter 2) Measures of DNA polymorphisms •!Measures such as estimates of gene diversity (proportion of polymorphic loci, heterozygosity) are difficult to interpret from DNA sequence data. •!DNA-level variation can be quite extensive. •!some measures of variation that are particularly suited for DNA sequence data. •! number of alleles in a sample, na count the number of different alleles (“haplotypes”) in your sample. •! number of segregating sites in the sample segregating site is nucleotide site that is polymorphic in your sample. K (“big K”) is the total number of segregating sites. Note that K is dependent on the length of the sequence, L. The longer the sequence, the larger K can be. We can normalize K by dividing it by sequence length. S = K/L This normalized estimate of the number of segregating sites, S. Both na and S are used to develop more sophisticated measures of DNA variation used in molecular population genetics. The raw material of population genetics! •! Population genetics is the study of naturally occuring genetic variation! •! Genetic variation comes in all shapes and sizes! –! –! –! –! Re-sequencing v. SNPs v. microsatellites! Recombining v. partially linked v. unlinked markers! Single gene v. multiple loci v. whole genome! Single population v. multiple population v. multiple species! •! Which data type you collect (and analyse) depends on the questions you want to ask!