Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Linkage and Linkage Disequilibrium AB = 25% A B Ab = 25% aB = 25% a b ab = 25% A B a b AB = 50% Ab = 0% aB = 0% ab = 50% f(Ab) = f(A) x f(b) B b A a f(Ab) ≠ f(A) x f(b) B b A a A locus and B locus are in Linkage Disequilibrium D = f(AB) x f(ab) - f(Ab) x f(aB) Maximum with no recombination D = 0 with free recombination (linkage equilibrium) Linkage equilibrium: Are alleles at separate loci paired at random? D = x11 − p1q1 D = x22 − p2q2 A locus and B locus are in Linkage Disequilibrium D = f(AB) x f(ab) - f(Ab) x f(aB) Maximum with no recombination D -> 0 with free recombination (linkage equilibrium) When allele frequencies are intermediate: f(A) = f(a) = f(B) = f(b) = 0.5, and maximal LD occurs so that no recombinants are present: f(AB) = f(ab) = 0.5, so D = 0.5 x 0.5 – 0.0 x 0.0 = 0.25 When allele frequencies are skewed: f(A) = 0.9, f(a) = 0.1; f(B) = 0.9, f(b) = 0.1 and maximal LD occurs so that no recombinants are present, D is less than 0.25: f(AB) = 0.9, and f(ab) = 0.1, so D = 0.9 x 0.1 – 0.0 x 0.0 = 0.09 LD as a two-locus Hardy Weinberg problem Linkage disequilibrium (LD) decays with distance and time A B a b r= Rate of recombination AB = (1-r)/2 Ab = r/2 aB = r/2 ab = (1-r)/2 Empirical demonstration of the Decay of LD over time r D p1 p2 q1q2 Epistasis QTL for flower traits in Mimulus (monkey flowers) M. lewisii F2’s F1 M. cardinalis Different pollinators Genetic map of monkey flower http://www.genetics.org/cgi/content/full/159/4/1701/F1 Quantitative trait locus (QTL)mapping: Screen for marker-trait associations in F2s or RILs Parentals F1 M, Q M, Q M, Q M, Q M, Q F2 Inbreed to make Recombinant inbred lines (RILs) Scan genome for association Between molecular marker and phenotype Association between Molecular marker (M) and QTL(Q) Small m, q Large M, Q QTL here Marker here http://isotope.bti.cornell.edu/img/intro/qtl_fig_2.gif QTL Mapping: detecting an association between a genetic marker (M) and a gene affecting a quantitative trait (Q). QTL mapping works because there is linkage disequilibrium (LD) between the marker (M) and the QTL (Q): mm marker genotypes are correlated with small size MM marker genotypes are correlated with large size Most traits in organisms Show continuous variation How do we find the genes That affect these “quantitative” traits Scan the genome for Nucleotide sites that Co-vary with the phenotype Genome wide association studies: GWAS Mutation “causing” variation in height Tall Tall Tall Tall Short Short Short Short A A A A G G G G Adjacent SNPs are linked Distant sites show no genotype-phenotype association Problem: how do we find the causal SNPs? Needle in a haystack What is better: More recombination, more markers? Parentals F1 M, Q M, Q M, Q M, Q M, Q F2 Inbreed to make Recombinant inbred lines (RILs) Scan genome for association Between molecular marker and phenotype Association between Molecular marker (M) and QTL(Q) Small m, q Large M, Q How does DNA evolve? Human 1 Chimp 1 51 51 101 101 151 151 ATGCCCCAACTAAATACTACCGTATGGCCCACCATAATTACCCCCATACT ||||||||||||||||| ||||||| ||||||||||||||||||||||| atgccccaactaaataccgccgtatgacccaccataattacccccatact . . . . . CCTTACACTATTCCTCATCACCCAACTAAAAATATTAAACACAAACTACC ||| |||||||| ||| |||||||||||||||||||||| |||| |||| cctgacactatttctcgtcacccaactaaaaatattaaattcaaattacc . . . . . ACCTACCTCCCTCACCAAAGCCCATAAAAATAAAAAATTATAACAAACCC | ||||| ||||||||||| ||||||||||||||||| || || |||||| atctacccccctcaccaaaacccataaaaataaaaaactacaataaaccc . . . . . TGAGAACCAAAATGAACGAAAATCTGTTCGCTTCATTCATTGCCCCCACA ||||||||||||||||||||||||| |||||||||||| |||||||||| tgagaaccaaaatgaacgaaaatctattcgcttcattcgctgcccccaca 201 ATCC 204 |||| 201 atcc 204 50 50 100 100 150 150 200 200 Measuring DNA Evolution • Align sequences between species • Determine length of sequences, L • Count number of differences • Divergence = proportion of differences • D = p-distance = (number of differences) / (length of sequence) • Rate of divergence = (sequence divergence) / (age of common ancestor) = D / time • Rate of substitution = D / 2 x time time Example: 5 differences in 100 D = 0.05, t = 6 million years Divergence = 0.05/6x106 Divergence = 8.3 x 10-9 Jukes Cantor One parameter model = rate of substitution PA(t) = ¼ + ¾ e-4t = probability that A remains A at time t PNN = ¼ + ¾ e-8t = probability that two sequences have the same nucleotide at N D = proportion of different nucleotides = 1 - PNN Dhat = 3/4(1-e-8t) K = - ¾ ln (1-4/3p) where p = proportion of nucleotide differences (# diffs./total bp) Kimura two-parameter model = rate of transition substitution b b b b b = rate of transversion substitution PAA(t) = ¼ + ¾ e-4bt + ½ e-2(+b)t = probability that A remains A at time t K = ½ ln(1/[1- 2P-Q]) + ¼ ln(1/[1-2Q]) where P = proportion of transitional differences Q = proportion of transversional differences Comparison of models • • • • P-distance Jukes Cantor Kimura 2-parameter Tamura-Nei • Etc… Molecular clocks Approximately constant Divergence of proteins K = •f0 Rate of substitution = Mutation rate x proportion of neutral mutations “Saturation” due to multiple Hits in DNA evolution Anatomy of a phylogenetic tree Terminal (external) nodes Taxa = Taxon1 OTUs = Operational taxonomic units External branch Internal branch Taxon2 Taxon3 Taxon4 Taxon5 Taxon6 Polytomy Non-dichotomous splitting Internal nodes Root Relative rate test • KAC = KBC • KOC is shared • Tajima test • (m1-m2)2 / (m1+m2) • Chi square, df=1 Species O m1 m2 Species A Species B Species C DNA test of neutrality • • • Neutral prediction: amino acid (nonsynonymous) substitution rate (dN) should be lower than silent (synonymous) substitution rate (dS) True for most genes – • – • Follows from functional constraint argument Different for Major Histocompatibility Complec (MHC) loci – • Antigen binding sites: dN/dS > 1 “positive” selection Antigen recognition sequence shows dN > dS Rest of molecule shows dN > dS, as expected Amino acid mutations are favored in antigen recognition region Promotes diversity, better recognition of foreign peptides http://depts.washington.edu/rhwlab/dq/3structure.html Rest of molecule: dN/dS < 1 Negative (purifying) selection Maximum likelihood • Likelihood of observing the data set – Assuming a given tree – Assuming a given model of DNA evolution • L = P(data|tree) • Consider 4-taxon cases within a tree OTU1 HTU1 OTU1 – For each site, Identify nucleotides at each of the four taxa – Assume all 16 pairs of nucleotides at internal nodes – Likelihood of observed 4 terminal nucleotides = sum of 16 independent probabilities – Repeat likelihoods for each position in alignment – Likelihood of tree = product of individual likelihoods OTU1 HTU1 OTU1 • L = PLi for i = 1 to n positions in alignment (or sum of log likelihoods) • Calculate likelihood for other trees; choose tree with maximum likelihood