* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Introduction to Genetics - Bruce Walsh's Home Page
Transgenerational epigenetic inheritance wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Human genetic variation wikipedia , lookup
Genomic imprinting wikipedia , lookup
Behavioural genetics wikipedia , lookup
Genome-wide association study wikipedia , lookup
Heritability of IQ wikipedia , lookup
Medical genetics wikipedia , lookup
Genome (book) wikipedia , lookup
Designer baby wikipedia , lookup
Public health genomics wikipedia , lookup
Genetic drift wikipedia , lookup
Population genetics wikipedia , lookup
Microevolution wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Introduction to Genetics Topics • Darwin and Mendel • Probability • Mendelian genetics – Mendel's experiments – Mendel's laws • Introduction to Population Genetics • Introduction to Quantitative Genetics Darwin & Mendel • Darwin (1859) Origin of Species – Instant Classic, major immediate impact – Problem: Model of Inheritance • • • • Darwin assumed Blending inheritance Offspring = average of both parents zo = (zm + zf)/2 Fleming Jenkin (1867) pointed out problem – Var(zo) = Var[(zm + zf)/2] = (1/2) Var(parents) – Hence, under blending inheritance, half the variation is removed each generation and this must somehow be replenished by mutation. Mendel • Mendel (1865), Experiments in Plant Hybridization • No impact, paper essentially ignored – Ironically, Darwin had an apparently unread copy in his library – Why ignored? Perhaps too mathematical for 19th century biologists • The rediscovery in 1900 (by three independent groups) • Mendel’s key idea: Genes are discrete particles passed on intact from parent to offspring Probability & Genetics Since genes are passed on at random, an understanding of probability is critical to understanding genetics Let A denote an event of interest (getting a head on the flip of a coil, rolling a 5 on a dice, getting a QQ genotype) Let Pr(A) denote the probability that event A occurs • Pr(A) falls between 0 and 1 • The sum of the probabilities for all (non-overlapping) events is one --- Probabilities sum to one • Pr(not A) = 1-Pr(A) Example Consider the offspring in a cross of two Qq parents What is the probability that an offspring is Anything EXCEPT qq? Pr(not qq) = 1- Pr(qq) = 1-1/4 = 3/4 The AND Rule Suppose the events A and B are independent --knowing that A has occurred does not change the probability that B occurs. The Pr(A AND B) = Pr(A)*Pr(B) “AND Rule” -- if see “AND”, multiply probabilities Pr(A AND B AND C) = Pr(A)*Pr(B)*Pr© The OR Rule Suppose the events A and B are mutually exclusive --Non-overlapping For example, A = roll even number of dice, B = roll A six are NOT mutually exclusive, but if B = roll 5 they are Pr(A OR B) = Pr(A) + Pr(B) “OR Rule” --- see OR = add probabilities Genetics examples Again consider offspring from Qq x Qq cross Prob(Not qq) = Pr(QQ or Qq) = Pr(QQ) + Pr(Qq) = 3/4 Pr(QQ) = Pr(Q from father AND Q from mother) = Pr(Q from father)*Pr(Q from mother) = (1/2)*(1/2) = 1/4 Pr(Qq) = Pr([f = Q AND m = q] OR [f = q AND m = q]) = Pr(f = Q AND m = q) + Pr(f = q AND m = q) = Pr(f = Q)*Pr(m = q) + Pr(f = q )*Pr( m = q) = (1/2)*(1/2) + (1/2)*(1/2) = 1/2 Conditional probability Let Pr(A | B) = Pr(A) given that we observe event B Pr(A | B) = Pr(A and B) / Pr(B) = Pr(A,B)/Pr(B) Pr(A,B) is called the joint probability of A & B Example: Suppose QQ and Qq give purple offspring, While qq = green offspring. What is the probability At a purple offspring from a Qq x Qq cross is QQ? Pr(QQ | F1 Purple) = Pr(QQ and Purple)/Pr(Purple) = (1/4)/(3/4) = 1/3 Mendel’s experiments with the Garden Pea 7 traits examined Mendel crossed a pure-breeding yellow pea line with a pure-breeding green line. Let P1 denote the pure-breeding yellow (parental line 1) P2 the pure-breed green (parental line 2) The F1, or first filial, generation is the cross of P1 x P2 (yellow x green). All resulting F1 were yellow The F2, or second filial, generation is a cross of two F1’s In F2, 1/4 are green, 3/4 are yellow This outbreak of variation blows the theory of blending inheritance right out of the water. Mendel also observed that the P1, F1 and F2 Yellow lines behaved differently when crossed to pure green P1 yellow x P2 (pure green) --> all yellow F1 yellow x P2 (pure green) --> 1/2 yellow, 1/2 green F2 yellow x P2 (pure green) --> 2/3 yellow, 1/3 green Mendel’s explanation Genes are discrete particles, with each parent passing one copy to its offspring. Let an allele be a particular copy of a gene. In Diploids, each parent carries two alleles for every gene Pure Yellow parents have two Y (or yellow) alleles We can thus write their genotype as YY Likewise, pure green parents have two g (or green) alleles Their genotype is thus gg Since there are lots of genes, we refer to a particular gene by given names, say the pea-color gene (or locus) Each parent contributes one of its two alleles (at random) to its offspring Hence, a YY parent always contributes a Y, while a gg parent always contributes a g An individual carrying only one type of an allele (e.g. yy or gg) is said to be a homozygote In the F1, YY x gg --> all individuals are Yg An individual carrying two types of alleles is said to be a heterozygote. The phenotype of an individual is the trait value we observe For this particular gene, the map from genotype to phenotype is as follows: YY --> yellow Yg --> yellow gg --> green Since the Yg heterozygote has the same phenotypic value as the YY homozygote, we say (equivalently) Y is dominant to g, or g is recessive to Y Explaining the crosses F1 x F1 -> Yg x Yg Prob(YY) = yellow(dad)*yellow(mom) = (1/2)*(1/2) Prob(gg) = green(dad)*green(mom) = (1/2)*(1/2) Prob(Yg) = 1-Pr(YY) - Pr(gg) = 1/2 Prob(Yg) = yellow(dad)*green(mom) + green(dad)*yellow(mom) Hence, Prob(Yellow phenotype) = Pr(YY) + Pr(Yg) = 3/4 Prob(green phenotype) = Pr(gg) = 1/4 Dealing with two (or more) genes For his 7 traits, Mendel observed Independent Assortment The genotype at one locus is independent of the second RR, Rr - round seeds, rr - wrinkled seeds Pure round, green (RRgg) x pure wrinkled yellow (rrYY) F1 --> RrYg = round, yellow What about the F2? Let R- denote RR and Rr. R- are round. Note in F2, Pr(R-) = 1/2 + 1/4 = 3/4 Likewise, Y- are YY or Yg, and are yellow Phenotype Genotype Frequency Yellow, round Y-R- (3/4)*(3/4) = 9/16 Yellow, wrinkled Y-rr (3/4)*(1/4) = 3/16 Green, round ggR- (1/4)*(3/4) = 3/16 Green, wrinkled ggrr (1/4)*(1/4) = 1/16 Or a 9:3:3:1 ratio Probabilities for more complex genotypes Cross AaBBCcDD X aaBbCcDd What is Pr(aaBBCCDD)? Under independent assortment, = Pr(aa)*Pr(BB)*Pr(CC)*Pr(DD) = (1/2*1)*(1*1/2)*(1/2*1/2)*(1*1/2) = 1/25 What is Pr(AaBbCc)? = Pr(Aa)*Pr(Bb)*Pr(Cc) = (1/2)*(1/2)*(1/2) = 1/8 Mendel was wrong: Linkage Bateson and Punnet looked at flower color: P (purple) dominant over p (red ) pollen shape: L (long) dominant over l (round) Phenotype Genotype Observed Expected Purple long 284 215 Purple round P-ll 21 71 Red long ppL- 21 71 Red round ppll 55 24 P-L- Excess of PL, pl gametes over Pl, pL Departure from independent assortment Linkage If genes are located on different chromosomes they (with very few exceptions) show independent assortment. Indeed, peas have only 7 chromosomes, so was Mendel lucky in choosing seven traits at random that happen to all be on different chromosomes? Problem: compute this probability. However, genes on the same chromosome, especially if they are close to each other, tend to be passed onto their offspring in the same configuation as on the parental chromosomes. Consider the Bateson-Punnet pea data Let PL / pl denote that in the parent, one chromosome carries the P and L alleles (at the flower color and pollen shape loci, respectively), while the other chromosome carries the p and l alleles. Unless there is a recombination event, one of the two parental chromosome types (PL or pl) are passed onto the offspring. These are called the parental gametes. However, if a recombination event occurs, a PL/pl parent can generate Pl and pL recombinant chromosomes to pass onto its offspring. Let c denote the recombination frequency --- the probability that a randomly-chosen gamete from the parent is of the recombinant type (i.e., it is not a parental gamete). For a PL/pl parent, the gamete frequencies are Gamete type Frequency Expectation under independent assortment PL (1-c)/2 1/4 pl (1-c)/2 1/4 pL c/2 1/4 Pl c/2 1/4 Recombinant Parental gametes gametesininexcess, deficiency, as (1-c)/2 as c/2> <1/4 1/4for forc c< <1/2 1/2 Expected genotype frequencies under linkage Suppose we cross PL/pl X PL/pl parents What are the expected frequencies in their offspring? Pr(PPLL) = Pr(PL|father)*Pr(PL|mother) = [(1-c)/2]*[(1-c)/2] = (1-c)2/4 Likewise, Pr(ppll) = (1-c)2/4 Recall from previous data that freq(ppll) = 55/381 =0.144 Hence, (1-c)2/4 = 0.144, or c = 0.24 A (slightly) more complicated case Again, assume the parents are both PL/pl. Compute Pr(PpLl) Two situations, as PpLl could be PL/pl or Pl/pL Pr(PL/pl) = Pr(PL|dad)*Pr(pl|mom) + Pr(PL|mom)*Pr(pl|dad) = [(1-c)/2]*[(1-c)/2] + [(1-c)/2]*[(1-c)/2] Pr(Pl/pL) = Pr(Pl|dad)*Pr(pL|mom) + Pr(Pl|mom)*Pr(pl|dad) = (c/2)*(c/2) + (c/2)*(c/2) Thus, Pr(PpLl) = (1-c)2/2 + c2 /2 Generally, to compute the expected genotype probabilities, need to consider the frequencies of gametes produced by both parents. Suppose dad = Pl/pL, mom = PL/pl Pr(PPLL) = Pr(PL|dad)*Pr(PL|mom) = [c/2]*[(1-c)/2] Notation: when PL/pl, we say that alleles P and L are in coupling When parent is Pl/pL, we say that P and L are in repulsion Allele and Genotype Frequencies Given genotype frequencies, we can always compute allele frequencies, e.g., 1X pi = freq(A i ) = freq(A i A i ) + freq(A i A j ) 2 i 6= j The converse is not true: given allele frequencies we cannot uniquely determine the genotype frequencies For n alleles, there are n(n+1)/2 genotypes If we are willing to assume random mating, freq(A i A j ) = Ω p2 i 2pi pj for i = j for i 6 =j Hardy-Weinberg proportions Hardy-Weinberg • Prediction of genotype frequencies from allele freqs • Allele frequencies remain unchanged over generations, provided: • Infinite population size (no genetic drift) • No mutation • No selection • No migration • Under HW conditions, a single generation of random mating gives genotype frequencies in Hardy-Weinberg proportions, and they remain forever in these proportions Gametes and Gamete Frequencies When we consider two (or more) loci, we follow gametes Under random mating, gametes combine at random, e.g. freq(AAB B ) = freq(AB jfat her) freq(AB jmot her) freq(AaB B ) = freq(AB jfather) freq(aB jmother) + freq(aB jfather) freq(AB jmot her) Major complication: Even under HW conditions, gamete frequencies can change over time AB AB ab ab AB ab AB ab In the F1, 50% AB gametes 50 % ab gametes If A and B are unlinked, the F2 gamete frequencies are AB 25% ab 25% Ab 25% aB 25% Thus, even under HW conditions, gamete frequencies change Linkage disequilibrium Random mating and recombination eventually changes gamete frequencies so that they are in linkage equilibrium (LE). Once in LE, gamete frequencies do not change (unless acted on by other forces) At LE, alleles in gametes are independent of each other: When linkage disequilibrium (LD) present, alleles are no longer independent --- knowing that one allele is in the freq(AB ) = freq(A) freq(B ) freq(AB C) = freq(A) freq(B ) freq(C) gamete provides information on alleles at other loci freq(AB ) 6 = freq(A) freq(B ) The disequilibrium between alleles A and B is given by D A B = freq(AB ) ° freq(A) freq(B ) The Decay of Linkage Disequilibrium The frequency of the AB gamete is given by freq(AB ) = freq(A) freq(B ) + D A B Departure from If recombination frequency between theLE A and B loci LE value is c, the disequilibrium in generation t is D (t ) = D (0)(1 ° c) t Note that D(t) ->Initial zero, LD although value the approach can be slow when c is very small Quantitative Genetics The analysis of traits whose variation is determined by both a number of genes and environmental factors Phenotype is highly uninformative as to underlying genotype Complex (or Quantitative) trait • No (apparent) simple Mendelian basis for variation in the trait • May be a single gene strongly influenced by environmental factors • May be the result of a number of genes of equal (or differing) effect • Most likely, a combination of both multiple genes and environmental factors. • Example: Blood pressure, cholesterol levels – Known genetic and environmental risk factors Consider Phenotypic a specific locus influencing trait distribution of athe trait For this locus, mean phenotype = 0.15, while overall mean phenotype = 0 Goals of Quantitative Genetics • Partition total trait variation into genetic (nature) vs. environmental (nurture) components • Predict resemblance between relatives – If a sib has a disease/trait, what are your odds? • Find the underlying loci contributing to genetic variation – QTL -- quantitative trait loci • Deduce molecular basis for genetic trait variation • Prediction of selection response • Prediction of the effects of selfing & assortative mating Dichotomous (binary) traits Presence/absence traits (such as a disease) can (and usually do) have a complex genetic basis Consider a disease susceptibility (DS) locus underlying a disease, with alleles D and d, where allele D significantly increases your disease risk In particular, Pr(disease | DD) = 0.5, so that the Penetrance of genotype DD is 50% Suppose Pr(disease | Dd ) = 0.2, Pr(disease | dd) = 0.05 dd individuals can rarely display the disease, largely because of exposure to adverse environmental conditions dd individuals can give rise to phenocopies 5% of the time, showing the disease but not as a result of carrying the risk allele If freq(d) = 0.9, what is Prob (DD | show disease) ? freq(disease) = 0.12*0.5 + 2*0.1*0.9*0.2 + 0.92*0.05 = 0.0815 From Bayes’ theorem, Pr(DD | disease) = Pr(disease |DD)*Pr(DD)/Prob(disease) = 0.12*0.5 / 0.0815 = 0.06 (6 %) Pr(Dd | disease) = 0.442, Pr(dd | disease) = 0.497 Thus about 50% of the diseased individuals are phenocopies