* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Introduction to Genetics
Skewed X-inactivation wikipedia , lookup
X-inactivation wikipedia , lookup
Transgenerational epigenetic inheritance wikipedia , lookup
Koinophilia wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Gene expression programming wikipedia , lookup
Genetic testing wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genetic engineering wikipedia , lookup
Genomic imprinting wikipedia , lookup
Medical genetics wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Public health genomics wikipedia , lookup
Human leukocyte antigen wikipedia , lookup
Designer baby wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Human genetic variation wikipedia , lookup
Genome (book) wikipedia , lookup
Behavioural genetics wikipedia , lookup
Heritability of IQ wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Population genetics wikipedia , lookup
Genetic drift wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Introduction to Basic and Quantitative Genetics Darwin & Mendel • Darwin (1859) Origin of Species – Instant Classic, major immediate impact – Problem: Model of Inheritance • • • • Darwin assumed Blending inheritance Offspring = average of both parents zo = (zm + zf)/2 Fleming Jenkin (1867) pointed out problem – Var(zo) = Var[(zm + zf)/2] = (1/2) Var(parents) – Hence, under blending inheritance, half the variation is removed each generation and this must somehow be replenished by mutation. Mendel • Mendel (1865), Experiments in Plant Hybridization • No impact, paper essentially ignored – Ironically, Darwin had an apparently unread copy in his library – Why ignored? Perhaps too mathematical for 19th century biologists • The rediscovery in 1900 (by three independent groups) • Mendel’s key idea: Genes are discrete particles passed on intact from parent to offspring Mendel’s experiments with the Garden Pea 7 traits examined Mendel crossed a pure-breeding yellow pea line with a pure-breeding green line. Let P1 denote the pure-breeding yellow (parental line 1) P2 the pure-breed green (parental line 2) The F1, or first filial, generation is the cross of P1 x P2 (yellow x green). All resulting F1 were yellow The F2, or second filial, generation is a cross of two F1’s In F2, 1/4 are green, 3/4 are yellow This outbreak of variation blows the theory of blending inheritance right out of the water. Mendel also observed that the P1, F1 and F2 Yellow lines behaved differently when crossed to pure green P1 yellow x P2 (pure green) --> all yellow F1 yellow x P2 (pure green) --> 1/2 yellow, 1/2 green F2 yellow x P2 (pure green) --> 2/3 yellow, 1/3 green Mendel’s explanation Genes are discrete particles, with each parent passing one copy to its offspring. Let an allele be a particular copy of a gene. In Diploids, each parent carries two alleles for every gene Pure Yellow parents have two Y (or yellow) alleles We can thus write their genotype as YY Likewise, pure green parents have two g (or green) alleles Their genotype is thus gg Since there are lots of genes, we refer to a particular gene by given names, say the pea-color gene (or locus) Each parent contributes one of its two alleles (at random) to its offspring Hence, a YY parent always contributes a Y, while a gg parent always contributes a g An individual carrying only one type of an allele (e.g. yy or gg) is said to be a homozygote In the F1, YY x gg --> all individuals are Yg An individual carrying two types of alleles is said to be a heterozygote. The phenotype of an individual is the trait value we observe For this particular gene, the map from genotype to phenotype is as follows: YY --> yellow Yg --> yellow gg --> green Since the Yg heterozygote has the same phenotypic value as the YY homozygote, we say (equivalently) Y is dominant to g, or g is recessive to Y Explaining the crosses F1 x F1 -> Yg x Yg Prob(YY) = yellow(dad)*yellow(mom) = (1/2)*(1/2) Prob(gg) = green(dad)*green(mom) = (1/2)*(1/2) Prob(Yg) = 1-Pr(YY) - Pr(gg) = 1/2 Prob(Yg) = yellow(dad)*green(mom) + green(dad)*yellow(mom) Hence, Prob(Yellow phenotype) = Pr(YY) + Pr(Yg) = 3/4 Prob(green phenotype) = Pr(gg) = 1/4 Dealing with two (or more) genes For his 7 traits, Mendel observed Independent Assortment The genotype at one locus is independent of the second RR, Rr - round seeds, rr - wrinkled seeds Pure round, green (RRgg) x pure wrinkled yellow (rrYY) F1 --> RrYg = round, yellow What about the F2? Let R- denote RR and Rr. R- are round. Note in F2, Pr(R-) = 1/2 + 1/4 = 3/4 Likewise, Y- are YY or Yg, and are yellow Phenotype Genotype Frequency Yellow, round Y-R- (3/4)*(3/4) = 9/16 Yellow, wrinkled Y-rr (3/4)*(1/4) = 3/16 Green, round ggR- (1/4)*(3/4) = 3/16 Green, wrinkled ggrr (1/4)*(1/4) = 1/16 Or a 9:3:3:1 ratio Probabilities for more complex genotypes Cross AaBBCcDD X aaBbCcDd What is Pr(aaBBCCDD)? Under independent assortment, = Pr(aa)*Pr(BB)*Pr(CC)*Pr(DD) = (1/2*1)*(1*1/2)*(1/2*1/2)*(1*1/2) = 1/25 What is Pr(AaBbCc)? = Pr(Aa)*Pr(Bb)*Pr(Cc) = (1/2)*(1/2)*(1/2) = 1/8 Mendel was wrong: Linkage Bateson and Punnet looked at flower color: P (purple) dominant over p (red ) pollen shape: L (long) dominant over l (round) Phenotype Genotype Observed Expected Purple long 284 215 Purple round P-ll 21 71 Red long ppL- 21 71 Red round ppll 55 24 P-L- Excess of PL, pl gametes over Pl, pL Departure from independent assortment Linkage If genes are located on different chromosomes they (with very few exceptions) show independent assortment. Indeed, peas have only 7 chromosomes, so was Mendel lucky in choosing seven traits at random that happen to all be on different chromosomes? Problem: compute this probability. However, genes on the same chromosome, especially if they are close to each other, tend to be passed onto their offspring in the same configuation as on the parental chromosomes. Consider the Bateson-Punnet pea data Let PL / pl denote that in the parent, one chromosome carries the P and L alleles (at the flower color and pollen shape loci, respectively), while the other chromosome carries the p and l alleles. Unless there is a recombination event, one of the two parental chromosome types (PL or pl) are passed onto the offspring. These are called the parental gametes. However, if a recombination event occurs, a PL/pl parent can generate Pl and pL recombinant chromosomes to pass onto its offspring. Let c denote the recombination frequency --- the probability that a randomly-chosen gamete from the parent is of the recombinant type (i.e., it is not a parental gamete). For a PL/pl parent, the gamete frequencies are Gamete type Frequency Expectation under independent assortment PL (1-c)/2 1/4 pl (1-c)/2 1/4 pL c/2 1/4 Pl c/2 1/4 Recombinant Parental gametes gametesininexcess, deficiency, as (1-c)/2 as c/2> <1/4 1/4for forc c< <1/2 1/2 Expected genotype frequencies under linkage Suppose we cross PL/pl X PL/pl parents What are the expected frequencies in their offspring? Pr(PPLL) = Pr(PL|father)*Pr(PL|mother) = [(1-c)/2]*[(1-c)/2] = (1-c)2/4 Likewise, Pr(ppll) = (1-c)2/4 Recall from previous data that freq(ppll) = 55/381 =0.144 Hence, (1-c)2/4 = 0.144, or c = 0.24 A (slightly) more complicated case Again, assume the parents are both PL/pl. Compute Pr(PpLl) Two situations, as PpLl could be PL/pl or Pl/pL Pr(PL/pl) = Pr(PL|dad)*Pr(pl|mom) + Pr(PL|mom)*Pr(pl|dad) = [(1-c)/2]*[(1-c)/2] + [(1-c)/2]*[(1-c)/2] Pr(Pl/pL) = Pr(Pl|dad)*Pr(pL|mom) + Pr(Pl|mom)*Pr(pl|dad) = (c/2)*(c/2) + (c/2)*(c/2) Thus, Pr(PpLl) = (1-c)2/2 + c2 /2 Generally, to compute the expected genotype probabilities, need to consider the frequencies of gametes produced by both parents. Suppose dad = Pl/pL, mom = PL/pl Pr(PPLL) = Pr(PL|dad)*Pr(PL|mom) = [c/2]*[(1-c)/2] Notation: when PL/pl, we say that alleles P and L are in coupling When parent is Pl/pL, we say that P and L are in repulsion Molecular Markers You and your neighbor differ at roughly 22,000,000 nucleotides (base pairs) out of the roughly 3 billion bp that comprises the human genome Hence, LOTS of molecular variation to exploit SNP -- single nucleotide polymorphism. A particular position on the DNA (say base 123,321 on chromosome 1) that has two different nucleotides (say G or A) segregating STR -- simple tandem arrays. An STR locus consists of a number of short repeats, with alleles defined by the number of repeats. For example, you might have 6 and 4 copies of the repeat on your two chromosome 7s SNPs SNPs vs STRs Cons: Less polymorphic (at most 2 alleles) Pros: Low mutation rates, alleles very stable Excellent for looking at historical long-term associations (association mapping) STRs Cons: High mutation rate Pros: Very highly polymorphic Excellent for linkage studies within an extended Pedigree (QTL mapping in families or pedigrees) Quantitative Genetics The analysis of traits whose variation is determined by both a number of genes and environmental factors Phenotype is highly uninformative as to underlying genotype Complex (or Quantitative) trait • No (apparent) simple Mendelian basis for variation in the trait • May be a single gene strongly influenced by environmental factors • May be the result of a number of genes of equal (or differing) effect • Most likely, a combination of both multiple genes and environmental factors • Example: Blood pressure, cholesterol levels – Known genetic and environmental risk factors • Molecular traits can also be quantitative traits – mRNA level on a microarray analysis – Protein spot volume on a 2-D gel Consider Phenotypic a specific locus influencing trait distribution of athe trait For this locus, mean phenotype = 0.15, while overall mean phenotype = 0 Basic model of Quantitative Genetics Basic model: P = G + E Genotypic value Environmental value Phenotypic value -- we will occasionally also use z for this value G = average phenotypic value for that genotype if we are able to replicate it over the universe of environmental values, G = E[P] G x E interaction --- G values are different across environments. Basic model now becomes P = G + E + GE Contribution of a locus to a trait Q1Q1 Q2Q1 Q2Q2 C C C -a C + a(1+k) C+a+d C+d C + 2a C + 2a C+a d measures dominance, with dG(Q =+ 0 if) the heterozygote d = ak =G(Q ) - [G(Q Q ) G(Q Q ) ]/2 2a 1Q =2 G(Q Q ) 2 2 Q 1 1 2 2 1 1 is exactly intermediate to the two homozygotes k = d/a is a scaled measure of the dominance Example: Apolipoprotein E & Alzheimer’s Genotype ee Average age of onset 68.4 Ee EE 75.5 84.3 2a = G(EE) - G(ee) = 84.3 - 68.4 --> a = 7.95 ak =d = G(Ee) - [ G(EE)+G(ee)]/2 = -0.85 k = d/a = 0.10 Only small amount of dominance Example: Booroola (B) gene Genotype Average Litter size bb Bb BB 1.48 2.17 2.66 2a = G(BB) - G(bb) = 2.66 -1.46 --> a = 0.59 ak =d = G(Bb) - [ G(BB)+G(bb)]/2 = 0.10 k = d/a = 0.17 Fisher’s (1918) Decomposition of G One of Fisher’s key insights was that the genotypic value consists of a fraction that can be passed from parent to offspring and a fraction that cannot. Consider the genotypic value Gij resulting from an Gi j = πG + Æi + Æj + ±i j AiAj individual Xdifference (for genotype Dominance deviations --the Mean value, with Average Since parents contribution passpredicted along toG genotypic for their allele i π =single Galleles ¢freq(Q i j value iQ j ) The genotypic value from the to individual Aioffspring, Aj) between the genotypic value predicted from the the allelic effects isathus i (the average effect of allele i) b iactual two single alleles the genotypic value, G Æj represent theseand contributions j = π G + Æi + bi j = ±i j Gi j ° G Fisher’s decomposition is a Regression Gi j = πG + Æi + Æj + ±i j Predicted valueResidual A notational change clearly shows this is a error regression, Gi j = πG + 2Æ1 + (Æ2 ° Æ1)N + ±i j IndependentIntercept (predictor) variable Nslope =Regression # of Q2 alleles residual 8Regression > < 2Æ1 2Æ1 + (Æ2 ° Æ1)N = Æ1 + Æ1 > : 2Æ 1 forN = 0; e.g, Q1Q1 forN = 1; e.g, Q1Q2 forN = 2; e.g, Q2Q2 Allele Q112 common, a common, a21 > a12 a1 = a2 = 0 Both Q and Q2 frequent, G21 Slope = a2 - a1 G22 G G11 0 1 N 2 Consider a diallelic locus, where p1 = freq(Q1) Genotype Q1Q1 Q2Q1 Q2Q2 Genotypic value 0 a(1+k) 2a Mean Allelic effects πG = 2p2 a(1 + p1 k) Æ2 = p1 a [ 1 + k ( p1 ° p2 ) ] Æ1 = ° p2a [ 1 + k ( p1 ° p2 ) ] Dominance deviations ±i j = Gi j ° πG ° Æi ° Æj Average effects and Additive Genetic Values The a values are the average effects of an allele A key concept is the Additive Genetic Value (A) of an individual X ≥ n (k) Æi i+ ¥ Æj (k ) Æk AA(G=i j ) = Æ + k= 1 Why all the fuss over A? Suppose father has A = 10 and mother has A = -2 for (say) blood pressure KEY: parentsblood only pass single to their offspring. Expected pressure inalleles their offspring is (10-2)/2 Hence, theyabove only pass the Amean. part of their genotypic = 4 units the along population Offspring A= Value G Average of parental A’s Genetic Variances Gi j = πg + (Æi + Æj ) + ±i j 2n 2 æ2 (G) = æ2 (πg X +n (Æi + Æ (Æ + Æ ) + æ (± jk ) + ±i jk) = æ i j ij ) X k ( ) ( ) 2 2 2 ( ) æ (G) = æ (Æi + Æj As) +Cov(a,d) æ=(±0i j ) k= 1 2 æG k= 1 = 2 æA + 2 æD Dominance Genetic Variance Additive Genetic Variance (or simplyVariance) dominance variance) (or simply Additive Key concepts (so far) • ai = average effect of allele i – Property of a single allele in a particular population (depends on genetic background) • A = Additive Genetic Value (A) – A = sum (over all loci) of average effects – Fraction of G that parents pass along to their offspring – Property of an Individual in a particular population • Var(A) = additive genetic variance – Variance in additive genetic values – Property of a population • Can estimate A or Var(A) without knowing any of the underlying genetical detail (forthcoming) æ2A = 2E [Æ2 ] = 2 Xm Æ2i pi i= 1 One locus, 2 alleles: Q1Q1 Q1Q2 Q2Q2 Since E[a] = 0, 2] Var(a)0= E[(aa(1+k) -ma)2] = E[a2a æA2 = 2p1 p2 a2 [ 1+ k ( p1 ° p2 ) ]2 When dominance present, Dominance effects asymmetric function of allele m m additive variance X X 2 2 2 æD = 2E [± ] = ±i j pi pj frequencies i=1 j=1 One locus, 2 alleles: æD2 = (2p1 p2 ak)2 Equals zero if k = of 0 This is a symmetric function allele frequencies Additive variance, VA, with no dominance (k = 0) VA Allele frequency, p Complete dominance (k = 1) VA VD Allele frequency, p Epistasis Gi j kl = πG + (Æi + Æj + Æk + Æl ) + (±i j + ±k j ) + (ÆÆi k + ÆÆi l + ÆÆj k + ÆÆj l ) + (Ʊi k l + Ʊj k l + Ʊki j + Ʊl i j ) + (±±i j k l ) = πG + A + D + AA + AD + DD Additive Additive Dominance xx Additive Dominant -interactions interactions interaction ---- --Dominance x value dominance Additive Genetic value These components are defined to be interaction uncorrelated, interactions interactions between between between two alleles aansingle allele at dominance aallele at locus one the interaction between the (or orthogonal), so that the at locus onewith locus the with genotype a single at allele another, another e.g. deviation at one locus with theat dominance B2 deviation at genotype another. kj 2 2 allele 2Ai and 2 2 æG = æA + æD + æA A + æA D + æD D Resemblance Between Relatives Heritability • Central concept in quantitative genetics • Proportion of variation due to additive genetic values (Breeding values) – h2 = VA/VP – Phenotypes (and hence VP) can be directly measured – Breeding values (and hence VA ) must be estimated • Estimates of VA require known collections of relatives AncestralCollateral relatives relatives, e.g., parent offspring e.g.and sibs 1 X o o o. 1 1 2 .. o 3 k 2 X o o o. 2 1 2 .. o 3 3 X o o o. 1 2 .. k 3 o 3 k Half-sibs Full-sibs 1 1 n 2 o* o* o* o* o. * o. * o* o* o. * o* o* o* .. 1 1 2 2 3 3 k .. k n 1 2 .. 3 k 1 ... n 2 o* o* o* o* o. * o. * o* o* o. * o* o* o* .. 1 1 2 2 3 3 k .. k 1 2 .. 3 k Key observations • The amount of phenotypic resemblance among relatives for the trait provides an indication of the amount of genetic variation for the trait. • If trait variation has a significant genetic basis, the closer the relatives, the more similar their appearance Genetic Covariance between relatives Sharing meansarise having allelestwo thatrelated are Genetic alleles covariances because Father Mother identical by are descent both copies of than individuals more(IBD): likely to share alleles can two be traced backindividuals. to a single copy in a are unrelated recent common ancestor. One allele IBD IBD No alleles Both IBD alleles Parent-offspring genetic covariance Cov(Gp, Go) --- Parents and offspring share EXACTLY one allele IBD Denote this common allele by A1 Gp = A p + D p = Æ1 + Æx + D 1x Go = A o + D o = Æ1 + Æy + D 1y IBD allele Non-IBD alleles C ov(G o; G p ) = Cov(Æ1 + Æx + D 1x ; Æ1 + Æy + D 1y = Cov(Æ1; Æ1) + Cov(Æ1 ; Æy ) + Cov(Æ1 ; D 1y ) + Cov(Æx ; Æ1 ) + Cov(Æx ; Æy ) + Cov(Æx ; D 1y ) + Cov(D 1x ; Æ1) + Cov(D 1x ; Æy ) + Cov(D 1x ; D 1y ) All white covariance terms are zero. • By construction, a and D are uncorrelated • By construction, a from non-IBD alleles are uncorrelated • By construction, D values are uncorrelated unless both alleles are IBD Ω Cov(Æx ; Æy ) = 0 V ar (A)=2 if x 6 = y; i.e., not IBD if x = y; i.e., IBD ar (A) =one V ar (Æ1 IBD + Æ2have ) = 2V Hence, relativesVsharing allele a ar (Æ1 ) genetic covariance of Var(A)/2 so t hat V ar (Æ1 ) = Cov(Æ1 ; Æ1 ) = Var (A )=2 The resulting parent-offspring genetic covariance becomes Cov(Gp,Go) = Var(A)/2 Half-sibs Each sib gets exactly one allele from common father, different alleles from the different mothers 2 1 o 1 o 2 Hence, the genetic The half-sibs covariance share of half-sibs no onealleles alleleisIBD just (1/2)Var(A)/2 •= Var(A)/4 occurs with probability 1/2 Full-sibs Father Mother Each sib gets exact one allele from each parent Full Sibs not IBD [ Prob = 1/2 ] Paternal allele [ Prob Prob(exactly oneIBD allele IBD)==1/2 1/2] not IBD [ Prob = 1/2 [ Prob = 1/2 ] ] = Maternal 1- Prob(0 allele IBD) -IBD Prob(2 IBD) Prob(zero alleles IBD) = 1/2*1/2 = 1/4 -> Prob(both Resulting Genetic Covariance between full-sibs IBD alleles IBD alleles 0 1 2 Probability Probability Contribution Contribution 1/4 0 1 1/21/2 Var(A)/2 Var(A)/2 2 1/4 0 1/4 1/4 0 Var(A) + Var(D) Var(A) + Var(D) Cov(Full-sibs) = Var(A)/2 + Var(D)/4 Genetic Covariances for General Relatives Let r = (1/2)Prob(1 allele IBD) + Prob(2 alleles IBD) Let u = Prob(both alleles IBD) General genetic covariance between relatives Cov(G) = rVar(A) + uVar(D) When epistasis is present, additional terms appear r2Var(AA) + ruVar(AD) + u2Var(DD) + r3Var(AAA) + Components of the Environmental Variance E = Ec + Es The Environmental variance can thus be written in terms of variance components as Total environmental value value experienced Common environmental Specific environmental value, by all any members of a family, e.g.,effects shared unique environmental E Ec Es maternal effects by the individual experienced One can decompose the environmental further, if desired. For example, plant breeders have terms for the location variance, the year variance, and the location x year variance. V =V +V Shared Environmental Effects contribute to the phenotypic covariances of relatives Cov(P1,P2) = Cov(G1+E1,G2+E2) = Cov(G1,G2) + Cov(E1,E2) Shared environmental values are expected when sibs share the same mom, so that Cov(Full sibs) and Cov(Maternal half-sibs) not only contain a genetic covariance, but an environmental covariance as well, VEc Cov(Full-sibs) = Var(A)/2 + Var(D)/4 + VEc