* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download EQUATIONS USED IN 40-300 POPULATION GENETICS
Hybrid (biology) wikipedia , lookup
Dual inheritance theory wikipedia , lookup
Adaptive evolution in the human genome wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genome evolution wikipedia , lookup
Genome (book) wikipedia , lookup
Gene expression programming wikipedia , lookup
Heritability of IQ wikipedia , lookup
Designer baby wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Group selection wikipedia , lookup
Human genetic variation wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Genetic drift wikipedia , lookup
Koinophilia wikipedia , lookup
1 LECTURE NOTES POPULATION GENETICS AND EVOLUTION Molecular Biology And Genetics 40-300 2 QUANTIFYING GENETIC VARIATION When we have allele frequency data for multiple loci we can quantify the variation as follows: Within populations, we can estimate average Heterozygosity: H = 1 - pi2 where pi = frequency of the ith allele proportion of polymorphic loci: of loci P = number of polymorphic loci/total number Between populations we can estimate Nei's genetic distance, D, as follows: The probability that a randomly chosen allele from EACH of TWO populations will be identical, relative to the probability that two randomly chosen alleles from the SAME population will be identical is the normalized identity, I: I = = (pxipxj) / [( pxi2) x ( pxj2)]1/2 JXY / [JXX x JYY]1/2 where: pxi = frequency of allele x in population i pxj = frequency of allele x in population j When surveying multiple loci, average across loci by using I = JXY / [JXX x JYY]1/2 Nei's genetic distance is then calculated as: D = -ln I When we have information about the sequence of the alleles, we can quantify the difference between the alleles, as well as differences in their frequencies. The average number of nucleotide substitutions between a pair of alleles X and Y, based on restriction site data is: dXY = -lnS j S = NXY (NX + NY)/2 where: NX = number of restriction sites in allele X NY = number of restriction sites in allele Y NXY = number of sites shared by X and Y j = number of nucleotides in a restriction site 3 The average number of nucleotide substitutions between a pair of alleles X and Y based on DNA sequence data is: XY = -3/4 ln (1 - 4/3 dXY) where: dXY = proportion of nucleotides that differ between alleles X and Y. The average number of substitutions per nucleotide site in population i is: vi = 2/[n(n-1)] nXnYXY where: n = number of alleles assayed nX = number of alleles of type X nY = number of alleles of type Y XY = sequence divergence between alleles X and Y The average number of substitutions per nucleotide site between populations i and j is: vij' = piXpjYXY where: piX = proportion of allele type X in population i pjY = proportion of allele type Y in population j XY = sequence divergence between alleles X and Y The diversity between populations, corrected for diversity within populations is: vij = vij' - (vi + vj) 2 THE HARDY WEINBERG THEOREM To calculate allele frequencies from raw data for one locus use: f(A) = f(AA) + 1/2 f(Aa) f(a) = f(aa) + 1/2 f(Aa) Under random mating, the GENOTYPE frequencies in the next generation will be: f(AA) = p2 f(Aa) = 2pq f(aa) = q2 Under inbreeding, the GENOTYPE frequencies in the next generation will be: f(AA) = p2 + pqF f(Aa) = 2pq(1-F) f(aa) = q2 + pqF 4 We can estimate F, the inbreeding coefficient as: F = 1 - [f(Aa)/2pq] To calculate GAMETE frequencies from raw data for two loci, with a recombination frequency, r, between the loci, we use: gAB = f(AABB) + 1/2f(AABb) + 1/2f(AaBB) + 1/2(AB/ab)(1-r) + 1/2(Ab/aB)r gAb = f(AAbb) + 1/2f(AABb) + 1/2f(Aabb) + 1/2(AB/ab)(r) + 1/2(Ab/aB)(1-r) gaB = f(aaBB) + 1/2f(AaBB) + 1/2f(aaBb) + 1/2(AB/ab)r + 1/2(Ab/aB)(1-r) gab = f(aabb) + 1/2f(Aabb) + 1/2f(aaBb) + 1/2(AB/ab)(1-r) + 1/2(Ab/aB)r Using this information we can calculate D, the coefficient of Linkage Disequilibrium D = (gAB x gab) - (gAb x gaB) We call AB and ab, COUPLING gametes We call Ab and aB, REPULSION gametes The frequency of the nine genotypes after one generation of random mating will be: AA Aa aa BB gAB2 2xgABxgaB gaB2 Bb 2xgABxgAb 2xgABxgab + 2xgAbxgaB 2xgaBxgab bb gAb2 2xgAbxgab gab2 To calculate the new gamete frequencies we use: gAB’ = (gAB2) + (0.5x2xgABxgAb) + (0.5x2xgABxgaB) + [(0.5x2xgABxgab)(1-r)] [(0.5x2xgAbxgaBxr) = (gAB2) + (gABxgAb) + (gABxgaB) + (gABxgab) - (rxgABxgab) + (rxgAbxgaB) = gAB (gAB + gAb + gaB + gab) - r[(gABxgab)- (gAbxgaB)] = gAB - rD Thus, the frequency of the four gamete types in the next generation is: gAB' = gAB - rD gAb' = gAb + rD + 5 gaB' = gaB + rD gab' = gab - rD Dmax = [f(A) x f(b)] or [f(a) x f(B)] whichever is smaller Dmin = [-f(A) x f(B)] or [-f(a) x f(b)] whichever is larger We can estimate the value of D after t generations as: Dt = (1-r)tDo 6 GENETIC DRIFT AND GENE FLOW Genetic drift causes gene frequencies to change by chance, due to sampling error between generations when populations are not infinite. Populations tend to become fixed for a single allele over time leading to a decrease in variation within populations. Different populations may randomly become fixed for different alleles leading to an increase in variation between populations. We can calculate the probability of getting i alleles of type A in a sample of N individuals when f(A) = p and f(a) = q as follows: Probability = (2N)! i! (2N-i)! piq2N-i where: ! is factorial: eg. 5! = 5 x 4 x 3 x 2 x 1 Effective population size Many factors can cause effective population (Ne) to differ from actual population size. When this occurs, populations can be much more suseptible to drift than we may think based on the actual size. 1. Unequal sex ratio. Ne = 4NmNf Nm + Nf 2. Fluctuating population size - Ne is the HARMONIC mean of population size over time 1 = Ne 1 Ni where: Ni = population size in generation i 7 Gene flow refers to the movement of genes between demes - it can happen at the level of individuals or at the level of gametes (eg - pollen). We can estimate the gene frequency within a deme as a result of gene flow from other demes. We must assume that the sample of genes entering the population has the same allele frequency as the average allele frequency for all of the demes. After one generation of gene flow pt = pt-1(1-m) + pm where: m = proportion of population that is migrants p = mean allele frequency across all demes (the source of migrants) From any starting point po, we can estimate the value of p within a deme after t generations of gene flow (m) as: pt = p + (po - p)(1-m)t Because gene flow spreads alleles among populations, it tends to increase variation within populations and decrease variation among populations. At EQUILIBRIUM, it can be shown that FST is a function of Ne and m as follows: FST^ = 1 (4Nem + 1) We can use OBSERVED values of FST to calculate the parameter Nem from the above equation. This estimate can be thought of as the combination of gene flow and drift that would result in the observed value of FST at equilibrium. When Nem = 1, subpopulations are exchanging one migrant per generation, on average. Values below 1 are considered to be an indication of restricted gene flow. Values above 1 indicate substantial gene flow. 8 FIXATION INDICES Species that are widely distributed are often subdivided into DEMES. Individuals may be more likely to mate within their deme rather than with individuals from other demes because of distance or physical barriers. Thus, populations will tend to drift apart. We can measure the divergence of populations as a result of the increase in homozygosity due to drift using the fixation index, F. It is similar to the inbreeding coefficient which measures the increase in homozygosity due to non-random mating. FIS = 1 - (HI/HS) homozygosity of individuals relative to the expectation for the subpopulation FST = 1 - (HS/HT) homozygosity of demes relative to the expectation of the total population FIT = 1 - (HI/HT) homozygosity of individuals relative to the total population The relationship among the 3 values is: (1 - FIT) = (1 - FIS) x (1 - FST) HI = [f(Aa)i] N where; N = number of populations f(Aa)i = observed number of heterozygotes in population i pi = frequency of allele A in population i HS = [2piqi] N HT = 2pq p = [pi] q = [qi] N N We can also estimate fixation indices from restriction site and sequence data using vi and vij. This value tells us what proportion of the total nucleotide diversity in a group of demes is due to differences among demes. vw = vi Unweighted mean of all vi values vb = vij Unweighted mean of all vij values FST = ___vb___ (vw + vb) 9 MUTATION When there are two alleles in a population and the mutation rate of A to a is µ and the mutation rate of a to A is v, we can calculate the frequency of the A allele at any time in the future (t generations) from some starting point, po, as: pt = [v / (µ + v)] + (po - [v / (µ + v)])(1 - µ - v)t The equilibrium frequency of the A allele will be: p^ = v / (v + µ) Infinite alleles model At the molecular level, it is not unreasonable to assume that each new mutation results in a new allele. Thus, at equilibrium between mutation and drift it can be shown that: F^ = 1 4Neµ + 1 where: Ne = effective population size µ = the mutation rate to new alleles F^ is a measure of homozygosity within a population as a result of the loss of alleles due to drift. Homozygosity can also be measured from allele frequency data as: F = pi2 where: pi = frequency of the ith allele We can define ne as the effective number of alleles, or the number of EQUALLY FREQUENT alleles it would take to provide a particular value of homozygosity. We can use the 2 equations above to estimate ne, or to estimate the parameter Neµ as follows: ne = 1/pi2 = 4Neµ + 1 10 NATURAL SELECTION One-locus Models Genotype Relative Fitness (Wi) AA WAA Contribution to the next generation f(AA)x(WAA) Aa WAa aa Waa f(Aa)x(WAa) f(aa)x(Waa) W = [f(AA) x WAA] + [f(Aa) x WAa] + [f(aa) x Waa] New f(A) = p' = f(AA)(WAA) + (1/2)f(Aa)(WAa) W New f(a) = q' = f(aa)(Waa) + (1/2)f(Aa)(WAa) W Fitness of each allele: WA = f(AA)(WAA) + (1/2)f(Aa)(WAa) p Wa = f(aa)(Waa) + (1/2)f(Aa)(WAa) q The CHANGE in allele frequency in the next generation can be calculated as: p = pq (WA - Wa) W Selection on alleles with varying effects on fitness. NOTE: Wi = 1 - s Genotype Wi AA 1 s = selection coefficient Aa aa 1-hs 1-s when h = 0, then A is dominant to a wrt fitness when 0 < h < 1, then A and a are codominant when h = 1, then a is dominant to A wrt fitness At equilibrium between mutation and selection against an allele, a: q^ = (µ/s) q^ = µ/hs q^ = µ/Fs 11 When the heterozygote has the highest fitness, we say that it is overdominant: Genotype Wi AA 1-t At equilibrium Aa 1 aa 1-s q^ = t/(s + t) This is a STABLE equilibrium When the heterozygote has the lowest fitness, we say that it is underdominant: Genotype Wi AA 1+t Aa 1 aa 1+s At equilibrium q^ = t/(s + t) This is a UNSTABLE equilibrium Two-locus Models Fitness interactions between loci can be ADDITIVE, MULTIPLICATIVE or EPISTATIC ADDITIVE: When an individual obtains an increment of fitness for each locus affecting a particular trait, overall fitness for the trait is calculated as THE SUM of the fitness obtained from each locus. AA WAA Aa WAa aa Waa BB - WBB WAA + WBB WAa + WBB Waa + WBB Bb - WBb WAA + WBb WAa + WBb Waa + WBb bb - Wbb WAA + Wbb WAa + Wbb Waa + Wbb 12 MULTIPLICATIVE: When an individual obtains fitness for each locus affecting a particular trait independently of the other loci, fitness for the overall trait is calculated as THE PRODUCT of the fitness obtained from each locus. AA Aa aa WAA WAa Waa BB - WBB WAA x WBB WAa x WBB Waa x WBB Bb - WBb WAA x WBb WAa x WBb Waa x WBb bb - Wbb WAA x Wbb WAa x Wbb Waa x Wbb EPISTATIC: When the fitness of a genotype at one locus depends on the genotype at a second locus, the fitness interaction between the loci is said to be EPISTATIC. We can calculate the change in GAMETE frequencies and ALLELE frequencies as a result of selection acting on the 2 loci. First, define WXY, the average fitness of gamete XY WAB = gAB[WAB/AB] + gAb[WAB/Ab] + gaB[WAB/aB] + gab[WAB/ab] WAb = gAB[WAb/AB] + gAb[WAb/Ab] + gaB[WAb/aB] + gab[WAb/ab] WaB = gAB[WaB/AB] + gAb[WaB/Ab] + gaB[WaB/aB] + gab[WaB/ab] Wab = gAB[Wab/AB] + gAb[Wab/Ab] + gaB[Wab/aB] + gab[Wab/ab] The average fitness of the entire population is: W = [f(AiAjBiBj) x WAiAjBiBj] where i and j are the alleles at each locus. With 2 alleles at each locus, there will be NINE terms in the equation, one for each of the 2-locus genotypes. 13 To calculate GAMETE frequencies after one generation of selection we use: gAB’ = gAB[WAB/W] - rDWAaBb gAb’ = gAb[WAb/W] + rDWAaBb gaB’ = gaB[WaB/W] + rDWAaBb gab’ = gab[Wab/W] - rDWAaBb 14 MOLECULAR EVOLUTION Rates of amino acid substitution per unit time = rate of amino acid substitution per unit time D = observed proportion of amino acid differences Dt = 1 - exp(-2t) k = expected proportion of amino acid differences k = 2t = -ln(1-D) We can calculate k from D using k = -ln(1-D). Then, we can calculate t if we know OR we can calculate if we know t. Rates of nucleotide substitution per unit time = rate of nucleotide substitution per unit time D = observed proportion of nucleotide differences Dt = 1 - exp(-2t) k = expected proportion of nucleotide differences k = 2t = -[3/4]ln(1- 4/3[D]) We can calculate k from D using k = -[3/4]ln(1- 4/3[D]). Then, we can calculate t if we know OR we can calculate if we know t. OBSERVATIONS THAT LED TO THE NEUTRAL THEORY 1. Proteins evolve faster and are more polymorphic than would be expected if substitutions are the result of fixation of beneficial mutations by natural selection. 2. Proteins evolve at a constant rate through time. beneficial mutations to occur at regular intervals. We would not expect 3. Different proteins and different parts of proteins evolve at different rates. 15 THE NEUTRAL THEORY OF MOLECULAR EVOLUTION Kimura (1968) and King and Jukes (1969) propoded the NEUTRAL THEORY to explain these observations. They suggested that most (but NOT all) evolutionary changes in macromolecules were due to the random fixation of selectively equivalent (neutral) variants by genetic drift. Prior to this, it was believed that all variation must be under the influence of natural selection. 1. The neutral theory explains the high levels of variation. Drift decrease heterozygosity at the rate of 1/2Ne per generation. But mutation adds new variation. The average time to FIXATION of a neutral allele is 4Ne per generation. The average time to LOSS of a neutral allele is 2(Ne/N)ln2N generations. Table 1 shows that a neutral mutation on its way to fixation will create a polymorphism for a long period of time. 2. The neutral theory explains why substitution rates are constant. The steady state rate at which neutral mutations is fixed is v (neutral mutation rate) = 1/2N x 2N x v 3. The neutral theory explains the variable rates of substitution as variation in the probability that a new mutation is NEUTRAL. If the probability is high, then the substitution rate will be high. If the probability is low, then the substitution rate will be low. So, absolute mutation rates may be similar in different molecules but neutral mutation rates may vary. HOW WELL DOES THE NEUTRAL THEORY EXPLAIN OBSERVATIONS? Levels of heterozygosity are lower than expected under the neutral theory. This has led to a modification of the theory to incorporate nearly neutral mutations. The interaction between drift and natural selection will determine the fate of a new mutation. The relevant quantity is 4Nes If 4Nes > 10, then selection is the primary force acting on the allele If 4Nes < 0.1, the drift is the primary force acting on the allele If 0.1 < 4Nes < 10, then both forces will act on the allele MOLECULAR CLOCKS Rates of substitution seem to be fairly constant for many genes and proteins. We can use this relationship to estimate times of divergence for species with poor fossil data. It is important that the “clock” be “calibrated” for the organisms and the genes under consideration. Estimates of rates of DIVERGENCE PER UNIT TIME are estimates of 2. Use observed estimates of D to calculate k. Then t = k/(2). 16 QUANTITATIVE GENETICS Many phenotypic traits are nearly continuous in their expression. If the trait can take on any value it is said to be CONTINUOUS (eg. height, weight) If the trait can only take on whole integer values it is said to be MERISTIC (eg. litter size, bristle or spot number, appendage number) When considering a population, we can measure the value of the trait and describe the distribution of the phenotypes by its MEAN and VARIANCE. mean = x = nixi n The actual value of the mean = variance = s2 = ni(xi-x)2 (n-1) = (nixi2 - nx2) (n-1) The actual value of the variance is 2 The standard deviation = s2 = We can express the relationship between phenotype and genotype as follows: P = + G + E where: G = the deviation from the mean due to genotype E = the deviation from the mean due to the environment If we assume no genotype x environment interaction then: G = 0 and E = 0 We want to be able to determine the contribution of a particular allele to the phenotype. Phenotypes cannot be passed but we can determine out an allele contributes to the phenotype “on average”. 17 AVERAGE EXCESS OF AN ALLELE, a This can be defined as the average contribution of an allele to the phenotype beyond the mean phenotype. aA = [f(AA) x GAA] + [1/2f(Aa) x GAa] p aa = [f(aa) x Gaa] + [1/2f(Aa) x GAa] q If there are more than 2 alleles, all the heterozygotes must be considered. aA = [f(AA) x GAA] + [1/2f(AB) x GAB] + [1/2f(AC) x GAC] p To estimate Gi (the genotypic deviation of the ith genotype) use: Gi = Pi - where: is the population mean phenotype Pi is the mean phenotype of genotype i. We can also calculate the BREEDING VALUE (BV) of each allele. We need to calculate ,the average effect of an allele which equals a in a random mating population. In an inbred population, = a/(1 + F) BVAA = A + A BVAa = A + a BVaa = a + a BVi and Gi do not have the same value because Gi includes a dominance contribution to the phenotype whereas BVi includes only the additive contribution of an allele to the phenotype. 18 VARIANCE COMPONENTS Unless all individuals have exactly the same phenotype, a population will have phenotypic variance. This variance can be divided into contributions from the genotype and the envirnoment (as was the mean). P2 = G2 + E2 where: P2 is the total phenotypic variance G2 is the variance due to genetic variation E2 is the variance due to environmental variation G2 can be further subdivided into an additive genetic component(A2)and a dominance genetic component (D2) as follows: G2 = A2 + D2 Total phenotypic variance is calculated in the usual way: P2 = ni(Pi-)2 (n-1) Genetic and additive variance components can be estimated as follows: G2 = fiGi2 A2 = fiBVi2 The variance components tell us how much of the total phenotypic variance is due to genetic factors and how much is due to environmental factors. The ONLY variation that is important for evolution is ADDITIVE GENETIC VARIANCE. h2 is defined as the proportion of total phenotypic variance that is due to genetic variance. Heritability in the broad sense is h2B = Heritability in the narrow sense is h2N = G2/P2 A2/P2 . If h2N = 1, then ALL phenotypic variation for a particular trait is due to additive genetic variation and a population will respond quickly to selection on the trait. If h2N = 0, then NONE of the phenotypic variation is due to additive genetic variation and a population will NOT respond to selection on the trait. 19 When considering complex phenotypic traits that are controlled by many loci, POLYGENIC TRAITS, it is very difficult to calcuate the individual variance components for each locus. Thus, we use parent-offspring regression to estimate heritability. The higher the heritability, the higher the correlation between parents and their offspring. We can calculate the COVARIANCE between two variables x and y as: XY = ni(xi - x)(yi - y) (n-1) = [nixiyi - nxy] (n-1) The equation for a line is y = c + bx It can be shown that the SLOPE of a line, b = XY/2x and that h2 = b when we regress the phentype of the MIDPARENT on the phenotype of the offspring. If we regress the phenotype of ONE parent on the phenotype of the offspring, then h2 = 2b because each parent only contributes 1/2 of the offspring’s genes. In an artificial selection experiment, R = h2S Response to selection = R = (‘ - ) and Selection differential = S = (S - ) and where: ‘ = the mean after selection S = the mean of the selected parents. 20 ADAPTATION Definitions: Adaptedness is a measure of a genotype’s (phenotype’s) capacity for survival and reproduction relative to that of other genotypes. An adaptation is a phenotypic variant that results in the highest fitness among a specified set of variants in a given environment. Adaptation/adaptedness is RELATIVE, not optimal Life can be divided into different levels of organization. We can ask which level of organization benefits from adaptations . The question may seem trivial. However, there can be conflict among the levels. What benefits a gene may not benefit the individual. What benefits an individual may not benefit the population, or vice versa. Genic Selection An allele that is favored by selection at the level of the gene, may not be favored at the level of the individual. Examples: Segregation distorter in Drosophila melanogaster. Biased gene conversion Transposable elements. Individual Selection An allele is favored because it increases the survival and/or reproduction of an individual. This is what we normally think of when we talk about natural selection. Group Selection An allele is favored because it increases the “survival” of the population. Many traits favored in individuals do this, in which case there is no need to consider group selection. However, can a trait which benefits the group at the expense of the individuals who carry it, ever increase in frequency via natural selection? Example: Altruistic behavior. A behavior is ALTRUISTIC if it increases survival/reproduction in the recipient and decrease survival/reproduction of the altruist. 21 Many examples of apparently altruistic behavior have been observed in nature: for example, warning calls. Theoretically, altruistic behavior could increase in frequency if it increases the probability that a group (population) will avoid extinction, or if it increases the probability that a group will expand to form other groups (see Figure 4). However, individuals reproduce at a much faster rate than do groups, and it seems unlikely that group selection could ever over ride individual selection. There is one case where group selection does seem to be important - when the “group” is composed of relatives. Kin Selection If you increase the fitness of your relatives, they can pass on the same genes that you do. Inclusive fitness: Your total fitness is a function of your own fitness, plus the fitness you get when you increase the survival/reproduction of relatives. wi = ai + rijbij where: wi = inclusive fitness of individual i ai = direct effect of the altruistic trait on the individual fitness of i bij = the effect of the altruistic trait on the fitness of another individual, j rij = coefficient of relatedness between i and j (the fraction of j’s gametes that are identical by descent to alleles carried by I). r Y = FXY 1 + FX where FXY = inbreeding coefficient of hypothetical offspring of X and where FX = inbreeding coefficient of X How to calculate F from pedigrees: Fi = (1/2)i(1 + FA) where i is the number of individuals in the path leading to i, and F A is the inbreeding coefficient of A, the common ancestor of the individuals in the path leading to i. Even if ai <0 (the altruistic behavior works against the individual directly), the altruistic trait can increase if the individual obtains a large increment towards fitness from helping relatives. (see example from Ridley concerning scrub jay helpers in Table 12.2) So, to come back to the question: What is the unit of selection? One viewpoint is that the units that show adaptation are the units that show heritability - the phenotypic traits and the individuals that possess them. Mutations that influence the phenotype of a unit (cell, tissue, organ, limb) must be transmitted to offspring of that unit - that is how natural selection increases the frequency of the trait. 22 Another viewpoint is that the unit of selection is the gene itself. It is the only entity which is potentially “immortal” . Phenotypes are not passed on - they are genotype by environment interactions. Even the very same genotype may not produce an identical phenotype at another time in another environment. Genotypes are not passed on - genetic combinations are reshuffled each generation due to meiosis and recombination. Even though ecological processes such as predation, competition etc., act on the individuals and ultimately cause the change in allele frequencies, it is the alleles that are passed on. Thus, it can be argued that adaptations exist because they increase the reproduction of the genes that encode them, relative to the genes that encode alternate forms of the trait. Those entities which propagate genes efficiently will show adaptation. What sorts of genetic changes can cause ADAPTATION? 1. Changes affecting the Biochemistry of an organism. Enzymes can change to affect their affinity for different substrates, their temperature optima, their kinetics etc. There is no doubt that biochemical evolution is important. However, we seldom see major new biochemical pathways evolving - the basic pathways are the same among all living things. One example of a new pathway evolving is the divergence of C3 vs. C4 photosynthetic pathways in plants. 2. Changes affecting the evolution of NEW CELL TYPES. From a histological perspective, we can recognize a relatively small number of basic cell types, regardless of the organisms from which they come (muscle, blood cell, nerve cell, epidermal cell, etc.). There is not enough variation in cell types to account for the amount of morphological evolution that has occurred. 3. Changes in DEVELOPMENTAL PATTERNING. Most morphological variation that we observe is due to changes in the developmental patterning of cellular mechanisms, not due to changes in the mechanisms themselves. Development An organism develops from a zygote due to the proliferation and differentiation of various cell lines at particular times and at distinct rates. Morphology can change if there are changes in: SPATIAL organization of cell types. TEMPORAL patterns of differentiation. 23 Passage from: “The origin of animal body plans” (March-April 1997) American Scientist 85:126-137 by D. Erwin, J. Valentine and D. Jablonski [Developmental regulation proceeds through the sequential activation of a series of regulatory switches that in turn activate networks of other genes. In general, regulatory genes produce proteins that bind to and influence the activity of other genes. The protein products of these genes then activate still other genes and the cascade continues. Regulatory genes that are active early in development help set up the body axes by determining which end of the embryo becomes the head, and which end the tail, which part is the back and which is the belly. These early expressing genes also set up the basic tissue types. Genes that are active later in the cascade help block out distinctive morphological regions within the body - say the head from the abdomen. Later still in the cascade, genes mediate the growth of appendages like limbs, until the most refined morphological details have been achieved. Many different classes of regulatory genes share a common DNA sequence which is known as the homeobox which predates the origin of animals.] Information about HOX genes from: Homeotic genes and the evolution of arthropods and chordates. (10 August 1995) Nature 376:479-485) by S.B. Carroll. HOX genes demarcate relative positions in animals rather than specify any particular structures. They regulate the expression of large numbers of target genes. In Drosophila, KNOWN HOX targets include genes encoding other transcriptional regulatory proteins, secreted signaling proteins, structural proteins. There are between 85 and 170 genes that are know to be regulated by the product of one particular HOX gene, Ultrabithorax, alone. A mutation in one HOX gene can affect regulation of many other genes and thus have profound effects on morphology. Arthropods differ in the number, type, and organization of body appendages (antennae, claws, mouth parts, legs) all of which evolved from ancestral arthropod limbs. Changes in HOX gene expression can explain why some crustaceans have limbs on their abdominal segments, and others do not. It is possible to evolve new regulatory interactions which determine WHEN and WHERE a limb will develop. In vertebrates, HOX genes influence vertebral morphology and patterns of limb and central nervous system development. “The creative potential of regulatory evolution lies in the hierarchical and combinatorial nature of the regulatory networks that guide the organization of body plans and the morphogenesis of body parts.” Understanding how adaptation of complex morphological traits occurs is possible if we recognize two basic properties of development: 1. DEVELOPMENT IS EPIGENETIC - it depends on prior developmental events and cannot be understood entirely in terms of primary gene action. Mutations that act early in development have larger effects than mutations that act late in development. 24 2. DEVELOPMENT IS INTEGRATED - development of complex structures such as limbs involves changes in many cell types and all of these changes must be timed correctly. The integrated control of these changes through regulatory genes such as HOX genes makes complex morphological changes possible. It was previously thought that there had to be changes in the separate genes controlling all the parts of a trait. This would make the evolution of complex morphological traits highly unlikely. ADAPTIVE EXPLANATION Can natural selection explain all known adaptations? Traits with simple genetic basis are no problem - colour in moths, hemoglobin in high altitude ducks. But, what about complex traits like the eyes, wings, organ systems? Darwin was convinced that such traits must have evolved “gradually” via many small changes. Population genetics theory is concordant with this view: mutations of small effect are more likely to be beneficial than mutations of large effect. The critical requirement is to show that a complex trait COULD have evolved via small changes. It doesn’t matter if we know exactly what all of those small changes were. Two classes of adaptations may cause a problem for natural selection. 1. Complex traits with many integrated parts that must all change simultaneously. eg. giraffe’s long neck, the eye. As knowledge of development increases, this problem is easy to overcome. We now know that much morphological evolution occurs by changes in REGULATORY genes which alter EXPRESSION of many loci simultaneously. The genes controlling the STRUCTURE of the parts do not have to change. 2. Traits for which the rudimentary stages would seem to be disadvantageous or functionless. eg. wings Adaptations do not generally come from nothing , but from modification of a structure that already exists. Thus, an important concept concerning the evolution of complex traits when the early stages would seem not to be useful is PREADAPTATION. Preadaptation refers to the evolution of a trait for one purpose but the later use of the trait for another purpose. We often observe a large change in the function of the trait with little change in the strucuture. After the trait is used for the new purpose, natural selection can act on variants that influence the new purpose. eg. lobe fins evolving into tetrapod limbs. eg. the evolution of wings prior to the evoluton of flight in birds Some people refer to traits that have changed functions as EXAPTATIONS 25 THE STUDY OF ADAPTATION. 1. Identify types of genetic variants that a trait may have. 2. Develop hypotheses or models of the function of the trait. 3. Test predictions of the hypotheses. A. Determine if the actual form of the trait matches the hypothesis. If not, the hypothesis is incorrect. B. Perform experiments to determine if the hypothesis is correct. This requires that variant forms of the trait are available or can be manufactured. Example: neck teeth and Chaoborus predation on Daphnia C. The COMPARATIVE METHOD . The hypothesis about the adaptive value of a trait predicts that some species in a particular environment should have a particular form of the trait that differs from that observed in species in different environments. Example: the production of neck teeth evolved independently in 2 groups of Daphnia that coexist with Chaoborus WHY ARE ADAPTATIONS IMPERFECT? 1. TIME LAGS - the environment changes so the population must respond. But, evolution via natural selection takes time. eg. tropical fruits with hard outer casings evolved for disperal by now-extinct mammals. 2. GENETIC CONSTRAINTS - Heterozygous advantage is an example of a genetic constraint. A sexual diploid population cannot be “true-breeding” for heterozygotes. So, the population must tolerate the existence of less-fit homozygotes. A population could “get around” this constraint via gene duplication. 3. DEVELOPMENTAL CONSTRAINTS definition: A development constraint is a bias on the production of variant phenotypes or a limitation on phenotypic variablity caused by the structure, character, composition, or dynamics of the developmental system. Causes: PLEIOTROPY - genes effect more than one trait. Selection cannot operate on the traits independently. eg. Small salamanders with 4 toes. Populations can sometimes evolve altered developmental pathways to decrease the constraint. CANALIZING SELECTION A new mutation might provide an advantage with respect to one trait, but it also causes some disruption of development. Selection will favour alleles at MODIFIER loci that decrease this disruption. In time, the developmental pathway can be restored even when the mutation is fixed, by fixation of alleles at modifier loci. eg. Resistance to insecticides eg. Abnormal abdomen in Drosophila mercatorum. 26 DEVELOPMENTAL CONSTRAINT HAS BEEN PROPOSED AS AN ALTERNATIVE TO NATURAL SELECTION AS AN EXPLANATION FOR THE FORM OF SOME TRAITS. In other words, if the trait COULD evolve, it might be favoured by selection, but developmental constraints prevent the trait from occurring in the first place. eg. spotted mammals tend to have ringed tails How could we distinguish between these 2 alternatives? 1. Adaptive prediction - If you can predict the form a trait will take under particular conditions, you could argue that natural selection is responsible for its form. 2. Direct measure of selection - If it is possible, we can measure selection on the trait relative to other forms of the trait. This is not always practical. 3. Heritability - If the trait is highly constrained we do not expect there to be any additive genetic variation at the loci which control it. We can do artificial selection on the trait and if heritability is not 0, then the trait is probably not constrained.. 4. Cross-species evidence - Do the missing forms of the trait occur in other species? Can we create the missing forms via artificial selection? If we can obtain them, then developmental constraint is NOT a good explanation for why they do not exist in nature. Allometry - it could be argued that you can’t get a particular phenotype because allometric relationships prevent it. However, if you can alter allometric relationships via artificial selection, it is possible that other forms of the trait could occur in nature. eg. eye stalks in flys 5. Historical constraint - A population could evolve to a high peak on its adaptive topography, but the environment may change so that another location on the topography now provides a higher peak. Even so, the population will be stuck on the current peak, even though a higher one now exists. eg. the recurrent laryngeal nerve in the neck of mammals eg. different means to obtain the same trait - neck teeth in different groups of Daphnia. Adaptation must be understood in a historical context. What was present in the past provides the raw material for subsequent change. 6. Trade-offs - If a trait is used for many functions, it may not be possible to optimize it for every one. eg. the vertebrate mouth is used for breathing and ingesting food. In theory, it is simple to define adaptations as traits that have evolved via natural selection. In practice, it may be difficult to determine if the current form of the trait did indeed evolve via natural selection. 27 SPECIES CONCEPTS AND SPECIATION First we must distinguish between two types of evolutionary change: ANAGENESIS is evolution, or a change in the gene pool, within a species. This is what we have been talking about up until now. CLADOGENESIS is branching evolution and refers to the development of 2 species from a single ancestral species. WHAT IS A SPECIES? In practice, we recognize species by their morphological differences. So we can define a PHENETIC SPECIES as a group of organisms that look similar to one another, but is distinct from other such groups. The criteria usually include a large number of morphological characters. Unfortunately, a set of characters may not always define the same “groups”. In addition, this definition of species has no relationship to evolution. We can define species this way even if evolution does not occur. The most commonly used definition currently, at least among zoologists, is the BIOLOGICAL SPECIES which can be defined as a group of interbreeding individuals that is reproductively isolated from all other such groups. This definition was proposed by Ernst Mayr. This definition is satisfying from an evolutionary perspective because it incorporates the idea of shared gene pools so that species and speciation can be studied in the framework of population genetics; a gene pool = a species. The phenetic and biological species concepts often describe the same groups of individuals. This is not surprising - we use morphological characters to identify individuals that belong to the same gene pool. The phenetic similarity is a direct consequence of the heritability of the traits encoded by the gene pool. Thus, as far as proponents of the biological species concept are concerned, phenetic similarity only matters in so far as it is an indicator of interbreeding. An example of a situation where the 2 concepts disagree is SIBLING SPECIES. In this case we have 2 species whose individuals are morphologically indistinguishable (at least to humans). However, genetics, behavior and/or reproductive biology show that their are really 2 groups that do not interbreed. NOTE: The “glue” that holds species together under this concept is GENE FLOW. Over the years there have been many attempts to modify the definition of species. The reason there has been so much effort in this area is that no definition is “perfect”. For example, how can you use the criterion of interbreeding with respect to an asexual or parthenogenetic organism? Below is a short description of some of the other species concepts that have been proposed. 28 THE RECOGNITION SPECIES CONCEPT (H. Paterson) A species is a group of individuals sharing the same Specific Mate Recognition System (SMRS). In general, this definition should define the same groups as the Biological Species concept. However, instead of framing things in terms of who individuals do NOT breed with, this concept focuses on who individuals DO breed with. THE ECOLOGICAL SPECIES CONCEPT A species is a group of organisms exploiting a single niche. It has been argued that ecological niches in nature occupy discrete zones with gaps between them. The “glue” holding the species together in this case is natural selection. Interbreeding between species would not be favored because of the creation of hybrids that are not adapted to either niche. This concept differs from the Biological species concept in its focus on natural selection as the cohesive force, rather than gene flow. THE COHESION SPECIES CONCEPT (A. Templeton) Species are the most inclusive group of individuals having the potential for phenotypic cohesion throught intrinsic cohesion mechanisms. Mechanisms of cohesion include: gene flow, stabilizing selection, developmental contrainsts, reproductive isolation There is, as of yet, no perfect definition of species that everyone can agree on. This is, perhaps, not surprising as the factors that lead to the development of 2 species from 1 operate over very long time scales. Should we be surprised to catch a species “in the act” of diverging into 2 species, making it hard to decide if there is 1or 2? ORIGIN OF NEW SPECIES Speciation is caused by the evolution of genetic barriers to interbreeding. 1. Start with a single species composed of a set of interbreeding individuals. 2. A new variant(s) spreads throughout part of the species range. Barriers of this variant mate only or preferentially with other bearers of the variant. 3. Once the mating preference becomes exclusive breeding within each of the two groups (with and without the variant), two species exist. 4. The two species will continue to diverge at other loci. REPRODUCTIVE ISOLATING MECHANISMS Mechanisms that prevent interbreeding are generally divided into 2 groups Pre-zygotic isolating mechanisms Post-zygotic isolating mechanisms 29 How much genetic differentiation must there be for speciation to occur? * Nothing by itself is critical for speciation. * Speciation can be caused by changes at a few loci, or by changes at many loci. The changes can relate to any part of the genome controlling any feature of the organisms; morphology, behaviour, karyotype, allozymes, habitat preferences, etc etc. eg. 2 species can be morphologically similar but genetically divergent - Daphnia eg. 2 species can be morphologically different but genetically simliar - humans and chimps MECHANISMS OF SPECIATION A speciation mechanism is ANYTHING that restricts gene flow among populations and thus leads to reproductive isolation. Mechanisms of speciation have been classified in various ways (SEE HANDOUT). Mayr has classified speciation mechanisms according to the level at which it occurs (individuals vs populations) and , in the case of populations, according to their geographical relationship. Templeton has classified speciation mechanisms in a population genetic framework. GEOGRAPHIC SPECIATION (Allopatric, Parapatric, Sympatric) Allopatric speciation - Reproductive isolation (RI) evolves while 2 groups are separated by some geographical barrier. The genetic changes can be caused by drift or natural selection, but they do not occur to cause speciation per se. Speciation is a BY-PRODUCT of divergence in the absence of gene flow. When isolated populations come into secondary contact, there can be one of two outcomes: the populations interbreed and remerge into a single species OR they remain separate as a result of RI that has evolved during the separation. It has been argued that the occurrence of post-mating isolation between the groups in secondary contact can cause selection to favour the evolution of pre-mating isolation to REINFORCE the RI that has already evolved. In other words, it natural selection can directly complete the speciation process by favouring genotypes that mate within their own group. This is called speciation via REINFORCEMENT. This is theoretically possible, BUT it requires strong linkage disequilibrium between the loci causing the pre- and post-mating isolation. This will occur initially, when the groups come back into contact. However, interbreeding will break down the linkage disequilibrium - usually faster than selection can increase the frequency of alleles for pre-mating isolation. In the meantime, selection is also acting to reduce the frequency of alleles that cause the post-mating isolation. As this occurs, the barrier to interbreeding will decrease and there will no longer be a selective advantage to pre-mating isolation. 30 Parapatric specation - RI evolves in a continuous population which spans an environmental gradient. If different alleles are favoured in different environments, then a cline in allele frequencies will develop. Clines are common in a nature and when they are gradual, they seldom lead to speciation. However, when there are abrupt changes in the environment, the cline can be very steep leading to what is called a STEP CLINE. Heterozygotes tend to be disadvantageous (post-mating isolation) and their occurrence in the transition zone can lead to the evolution of pre-mating isolation via reinforcement. If the transition zone is stable and long-lived, it may provide the conditions necessary for reinforcement to occur - stability of the heterozygote disadvantage providing sustained selection in favour of pre-mating isolation. Hybrid zones - An area of contact between two noticeably different forms at which hybridization takes place. When the hybrid zone forms on either side of an abrupt environmental transition, it is considered to PRIMARY. When populations come back into secondary contact, they may hybridize at the contact zone. Such zones of contact are considered to be SECONDARY hybrid zones. In practice, it is difficult to determine whether a hybrid zone is primary or secondary. The relative frequency of the two types has important implications for the relative importance of allopatric versus parapatric speciation. Sympatric speciation - RI evolves within the range of the ancestral group, often as a result of spatial environmental heterogeneity. This is a controversial idea because it REQUIRES the operation of reinforcement. The conditions for the occurrence of sympatric speciation are similar to the conditions required for the establishment of a multiple niche polymorphism. There must be some sort of HABITAT SELECTION such that individuals who have high fitness in one environment tend to choose that environment and thus, tend to mate with other individuals that have high fitness in that same environment. This habitat selection can provide the reduction in gene flow between habitats that would allow the development of RI. eg. The evolution of host races in the fruit fly Rhagoletis. POPULATION GENETIC MODES OF SPECIATION Templeton divides mechanisms of speciation into two main groups: TRANSILIENCE and DIVERGENCE. Speciation via transilience can occur when some event other than natural selection creates a change in the genetic compositon of a species. Natural selection acting to maintain the “status quo” is overcome by this event, and then natural selection acts to stabilize the new state. Speciation via divergence occurs when RI evolves gradually as a consequence of the operation of natural selection under different conditions. 31 DIVERGENCE MODES Adaptive (similar to allopatric) Some extrinsic barrier to gene flow develops. Isolated populations diverge due to adaptation to different environments. RI is a secondary consequence of the adaptation. Rates of divergence are dependent on population structure. Large panmictic populations that occupy similar environments would be slow to diverge from one another even in the absence of gene flow. Clinal (similar to parapatric) Natural selection occurs along an environmental gradient with isolation by distance. This is most likely to occur if selection is creating a cline of allele frequencies at a major locus with many modifier loci. When different alleles at the major locus are favoured at opposite ends of the cline, the accumulation of differences at the modifier locui, which enhance the phenotypic expression of the trait under selection, can indirectly result in post-mating isolation. Habitat (similar to sympatric) There is no isolation by distance, and there is gene flow (or the potential for gene flow) among groups of individuals which prefer, and have different fitnesses in different habitats. In order for this to lead to a speciation event, there must be a genetic basis for habitat selection which can then lead to assortative mating within groups. In general, the loci controlling the habitat selection need to be in linkage disequilibrium with the loci controlling fitness differences in the various habitats. TRANSILIENCE MODES Genetic A rapid change in the genetic composition of a population due to a founder effect. This is most likely to occur in a large, panmictic population that gives rise to a small peripheral population. The founder event can create linkage disequilibrium among loci which can effect the trajectory of natural selection in the new population. If the population stays small, then inbreeding and homozygosity will increase (relative to the ancestral population). The resulting change in the genetic architecture of the new population, through the redevelopment of new co-adapted gene complexes, can lead to the secondary development of RI between the new population and the ancestral population. One way to think of this is as speciation via PEAK SHIFTS . eg. Hawaiian Drosophila Chromosomal Chromosomal rearrangements such as inversions, translocation, and fusions can cause a high degree of hybrid sterility. Generally, we expect the new variant to be eliminated by natural selection. However, if it becomes fixed in one population, that population will be reproductively isolated from other populations. One way to overcome the selective barrier is extreme inbreeding in small populations which could rapidly fix the new chromosomal variant. This seems unlikely in many species but species with small isolated populations such as rodents, some species of lizards and some species of plants appear to have speciated via this mechanism. Hybrid maintenance When 2 species hybridize, the F1 hybrids may be viable/fertile, but there may be F2 breakdown. In some cases there has been the evolution of a mechanism ot maintaine the viable F 1 state. In plants, a common mechanism to do this is polyploidy. In animals, the development of parthenogenesis can lead to new “species”. 32 Hybrid recombination When 2 species hybridize, the F1 hybrids may be viable/fertile, but there may be F2 breakdown. In plants, the F1 generation may be viable but essentially sterile. However, this F 1 state can be maintained indefinitely via vegetative reproduction. During this time the plants will continue to produce pollen and seeds and eventually recombinants may occur that are fertile, but that differ from either parent. If these recombinants are interfertile amongst themselves, they can establish a new species which is isolated from either parent via post-mating barriers. DIFFERENT MODES OF SPECIATION ARE MORE LIKELY IN SPECIES WITH CERTAIN ATTRIBUTES Allopatric speciation (or speciation via adaptive divergence) is considered to the most common form of speciation. There are no theoretical difficulties with this concept and virutally any species can speciate if it is divided into isolated demes. This mode of speciation requires fairly long periods of time - it has been estimated that Drosophila species take between 1.5 and 3.5 million years to evolve allopatrically. Speciation via genetic transilience could occur more rapidly than adaptive divergence and seems more likely in situations where islands have been colonized by large mainland populations, and in organisms with low vagility. Parapatric (or clinal speciation) is also more likely in low-vagility organisms such as plants, terrestrial snails, fossorial rodents, flightless insects, lizards, etc. Sympatric speciation seems to be more likely in organisms that use various “hosts” such as parasites and phytophagous insects. Speciation is not adaptive in itself but it has profound consequences for adaptive evolution. Populations can become more “fine-tuned” to their environment in the absence of gene flow from populations in other environments. A group of organisms can occupy a much larger “ecological space” if it is divided into reproductively isolated groups each specializing on one region of that space. A species cannot be best at everything (recall the difficulty with maintaining polymorphism via environmental heterogeneity) but reproductively isolated groups can “be best” at one particular thing. Rates of speciation have varied immensely over evolutionary time. The fossil record indicates that there have been long periods of stasis interrupted by periods of rapid diversification. Often, bursts of speciation involve ADAPTIVE RADIATIONS which are the evolutionary divergence of members of one lineage into different adaptive zones. 33 Some generalizations about adaptive radiations: * They often occur at the edge of species ranges where a new genetic combination might be favoured in a different environment than that usually occupied by the species. * A lack of direct competitors or predators will facilitate the process by allowing a species to invade an environment to which it is not well adapted. This is difficult if well-adapted competitors or predators are already there. * Adaptive radiations often happen when something opens up new niches for colonization. - archipelagos which are uninhabited when they first form are subsequently colonized (eg. the invasion of the Galapagos Islands by a single finch species that radiated to fill multiple niches occupied by non-finches on the mainland, the radiation of picture-wing Drosophila and honeycreepers in the Hawaiian islands). -profound changes in climate can open up vast areas of novel habitat. (eg. the drying of Africa turned much of the forest into savannah allowing the adaptive radiation of ungulates). -mass extinctions in one lineage can open up new niches for other lineages (eg. the extinction of large carnivorous dinosaurs provided the opportunity for the radiation of large carnivorous mammals and birds - only the canids and felids survive today). * Adaptive radiations often happen after the evolution of a KEY INNOVATION. The evolution of a new morphological feature in one lineage often opens up opportunities for that group to invade niches which it could not previously occupy. (eg: the evolution of flight in birds and in bats, and the evolution of modified jaw structure in cichlid fishes). 34 PHYLOGENY RECONSTRUCTION Evolutionary biologists have 2 major tasks: 1. 2. To determine the ecological and genetic mechanisms of evolutionary change. To determine the actual history of evolutionary change. To date, we have been mainly concerned with MICROEVOLUTION - changes in gene frequencies below the species level. If such change proceeds beyond a certain point, we recognize that one species could become 2. However, we have not considered the evolution of higher taxonomic groups or MACROEVOLUTION. The term that we use for evolutionary change within a lineage is ANAGENESIS. The term that we use for evolutionary change leading to the branching of one lineage into two lineages is CLADOGENESIS. To understand macroevolution, we need to understand PHYLOGENY and SYSTEMATICS Phylogeny is the pattern of branching showing the evolutionary relationships among species. Systematics refers to the organization of organisms into hierarchical groups. Ultimately, evolutionary biologists would like the system of classification to reflect phylogeny. Classification based on overall morphological similarity often does correspond with phylogeny but it need not if organisms have evolved in parallel or convergently. Why Worry About Phylogeny? If we have accurate phylogenies, we can ask questions about rates of evolution, patterns of evolution, and adaptation via the comparative method. It allows us to identify INDEPENDENT evolutionary events. Phylogenetic terms Monophyletic group - a group of taxa descended from a single ancestral taxon Polyphyletic group - a group of taxa descended from two or more distinct ancestral taxa. Paraphyletic group - a group of taxa derived from a single ancestral taxon, but one which does not contain all of the descendants of the most recent common ancestor. Plesiomorphy - a character showing the ancestral condition. Apomorphy - a character derived from and differing from an ancestral condition. Homologous - used of structures, traits or properties having common ancestry but not necessarily retaining similarity of structure or function. Analagous - pertaining to similarity of structure or function due to convergence rather than to common ancestry. When constructing phylogenies, we must strive to choose characters that are homologous among the taxa under consideration, rather than analagous. 35 PHENETICS (Numerical Taxonomy) The phenetic approach involves grouping taxa on the basis of overall similarity for a large number of characters. Any sort of characters can be used: morphological, molecular, behavioral. The tree of relationships constructed using this method is called a PHENOGRAM or DENDROGRAM. Assumptions Evolution occurs at an approximately constant rate so that higher similarity reflects closer genetic relationship. Advantage: It is not necessary to know whether the forms of each character are ancestral or derived. Disadvantage: It is not possible to reconstruct the original character states of the taxa from the phenograms. It is not possible to learn how the traits are changing along the branches of the phenogram - all of the data are reduced to single numbers characterizing the similarity (or distance) between pairs of taxa. Once character state data are gathered from a group of taxa, it is necessary to convert it to a measure of distance or similarity between all pairs of taxa (a distance matrix). There are various ways to do this for different types of data. Examples for molecular data follow: Nei’s Genetic Distance It is common to gather allele frequency data for a large number of loci and then convert these data to a measure of genetic distance. I = (xi2 xiyi x yi2)1/2 = Jxy (Jx x Jy)1/2 where xi is the frequency of the ith allele in population x where yi is the frequency of the ith allele in population y I is the normalized identity - the probability of choosing the same allele from each of population x and y, relative to the probability of choosing the same allele twice from either population x or y. D = -lnI D is the genetic distance between population x and y. To calculate I for many loci use Jxy (Jx x Jy)1/2 where J is the mean across loci. 36 Sequence Divergence It is now common to gather data about DNA sequences directly. The raw data must be converted to a measure of nucleotide divergence between pairs of taxa. Restriction site data It is possible to map the location of restriction sites in DNA fragments. The raw data will consist of a table indicating whether a particular restriction site is present or absent in a particular DNA fragment. Example: DNA fragment 1 DNA fragment 2 1111110000111111001011101 1111111111001110111111111 Sequence divergence between fragments 1 and 2 can be estimated as: dXY = -ln S r where r = the number of nucleotides in the enzyme recognition site (usually 4 or 6) where S = 2mxy mx + my where mxy = the number of sites shared by sequence X and Y where mx = the number of sites in sequence X where my = the number of sites in sequence Y In the example above: m12 = 14, m1 = 17, m2 = 22, S = (2 x 14)/(17 + 22) = 0.7179 If enzymes recognizing hexanucleotide sites were used, then r = 6 and d12 = 0.055 or 5.5% sequence divergence DNA sequence data It is now much more common to obtain direct sequence data. In this case, dxy is the proportion of observed nucleotide difference between a pair of sequences. This observed number of differences is an underestimate of the actual number of changes that have occurred as there may have been multiple changes at some sites such as A>G>C. One correction used to account for this multiple substitution is the Jukes-Cantor correction: xy = -3/4 ln(1 - [4/3]dxy) Example: DNA Sequence 1 DNA Sequence 2 dxy = 5/40 = 0.125 AGGCT GAGAG AGATA CCCCG GATAG CAGAT ACGAT ACGAT AGGCC GAGAG AGATG CCCCG GGTAG CAAAT ATGAT ACGAT * * * * * xy = -3/4 ln(1 - [4/3]0.125) = 0.137 or 13.7 % sequence divergence. 37 Once we have constructed a distance matrix for PAIRS of taxa, we require a method to group taxa based on the similarity(distance). A method that was once commonly used (but has since fallen out of favor) is UPGMA (Unweighted Pair-group Method using Arithmetic means) (see example on handout). Many other methods have been developed, each designed to minimize the distortion of the original distance matrix on the final phenogram. In other words, the branch lengths on the phenogram should correspond to the actual genetic distances in the original data matrix. Examples of other phenetic clustering methods: Neighbor-Joining (Saitou and Nei. Molecular Biology and Evolution 4:406-425, 1987) Fitch-Margoliash (Fitch and Margoliash. Science 155:279-284, 1967). CLADISTICS The cladistic approach involves grouping taxa on the basis of shared derived characters (apomorphies). In other words, organisms that share apomorphies are more closely related to one another than they are to taxa that do not possess the apomorphic character state. In order to use this approach, it is necessary to classify character states as derived and ancestral. The tree of relationships constructed using the approach is called a CLADOGRAM. Assumption: Derived character states only evolve once. Advantage: It is possible to reconstruct the pattern of character state changes from the cladogram. Disadvantage: It may be difficult to determine which character states are ancestral and which are derived. One way to polarize character states is to use an OUTGROUP. An outgroup taxon is chosen based on its close phylogenetic relationship to the group of taxa for which you are attempting to construct a phylogeny (the INGROUP). The choice of outgroup is very important; it is absolutely essential that the outgroup NOT be more closely related to any of the ingroup taxa than they are to one another. Character states are said to be ancestral if they are shared by the outgroup and any of the ingroup taxa. Character states that are unique to some subset of the ingroup taxa are said to be derived. These character states define relationships among the ingroup taxa.We attempt to draw a cladogram that is consistent with the pattern of character state change for all of the characters in our study. In other words, we try to find a tree that requires each derived character state to evolve only once on the cladogram. Unfortunately, this is rarely possible. A character that changes between the ancestral and a particular derived state more than once on the cladogram is said to be HOMOPLASIOUS. Thus, we have the dilemma of trying to decide which of the many possible trees we could draw is the “best” tree. There are several criteria that are commonly used to make this decision. 38 Parsimony We draw all the possible trees suggested by our data set, and then ask, how many total character state changes are required by each tree? The “best” tree is considered to be the tree that requires the fewest character state changes, or steps, to explain the data. This criterion is based on the idea that character state change is rare, so that the shortest tree is most likely to represent the true tree. This method is good if homoplasy is not dispersed among many characters. Compatibility We draw all the possible trees suggested by our data set, and then ask, which tree is congruent (requires only a single character state change) with the highest number of characters? The “best” tree has the fewest number of homoplasious characters. This method is good if there are a few characters that seem to be evolutionary labile (easily change between the ancestral and derived state). Ideally, there is only one “best” tree. However, we often find that there are a large number of equally parsimonious or equally compatible trees. How do we decide which one is best? The analysis of methods to determine which trees are best is a very active area of evolutionary research. New methods are being proposed all the time. However, one example that has been widely used in the past (and still is) is BOOTSTRAPPING. Boostrapping Bootstrapping involves constructing a large number of replicate data sets from the original data set by randomly choosing characters from the original data and then replacing them before choosing again. Suppose we have surveyed 100 characters. We construct a replicate data set by randomly choosing a character from among the 100 to include in our new data set. Then we “put it back” and choose again. We repeat this process until we have randomly choosen 100 characters for our new data set. This data set may not include some of the 100 original characters, but it may also include some of them many times. We repeat this process until we have constructed a large number (100 or more) of replicate data sets, each consisting of 100 characters. Then we construct trees from each of the data sets and keep the “best” trees according to our optimality criteria. A CONSENSUS tree is then constructed from this group of trees: it is the tree in which each monophyletic group of taxa occurs most frequently. The frequency with which each monophyletic group occurs among the set of trees is called the bootstrap value. The higher the value, the more confident we are in the proposed grouping. MAXIMUM LIKELIHOOD Maximum likelihood methods of phylogenetic inference evaluate a hypothesis about evolutionary history in terms of the probability that a proposed model of the evolutionary process and the hypothesized history would give rise to the observed data. In the case of phylogeny reconstruction, the data are observed nucleotide or protein sequences, and the unknowns are the branching order and branch lengths of a phylogenetic tree. We must specify a model that accounts for the conversion of one sequence into another. In some cases, parameters of the model (for example, patterns of substiution) can be estimated from the data. The maximum likelihood approach evaluates the probability that the model we have chosen will have generated the observed sequences. Phylogenies are inferred by finding those trees that yield the highest likelihood values. 39 PROBLEMS WITH PHYLOGENY RECONSTRUCTION There are two major reasons why we may not find the true phylogeny with these methods. 1. Variation in evolutionary rates. Phenetics is particulary impacted by changes in evolutionary rates. As a result of rapid evolutionary change within one lineage, it may be appear to be quite divergent from even its closest relatives. On the other hand, its close relatives may appear to be similar to more distant taxa that have been evolving very slowly. Since phenetics groups taxa based on overall similarity, the fast evolving lineage will be placed in the wrong position on the phenogram. 2. Homoplasy. When characters change state more than once during evolutionary history they confuse our perception of phylogenetic relationships. There are three types of homoplasy we need to be concerned with: a. Convergence - the evolution of a derived state from two different starting points. The descendents of two different lineages resemble each more than did the ancestors. This often occurs when a common problem is “solved” with a similar solution in 2 unrelated lineages. eg. the wings of birds and bats, the sucking mouth parts of mosquitoes and true bugs, the trachea of chelicerates and insects. b. Parallelism - the evolution of a derived state by a similar pathway in two lineages that share a common ancestor. eg. the parallel radiation of marsupial and placental mammals. c. Evolutionary reversal - the loss of a derived state back to the ancestral state. eg. the redevelopment of wings in a lineage of wingless insects (the ancestral state is winged). These processes can cause problems for phenetics because they cause distantly related taxa to be more similar to one another than they are to their closest relatives. With a cladistic approach, we group taxa that share a derived character state. If that state has independently evolved several times, we will incorrectly group together all of the taxa that possess it when in fact, they are not each other’s closest relatives. These sorts of phenomena are the reason that we do not get one “best” tree in a cladistic analysis (there are several equally parsimonious or compatible trees depending on which characters are considered to be homoplasious).