* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Interference Do crossovers interefere with one another? Or, if a
Essential gene wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Population genetics wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Gene desert wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Public health genomics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genomic imprinting wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene expression programming wikipedia , lookup
Minimal genome wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Genome evolution wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genome (book) wikipedia , lookup
Designer baby wikipedia , lookup
Microevolution wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Gene expression profiling wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Interference Do crossovers interefere with one another? Or, if a crossover occurs in one region, does this somehow inhibit or interfere with the probability that a crossover will occur in a nearby region? We can address this question using three-point crosses, and comparing the number of double crossovers observed to that expected in the absence of interference. That is, if we assume that the occurrence of crossovers is independent of one another. In our first example three-point testcross above, we constructed the genetic map below, and we observed a total of 10 double recombinants out of a total of 510 progeny examined. C 11.8cM A 21.5cM B We can predict the proportion of double recombinants expected in the absence of intereference. That is, we assume whether a recombination event occurs between C and A is independent of whether one occurs in the region between A and B. this is simply obtained as the product of the two probabilites. So probability of double recombinant in the absence of interference is: Prob(double) = 0.118 x 0.215 = 0.025 Since we examined 510 progeny, the expected number is 510 x 0.025 = 12.75 . Recall that we observed 10 double recombinants. In this case there isn't much difference between the observed and expected. Interference is equals 1 - (observed double rec / expected double rec) I = 1 - 10 / 12.75 = 0.22, or about 22 percent. (is this statistically significant we'd need to carry out an appropriate statistical test and we'd need a reasonably large sample size). In our second example of a three-point cross we observed 20 double recombinants out of 1000 progeny and deduced the map below. R T S 28 cM 20 cM Expected proportion of double recombinants = 0.2 x 0.28 = 0.056 Expected number = 1000 x 0.056 = 56 I = 1- 20 / 56 = 0.64; Which happens to be about, 64%. Here, there may well be interference, and interference usually occurs at some level in most organisms and is particularly detectable for gene regions that are nearby . Goodness of fit test for linkage One issue we haven't yet addressed is a statistical test for linkage. If ratios for a testcross depart radically from a 1:1:1:1 ratio, we can perhaps be confident that linkage might occur, however, what if the ratios are off, but not by very much. We can test this using a chi-square test similar to our previous ones. In this case, however, we will do what often called a chi-square test for independence (that is we ask whether the genes segregate independently). The test is used in other branches of biology as well and may be called a test of independence, or a contigency test. So, imagine we had the following data Crossed AABB X aabb F1 all AaBb Female Testcross: AaBb Male aabb x AaBb Aabb aaBb 55 40 45 But he got: aabb 60 There certainly seem to be more parentals than recombinants, but is this just due to random variation or is there truly evidence for linkage here? To do the chi-squared test for independence, we need first to generate the expected numbers of progeny of each type assume there is no linkage and then compare this to the observed data. First organize the data in a square table as follows: Observed table Aa aa totals Bb 55 45 100 bb 40 60 100 totals 95 105 200 Using this observed table, we can estimate the proportion of each genotype we expect to see assuming independent assortment. so for AaBb we expect 95/200 x 100/200 = 0.2375 or to convert this to the expected Number of AaBb progeny 200 x 0.2375 = 47.5 for Aabb we expect 95/200 x 100/200 = 0.2375 or 47.5 progeny for aaBb we expect 105/200 x 100/200 = 0.2625 (x 200) give 52.5 progeny for aabb we expect 105/200 x 100/200 = 0.2625 (x 200) gives 52.5 progeny Observed table Bb 55 45 100 bb 40 60 100 totals 95 105 200 Expected table Bb Aa 47.5 aa 52.5 totals 100 bb 47.5 52.5 100 totals 95 105 200 Aa aa totals so now we have observed and expect so calculated the chi-square statistic as usual. Chisq = sum{(obs-exp)2 /exp} chisq = (55-47.5) 2 /47.5 + (40-47.5) 2 /47.5 + (45-52.5) 2 /52.5 +(60-52.5) 2 /52.5 = 4.51 How many degrees of freedom? Well here we have departure from before. We have 4 observed classes, so we lose 1 degree of freedom leaving 3. However, we used the data to obtain the expecteds. In fact, we estimated two independent probabilities from the data, the probability of being Aa and the probability of being Bb, so we lose 1 degree of freedom for each. So df = 4-1-2 = 1 The easy way to get degrees of freedom is just (nrows-1) x (ncolumns -1) = (2-1) x (2-1) =1 since χ2 1, 0.05 = 3.841 and our value is greater than this we reject that null hypothesis that the genes are unlinked (ie assort independtly) therefore we have evidence for linkage. Now estimate the recombination frequency as: r = 85/200 = 0.425 or 42.5 % Centromere mapping with tetrads Note that we've considered the mapping of genes, and DNA-based markers which could be a segment of a gene, and non-coding region of DNA, or even a single position on a DNA sequence (SNP). Using Neuropora tetrads, it is also possible to map the position of genes relative to the centromere. So let's return to our spore colour schematic in the asci of Neurospora. Here we've generated a Dd heterozygote, and are following meiosis. D gives dark spores, while d gives light coloured spores. Now consider the location of the spore colour gene, D/d relative to the centromere. Centromere D If there is no crossing over between the centromere and the locus D, then there are two possible patterns of spore formation This is because of the way in which spores are formed in a linear meitoic pattern and because the particular alleles remain attached to the partental centromeres in the absence of recombination 1st meiotic division Diploid meiocyte 2nd meiotic division mitosis Mature ascus with 8 ascospores And this reversed So here are the two asci type without recombination between D and centromere. d d d d D D D D If there is recombination between locus and centromere, this yields four possibilities So now let's imagine you do this cross and score a large number of asci. D D D D d d d d 280 d d d d D D D D 280 d d D D d d D D 10 D D d d D D d d 10 d d D D D D d d 10 D D d d d d D D 10 So, now we've counted the number of asci where there has been no recombination between the D locus and the centromere, as well as those where there has been recombination. Note that here we are counting entire products of meiosis, not simply counting each individual genotype. So let's consider one of the asci that had recombination. Recall that recombination will involve a pair of chromatids, leaving the other pair as parentals or non recombinants. Thus for each ascus that indicated recombination, only half of the progeny (or spores) are the result of recombination. So the recombination proportion r = (number recombinant asci/ 2)/total number of asci r = (40/2)/ 600 = 0.033 or 3.3cM Thus the map would appear as Centromere 3.3cm D Lod Scores and Human pegrees and linkage. One issue in human genetics is we can't force mating. People tend to be resistant to this. So, we must use the matings/pedigrees available to us. Furthermore, human families are typically small. As a result, a single family alone will typically not provide sufficient power to determine whether genes might, or might not be linked. This issue can be of considerable significance in human medical genetics, where we might wish to find a disease causing genes and the approach to use is typically to find linked genetic markers, to allow us to close in on the gene. To resolve this, one can combine information across a number of pedigrees, and an approach to doing this involves the use of Lod (log of the odds ratio) scores. Lod scores are also used in other branches of genetics, and they bear considerable resemblance to maximum likelihood ratios (a statistical method for estimation). The approach of using Lod scores, involves determining the probability of observing a particular set of progeny for a single family first assuming the two genes are not linked (i.e. assume independent assortment). It is then possible to construct the probability of observing the progeny assuming a range of different linkage values (or the value of, r, estimated from the family). The ratio of the probability with linkage, to that assuming no linkage is calculated (the so-called odds ratios) and then simply take the log10 of the number. The r value corresponding to the greatest Lod score, provides the best estimate of the r. The Lod scores can simply be added up across a number of pedigrees. If the Lod score exceeds 3 (this means probability of observing the progeny assuming linkage is 1000 times greater than that if the genes were unlinked) it is typically assumed that the genes are linked. An example perhaps illustrates this best. Imagine you are interested in a disease causing gene, d, as opposed to the normal allele D. And you also have a molecular marker with two alleles G1 and G2 where you can detect both alleles using RFLPs or some other molecular method. You also know the parental versus recombinant types because of information in the pedigree. You explore a number of family pedigrees for one where the there is a double heterozygote cross DdG1G2 x ddG2G2 (this is essentially a test cross) So you then observe that six progeny are produced. 2 are DG1 parental 2 are dG2 parental 1 is dG1 recomb 1 is DG2 recomb (we'd estimate r = 2/6 = 0.33) Now we can write out the probability of observing these progeny assuming various r values. First beginning with r = 0.5, or indept assort. Under independent assortment we expect to see 0.25 of each progeny type. So prob of observing these six progeny is .25x.25x.25x.25x.25x.25 = 0.000244 if r = .2 Then you expect to see 0.8/2 = 0.4 of each parental and 0.2/2 of each recomb. So probability is 0.4*.4*.4*.4*.1*.1 = 0.000256 if r=.333 0.3334 x 0.1672 = 0.000343 etc. for other probs. You can then do this for other r-values. Then construct the ratio, and then the Log10 of the ratio Tabulate below Probabilty Ratio Lod score .5 .4 .33 .3 .2 0.000244 1 0.000324 1.327104 0.12290496 0.000343 1.404453 0.14750706 0.000338 1.382976 0.14081464 0.000256 1.048576 0.02059991 0 So here we can see, not surprizingly that the highest LOD score is where r = 0.33 which is the estimate of we'd expect , r = 2recomb/6 progeny = 0.333. The LOD value for r = 0.33 falls well below 3.0 and so to ask whether these genes are truly linked we would need to examine more pedigrees segregating for these genes, and then simply add the Lod values together. If they exceed three, then we can be reasonably convinced the genes are linked, and we can estimate the recombination frequency from the data. This then might provide us with information we need to begin to find the gene (at the DNA level) for the disease allele D. LOD score LOD score 0.2 0.15 0.1 0.05 0 0.1 0.2 0.3 recom bination (r) 0.4 0.5