Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Gene mapping: Linkage and association methods • Disease gene mapping is one of the main purposes for genotyping • Two major approaches: linkage and association analyses Linkage analysis Try to localize genes affecting specific phenotypes Search for: cosegregation of disease and marker alleles Basics of Linkage Analysis 1. 2. 3. 4. Idea of Linkage Analysis Types of Linkage Analysis Parametric Linkage Analysis Conclusions Basics of Linkage Analysis 1. 2. 3. 4. Idea of Linkage Analysis Types of Linkage Analysis Parametric Linkage Analysis Conclusions Linkage Analysis • One of the two main approaches in gene mapping. • Uses pedigree data. Genetic linkage and linkage analysis • Two loci are linked if they appear nearby in the same chromosome. • The task of linkage analysis is to find markers that are linked to the hypothetical disease locus • Complex diseases in focus usually need to search for one gene at a time • Requires mathematical modelling of meiosis Meiosis and crossover • Number of crossover sites is thought to follow Poisson distribution. • Their locations are generally random and independent of each other. The simple idea DIS Recombination fraction Marker Always: 0 ≤ ≤ 0.5 • Task: Find that maximises L( |data ) • Obtain measure for degree of evidence in favour of linkage (LOD score) Markers and inheritance 1 2 4 3 2 2 1 3 3 1 4 2 Father 2 3 1 3 4 1 Mother Child • Polymorphic loci whose locations are known • Most often SNPs or microsatellites • Inherited within the chromosomes Markers and information • Two individuals share same allele label they share the allele IBS (identical by state) • Two individuals share an allele with same (grand)parental origin they share an allele IBD (identical by descent) • IBS sharing can easily be deduced from genotypes. • IBD sharing requires more information. One can try to deduce IBD sharing based on family structure and inheritance. Markers and information 1,2 2,3 The children share allele 1 IBS. 1,2 1,3 They also share it IBD. Markers and information 1,2 1,3 The children share allele 1 IBS. 1,2 1,3 They do not share alleles IBD. Markers and information 1,1 2,3 The children share allele 1 IBS. 1,2 1,3 They either share or do not share it IBD. Marker maps Building blocks of linkage analysis Pedigree structures Chr. 1 1 1 2 5 1 12 1 2 14 1 2 3 1 2 1 2 1 2 1 2 Chr. 2 1 3 3 4 4 5 4 7 1 1 2 3 2 4 3 4 4 2 1 4 2 3 Genotypes Phenotypes Chr. 22 2 1 1 3 2 2 2 3 3 4 Building blocks of linkage analysis • Information about disease model (in parametric analysis) 0.99 (aa), probability of a homozygote being affected 0.8 (Aa), probability of a heterozygote being affected 0.001 (AA), probability of a non-carrier being affected (phenocopy rate) Assumed disease allele frequency • • Marker allele frequencies Information about environmental variables Basics of Linkage Analysis 1. 2. 3. 4. Idea of Linkage Analysis Types of Linkage Analysis Parametric Linkage Analysis Conclusions Types of linkage analysis • • • • • Parametric vs. non-parametric Dichotomous vs. continuous phenotypes Elston-Stewart vs. Lander-Green vs. heuristic Two-point vs. multipoint Genome scan vs. candidate gene Basics of Linkage Analysis 1. 2. 3. 4. Idea of Linkage Analysis Types of Linkage Analysis Parametric Linkage Analysis Conclusions Maximum likelihood estimation • • • • • • A common approach in statistical estimation Define hypotheses Generate likelihood function Estimate Test hypotheses Draw statistical conclusions Hypotheses in linkage analysis H0: – = 0.5 – the disease locus is not linked to the marker(s) HA: – 0.5 – the disease locus is linked to the marker(s) Likelihood function for a single nuclear family Lj = gF P(gF) P(yF | gF) gM P(gM)P(yM | gM) gOi P(gOi | gF, gM) P(yOi | gO) The parameter is incorporated here G = genotype probabilities y = phenotype probabilities Several independent families • The likelihood functions of multiple independent families are combined: • L = Lj or logL = log Lj Testing of hypotheses • Compute values of likelihood function under null and alternative hypotheses. • Their relationship is expressed by LOD score (essentially derived from the likelihood ratio test statistic. LOD ( ' ) log 10 L( ' ) log 10 L( ' ) log 10 L( 0.5) L( 0.5) On significance levels • P-value gives a probability that a null hypothesis is rejected even though it was true. • A LOD-score threshold of 3 corresponds to a single-test p-value of approximately 0.0001 • Often, the significant areas pointed out are quite large, from 10-40 cM (millions of basepairs) 0.56 0.5 LOD score 0.0 0.0 0.14 Recombination fraction LOD>3 taken as evidence of linkage. 0.5 Basics of Linkage Analysis 1. 2. 3. 4. Idea of Linkage Analysis Types of Linkage Analysis Parametric Linkage Analysis Conclusions Conclusions • Linkage analysis is a pedigree-based approach to gene mapping. • Parametric vs. nonparametric methods. • Hypothesis-driven vs. explorative analysis. • Meta-analysis (integration of several studies into “one big study”) becoming increasingly popular. Fine mapping and association analysis • After successful linkage analysis, what to do? • How to refine the linked area – where actually the disease susceptibility locus is? Outline of the rest of the lecture: • Allelic association • χ2 –test • LD mapping Allelic association • An example: A leukaemia study, where a number of affected and healthy control persons have been contacted for DNA samples • A candidate gene has been suggested: GSTM1, which functions in the metabolism of benzene • GSTM1 has two different alleles, 1 and 2, where – A person is “positive” for allele 1 if his genotype is 1 1 or 1 2 – A person is “null”, if having genotype 2 2 • The numbers of leukaemic and control individuals either positive or null with respect to allele 1 are compared by χ2-test in order to find out, whether there is statistically significant difference Allelic assosiation Results: observed frequencies Expected frequencies Test statistic • The observed are compared to expected frequencies. (null hypothesis, H0: carrier status and disease occurrence are independent of each other ) • Test statistic (oi ei ) i1 ei k 2 2 where • oi is the observed frequency for class i, ei the expected frequency for class i • k is the number of classes Allelic assosiation • Now, χ2 = 111,39. • Degrees of freedom for the test: df=(r-1)(s-1), where r = number of rows, s = number of columns Here, df = (2-1)*(2-1) = 1 • The χ2 value is then compared to the null distribution of critical χ2-test statistic values (within the given df class) χ2-distribution: critical values for chosen significance levels df\p 1 2 3 4 5 6 7 8 9 10 11 0.10 2.71 4.61 6.25 7.78 9.24 10.64 12.02 13.36 14.68 15.99 17.28 .05 3.84 5.99 7.81 9.49 11.07 12.59 14.07 15.51 16.92 18.31 19.68 .025 5.02 7.38 9.35 11.14 12.83 14.45 16.01 17.53 19.02 20.48 21.92 .01 6.63 9.21 11.34 13.28 15.09 16.81 18.48 20.09 21.67 23.21 24.73 .005 7.88 10.60 12.84 14.86 16.75 18.55 20.28 21.96 23.59 25.19 26.76 When the observed value of test statistic is greater than the critical value (for the chosen significance levels) given in the table, the null hypothesis can be rejected. Allelic association • • • The value we obtained, χ2 = 111,39 , exceeds all critical values with df=1 given in the table. We conclude, that H0 can be rejected and thus, there is statistically significant difference between the affected and healthy with respect to GSTM1 genotypes. The relative frequencies of ’null’ and ’positive’ genotypes show the same It seems that different GSTM1 genotypes, by changing the benzene metabolism, considerably affect the probability of getting leukaemia • • • • Note: compared to linkage analysis, which is based on the observed inheritance patterns in pedigrees, the association analysis studies correlation of allele presence and a disease in the level of population We find an allele or a haplotype overrepresented in affected individuals → BUT the statistical correlation does not implicate a causal relationship !!!! → Quite often, the associating allele or haplotype is not the cause of the disease itself, but is merely correlated with the presence of the actual susceptibility gene in the same chromosome. It is then said to be in linkage disequilibrium with the disease gene. → Time A Original mutation in one chromosome in the founder population Current generation B An affected pedigree C 6 2 1 3 1 2 5 3 LD mapping • The marker itself is NOT the reason for the disease, but it’s located nearby the disease susceptibility gene, and there is correlation between the presence of certain marker allele and the disease gene allele (LD) • The correlation, i.e. LD, is based on founder effect: the disease allele has been born a long time ago on a certain ancestral chromosome, and majority of disease alleles existing presently predate from that original mutation LD-mapping: Utilizing the founder effect Data Disease locus Disease status SNP1 S2 ... ... a a ? ? 2 1 1 2 1 1 1 2 2 11 2 2 1 2 11 2 1 2 1 2 2 1 2 1 1 2 c c 2 1 1 ? 1 ? ? ? 1 2 2 11 2 1 2 2 21 1 1 1 2 1 1 1 1 1 1 1 a a 1 1 1 2 1 1 1 2 1 1 2 11 2 2 2 2 11 2 2 1 2 2 1 1 ? ? ………… 1 1 Many approaches, several programs – ”old-fashioned” allele association with some simple test (problem: multiple testing) – TDT; modelling of LD process: Bayesian, EM algorithm, integrated linkage & LD Limitations: LD is random process The amount of LD is on a continuous but slow change, where the natural forces of – genetic drift – population structure – natural selection – new mutations – founder effect ...affect it – even if two pairs of loci are in exactly the same distance from each other, their amount of LD may vary a lot. → This limits the accuracy of LD mapping, though it is much more accurate in pinpointing the location of a disease gene compared to linkage