* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Complex” inheritance - CSC's mainpage — CSC
Cre-Lox recombination wikipedia , lookup
Behavioural genetics wikipedia , lookup
Genetic testing wikipedia , lookup
Designer baby wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Koinophilia wikipedia , lookup
Genetic studies on Bulgarians wikipedia , lookup
Medical genetics wikipedia , lookup
Heritability of IQ wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Public health genomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Human leukocyte antigen wikipedia , lookup
Genome-wide association study wikipedia , lookup
Human genetic variation wikipedia , lookup
Microevolution wikipedia , lookup
A30-Cw5-B18-DR3-DQ2 (HLA Haplotype) wikipedia , lookup
Genetic drift wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Linkage analysis, linkage disequilibrium analysis, and joint analysis Linkage and linkage disequilibrium analyses time LD analysis linkage analysis = simplified and approximate linkage analysis on extremely large pedigree (i.e., population) of unknown structure Possible tests of linkage and/or LD linkage in absence of LD no linkage linkage no LD no LD LD in absence of linkage ? LD given linkage no linkage LD ? linkage linkage given LD LD ? Why joint analysis of linkage and LD? • to use as much information as possible from a dataset to map the position of a locus Linkage and linkage disequilibrium analysis Why joint analysis of linkage and LD? • to use as much information as possible from a dataset to map the position of a locus • If a significant linkage signal has been obtained in a dataset, then the locus to be mapped obviously plays a substantial role in the etiology of the studied trait in that dataset. Therefore, the same (rather than a different) dataset should be used for fine mapping, e.g. by LD analysis. • joint analysis is often more than sum of parts Evidence of linkage can provide genotype and phase information to LD analysis (and vice versa) D + 1 2 phase probability w/o linkage probability w/ linkage I or D + D + 2 1 3 3 II Pr(I) = Pr(II) D D Pr(I) Pr(II) 1 3 Why joint analysis of linkage and LD? • to use as much information as possible from a dataset to map the position of a locus • If a significant linkage signal has been obtained in a dataset, then the locus to be mapped obviously plays a substantial role in the etiology of the studied trait in that dataset. Therefore, the same (rather than a different) dataset should be used for fine mapping, e.g. by LD analysis. • joint analysis is often more than sum of parts • Differences in typical ascertainment protocol: While ascertaining affected singletons or trios with an affected offspring (as is typical in LD analysis) does not normally enrich for an underlying genetic etiology of the trait, ascertainment on the basis of multiple affected individuals per family (as is typical in linkage analysis) often does. Why not dispense with linkage analysis altogether and go straight for genomewide association analysis? This is what proponents of the haplotype mapping (HapMap) project essentially suggest. There are at least 2 big problems, however. Genetic heterogeneity locus homogeneity, allelic homogeneity time locus homogeneity, allelic heterogeneity locus heterogeneity, allelic homogeneity (at each locus) locus heterogeneity, allelic heterogeneity (at each locus) time Association analysis is much more susceptible to allelic heterogeneity than linkage analysis LD analysis? not okay D linkage analysis? okay if D/D/D are alleles of the same locus D/+ D D D/+ D/+ D/+ D/+ D/+ D/+ D/+ D/+ D/+ D/+ D/+ D/+ D/+ D/+ RP1 RP15 RP2 RP3 RP7 RP11 CHM RP9 RP12 Peripherin-RDS ROM1 RP13 Rhodopsin RP14 CNCG LCA1 PDEA PDEB ABCR Other Genetic Factors Sex-Linked Dominant Alleles Sex-Linked Recessive Alleles Autosomal Dominant Alleles Autosomal Recessive Alleles Retinitis Pigmentosa Mendelian Disease Quality of Life Quantity of Life Environmental and Cultural Factors Sample size requirements to detect a RP gene by affected sib-pair analysis 5000 100000 4000 80000 70000 3000 60000 ADRP ARRP 50000 2000 40000 30000 1000 20000 10000 0 0 0 0.05 0.1 0.15 0.2 0.25 Proportion of families with disease alleles in a given gene 0.3 (autosomal dominant RP) Number of sib-pairs required for 95% power at lod score 3 (autosomal recessive RP) 90000 Sample size requirements to detect a rhodopsin allele as a risk factor for RP by TDT analysis Number of triads required for 95% power and p-value 0.0001 500000 400000 0.5 0.4 300000 0.2 0.3 200000 100000 0 0 0.05 0.1 0.15 0.2 Relative frequency of given rhodopsin risk allele 0.25 allelic heterogeneity: examples of COL1A1&2 genes From: Weiss KM (1993) Genetic variation and human disease: principles and evolutionary approaches. Cambridge University Press, Cambridge Genome-wide number of tests is much larger for association analysis than linkage analysis more stringent test criterion is required linkage analysis linkage analysis = LD analysis R? R? R? R? R? R? R? examining recombination status of individual meioses R in ≥1 meiosis? examining history of recombination status multiple meioses R: recombination Haplotype conservation: example of hereditary hemochromatosis Thomas et al (1998) A haplotype and linkage disequilibrium analysis of the hereditary hemochromatosis gene region. Hum Genet 102:517 Decay of LD by recombination max 0 generations decay rate: degree of LD 1 5 10 20 50 100 0 -0.2 -0.1 0 genetic distance (M) 0.1 0.2 Definition of linkage disequilibrium (LD) and allelic association Either term refers to the situation where alleles at different loci do not occur independently of each other on haplotypes, irrespective of the underlying cause of the non-independence. Let aij denote allele j at locus i. The two alleles a11 and a21 at locus 1 and 2 are in LD/allelic association if and only if Pa11a21 Pa11 Pa21 Pa21 | a11 Pa21 . Forces creating and destroying LD Terwilliger, Weiss (1998) Linkage disequilibrium analysis of complex disease: fantasy or reality? Current Opinion in Biotechnology 9:578 Sources of LD •“founder effect” (the allele of the trait locus, along with the surrounding haplotype, in individuals with the trait is shared IBD from a common ancestor) •drift (random fluctuation of haplotype frequencies from generation to generation) •admixture (migration between populations with different allele frequencies at the loci of interest) •interaction between alleles at different loci (epistasis) •poor matching of case and control samples (difference in allele frequencies is unrelated to the alleles at the trait locus; “comparison of apples and oranges”) presence/amount of LD is a function of the genetic distance between the loci A 1 B 1 C 1 C 1 a mutation occurs A 1 B 1 B 2 Complete disequilibrium Complete disequilibrium A 1 B 1 B 2 C 1 C 1 recombination occurs A 1 A 2 B 1 B Incomplete disequilibrium 2 A 1 A Incomplete 2disequilibrium B 1 B 2 C 1 time passes, more recombination occurs Equilibrium the haplotype frequencies are the product of the allele frequencies p(A1) = p(A) p(1) A 1 B 1 C 1 C 1 Complete disequilibrium C 1 Incomplete disequilibrium a mutation occurs A 1 B 1 B 2 recombination occurs A 1 A 2 B 1 B 2 time passes, more recombination occurs Equilibrium “Founder effect” Initially, when the initial copy of the trait allele is introduced into the population, the allele is present on a particular haplotype. As the allele is passed on through generations, alleles at neighboring marker loci are cotransmitted in a hitchhiking effect. Recombination occasionally breaks the haplotype, reducing the length of the conserved haplotype and the amount of LD. D Haplotype sharing due to a “founder effect” The apparently unrelated individuals in the sample of individuals with the trait received the same disease allele from a common ancestor; these individuals are therefore very distant relatives in reality. example genealogy time Decay of LD by recombination max 0 generations decay rate: degree of LD 1 5 10 20 50 100 0 -0.2 -0.1 0 genetic distance (M) 0.1 0.2 Principle behind LD mapping based on admixture Assume that 2 populations, both genetically homogeneous but genetically very different from each other, colonize a previously uninhabited island. Assume that the alleles at different loci in each populations are in linkage equilibrium, and that a rare “Mendelian” trait, with causative allele(s) “D”, is only present in one of the two populations. If one sampled case and control individuals from the joint population (in the initial generation, before mating between the two colonizing population has taken place), one would be able to detect LD between the trait and many markers, irrespective of genetic distance between the loci. This is because all cases would have been ascertained from the population harboring the trait, and the marker allele frequencies between cases and controls would differ for any marker with different allele frequencies in the two colonizing populations. (This is equivalent to getting “false positives” due to poorly matching case and control groups.) Assume that subsequently there is random mating in the joint population. The initial LD will decay rapidly due to recombination for all markers but those tightly linked to the trait locus. If one sampled cases and controls after several generations of random mating, one would therefore detect LD only with markers near the trait locus, demonstrating the potential usefulness of admixture-based LD mapping. Be aware that LD between a pair of loci will only result if the founding populations have different allele frequencies at both loci. Ideal population for LD mapping based on “founder effect” •very small, homogenous founder population •rapid subsequent population growth •for detection of LD: few generations since population was founded •for fine mapping: many generation since population was founded •panmixia •no admixture •homogeneous environment •large enough population to have a sufficient number of individuals with trait of interest •availability of genealogical records, high medical standards, favorable public and private attitudes towards genetic research Ideal population for LD mapping based on drift •small population size •no population growth •many generation since population was founded •panmixia •no admixture •homogeneous environment •large enough population to have a sufficient number of individuals with trait of interest •availability of genealogical records, high medical standards, favorable public and private attitudes towards genetic research Ideal population for LD mapping based on admixture •admixing populations are each homogenous and genetically very different from each other •for detection of LD: few generations since population was founded •for fine mapping: many generation since population was founded •panmixia in admixed population •no admixture after initial mixing of populations •homogeneous environment •large enough population to have a sufficient number of individuals with trait of interest •availability of genealogical records, high medical standards, favorable public and private attitudes towards genetic research Measures of strength of LD alleles of locus B (marker) alleles of locus A (trait) 1 pD 2 pD p2 2 D pD1 pD 2 pD p1 p2 p p1 p2 pD1 pD p1 1 p1 p p1 p2 p p2 pD1 p2 pD 2 p1 min , max , min max pD p1 , p p2 1 max min pD p2 , p p1 1 / min if 0 / max if 0 0,1 Testing for presence of LD marker alleles 1 2 cases (“affected”) n11 n12 n1 controls (“unaffected”) n21 n22 n2 n2 n n1 Do the marker alleles occur in equal proportions among the cases and controls? If not, and there is a significant difference in allele frequencies, the marker locus is probably in close genetic distance from the trait locus of interest. null hypothesis (H0): proportions are equal alternative hypothesis (H1): proportions are not equal (2-sided alternative) Testing for presence of LD marker alleles 1 2 cases (“affected”) n11 n12 n1 controls (“unaffected”) n21 n22 n2 n2 n n1 1) Fisher’s “exact” test computes exact p-values based on hypergeometric distribution; computationally intensive 2) chi-squared test uses continuous distribution (c2 to (approximately) represent categorical data; is therefore only appropriate when all cell counts are large, say, > 5; not computationally intensive Chi-squared test I J X 2 i 1 j 1 I J i 1 j 1 obs marker alleles ij exp ij n 1 2 exp ij ij nin j / n 2 nin j / n 2 cases (“affected”) n11 n12 n1 controls (“unaffected”) n21 n22 n2 n2 n ~ c 2 I 1 J 1 under H 0 . n1 n n n n n For 2 2 table, X 2 11 22 12 21 ~ c 21 under H 0 . n1n2n1n2 2 Often, a " continuity correction " is applied on a 2 2 table : n n11n22 n12n21 n / 2 2 X ~ c 21 under H 0 . n1n2 n1n2 2 Chi-squared test: multi-allelic marker case marker alleles 1 2 3 m cases (“affected”) n11 n12 n13 n1m n1 controls (“unaffected”) n21 n22 n23 n2 m n2 n1 n2 n3 nm n Either perform a separate test for each allele individually (by collapsing all other alleles): m tests on 2x2 tables, requiring correction for multiple testing (e.g. Bonferroni correction), or perform one chi-squared test on whole table (with m-1 degrees of freedom). Measured genotype analysis: A fixed effects model in which genotype-specific means are estimated Quantitative Trait Linkage Analysis: Variance Component Approach Modeling the Phenotype: p i xi qj a e Baseline mean Regression coefficients x q a e Scaled covariates QTL effects Residual genetic effects Random environmental effects Genotypes as covariates If effect of QTL is modeled as additive: Genotype AA Aa aa Cov -1 0 1 To allow for non-additive models: Cov1 Cov2 Genotype Add Dom AA -1 0 Aa 0 1 aa 1 0 FXII levels by FXII 46C/T genotype CC CT TT FXII levels 128.88 92.23 55.58 p < 110 -7 Prothrombin activity levels (%) Prothrombin levels by G20210A genotype 190 170 150 130 110 90 p < 110-7 70 50 G/G G/A A/A Disequilibrium is unpredictable. A QTL may be in equilibrium with the other polymorphisms surrounding it. Disequilibrium need not be present. LD within F7 gene POMC: Pattern of LD Caution! Negative results in an association study have implications only for the marker you have tested, not necessarily for the entire candidate gene.