Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Human Genetics, part I Liisa Kauppi (Keeney lab) Mapping Mendelian and complex diseases - Linkage mapping in pedigrees - Association mapping in populations Genes and Environment QuickTime™ and a TIFF (Uncompress ed) dec ompres sor are needed to s ee this pic ture. “Natural” mutants only QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Heritability: first degree relatives of a patient at greater risk QuickTi me™ and a T IFF (Uncom pressed) decom pressor are needed to see t his pict ure. For type I diabetes, l = 15 (6%/0.4%) Twin studies: Adopted (separated in infancy) Fraternal vs. identical twins Quic kT i me™ and a T IFF (Unc ompres s ed) dec ompres s or are needed t o s ee thi s pi c ture. Biological vs. non-biological siblings QuickT ime ™an d a TIFF ( Uncomp res sed) deco mpre ssor ar e need ed to see this pictur e. Genes and Human Disease MENDELIAN High penetrance, Single gene COMPLEX/MULTIFACTORIAL DISEASE Polygenic, Reduced penetrance Osteoporosis Schizophrenia “Common disease” Cystic fibrosis Asthma Blood type Height EASY Pure environment Infectious disease Snakebite Body weight HARD Language Polymorphic markers are needed for disease mapping Microsatellites Tandem arrays of simple repeats, for example (CA)n, n=15…27 MULTIALLELIC A G Single nucleotide polymorphisms (SNPs) Abundant, perhaps 1 every 300 bp - RFLPs Mostly non-coding BI-ALLELIC Genotype frequencies: Hardy-Weinberg equation B allele has frequency p b allele has frequency q p+q=1 p (B) q (b) p (B) p2 (BB) pq (Bb) q (b) pq (Bb) q2 (BB) p2 (BB) + 2pq (Bb) + q2 (bb) = 1 Hardy-Weinberg equilibrium How are recessive traits maintained in a population? HWE of allele frequencies: p2 + 2pq+ q2= 1 Hypothetical example: in Sardinia, 1 in 5 individuals have straight hair This trait is determined by a single gene and it is recessive. S allele = curly hair, s allele = straight hair Frequency of s/s homozygotes is 0.2 Frequency of s allele is 0.45 (0.2) Frequency of S allele is 1 - 0.45 = 0.55 Gametes for next generation: S s S 0.552=0.3 0.55 x 0.45 = 0.25 s 0.55 x 0.45 = 0.25 0.452=0.2 Frequencies of genotypes and alleles remain unchanged from one generation to the next. HWE allows calculations of carrier frequencies for recessive traits (with caution) Example: Cystic fibrosis, alleles CF and cf Incidence 1/2000 births p2 + 2pq+ q2= 1 Frequency of cf/cf homozygotes is 0.0005 Frequency of cf allele is 0.022 (0.0005) Frequency of CF allele is 1- 0.022 = 0.978 Frequency of CF/cf heterozygotes is 2 x 0.978 x 0.022 = 0.043 So what if genotypes at a locus are not in HWE? p2 + 2pq+ q2= 1 Suggests that assumptions are not met Example: heterozygote deficit could arise from recent admixture Population 1 B freq 0.9 b freq 0.1 Population 2 B freq 0.1 b freq 0.9 n=1000 0.81+0.18+0.01 0.01+0.18+0.81 810+180+10 10+180+810 B freq 0.5 b freq 0.5 n=2000 n=1000 0.25+0.5+0.25 500+1000+500 820+360+820 HWE expected observed Departure from HWE (heterozygote excess): the Prion protein gene and human disease • PRNP gene linked to prion diseases e.g. CJD, kuru • A common polymorphism, M129V, influences the course of these diseases: the MV heterozygous genotype is protective • Kuru acquired from ritual cannibalism was reported (1950s) in the Fore people of Papua New Guinea, where it caused up to 1% annual mortality • Departure from Hardy-Weinberg equilibrium for the M129V polymorphism is seen in Fore women over 50 (23/30 heterozygotes, P = 0.01) Linkage studies - recombination in a family how often are 2 loci separated by meiotic recombination? I 2 loci on same chromosome II Informative and uninformative meioses III NR NR R NR Recombination fraction is 2/6=0.33 NR R Recognizing recombinants does the disease segregate with this marker? 1 I 25 16 II 6 21 34 III 31 32 41 NR NR NR 41 NR Recombination fraction is 1/6=0.167 42 32 NR R Recognizing recombinants Often samples are missing I II 21 34 III OR 31 32 41 NR R NR R NR R 41 NR R 42 NR R Recombination fraction is 1/6=0.167 or 5/6=0.833 32 R NR Recognizing recombinants Tracing additional family members can help I II 56 21 34 III 31 32 41 NR NR NR 41 NR 42 32 NR R 15 16 But are these identical by descent? Which marker is the disease locus closest to? Lod scores Logarithm of odds (Lod) score Z Z = log Likelihood of loci being linked Likelihood of loci not being linked For the example pedigree with 1/6 recombinants: Z = log (1 - 0.167)5 x 0.167 (0.5)6 = 0.632 Lod scores between -2 and +3 are inconclusive Below -2 exclusion Above +3 linkage Requires a precise genetic model Which marker is the disease locus closest to? Multi-point lod scores chr 3p12-14 Waardenburg syndrome type 2 After Hughes et al. (1994) Nature Genet 7, 509-512 Multifactorial diseases (no simple Mendelian inheritance pattern) Sib-pair analysis 21 34 Number of shared parental alleles probability 32 32 31 42 41 2 1 1 0 1/4 1/2 1/4 Affected sib-pairs Which loci do the affected sibs share more often than expected by chance? 21 34 21 34 Number of shared parental alleles Number of shared parental alleles 32 32 31 2 1 32 32 2 Detecting linkage in pedigrees can be complicated… … and you need lots of meioses! Association mapping in a population Cases vs. controls HLA-DR4 allele (UK) General population 36% Rheumatoid arthritis patients 78% Seek correlation between genotype and phenotype Allele B is associated with disease D if people who have D also have B more often than predicted from B’s frequency To test every polymorphism is too expensive Linkage disequilibrium (LD) measures association between two alleles Mutation creates new variants A G A A A G T A Initially, the new allele is in LD with nearby alleles LD value = 1 Recombination reshuffles existing variation A G A G X T A A T LD diminishes If enough crossovers take place, the loci are in “free association” Commonly used LD measures: D’ and r2 Haplotypes are sets of markers inherited as a “package” meiotic recombination creates novel haplotypes Markers form haplotype blocks in the population LD is a measure of allelic association in a population 2 SNP loci on the same chromosome C/G A/T C T G C G A A T < 4 combinations -> LD Conversely: all 4 combinations -> low or no LD But also: population history, drift, selection… Disease haplotypes shorten from one generation to the next Recombination hotspots are key in shaping haplotype blocks Perhaps at least 90% of crossovers take place at highly localized hotspots HLA class II Recombination activity Haplotype blocks Kauppi et al. (2004) Nat Rev Genet 5, 413-424 How do you extract haplotypes from genotype data? A/T C/G A T C G Blood DNA or A T G C ? Other family members Other individuals in population A A C C T T G G Data just released: A haplotype map of the human genome, Nature 437, 1299-1320 HapMap project Examines haplotypes in four populations DNA samples: 270 people in total Yoruba (Nigeria): 30 parent-child trios Whites with North and West European ancestry (USA): 30 trios Japan: 45 unrelated individuals China: 45 unrelated individuals Identify “haplotype tag SNPs” to minimize genotyping effort >3,500,000 SNPs typed in total Limited within-block diversity Example: a 8.5-kb long block on chr 2, 36 SNPs typed In principle, could give rise to 236 different haplotypes Only seven different haplotypes found among 120 European chromosomes Recombination hotspots are widespread and account for LD structure 7q21 The International HapMap Consortium Pairwise tagging A/T 1 A A T T G/A 2 G G A A high r2 G/C 3 G C G C T/C 4 T C C C high r2 G/C 5 A/C 6 A C C C G C G C high r2 After Carlson et al. (2004) AJHG 74:106 Tags: SNP 1 SNP 3 SNP 6 3 in total Test for association: SNP 1 SNP 3 SNP 6 The Common-Disease CommonVariant Hypothesis • Says – disease-predisposing variants will exist at relatively high frequency (i.e. >1%) in the population. – are ancient alleles occurring on specific haplotypes. – detectable in a case-control study using tagging SNPs. • Alternative hypothesis says – disease-predisposing alleles are sporadic new mutations, perhaps around the same genes, on different haplotypes. – families with history of the same disease owe their condition to different mutations events. Does same phenotype mean same genotype? Coding SNPs, nonsynonymous or synonymous “Regulatory” SNPs Common Gene Variation in Complex Disease • Case-control studies, comparing the frequencies of common gene variants can identify susceptibility and protective alleles • Some have multiple identified genes (*) Phenotype Gene Variant IDDM* Alzheimer dementia Deep venous thrombosis Colorectal cancer NIDDM HLA APOE F5 APC PPAR DR3,4 E4 Leiden 3920A 12A Other types of variation may also have a role in complex disease common copy number polymorphisms large scale rearrangements, deletions and insertions microsatellite expansions, small insertion/deletions etc.