Download Lecture 3 Human Genetics

Note that the genetic map is different for men and women Recombination frequency is higher in meiosis in women How do we extend the map to identifying disease genes? • The CEPH families were instrumental in constructing the map • But our goal is to map human diseases •You rarely get large multi-generational highly informative families • How do we get to a lod score of 3 with small families? That’s the awesome power of logarithms Science (2006) 312:279-282 Recall: The small family (5 kids) and Mom (informative) was either: Dd 12 dd 22 D d D d 1 2 OR 2 Dd 12 dd 12 Dd 22 Dd 22 1 dd 12 What if there were no crossovers? L = [(0.9)5 + (0.1)5 ]/ 2 (0.5)5 Z = log10 L = 0.97 (For one crossover, Z = 0.021, the crossover penalty) Add the lod scores from different families This is the same as multiplying probabilities What is the probability of two coin flips and getting two heads? 0.5 x 0.5 = 0.25 (product rule in statistics) If the same markers are in two different families, then they are independent 4 or 5 small families, and a small number of crossovers, should suffice Works extremely well for DNA markers, more problematic for diseases If the disease (phenotype) is caused mutations in either of two genes, then mixing lod scores will confound the analysis (called heterogeneity) Autosomal Recessive Use IBD Mapping Look for homozygous region in affected individuals, Not homozygous in unaffected individuals IDB preserves the haplotype Similar principle as Linkage Disequilibrium Except it is with individuals, not populations Disequilibrium Mapping A way to map genes using populations Instead of using pedigress, use all of the patients We are interested in haplotypes Haplotype 1 2 alleles 2 alleles 2 alleles C T A C A A C G CTTCC[1396bp]GAAGCTCAGAAAGG GAAAGGAAAAGAAGATTT G GATAATATAAAAAATAT[2502bp]TTGGGAATTTACA AATAC Haplotype 2 CTTCC[1396bp]GAAGCTCAGAAAGG GAAAGGAAAAGAAGATTT GATAATATAAAAAATAT[2502bp]TTGGGAATTTACA AATAC Haplotype 3 GAAAGGAAAAGAAGATTT CTTCC[1396bp]GAAGCTCAGAAAGG GATAATATAAAAAATAT[2502bp]TTGGGAATTTACA AATAC Consider five loci each with two alleles A B C D E A1 B1 C1 D1 E1 A2 B2 C2 D2 E2 How Many Haploytpes? Individual = A1 B2 C1 D2 E1 A2 B2 C1 D1 E2 Two haplotypes In theory there are 25 (32) possibilities IF the combinations are independent In practice, far fewer (5-10 in sub-Mb distances) WHY? Some SNPs are “old” Example A1 and A2, D1 and D2 If they are in Hardy Weinberg equilibrium,then 4 haplotypes A1 A1 A2 A2 A B C D E A1 B1 C1 D1 E1 A2 B2 C2 D2 E2 D1 D2 D1 D2 A new SNP arises (B2), but in just one haplotype A1 B2 D1 A1 A1 A2 A2 B1 B1 B1 B1 D1 D2 D1 D2 New Haplotype Even later, two new SNPs arise (C2 and E2) A1 B2 C1 D1 E1 A1 B1 C1 D1 E1 A1 B1 C1 D2 E1 A2 B1 C1 D1 E1 A2 B1 C1 D2 E1 A1 B1 C2 D1 E1 A2 B1 C1 D1 E2 So we end up with a total of 7 haplotypes for 5 SNPs There is a possibility of recombination between SNPs However, this is very slow and improbable, especially for short distances Now consider that a disease mutation arises between C and D Just like the SNPs, it is likely to have arisen once And it is in only one of the common 7 haplotypes Therefore the SNP alleles in that haplotype are correlated with the mutation This is the principle of DISEQUILIBRIUM MAPPING It depends on: 1. Age of the mutation 2. Age of the SNPs in the haplotype 3. Age of the population 4. Frequency of recombination (distance between) SNPs Disequilibrium mapping is particularly useful when: There is a relatively new disease mutation Relatively isolated (and hopefully new) population (Finland) Population Allele Frequencies 1 2 A B 0.3 0.7 0.2 0.8 C 0.8 0.2 * D E 0.7 0.3 0.2 0.8 If equilibrium, patients should have same allele frequencies If disequilibrium, patients should have increased frequencies near the disease gene The degree of deviation should be maximal near the disease gene Simple Case: Autosomal dominant disease arises between C and D of a particular genotype A1 B2 C1 A Few Generations Later: * D2 E1 Allele Frequencies Population Patients 0.3 0.8 0.3 0.3 0.2 1 1 1 1 1 0.6 0.9 1 1 0.9 0.4 0.8 1 1 Over time: (Patients only) Later 0.7 Deviation from Population Frequency Distance along Chromosome Disease Gene So this is it………………. How do we find the gene and the mutation? We need to make the correlation with the genetic map (for example distance in cM) to the physical map (DNA) Most important to have the physical map annotated All methods give a map location (Maximum Likelihood) Distance along Chromosome Disease Gene Point your browser to genome.ucsc.edu Identify the genes in the interval Look for best candidate Expression data (is the gene expressed in affected tissue?) Is expression of the gene affected in patients? Ultimately we must search for mutations DNA sequencing is best (SSCP is usually done first) Does the mutation make sense For example, recessive= loss of function SNP chips Lots of possibilities Great Dane x Mexican Chihuahua F1 Big (Great Danes) 3 Big : 1 Small? Not Likely………….The sum total of many gene….multigenic Many human disorders, conditions and predispositions are multigenic Twin studies where identical twins are raised together or raised apart Look at complex behaviors and ask if they are genetic or environment Answer: For almost every single behavior…..it’s a little of both “Heritability” or the fraction of the condition that is genetic But how many genes? Association studies…..use SNP chips and the awesome power of Computational Biology

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture 3 Human Genetics