* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Presentation #2 - UCLA Human Genetics
Gene expression profiling wikipedia , lookup
Skewed X-inactivation wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Neocentromere wikipedia , lookup
Genomic imprinting wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genome evolution wikipedia , lookup
Nutriepigenomics wikipedia , lookup
X-inactivation wikipedia , lookup
Population genetics wikipedia , lookup
Human genetic variation wikipedia , lookup
Behavioural genetics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genetic engineering wikipedia , lookup
Gene expression programming wikipedia , lookup
Microevolution wikipedia , lookup
Heritability of IQ wikipedia , lookup
Public health genomics wikipedia , lookup
Designer baby wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genome (book) wikipedia , lookup
HG236B, April 30, 2010 (Lusis) Mouse Genetics: Gene Mapping 1. Mendelian traits 2. Quantitative trait locus mapping 3. Fine mapping 4. Association analysis Genome Scan 1 genetic markers 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X Y Breeding Strategies for Mapping Genes in Mice: Backcross 1A 1A 1B 1B 2A 2A 2B 2B x Parental Strain #1 Parental Strain #2 1A 1B 1B 1B 2A 2B 2B 2B x F1 Heterozygote Backcross Progeny Parental Strain #2 1A 1B 1B 1B 1A 1B 1B 1B 2A 2B 2B 2B 2B 2B 2A 2B parental recombinant Linkage Analysis in a Backcross: An Example Recombinant Parental Expected in absence of linkage (total 100): Observed: A A a A A A a A B B b B b B B B 25 25 25 25 30 32 20 18 Estimated distance: 38 = 38 cM 100 But, are these data significant? Chi Squared Test n χ2 = Σ (Oi – Ei)2 i=1 Ei Oi = observed value in ith group Ei = expected value in ith group number of groups = n degrees of freedom = n - 1 χ2 = (obsr – expr)2 + expr (obsp – expp)2 expp = (38 – 50)2 50 + (62 – 50)2 50 = 144 50 144 50 = + 5.76 Number of degrees of freedom = one less than the number of potential outcome classes = 1 Chi-Square Distribution Probability Degrees of Freedom 0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 0.01 0.001 1 .004 0.02 0.06 0.15 0.46 1.07 1.64 2.71 3.84 6.64 10.83 2 0.10 0.21 0.45 0.71 1.39 2.41 3.22 4.60 5.99 9.21 13.82 3 0.35 0.58 1.01 1.42 2.37 3.66 4.64 6.25 7.82 11.34 16.27 4 0.71 1.06 1.65 2.20 3.36 4.88 5.99 7.78 9.49 13.28 18.47 5 1.14 1.61 2.34 3.00 4.35. 6.06 7.29 9.24 11.07 15.09 20.52 6 1.63 2.20 3.07 3.83 5.35 7.23 8.56 10.64 12.59 16.81 22.46 7 2.17 2.83 3.82 4.67 6.35 8.38 9.80 12.02 14.07 18.48 24.32 8 2.73 3.49 4.59 5.53 7.34 9.52 11.03 13.36 15.51 20.09 26.12 9 3.32 4.17 5.38 6.39 8.34 10.66 12.24 14.68 16.92 21.67 27.88 10 3.94 4.86 6.18 7.27 9,34 11.78 13.44 15.99 18.31 23.21 29.59 Nonsignificant Significant Ikeda, et al. Nature, 30:401 (2002) Inbred strains of mice differ in traits relevant to common diseases in humans Naturally Occurring Mouse Models for Common Human Diseases Disorder Strain Alcoholism/drug addiction C57BL/6 Arthritis MRL Asthma A Atherosclerosis C57BL/6, DBA Autoimmune disease NZB, NZW Cleft palate A Deafness LP Dental disease C57BL/6, BALB/c Diabetes, Type 1 NOD Diabetes, Type 2 C57BL/6 Epilepsy EL, SWR Hemolytic anemia NZB BALB/c Hepatitis Hodgkin’s disease SJL Hypertension MA/My Obesity Many strains Osteoperosis DBA Daily Average Food Intake Adjusted by Weight (g/30gBW) Mapping Genes for a Complex Trait in a Cross between Two Strains of Mice Strain B Strain A FI Hybrids F2 Intercross Mice Hepatic fibrosis in 7 inbred strains and A x BALB/c F2 mice Screen inbred strains for trait of interest to identify those that differ the most. Construct an F2 cross using the 2 extreme strains (A and BALB/c) to generate a large number of mice to map loci responsible for trait differences in the parental strains. 382 AxBALB/c F2 6 wk time point GENOTYPE PHENOTYPE A backcross between two strains typed for a trait The backcross mice were typed at a marker on Chr 1 and another on Chr 2 Linear regression model y f(x) y = βx + e x y = observed value (ex: weight = 2.2, 2, 4, ) x = value of the predictive variable (ex: snp genotype = AA,GG, AA). x is observed β = slope, expected change in y for one unit change in x e = unobserved random variable, which adds noise to the observed y (contributes to variation in y). Sometimes referred to as “error”, although it is not necessarily error Mapping using linear regression Phenotype y simple case xi A/A A/G G/G y = βxi + e y = observed phenotype for each individual, ex: weight xi = genotype at a given marker β = slope, gives change in y for each x. β=snp effect size e = remaining variation in phenotype y, not explained by xi – With linear regression, a likelihood ratio is used, derived from goodness of fit for the model with genetic effects included vs that without: • H1 : y = µ + β1(additive) + β2(dominant) + e (full model) vs • H0 : y = µ + e (reduced model) • The likelihood ratio LR = n loge (RSSreduced/RSSfull ) . – RSS= residual sum of squares; n = # of observations • LOD score = LR/4.61 Distribution of body weight and body weight QTL in B6 x BTBR ob/ob F2 cross Stoehr et al. Diabetes 59:245 (2004) Identifying QTL • Interval mapping: – “Simple interval mapping” incorporates marker map position and adjacency – “Composite interval mapping” additionally incorporates background markers and is designed for detecting multiple QTL. • A number of QTL mapping programs have been developed. (List at: http://www.mapmanager.org/qtsoftware.html) – – – – MAPMAKER/QTL Map Manager QTL QTL Cartographer R/qtl “…we can effectively destroy any association between the trait values and the analysis points linked to the QTL by randomly shuffling the trait values, i.e., by reassigning each trait value to a new individual while retaining the individual’s genetic map.” The standard error for an empirical p-value is the square root of p(1 − p)/N, where p is the empirical p-value and N is the number of permuted data sets. Thus, for example, 800 permuted data sets are sufficient to establish a standard error of 0.005 for an empirical p-value of 0.02, assuring us that it is well below the 0.05 significance level. QTL analysis is highly reproducible Estimating QTL effect size in crosses • Total trait (y) variance is the sum of genetic and environmental components, determined in the F2 mice by: Variance = s2 = (Σ(x‐mean)2 )/(n‐1) • Environmental variance is estimated from parental strain data as: (s2a + s2b) / 2 • Overall genetic variance (heritability) is: Total – Environmental variance. • QTL effect size is the % of total variance explained by a given QTL. The effect size of most QTL is under 10% Flint NRG 2005 The resolution of QTL is generally poor, and thus identification of causative gene is a bottleneck • QTL mapping began in early 1990’s • By 2005, approximately 2,050 mouse and 700 rat QTLs reported • Only 20 causative genes identified • At this rate (20 genes/15 years) it will take 1500 years to identify causative genes for already identified QTL • What approaches might help? Flint et al. 2005 Strategies to dissect quantitative trait loci for gene identification Parent 1 Parent 2 X Congenic strains Recombinant inbred lines Chromosome Recombinant inbred congenic substitution strains lines Advanced intercross lines Development Congenic Strains Construction ofof Congenic Strains Strain A Strain B N1 25% Strain B N2 12.5% Strain B Repeated Backcrossing for ~10 Generations Congenic Strain A.B. Locus Fine mapping using “subcongenic strains” Critical Region Plasma cholesterol High Low Low High High High High Low High Low High BALB MRL Strategies to dissect quantitative trait loci for gene identification Parent 1 Parent 2 X Congenic strains Recombinant inbred lines Chromosome Recombinant inbred congenic substitution strains lines Advanced intercross lines Construction of Recombinant Inbred (RI) Strains Chromosome 15 genotypes in recombinant inbred (RI) strains derived from C57BL/6 (filled) and DBA/2 (open) RI strain number The “Collaborative Cross”‐ multi‐line RI strains 8 progenitor strains Design: Representative genotype distribution: X 1000 Benefits: More genetic variation; finer mapping; cumulative data; single genotyping Strategies to dissect quantitative trait loci for gene identification Parent 1 Parent 2 X Congenic strains Recombinant inbred lines Chromosome Recombinant inbred congenic substitution strains lines Advanced intercross lines Demant P. (2003) Strategies to dissect quantitative trait loci for gene identification Parent 1 Parent 2 X Congenic strains Recombinant inbred lines Chromosome Recombinant inbred congenic substitution strains lines Advanced intercross lines Science, 2004 Strategies to dissect quantitative trait loci for gene identification Parent 1 Parent 2 X Congenic strains Recombinant inbred lines Chromosome Recombinant inbred congenic substitution strains lines Advanced intercross lines Genome‐wide genetic association of complex traits in heterogeneous stock mice William Valdar, Leah C Solberg, Dominique Gauguier, Stephanie Burnett, Paul Klenerman, William O Cookson, Martin S Taylor, J Nicholas P Rawlins, Richard Mott & Jonathan Flint Nat. Genet 38, 879 (2006) Boxes above peaks are 95% confidence intervals and corresponding bootstrap probabilities History of laboratory inbred strains of mice Laboratory inbred strains are mosaics derived from several widely divergent subspecies from Frazer KA, Eskin E, Kang HM et al. Nature. Aug 2007 http://mouse.cs.ucla.edu/ Linear Model 400 - 400 - 350 - 350 - 300 - 300 - 250 - 250 - 200 - 200 - G C y = μ + βx + ε Associated: β≠0 T C y = μ + βx + ε Not Associated: β=0 Association studies p=0.001 -log10(Pvalue) -log10(Pvalue) Chromosome Complex genetic relatedness of lab strains body weight 10.0 15.0 20.0 25.0 30.0 35.0 Eun Yong Kang, Chris Jones, E. Eskin Efficient Mixed Model Association (EMMA) reduces inflated p‐values Body weight t‐test Body weight EMMA Saccharin preference t‐test Saccharin preference EMMA Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, Daly MJ, Eskin E. Genetics. 2008: 178:1709‐23. Linear Mixed Model Fixed effects + Random effects (aka variance components) Let’s break it up.... Random effects are not ‘random’, they are Random Variables! Random Variable (RV) is a variable (an event) that takes on values with a given probability Examples: a) Roll a die, let U denote the number observed p(U=1) = 1/6, p(U=2) = 1/6... b) Roll two dice, let U denote the sum of the two numbers observed: p(U=1) =0, p(U=2)=(1/6)x(1/6), etc... c) Let U ~ identity by descent at a locus, between sibs p(U=0) = 1/4, p(U=1) = 1/2, p(U=2) = 1/4. The values of U occur with a given probability, they are not fixed, hence U is a random variable y = βX + Zu + e Variance components var(u) = σ2gK var(e) = σ2e σ2g*K is the nxn var‐covar matrix Describes the covariance structure among strains i.e. the additive genetic variance σ2g is proportional to the kinship K, The kinship itself is not random, it’s a constant By including K, we allow part of the genetic variance to be explained by K Inbred/recombinant inbred population for high resolution mapping : Whole genome association ~40 inbred strains >135,000 SNPs Classical inbred strains provide mapping resolution ~70 recombinant inbred strains >135,000 SNPs Data collection: Phenomic Transcriptomic Proteomic Metabolomic Whole genome association Recombinant inbred strains provide statistical power Plasma high density lipoproteins (n=8 males/group) Chromosome 1 locus for HDL 40 Mb (~300 genes) Chromosome 1 locus for HDL: Validation of ApoA2 gene involvement * =