* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download GLYPHOSATE RESISTANCE Background / Problem
Genetic testing wikipedia , lookup
Ridge (biology) wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Minimal genome wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genetic engineering wikipedia , lookup
Genomic imprinting wikipedia , lookup
Medical genetics wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Gene expression programming wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene expression profiling wikipedia , lookup
Pharmacogenomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genome evolution wikipedia , lookup
Genetic drift wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Public health genomics wikipedia , lookup
Human genetic variation wikipedia , lookup
Behavioural genetics wikipedia , lookup
Genome (book) wikipedia , lookup
Heritability of IQ wikipedia , lookup
Designer baby wikipedia , lookup
Population genetics wikipedia , lookup
Reverse genetics: Quantitative Trait Locus (QTL) mapping Association mapping Integrating Mendelian and Quantitative Genetics using molecular techniques Mendelian trait 1 Individual 2 3 4 5 6 7 8 9 10 Allele A1 Allele A2 Genotype = 12 11 22 22 11 22 12 11 22 12 Quantitative trait 16 28 40 52 64 76 88 Height Courtesy of Glenn Howe Identifying Genes Underlying Phenotypes Linkage and quantitative trait locus (QTL) analysis Need a pedigree with segregating traits Linkage map with moderate number of markers Very large regions of chromosomes represented by markers Quantitative Trait Locus Mapping A B C A B C Parent 2 a b c X HEIGHT Parent 1 a b c A B C A B c B b Bb A B C X A A b B c c a a BB c c BB a b c F1 F1 BB a A B b c c bb A a b b c c bb A A b B c C BB bb a b c A A B b c c Bb a a B B c c Bb Bb GENOTYPE A a b B c c BB a B c BB “Genetic architecture” of quantitative traits QTL studies can reveal the following facets of the genetic architecture of a quantitative trait: -Number of genes underlying the trait -The strength of effect of each gene -Additive vs. dominant effects of traits -Potential gene interactions among genes -Ultimately, “QTN” or the actual genes involved Quantitative Trait Locus Analysis Step 1: Make a controlled cross to create a large family (or a collection of families) Parents should differ for phenotypes of interest Segregation of trait in the progeny Step 2: Create a genetic map Large number of markers phenotyped for all progeny Step 3: Measure phenotypes Need phenotypes with moderate to high heritability Step 4: Detect associations between markers and phenotype using a model Step 5: Identify underlying molecular mechanisms Step 1: Construct Pedigree Cross two individuals with contrasting characteristics Create population with segregating traits Ideally: inbred parents crossed to produce F1s, which are intercrossed to produce F2s Recombinant Inbred Lines created by repeated intercrossing Allows precise phenotyping, isolation of allelic effects Grisel 2000 Alchohol Research & Health 24:169 Step 2: Construct Genetic Map Based on nonrandom association of alleles at different loci in pedigree Calculate pairwise likelihood of linkage Gives overview of structure of entire genome Most efficient with anonymous markers: AFLP Codominant markers much more informative: SSR Step 3: Determine Phenotypes of Offspring Phenotype must be segregating in pedigree 0.1 Must differentiate genotype and environment effects 0.5 How? Works best with phenotypes with high heritability 0.9 Proportion of total phenotypic variance due to genetic effects Why is this important? Step 4: Detect Associations between Markers and Phenotypes Single-marker associations are simplest Simple ANOVA, correcting for multiple comparisons Log likelihood ratio: LOD (Log10 of odds) If QTL is between two markers, situation more complex Recombination between QTL and markers (genotype doesn't predict phenotype) 'Ghost' QTL due to adjacent QTL Use interval mapping or composite interval mapping Simultaneously consider pairs of loci across the genome Step 5: Identify underlying molecular mechanisms QTL chromosome Genetic Marker QTG: Quantitative Trait Gene QTN: Quantitative Trait Nucleotide Adapted from Richard Mott, Wellcome Trust Center for Human Genetics QTL mapping: model for a single marker locus r A Q a q a q x a q Marker locus A, quantitative trait locus Q, recombine at rate r Qq genotype has mean Qq qq genotype has mean qq Offspring Aa has mean Aa=Qq (1-r) + qq r aa has mean aa=Qq r + qq(1-r) QTL effect = (Qq - qq )= (Aa-aa)/(1-2r) Recombination rate confounded with QTL effect QTL mapping: model for flanking marker loci r1 r2 A Q B a q b a q b x a q b In simplest case, two markers A and B flank the QTL Enough degrees of freedom to separately estimate QTL effect "Interval mapping": estimate QTL effect in a sliding window along the marker map Many approaches developed... QTL map of in Douglas fir (bud opening date) Figure 2.—Seven QTL for terminal bud flush were detected in the growth initiation experiment . QTL were found on six linkage groups (2, 3, 4, 5, 12, and 14) and were detected in five of the six treatment combinations. Jermstad et al. (2003) Genetics, Vol. 165, 1489-1506 QTL Vary by Year, Site, and Population Loblolly pine QTL measured in different years at same site, in different sites, and with a different genetic background Stippled: not repeated across years % latewood wood-specific gravity Brown et al Drawbacks of QTL mapping Often results are difficult to reproduce, and vary by year, pedigree and location Multiple experiments are needed to confirm results, but experiments are large undertakings (population size, genotyping, phenotyping) Even if QTL localized to a few cM, this could correspond to 1000s of KB of DNA, containing many genes As controlled crosses are used, only a fraction of natural variation surveyed Biased towards detecting large effect QTL, as small effect QTL are not statistically significant Association Genetics Methods for associating phenotypes with SNPs Effects of population structure Candidate gene approaches QTL mapping vs. association genetics Indirect vs. direct association Two approaches to association studies Population-based Cases (affected individuals) and unrelated population controls (unaffected individuals) collected from “one” population Effects of population structure can be incorporated Family-based Child-family trios and TDT design is the most common Robust to effects of population structure Case – control association test The simplest method Compare SNP frequencies of affected vs. unaffected Chi-square with one degree of freedom test Genotype “AA” Genotype Total “Aa” affected a b a+b unaffected c d c+d Total a+c b+d C21 = (ad - bc)2N . (a+c)(b+d)(a+b)(c+d) Case-Control Example: Diabetes Knowler et al. (1988) collected data on 4920 Pima and Papago Native American populations in Southwestern United States High rate of Type II diabetes in these populations Found significant associations with Immunoglobin G marker (Gm) Does this indicate underlying mechanisms of disease? Case-control test for association (case=diabetic, control=not diabetic) Gm Haplotype Type 2 Diabetes present absent Total present 8 29 37 absent 92 71 163 100 100 200 Total Question: Is the Gm haplotype associated with risk of Type 2 diabetes??? (1) Test for an association C21 = (ad - bc)2N . (a+c)(b+d)(a+b)(c+d) = [(8x71)-(29x92)]2 (200) = 14.62 (100)(100)(37)(163) (2) Chi-square is significant. Therefore presence of GM haplotype seems to confer reduced occurence of diabetes. (Note the test is exactly analogous to calculating r2 between two loci). Case-control test for association (continued) Question: Is the Gm haplotype actually associated with risk of Type 2 diabetes??? The real story: Stratify by American Indian heritage 0 = little or no indian heritage; 8 = complete indian heritage Index of indian Heritage Conclusion: Gm Haplotype Percent with diabetes 0 Present Absent 17.8 19.9 4 Present Absent 28.3 28.8 8 Present Absent 35.9 39.3 The Gm haplotype is NOT a risk factor for Type 2 diabetes, but is a marker of American Indian heritage Family-Based Association: The Transmission Disequilibrium Test (TDT) Still an association test (like a case-control), but we study parents and offspring and we condition on the parental genotypes -this reduces effects of population stratification Given the genotypes of the parents, is there an allele that is transmitted more frequently to affected individuals? Only look at affected offspring with at least one heterozygous parent, and consider only family with affected progeny Under the null hypothesis (H0) of no linkage, what proportion of alleles do we expect the heterozygous parent to transmit? AB AA AB or AA? To do TDT, (1) we count the number of kids inheriting A or B across many families (trios) with affected kids (2) Statistically test whether this observed number is different from 50:50 (3) If NOT 50:50, then affected kids may be inheriting one allele preferentially over the other Transmission Disequilibrium Test (TDT) (with known parental genotypes and 2 alleles at the locus) For each heterozygous parent in each family, we determine which allele is transmitted to the affected offspring and which is not. AB AA AB AB number=b AA AA number=c H0: Two alleles are transmitted equally (no linkage and no association) Ha: One of the alleles is preferentially transmitted (linkage and association) Test statistic is (b - c)2 b+c ; c 2 with 1 df Transmission Disequilibrium Test (TDT) : Example For each heterozygous parent in each family, see which allele is transmitted to the affected offspring and which is not. 12 11 12 12 10 families 11 15 families TDT test b= , c= (b - c)2 = b+c 11 = , p-value = Methods for genetic association in natural populations • Standard general linear models (GLMs), usually with p values computed by permutation. y = + mi + eij, where y is the trait value, is a general mean, mi is the genotype of the i-th SNP and eij is the residual. • Structured Association (Pritchard et al. 2000; Thornsberry 2001) and PCA Association (Price et al. 2006). Controls for population structure by incorporating a Q matrix. This matrix is an n × p population structure incidence matrix where n is the number of individuals assayed and p is the number of populations defined. • Mixed Linear Models (MLMs; Yu et al. 2006). They incorporate a Q matrix (fixed effect) but also a pairwise relatedness matrix (K matrix, a random effect), which account for within population structure. Genetic association method depends upon population structure SA=structured association GC=genomic control GLM=general linear model TDT=transmission disequilibrium MLM=mixed linear model Population structure unknown SA GC GLM GC GLM GC MLM MLM TDT Familial relatedness Based on Yu & Buckler (2006) Current Opinion in Biotechnology Pinus taeda L Continuous range, no clear population genetic structure Pinus pinaster Ait. 22 populations Fragmented range, significant population structure Pinus pinaster geographic range (46) Pleucadec (47)Erdeven France St Jean de Monts(45) Olonne/Mer(44) (43)Le Verdon (42)Hourtin (41)Mimizan (40)Petrock Spain (27)San Cipriano Cuellar Cuellar (25)San Leonardo de Yagüe (23)Cuellar (26)Bayubas de Abajo (22)Coca (21)Arenas de San Pedro Valdemaqueda(24) Cenicientos (20) Portugal Restonica (2) Pinia (15) (11)Pinet a (10)Aulenne Ahin(28) (29)Oria Tabarka(50) Tabarka Tabarka Tunisia Tamrabta(30) Morocco ADEPT project TREESNIPS project (also P. sylvestris, Picea abies and oaks) Genetic association with wood property traits in loblolly pine Phenotypic traits • Earlywood specific gravity (ewsg) • Latewood specific gravity (lwsg) • Percent latewood (lw) • Earlywood microfibril angle (ewmfa) • Lignin & cellulose content (lgn-cel) microfibril angle S3 S2 S1 1o wall • Synthetic PCAs for different wood-age types González-Martínez et al. 2007 Genetics 2o wall Significant genetic association of cad gene with earlywood specific gravity and 4cl with % latewood 4cl 0 500 1000 9 9 4 1 cad 1500 1 4 1 0 2000 1 6 0 9 1 6 9 7 1 8 4 5 1 9 3 4 2500 2 0 0 4 2 3 8 5 2 5 8 9 0 -60 90 208 90 F1A F4 61 R4 601 F5 491 F3 947 R3 F2 1454 1486 R3 R1A 2003 F6 1956 500 1000 321 781 R1A F6 R6 2728 1500 1008 1133 F2 R6 2000 2500 2500 3000 1417 1528 1681 R2 3500 3192 3284 F3 R3 Genetic association method depends upon population structure SA=structured association GC=genomic control GLM=general linear model TDT=transmission disequilibrium MLM=mixed linear model Population structure unknown SA GC GLM GC GLM GC MLM MLM TDT Familial relatedness Based on Yu & Buckler (2006) Current Opinion in Biotechnology K vs. Q matrix Traits measured Power Power considerations: structured populations % variation explained by QTN Zhao et al. (2007) PLoS Genetics (Small association pop of ~100 accessions) Candidate Gene Associations vs. Whole Genome Scans If LD is high and haplotype blocks are conserved, entire genome can be efficiently scanned for associations with phenotypes Biased by existing knowledge Use "Candidate Regions" from high LD populations, assess candidate genes in low LD populations ABOVE:BELOW If LD is low, candidate genes are usually identified a priori, and a limited number are scanned for associations I COARSE ROOT Simplest for case-control studies (e.g., disease, gender) QTL 154.1 157.3 163.4 171.3 178.2 180.8 182.1 184.2 193.5 198.1 206.8 210.6 219.9 226.5 230.3 232.7 243.1 P_204_C S8_32 P_2385_C P_2385_A T4_10 S15_8S5_37 T4_7S6_12 S8_29 P_2786_A S12_18 T1_13 T7_4 T3_13 T3_36 S17_21 S15_16T12_15 T2_30 S13_20 S1_20 T9_1 S1_19 S3_13 S1_24 S2_7 P_575_A T12_22 S2_32 T7_9 S2_6 S13_16 T5_25 T5_12 T10_4 T1_26 T7_13 P_93_A S4_20 S7_13 S7_12 T12_4 S4_24T3_10 S6_4 P_2852_A S3_1 S6_20 S13_31 T7_15 T2_31 S8_4 S8_28 O_30_A T5_4 T3_17 T12_12 S5_29 P_2789_A P_634_A S17_43 S17_33 S17_12 S4_19 262.9 S17_26 0.0 8.8 11.6 12.1 13.8 15.5 17.9 20.4 22.3 23.5 24.1 25.3 26.5 29.5 36.5 43.2 50.5 52.9 54.1 59.1 60.6 85.0 95.7 107.8 121.4 124.3 129.0 135.7 148.6 150.2 152.8 Candidate Region Candidate Gene Identification The “Candidate gene” approach Candidate genes are selected by knowledge of how they influence similar traits in other organisms. There is increasing evidence that some genes can control similar phenotypic traits even in distantly related species. Easy to apply: lets see if this primer set works on this particular species! Candidate gene definitions Candidate genes are genes of known biological action involved with the development or physiology of the trait - Biological candidates They may be structural genes or genes in a regulatory or biochemical pathway affecting trait expression Positional candidates lie within the QTL region that affect the trait Traditional candidate genes and traits MHC related genes for studying disease and parasite resistance, and mate choice Heat shock proteins (HSP) for temperature and stress tolerance Growth hormone and its receptors for growth, size Candidate genes also available for many ecologically relevant traits incl. morphology, color, foraging, learning and memory, social interactions, alternative mating strategies Success story: Melanocortin-1 receptor gene Coat colour variation in mice (Robbins et al. 1993) Hair and skin color in humans (Valverde et al. 1995) Feather coloration in chickens (Takeuchi et al. 1996) Coat colour in pigs (Kijas et al. 1998) Feather coloration in several bird species (Theron et al. 2001; Mundy et al. 2004) Coat colour in several mammals such as horse, red fox and pocket mice (Mundy et al. 2004) Skin color in lizards (Rosenblum et al. 2004). Coat color of Kermode Bear (Ritland et al. 2001) Melanocortin-1 receptor gene (MC1R) Mundy 2005 MC1R in pocket mouse Nachman et al. 2003 MC1R in pocket mouse: habitat differences Nachman et al. 2003 MC1R in lesser snow goose Mundy et al. 2004 MC1R in Arctic skua Mundy et al. 2004