* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Genome evolution: a sequence
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Genetic drift wikipedia , lookup
Non-coding DNA wikipedia , lookup
Human genetic variation wikipedia , lookup
Behavioural genetics wikipedia , lookup
Pathogenomics wikipedia , lookup
Koinophilia wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Whole genome sequencing wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genome (book) wikipedia , lookup
Public health genomics wikipedia , lookup
Dual inheritance theory wikipedia , lookup
Human genome wikipedia , lookup
Genomic library wikipedia , lookup
Gene expression programming wikipedia , lookup
Heritability of IQ wikipedia , lookup
Minimal genome wikipedia , lookup
Adaptive evolution in the human genome wikipedia , lookup
Designer baby wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Group selection wikipedia , lookup
Human Genome Project wikipedia , lookup
Genome editing wikipedia , lookup
Population genetics wikipedia , lookup
Microevolution wikipedia , lookup
Genome Evolution. Amos Tanay 2009 Genome evolution Lecture 9: Quantitative traits Genome Evolution. Amos Tanay 2009 Every meaningful evolutionary traits is ultimately quantitative Continuous traits: Weight, height, milk yeild, growth rate Categorical traits: Number of offspring, petals, ears on a stalk of corn Threshold traits: disease (the underlying liability toward the trait) F. Galton Ultimately, fitness is a quantitative trait, so what is special about it? Historically, research on genetics and directed selection were distinct from evolutionary theory Currently, a quantitative approach to molecular evolution and population genetics is a major frontier in evolutionary research Genome Evolution. Amos Tanay 2009 The basic observation: heritability Var ( x) E ( x 2 ) E ( x) 2 Cov( x, y ) E ( xy) E ( x) E ( y ) A linear fit would try to minimize the mean square deviation: SS E[ y (a bx)]2 SS 2 E ( y a bx) 2( E ( y ) a bE ( x)) 0 a SS 2 E ( x( y a bx)) 2[ E ( xy) aE ( x) bE ( x 2 )] 0 b [ E ( xy) E ( x) E ( y )] b[ E ( x 2 ) E ( x) 2 ] 0 Cov( x, y ) b Var ( x) Cov( x, y ) r Var ( x)Var ( y ) Heritability is defined: 1 2 b h 2 (dividing because only one parent is considered) This is the “narrow sense” heritability Genome Evolution. Amos Tanay 2009 Artificial selection %Oil 12 48 Over 100 years of an ongoing selection experiments From 4.6% to 20.4% oil What kinds of evolutionary dynamics allow for such rapid increase in the trait? 12 Genome Evolution. Amos Tanay 2009 Artificial selection Selection can work by exploiting existing polymorphic sites or by fixating new mutations SNP data suggest that at least 50 genes were involved in the corn selection Theory suggest that fixation of all strong effects should occur rapidly – 20 generations. Later one should see fixation of alleles with smaller effect or new mutations Remainder- Theorem (Kimura): t (2 / s ) ln( 2 N ) One strong candidate for introducing mutations are repetitive elements. The corn population is of tiny size (60) Selection is enhanced due to the threshold effect Genome Evolution. Amos Tanay 2009 Limits to artificial selection After some (variable number of) generations, artificial selection stop increasing the trait One reason for that can be the exhaustion of polymorphism This is frequently not the case, since reversing the selection is frequently shown to have an effect – meaning polymorphisms is present Another reason for converging trait values is selection on other traits (fertility!) Using many allele affecting the trait, artificial selection can reach trait values that are practically never observed in the original population Not all traits can be artificially selected: in 1960, Maynard-Smith and Sondhi showed they could not select for asymmetric body plan in flies by choosing flies with excess of dorsal bristles on the left side This suggest that some traits are strongly stabilized Artificial selection can proceed non-linearly: starting and stopping A main possible reason for that is that recombination of strongly linked alleles takes time J. Maynard-Smith Genome Evolution. Amos Tanay 2009 Truncation selection M The selection differential is generally larger than the selection response Differential: S = MS - M This is because some of the selected offsprings are of high trait value due to non-genetic effects MS Another reason is that the genotype of the selected offspring is modified by segregation and recombination M’ Response: R = M’ - M We redefine (realized) heritability as the ratio between selection differential and selection response R h2S Genome Evolution. Amos Tanay 2009 Back to genetics: two loci BB Bb bb Assume additive selected trait AA 4 3 2 1/16 Aa 2/16 Generally: M=2(p(A)+p(B)) Selecting class 0 and 1 MS = 0.8 3 1 2 2/16 aa 1/16 2 4/16 1 1/16 BB 0 2/16 Bb After selection: p(A)=p(B) = 0.2 Yielding: M’ = 0.8 and h2 = 1 2/16 1/16 bb Assume dominant selected trait AA 4 4 1/16 Aa 2/16 4 4 2/16 aa 2 2 2 4/16 2 1/16 1/16 2/16 0 2/16 1/16 Selecting class 2 MS = 12/7 After selection: p(A)=p(B) =2/7 Yielding: M’ =96/49 and h2 =17/21=0.81 Genome Evolution. Amos Tanay 2009 Continuous traits We now assume each genotype have a distribution of trait values The variability may be a consequence of environmental factors or other loci m AA M (MS-M)/s2Z/B ma mA' A' m a m – mean a – additivity d = dominance Z mAA' m d MS B T AA’ M p 2 (m a) 2 pq(m d ) q 2 (m a) m ( p q)a 2 pqd A’A’ Cov(pheno, number of A alleles)= AA 2 pm 2 p 2 a 2 pqd [m ( p q)a 2 pqd ]( 2 p) 2 pqa 2 pq(q p)d Var(number of A alleles)= 4 p 2 pq (2 p) 2 m 2 2 pqa 2 pq( p q)d b a (q p)d 2 pq a m AA m A'A' 2 m AA m A'A' 2 d mAA' m Genome Evolution. Amos Tanay 2009 Continuous traits M (MS-M)/s2Z/B Z Selecting on a threshold over the mean of the population (T) T a T d Thresh relative to AA normal distrib T a Thresh Relative to A’A’ normal distrib Thresh relative to AA’ normal distrib MS B T AA’ mA' A' m a mAA' m d A’A’ The “fitness” equals the ratio between the areas beyond the threshold AA Assuming small differences we have rectangular areas: w11 w12 Z ((T d ) (T a)) Z (a d ) w12 w22 Z ((T a) (T d )) Z (a d ) mAA m a m a m AA m A'A' 2 m AA m A'A' 2 d mAA' m Genome Evolution. Amos Tanay 2009 Allele frequency change M w11 w12 Z ((T d ) (T a)) Z (a d ) (MS-M)/s2Z/B w12 w22 Z ((T a) (T d )) Z (a d ) We showed before (lecture 3): Z p pq[ p(w11 w12 ) q(w12 w22 )] / w MS B T AA’ Average fitness is the area B: p pq[ pZ (a d ) qZ (a d )] / B mAA m a mA' A' m a mAA' m d A’A’ p ( Z / B) pq[a (q p)d ] AA Selection Allele Intensity frequency m Phenotype to genotype regression a m AA m A'A' 2 m AA m A'A' 2 d mAA' m Genome Evolution. Amos Tanay 2009 Mean Phenotype M p ( Z / B) pq[a (q p)d ] Selection Allele Intensity frequency (MS-M)/s2Z/B Phenotype to genotype regression Z M ' ( p p) 2 (m a) 2( p p)( q p)( m d ) (q p) 2 (m a) p 2 (m a) 2 pq(m d ) q 2 (m a) 2[a (q p)]p M 2[a (q p)d ]p MS B T AA’ mAA m a mA' A' m a mAA' m d A’A’ M ' M 2[a (q p)d ]p ( Z / B)2 pq[a (q p)d ]2 AA ( M M S )2 pq[a (q p)d ]2 / s 2 m R S 2 pq[a (q p)d 2 ] / s 2 h 2 pq[a (q p)d ] / s 2 2 2 a m AA m A'A' 2 m AA m A'A' 2 d mAA' m Genome Evolution. Amos Tanay 2009 Phenotype variation Phenotypes in natural environment can be modeled as a combination of genotype and environmental effects: P G E More carefully, the genotype effect on phenotype may is a function of the environment, and the additive form may be wrong For example, gene expression of stress related genes depends on the genotype differently for different stresses Understanding QTL evolution Mapping phenotypes to QTL Genome Evolution. Amos Tanay 2009 Genetic analysis of genome-wide variation in human gene expression (Morely et al. 2004) 14 CEPH families (of ~8 members each) 3554 variable expression genes (in lymphoblastoid cells) 2756 SNPs (just a few!) Alternatively: 94 unrelated CEPH grandfathers Testing linkage of expression and SNPs in the large family trees yield linkage for ~1000 phenotypes The test on families use the genealogical structure (SIBPAL - http://darwin.cwru.edu/) Alternative test on unrelated individuals use simple correlation of the 0,1,2 individual Difficulties: multiple testing vs low resolution Reporting on loci that are linked with many QTLs Genome Evolution. Amos Tanay 2009 Genetic analysis of radiation-induced changes in human gene expression (Smirnov 2009) 15 CEPH families (of ~8 members each) Low resolution ~3000 SNPS, and high resolution HapMap SNPS, 3280 responding genes – different time points during irradiation Follow up molecular biology experiments Variability in B-cells response to irradiation Mapped eQTL Genome Evolution. Amos Tanay 2009 Genetic Dissection of Transcriptional Regulation in Budding Yeast (Brem et al 2002) Crossing two budding yeast strains Fully genotyping, testing expression (later in different conditions) Hundred of variably expressed genes Using the compact yeast genome help deciding linkage Using the well-characterized biology of yeast helps explain linkage Genome Evolution. Amos Tanay 2009 Identifying regulatory mechanisms using individual variation reveals key role for chromatin modification (Lee et al 2008) Building association to groups of genes instead of single genes (Litvin et al 2009) ©2009 by National Academy of Sciences Genome Evolution. Amos Tanay 2009 Schadt EE et al. 2005 (and many publications following it) R – expression L – locus genoetype C - phenotype Looking for gene expression traits that explain QTLs – stands between genetic loci and some disease trait of interest Applied to obesity linkage (in mice) Further development use more data (not just expression), or gene subnetworks Ultimate goal is to build a model explaining phenotype by genotype through molecular phenotypes Positive correlation suggests linked eQTLs Correlation between genetic distance and correlation suggests LD effect Possible modes of causality or interaction Genome Evolution. Amos Tanay 2009 Direct quantitative trait locus mapping of mammalian metabolic phenotypes in diabetic and normoglycemic rat models (Dunas et al. 2007) Crossing two rat strains: diabitic and normal 2000 microsatellite and SNP markers Using NMR to perform metabolic profiling – looking for linkage explaining metabolic abnormalities (a) The horizontal axis shows the frequency from the NMR spectrum expressed as chemical shift from right to left ( , ppm). The vertical axis indicates genetic locations (cM) on chromosomes 1 to X. The lod scores between each genotype and each metabolite are color coded. Significant linkages between genomic locations and regions of the plasma NMR profile are present in the aliphatic region (0.5 to 4.5 ppm) and the aromatic region (>5.5 ppm). Resonances corresponding to the anesthetics and their degradation products were withdrawn as described in Methods. (b,c) Genome-wide linkage mapping across the full metabonomic spectrum for marker D14Wox10 (b) and linkage data across the genome for the metabolite 7.86 (c).