* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download No Slide Title
Genetic engineering wikipedia , lookup
Non-coding DNA wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Genome (book) wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Pathogenomics wikipedia , lookup
Designer baby wikipedia , lookup
Metagenomics wikipedia , lookup
Genome evolution wikipedia , lookup
Genome editing wikipedia , lookup
History of genetic engineering wikipedia , lookup
Human genetic variation wikipedia , lookup
Heritability of IQ wikipedia , lookup
Microevolution wikipedia , lookup
Behavioural genetics wikipedia , lookup
Public health genomics wikipedia , lookup
Population genetics wikipedia , lookup
Quantitative Genetics in the Age of Genomics Quic kTime™ and a TIFF (Uncompr es sed) dec ompres sor are needed to see this picture. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Classical Quantitative Genetics • Quantitative genetics deals with the observed variation in a trait both within and between populations • Basic model (Fisher 1918): The phenotype (z) is the sum of (unseen) genetic (g) and environmental values (e) •z=g+e • The genetic value needs to be further decomposed into an additive part A passed for parent to offspring, separate from dominance (D) and epistatic effects (I) that are only fully passed along in clones • g=A+D+I • Var(g)/Var(z) is quantitative measure of nature vs. nurture – fraction of all trait variation due to genetic differences Fisher’s great insight: Phenotypic covariances between relatives can estimate the variances of g, e, etc. • For example, in the simplest settings, – Cov(parent,offspring) = Var(A)/2 – Cov(Full sibs) = Var(A)/2 + Var(D)/4 – Cov(clones) = Var(g) = Var(A)+Var(D)+Var(I) • Random-effects model – Interest is in estimating variances • Thus, in classical quantitative genetics, a few statistical descriptors describe the underlying complex genetics – This leaves an uneasy feeling among most of my molecular colleagues. – Does the age of genomics usher in the death knell of Quantitative Genetics? Approximate costs of genome projects • Arabidopsis Genome Project ... $500 million • Drosophila Genome Project ... $1 billion • Human Genome Project ... $10 billion • Working knowledge of multivariate statistics ... Priceless Model systems QuickTime™ and a Photo - JPEG decompressor are needed to see this picture. Euchloe guaymasensis Neoclassical Quantitative Genetics • Use information from both an individual’s phenotype (z) and marker genotype (m) • z = u + Gm + g + e – Gm is genotypic value associated with the scored genotype (m ) – Obvious extensions: include Gm x e and Gm x g • Mixed model: can treat as the Gm as fixed effects; g and e as random • My molecular colleagues hope that Gm accounts for most of the variance in the trait – If true, then Var(g)/Var(z) trivial Limitations on Gm • The importance of particular genotypes may be quite fleeting – can easily change as populations evolve and as the biotic and abiotic environments change – If epistasis and/or genotype-environment interactions are significant, any particular genotype may be a good, but not exceptional, predictor of phenotype • Quantitative genetics provides the machinery necessary for managing all this uncertainty in the face of some knowledge of important genotypes – e.g., proper accounting of correlations between relatives in the unmeasured genetic values (g) The importance of even rather imperfect marker information • Suppose an F1 is segregating favorable alleles at n loci, and we inbred to fixation before selecting among pure lines – Pr (fixation favorable allele) = 1/2 • What are the required number of lines for Pr (at least one line fixed for n favorable alleles) = 0.9? • For n = 10: 2,360 lines • For n = 20: 2,400,000 lines • Suppose marker information increases the probability of fixation by 50% (to 0.75) • Required number of lines for Prob(at least one line fixed for n favorable alleles) = 0.9 • For n = 10: 40 lines (60-fold reduction) • For n = 20: 725 lines (3,300-fold reduction) How do we obtain Gm? • Ideally, we screen a number of candidate loci • QTL (Quantitative trait locus) mapping • Uses molecular markers to follow which chromosome segments are common between individuals • This allows construction of a likelihood function, e.g., • • 1 ` ( z j š ; æ2A ; æ2A § ; æ2e ) = p exp ° (z ° š ) T V ° 1 (z ° š ) n 2 (2º ) jV j 1 Estimated QTL effect Background genetic effects where Estimated from marker information an d ž R ij = V = R æ2A + A æ2A § + I æ2e Known from pedigree relationships 1 Rij for i = j ; for i = 6 j ž A ij = 1 2£ i j for i = j for i = 6 j A typical QTL map from a likelihood analysis Estimated QTL location Support interval Significance Threshold Genomics and candidate loci • Typical QTL confidence interval 20-50 cM • The big question: how do we find suitable candidates? • The hope is that a genomic sequence will suggest candidates Genomics tools to probe for candidates • Dense marker maps • Complete genome sequence – Expression data (microarrays) – Proteomics – Metablomics The accelerating pace of genomics • Faster and cheaper sequencing • Rapid screening of thousands of loci via DNA chips • “Phylogenetic bootstrapping” from model systems to distant relatives L K J I M B F A C H D E G Q O N Prediction of Candidate Genes • Try homologous candidates from other species • Examine all Open Reading Frames (ORFs) within a QTL confidence interval – Expression array analysis of these ORFs – Lack of tissue-specific expression does not exclude a gene • Proteomics – Specific protein motifs may provide functional clues • Cracking the regulatory code (in silico genetics) • Analysis of networks and pathways Searching for Natural Variation • This may be the area where genomics has the largest payoff • Source (natural and/or weakly domesticated) populations contain more variation than the current highly domesticated lines • Key is to first detect and localize importance variants, then introgress them into elite lines Impact of other biotechnologies • Cloning, other reproductive technologies – Maintain elite lines as cell cultures? – Embryo transplation into elite maternal lines? • Transgenics – Important tool in both breeding and evolutionary biology • Complications: – Silencing of multiple copies in some species – Strong position effects – Currently restricted to major genes • Major genes can have deleterious effects on other characters • Importance of quantitative genetics for selecting for background polygenic modifiers Useful Tools for Quantitative Genetic analysis • Four subfields of Quantitative Genetics – – – – Plant breeding Animal breeding (forest genetics) Evolutionary Genetics Human Genetics • Restricted communications between fields • Important tools often unknown outside a field Tools from Plant Breeding • Special features dealt with by plant breeders – Diversity of mating systems (esp. selfing) – Sessile individuals • Issues – Creation and selection of inbred lines – Hybridization between lines – Genotype x Environment interactions – Competition • Plant breeding tools useful in other fields – Field-plot designs – G x E analysis models: AMMI and biplots • These designs are also excellent candidates for the analysis of microarray expression data – Covariance between inbred relatives – Line cross analysis Animal Breeding • Special features – Complex pedigrees – Large half-sib (more rarely full-sib) families – Long life spans – Overlapping generations • Tree breeders face many of these same issues • Animal breeding tools useful in other fields – BLUP (best linear unbiased predictors) for genotypic values – REML (restricted maximum likelihood) for variance components • BLUP/REML allow for arbitrary pedigrees, very complex models – Maternal effects designs • Endosperm work of Shaw and Waser – Selection response in structured populations Evolutionary Genetics • Issues – Estimating the nature and amount of selection – Population-genetic models of evolution • Tools – Estimation of the nature of natural selection on any specified character • Lande-Arnold fitness estimation; cubic splines – Using DNA sequences to detect selection on a locus • Example: teosinte-branched 1 – Coalescent theory • The genealogy of DNA sequences within a random sample – Analysis of finite-locus and non-Gaussian models of selection response • Barton and Turelli; Burger Human Genetics • Issues – Very small family sizes – Lack of controlled mating designs • Tools of potential use – Sib-pair approaches for QTL mapping • QTL mapping in populations – Transmission-disequilibrium test (TDT) • Account for population structure – Linkage-disequilibrium mapping • Use historical recombinations to fine-map genes – Random-effects models for QTL mapping • BLUP/REML-type analysis over arbitrary pedigrees A Bayesian Future? • 1970s saw the start of a shift in QG from methods-of moments approaches (i.e., estimators based on sample means and variance) to likelihood approaches that use the entire distribution of the data – Initial objections to having to specify a likelihood function, • L(u | data) – As these methods became computationally feasible, they started to supplant their method-of-moments counterparts. • Similarly, Bayesian approaches have become much more computationally feasible recently because of both advances in computational power and a greater appreciation of the power of resampling methods (MCMC and Gibbs samplers) Posterior ( u | data ) = C* Likelihood ( u | data) * prior (u) 0.02 0.0175 posterior 0.015 0.0125 0.01 prior 0.0075 0.005 0.0025 0 100 200 300 400 Why Bayesian? • Marginal posteriors – The effects of the uncertainty in estimating nuisance parameters (those not of interest) are fully accounted for. • Exact for small sample size • Powerful interative sampling methods (MCMC, Gibbs) allow Bayesian analysis to work on problems with a very large number of parameters and relative few actual data points (vectors) Conclusions • Genomics will increase, not decrease, the importance of quantitative genetics • The machinery of classical quantitative genetics is easily modified (indeed, it is actually preadapted) to account for massive advances in genomics and other fields of biotechonology • Useful and powerful tools have been developed to address specific issues in the various subfields of quantitative genetics • Bayesian analysis will continue to increase in importance