Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Introduction to Genetic Epidemiology HGEN619, 2006 Hermine H. Maes Genetic Epidemiology Establishing / Quantifying the role of genes and environment in variation in disease and complex traits ~ Answering questions about the importance of nature and nurture on individual differences Finding those genes and environmental factors Genes & Environment How much of the variation in a trait is accounted for by genetic factors? Do shared environmental factors contribute significantly to the trait variation? The first of these questions addresses heritability, defined as the proportion of the total variance explained by genetic factors Nature-nurture question Sir Francis Galton: comparing the similarity of identical and fraternal twins yields information about the relative importance of heredity vs environment on individual differences Gregor Mendel: classical experiments demonstrated that the inheritance of model traits in carefully bred material agreed with a simple theory of particulate inheritance Ronald Fisher: first coherent account of how the ‘correlations between relatives’ explained ‘on the supposition of Mendelian inheritance’ People and Ideas Galton Mendel (1865-ish) Correlation Family Resemblance Twins Ancestral Heredity Fisher Darwin (1865) Natural Selection Sexual Selection Evolution Particulate Inheritance Genes: single in gamete double in zygote Segregation ratios Spearman (1918) (1858,1871) (1904) Common Factor Analysis Correlation & Mendel Maximum Likelihood ANOVA: partition of variance Wright (1921) Path Analysis Mather (1949) & Thurstone (1930's) Jinks (1971) Multiple Factor Analysis Biometrical Genetics Model Fitting (plants) Joreskog (1960) Jinks & Fulker (1970) Model Fitting applied to humans Covariance Structure Analysis LISREL Morton (1974) Population Genetics Path Analysis & Family Resemblance Elston etc (19..) Segregation Linkage Rao, Rice, Reich, Cloninger (1970's) Martin & Eaves (1977) Genetic Analysis of Covariance Structure Neale (1990) Watson & Crick (1953) Mx 2000 Assortment Cultural Inheritance Molecular Genetics Biometrical Model Aa h aa -d m AA d To make the simple two-allele model concrete, let us imagine that we are talking about genes that influence adult stature. Les us assume that the normal range of height for males is from 4 feet 10 inches to 6 feet 8 inches; that is, about 22 inches. And let us assume that each somatic chromosome has one gene of roughly equivalent effect. Then, roughly speaking, we are thinking in terms of loci for which the homozygotes contribute +- 1/2 inch (from the midpoint), depending on whether they are AA, the increasing homozygote, or aa, the decreasing homozygote. In reality, although some loci may contribute greater effects than this, others will almost certaily contribute less; thus we are talking about the kind of model in which any particular polygene is having an effect that would be difficult to detect by the methods of classical genetics. in Biometrical Genetics chapter in Methodology for Genetic Studies of Twins and Families Polygenic Traits 1 Gene 3 Genotypes 3 Phenotypes 2 Genes 9 Genotypes 5 Phenotypes 3 3 2 2 1 1 0 0 3 Genes 27 Genotypes 7 Phenotypes 7 6 5 4 3 2 1 0 4 Genes 81 Genotypes 9 Phenotypes 20 15 10 5 0 Stature in adolescent twins Women 700 600 500 400 300 200 Std. Dev = 6.40 100 Mean = 169.1 N = 1785.00 0 145.0 155.0 150.0 Stature 165.0 160.0 175.0 170.0 185.0 180.0 190.0 Individual differences Physical attributes (height, eye color) Disease susceptibility (asthma, anxiety) Behavior (intelligence, personality) Life outcomes (income, children) Polygenic Model Polygenic model: variation for a trait caused by a large number of individual genes, each inherited in a strict conformity to Mendel’s laws Multifactorial model: many genes and many environmental factors also of small and equal effect Effects of many small factors combined > normal (Gaussian) distribution of trait values, according to the central limit theorem. Central Limit Theorem The normal distribution is to be expected whenever variation is produced by the addition of a large number of effects, non-predominant This holds quite often Quantitative traits Continuous or Categorical ? Body Mass Index vs “obesity” Blood pressure vs “hypertensive” Bone Mineral Density vs “fracture” Bronchial reactivity vs “asthma” Neuroticism vs “anxious/depressed” Reading ability vs “dyslexic” Aggressive behavior vs “delinquent” Multifactorial Threshold Model of Disease Single threshold unaffected Disease liability affected Multiple thresholds normal mild mod Disease liability severe Genetically Complex Diseases Imprecise phenotype Phenocopies / sporadic cases Low penetrance Locus heterogeneity/ polygenic effects Complex Trait Model Linkage Marker Gene1 Linkage disequilibrium Linkage Association Mode of inheritance Gene2 Disease Phenotype Individual environment Common environment Gene3 Polygenic background Causes of Variation pre-1990 estimation of ‘anonymous’ genetic and environmental components of phenotypic variation genetic epidemiologic studies post-1990 identification of QTL’s: quantitative trait loci contributing to genetic variation of complex (quantitative) traits linkage and association studies Stages of Genetic Mapping Are there genes influencing this trait? Genetic Where are those genes? Linkage epidemiological studies analysis What are those genes? Association analysis Partitioning Variation phenotypic variance (VP) partitioned in genetic (VG) and environmental (VE) VP = VG + VE Assumptions: additivity & independence of genetic and environmental effects heritability (h2): proportion of variance due to genetic influences (h2 = VG /VP) property of a group (not an individual), thus specific to a group in place & time Sources of Variance Genetic factors: Additive (A) Dominance (D) Environmental factors: Common / Shared (C) Specific / Unique (E) Measurement Error, confounded with E Genetic Factors Additive genetic factors (A): sum of all the effects of individual loci Non-additive genetic factors: result of interactions between alleles at the same locus (dominance, D) or between alleles on different loci (epistasis) Environmental Factors Shared [common or between-family] environmental factors (C): aspects of the environment shared by members of same family or people who live together, and contribute to similarity between relatives Non-shared [specific, unique or within-family] environmental factors (E): unique to an individual, contribute to variation within family members, but not to their covariation Estimating Components Estimate phenotypic variance components from data on covariances of related individuals Different types of relative pairs share different amounts of phenotypic variance Biometrical genetics theory: specify amounts in terms of genetic and environmental variances Three major types of study: family, adoption and twin Designs to disentangle G+E Resemblance between relatives caused by: Shared Genes (G = A + D) Environment Common to family members (C) Differences between relatives caused by: Non-shared Unique Genes environment (E) Informative Designs Family studies – G + C confounded MZ twins alone – G + C confounded MZ twins reared apart – rare, atypical, selective placement ? Adoption studies – increasingly rare, atypical, selective placement ? MZ and DZ twins reared together Extended twin design Classical Twin Study MZ and DZ twins reared together MZ twins genetically identical DZ twins share on average half their genes Equal Environments Assumption MZ and DZ twins share relevant environmental influences to same extent Zygosity Identity at marker loci - except for rare mutation MZ and DZ twins: determining zygosity using ABI Profiler™ genotyping (9 STR markers + sex) MZ DZ DZ MZ & DZ Correlations rMZ > rDZ: G (heritability) C: increase rMZ & rDZ Relative magnitude of the MZ and DZ correlations > contribution of additive genetic (G) and shared environmental (C) factors 1-rMZ: importance of specific environmental (E) factors Twin Correlations * 1.0 A * .8 E .5 C * DZ .8 .6 * DZ .4 MZ * MZ * DZ .8 .7 * DZ MZ * MZ Example thus if, VP = VA + VC + VE = 2.0 CovMZ = VA + VC = 1.6 CovDZ = 1/2VA + VC = 1.2 then, by algebra, VA = 0.8, VC = 0.8, VE = 0.4 but it isn’t always so simple, consider VP = 1.0, CovMZ = 0.6; CovDZ = 0.65 then VA = -0.1, VC = 0.7, VE = 0.4 nonsensical negative variance component Observed Statistics Trait variance & MZ and DZ covariance as unique observed statistics Estimate the contributions of additive genes (A), shared (C ) and specific (E) environmental factors, according to the genetic model Useful tool to generate the expectations for the variances and covariances under a model is path analysis Path Analysis Allows us to diagrammatically represent linear models for the relationships between variables Easy to derive expectations for the variances and covariances of variables in terms of the parameters of the proposed linear model Permits translation into matrix formulation Variance Components P = eE + aA + cC + dD Unique Environment Shared Environment E Additive Genetic Dominance Genetic A D C c a e d Phenotype ACE Model Path Diagram for MZ & DZ Twins 1 MZ=1.0 / DZ=0.5 E C e c PT1 A a A C a c PT2 E e Model Fitting Evaluate significance of variance components effect size & sample size Evaluate goodness-of-fit of model - closeness of observed & expected values Compare fit under alternative models Obtain maximum likelihood estimates Mx Structural equation modeling package Software: www.vcu.edu/mx Manual: Neale et al. 2006 Free Structural equation modeling Both continuous and categorical variables Systematic approach to hypothesis testing Tests of significance Can be extended to: More complex questions Multiple variables Other relatives SEM: more complex questions I Are the same genes acting in males and females? (sex limitation) Role of age on (a) mean (b) variance (c) variance components Are G & E equally important in age, country cohorts? (heterogeneity) Are G & E same in other strata (e.g. married/unmarried)? ( G x E interaction) SEM: more complex questions II Do the same genes account for variation in multiple phenotypes? (multivariate analysis) Do the same genes account for variation in phenotypes measured at different ages? (longitudinal analysis) Do specific genes account for variation/covariation in phenotypes? (linkage/association) Linkage & Association Analysis Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological Where are those genes? Linkage studies analysis What are those genes? Association analysis Linkage Analysis Sharing between relatives Identifies large regions Include several candidates Complex disease Scans on sets of small families popular No strong assumptions about disease alleles Low power Limited resolution Linkage Scan Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological Where are those genes? Linkage studies analysis What are those genes? Association analysis Association Analysis Sharing between unrelated individuals Trait alleles originate in common ancestor High resolution Recombination since common ancestor Large number of independent tests Powerful if assumptions are met Same disease haplotype shared by many patients Sensitive to population structure Association Scan Proof of Concept: Genes/Regions Genome Scan Gene 1 Gene 2 Gene 3 Breast cancer DLC-1 Chr 8q Chr 13q Lung cancer CD44 Chr 22q Melanoma B-RAF Type 2 diabetes PPAR PPP1R3A HDL-C plasma level CETP LPL Osteoarthritis AGC1 Schizophrenia DDC FOXA2 Gene 4 Chr 1q First (unequivocal) positional cloning of a complex disease QTL ! Number of genes identified from QTL by year From QTL to gene: the harvest begins: RKorstanje & B Paigen : Nature Genetics 31, 235 – 236 (2002)