Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Genetic Epidemiology HRM 728 - 2015 Course Coordinator: Dr. Sonia Anand Course Dataset Assistant: Binod Course Outline • 14 classes • Mid-Term Assignment: 16-October-2015 • Help Session/Analytical Questions using PLINK – Nov 20, 2015 • Final Exam – Dec 4, 2015 • Final Assignment-Independent Study Presentation - Dec 11, 2015 Student Evaluation • Class Attendance/Participation: 15% • Mid-Term Assignment: 25% 5 page single spaced scholary summary (preapproved topic by Dr. Anand) • Final Exam: 25% • Independent Study: 35% including class presentation Seminar 1 • Key Concepts in Genetic Epidemiology – What does genetic epidemiology mean to you? Epidemiology Biology Statistics ~50 years 1865 Mendel discovers laws of genetics 1900 Rediscovery of Mendel’s genetics 1944 DNA identified as hereditary material 1953 DNA structure 1960’s Genetic code 1977 Advent of DNA sequencing 1975-79 First human genes isolated 1986 DNA sequencing automated 1990 Human genome project officially begins 1995 First whole genome 1999 First human chromosome 2003 ‘Finished’ human genome sequence The Human genome project The Human genome project promised to revolutionise medicine and explain every base of our DNA. Large MEDICAL GENETICS focus Identify variation in the genome that is disease causing Determine how individual genes play a role in health and disease The 2 Human genome project PUBLIC - Watson/Collins • • • • • Human Genome Project Officially launched in 1990 Worldwide effort - both academic and government institutions Assemble the genome using maps 1996 Bermuda accord PRIVATE - Craig Venter • • • • 1998 Celera Genomics Aim to sequence the human genome in 3 years ‘Shotgun’ approach - no use of maps for assembly Data release NOT to follow Bermuda principles The Human genome project It cost 3 billion dollars and took 10 years to complete (5 less than initially predicted). • • Currently 3.2 Gb Approx 200 Mb still in progress – Heterochromatin – Repetitive • Most recent human genome uploaded February 2009 How Are Traits Transmitted from Parents to Offspring? •Gregor Mendel’s experiments showed that genes are passed from parents of offspring –Each parent carries two genes that control a trait –Each parent contributes one copy from each pair –Pairs of genes separate from each other during the formation of egg and sperm (meiosis) –When egg and sperm fuse during fertilization, genes from mother and father become a new gene pair Genes are contained on chromosomes –Chromosomes are found in the nucleus of human cells and other higher organisms –Meiosis separates chromosomes pairs during formation of egg and sperm Concept of Heritability • Proportion of a traits total variance that is attributable to genetic factors in a particular population • Trait: Quantitative trait or continuous trait – i.e. height • “Attributable to” “caused by” • If everyone in the population were homozygous or everyone in the population had the same environmental exposure – the factors would not play a role in the “variance” in a trait. Heritability = zero Hardy-Weinberg Law of Population Genetics • Assume random mating in a population • In a two allele system, homozygosity and heterozygosity balance out • Allele and genotype frequencies will remain the same if: – Organisms reproduce – Allele frequencies are the same in both sexes – Loci must segregate independently – Mating is random with respect to genotype Hardy-Weinberg Law of Population Genetics 2 p Frequency of Alleles in population 2 q + 2pq + = 1 p+q=1 Dominant allele Recessive allele GENETIC EPIDEMIOLOGY Flow of research Disease characteristics: Familial clustering: Genetic or environmental: Mode of inheritance: Disease susceptibility loci: Disease susceptibility markers: Descriptive epidemiology Family aggregation studies Twin/adoption/half-sibling/migrant studies Segregation analysis Linkage analysis Association studies Why do we care about variations? underlie phenotypic differences cause inherited diseases allow tracking ancestral human history Human Genome • ~30,000 genes • • • • 3 billion base pairs in the human genome 15 million SNPs in human genome Human Diversity = 0.5% Far less than other animals like the chimp (because humans are younger) • Patterns of Linkage Disequilibrium (LD) in formative about population histories October 2004 SNPs • SNPs are more common variants (> 5%) • Most mutations will disappear but some will achieve higher frequencies due either to random genetic drift or to selective pressure • Base substitution through a non-repaired error that occurs during DNA replication • Low mutation rate 10-8 substitution per base pair per generation • Majority of SNPs are inherited - not de novo mutations SNPs persistence influenced by 2 forces • 1) Random Genetic Drift – random sampling of different allele with each generation (because only a small fraction of gametes pass onto the next generation); eventually FIXATION occurs when an allele reaches 100% or 0% • 2) Natural Selection – Affects the probability that a SNP is passed to the next generation - ↑ speed of fixation if it confers a fitness advantage = positive selection or ↓ new deleterious variants from gene pool (negative selection) or results in Balanced selection Linkage Disequilibrium • Chromosome are mosaics • Patterns of LD informative about population histories and depend on: – Recombination rate – Mutation rate – Population Size – Natural selection Conrad Nature Genetics 2006 Progress in Genetics • 1866 Gregor Mendel suggested traits were inherited • 1869-Friedrich Miescher isolated DNA • 1953 Double Helix Structure of DNA – Watson, Crick, Rosalind Franklin • 1975- Sanger Sequencing –”1st Generation” • 2003 –Human Genome “Crack the Code” • International Hap Map Project • Automated Sequencing • 1000 Genomes 2nd generation sequencing Genome wide annotation of functional elements made easy! Background into 1000 genomes • International collaboration • Sequence whole genome of approximately 2000 individuals from ~ 20 populations • Central goal is to describe most of the genetic variation that occurs at a population frequency greater than 1% • Help scientists: • • • • • Identify genetic variation with high resolution Improved imputation Novel genotype-phenotype associations Causal variants More accurately study evolutionary process & racial differences The 1000 Genomes Project Consortium (2012). An integrated map of genetic variation from 1,092 human genomes Nature DOI: 10.1038/nature11632 Population-specific genetic variation at high resolution Observe and identify population-specific genetic variation Novel SNPs are rare and more likely to be observed in one ethnic group Need good coverage in multiple populations Identification of such variants can help develop new population-specific arrays, minimizing ascertainment bias that currently exists as most are derived from Europeans Imputation to GWAS Provide resource to aid imputation of missing genotypes in association studies From the pilot study, authors found that each signal was in LD with 56 variants, on average 19% of time a coding variant was present in this LD Shows that 1000 genomes can be used to find variants that could be functional corresponding to GWAS hits Identification of causal variants Precise causative genes are difficult to identify as GWAS focus on LD / genomic regions Deep sequencing studies can help find novel or rare functional variants Re-sequencing studies support this approach in uncovering rarer variants with larger effects and functional causes with disease (Nejentsev 2009) From the Pilot phase Describes genomes from 1,092 individuals representing 14 populations across Europe, Africa, Asia, and the Americas 1000 Genomes The fraction of variants identified across the project that are found in only one population (white line), are restricted to a single ancestry-based group (solid colour), are found in all groups (solid black line) and all populations (dotted black line) 1000 Genomes Most common variants were almost always present in all 14 populations Degree of rare variants differed greatly From Genetics to Genomics Genetics Genomics • Disease • Information • Single Gene Disorders • All Diseases • Mutations/One Gene • Variation/Multi Genes • High Disease Risk • Low Disease Risk • Environment Role +/- • Environment Role ++ • “Genetic Services” • Gene-Environment Inxs Common Complex Diseases • Condition such as CVD is common • Includes closely related but not identical manifestations – angina, unstable angina, MI • Multiple genes have small effects - RR of 1.2 to 1.5 – affect multiple “risk factors” or intermediate phenotypes • Causative genotype may be the more common genotype (unlike monogenic disorders) What are we trying to study? "It's a classic scientific paradox — we know a genotype and we know a phenotype, but there's a black box in between" Genetic Association Studies Other Risk factors SNP Variation Gene Expression Protein Synthesis Post Protein Translational Expression Changes Disease Genetic Association Studies Other Risk factors SNP Variation Gene Expression Protein Synthesis Post Protein Translational Expression Changes Environmental Exposure Disease Indirect and Direct Allelic Association Direct Association D Indirect Association & LD M1 M2 D Mn * Measure disease relevance (*) directly, ignoring correlated markers nearby Assess trait effects on D via correlated markers (Mi) rather than susceptibility/etiologic variants. Semantic distinction between Linkage Disequilibrium: correlation between (any) markers in population Allelic Association: correlation between marker allele and trait Wacholder, 2002 (www) Population Stratification Marchini, 2004 (www) Models of gene–environment interactions Hunter, 2005 (www) Sample size requirement for gene-environment interaction studies Hunter, 2005 (www) An example of a gene-environment interaction In Alzheimer disease, the risk of cognitive decline as measured by TICS test is particularly high in APOE4 carriers who have untreated hypertension (APOE4+/HT+). Hunter, 2005 (www) Ascertainment Bias • Case-control type studies are specifically prone to ascertainment bias in this scenario as unlike a population-based study, cases and controls can be enriched for factors which investigators would like to focus, in the case of diabetes, hyperglycemia • In case of TCF7L2 (rs7903146) it could appear that in control samples the T-allele is associated with lower BMI, this is because, although the T-allele causes hyperglycaemia, the controls are selected to be normoglycaemic leading to accumulation of T-allele carriers with higher physical activity levels or lower BMI Future Directions: Beyond DNA & RNA “Omic” approach Technology Number estimated in humans Genomics Single nucleotide polymorphisms (SNPs) Transcriptomics Microarrays of gene transcripts (RNA) Proteomics Protein arrays of specific protein products ~100,000 Metabolomics Metabolic profiles 1000 – 10,000 metabolites ~10,000,000 ~20,000 *adapted from Ginsburg G, et al. J Am Coll Cardiol. 2005;46:1615-1627. Height and Risk of Coronary Artery Disease Paper by Gertler et al. from 1951 reported that individuals who suffered from a myocardial infarction before the age of 40 were on average 5 cm (2.9%) shorter than a healthy control population Gertler MM, Garn SM, White PD The Journal of the American Medical Association 1951 Short stature is associated with coronary heart disease: a systematic review of the literature and a meta-analysis. Paajanen TA, Oksala NKJ, Kuukasjärvi P, Karhunen PJ European Heart Journal 2010 Methods • Selection of studies for review: Systematic reviews, meta-analyses, randomized clinical trials, clinical trials, and cohort or case-control studies with at least 200 subjects Height dichotomized into short and tall groups Outcome defined as diagnosis of angina pectoris, ischaemic heart disease (IHD) or heart disease without MI, acute MI, or history of MI, coronary artery occlusion equal to or more than 50%, revascularization or percutaneous transluminal coronary angioplasty (PTCA), as well as all-cause mortality, CVD mortality, or CHD mortality • Meta-analysis: I-squared test for heterogeneity of data ORs and RRs from all studies converted to RRs for shorter group Results • Average cut-off for shorter group was 160.5 cm and cutoff for taller group was 173.9 cm, with different ranges for men and women • Combined RR for shorter group to experience CHD was 1.46 (95% CI 1.37–1.55) • Combined RR for all-cause mortality for short men was 1.37 (1.29–1.46) and for short women 1.55 (1.41–1.70) • Combined RR for all types of cardiovascular (CVD) deaths among men and women was 1.55 (95% CI 1.37– 1.74) • Overall, short stature represents ~1.5 times increased risk of CHD morbidity and mortality compared against tall stature New Approach to crack the question Using a genetic approach to explore the association between height and CAD risk helps remove some of the lifestyle and environmental confounders present in epidemiological studies • Background: 180 single-nucleotide polymorphisms (SNPs) were found to be significantly associated with height (GIANT study in Europeans, n=183,727) • Aims: Assess combined effect of 180 heightassociated SNPs on CAD risk Assess effect of these SNPs on CAD risk factors (e.g. blood pressure, LDL, etc.) Identify any biological pathways mediating this association Nelson NEJM 2015 Study Population • Summary association statistics extracted from 3 metaanalyses of GWAS case-control studies of CAD: • Coronary Artery Disease Genomewide Replication and Meta-Analysis (CARDIoGRAM) Consortium 21977 cases, 62289 controls All 180 SNP variants • Coronary Artery Disease (C4D) Consortium 17766 cases, 17115 controls All 180 SNP variants • Metabochip Combined CARDIoGRAM+C4D Consortium for cohorts not included in previous meta-analyses 25323 cases, 48979 controls 112 SNP variants Nelson NEJM 2015 Advantages of genetic approach in this study over traditional epidemiologic approach: - Genetic determinants of height are not confounded by lifestyle (e.g. nutrition) or environmental (e.g. socioeconomic status) factors - Allows tracing of genetic pathways to identify potential mechanisms driving association Limitations: - Lifestyle and environmental choices/events can be a direct consequence of height Height-Associated Variants and CAD Methods OR for CAD per • Using: 1 SD increase in β1 = effect size of association between variant genetically and height (GIANT study) determined height β2 = effect size of association between variant and CAD (CARDIoGRAM, C4D, and Metabochip studies) • To calculate: β3 = effect size of association between height and CAD mediated through variant β3 is the odds ratio for CAD per 1-standard deviation increase in genetically determined height Height-Associated Variants and CAD Methods • Association between individual SNPs with height (β1) and between individual SNPs with CAD (β2) is very small • Thus, β3 values for individual SNPs are centered around 1.0 and generally insignificant • To determine complete association between height and CAD, we combined β3 values from all SNPs using inverse-variance—weighted random-effects meta-analysis Height-Associated Variants and CAD - Results • Combined association between heightassociated SNPs and CAD was significant (OR=0.88, 95% CI = 0.82 to 0.95, p<0.001) • 13.5% increase in CAD risk per 1-standard deviation (SD) decrease in height • Most individual β3 values centered around 1.0 and insignificant, but a few values were significant (p<0.05) 3 out of 180 SNPs remained significant after Bonferroni correction Genetic Risk Score Analysis - Methods • Subgroup of CAD cohorts had genomewide individual-level genotype data available (8240 cases, 10009 controls) • Weighted analysis of genetic risk scores to evaluate effect of increasing number of height-associated variants on CAD risk • Genetic risk score: Value from 0 to 2 for each SNP obtained by multiplying sum of posterior probabilities for heightincreasing allele with effect size of allele on height Values totalled across all SNPs for each individual Individuals ranked and divided into quartiles Logistic regression on quartiles to estimate combined odds ratio for CAD Genetic Risk Score Analysis - Results • Increased number of height-raising alleles associated with reduced risk of CAD • Odds ratios for each quartile: Quartile 2 vs. Quartile 1 = 0.90 (95% CI = 0.83 to 0.98, p=0.02) Quartile 3 vs Quartile 1 = 0.88 (95% CI = 0.81 to 0.96, p=0.003) Quartile 4 vs Quartile 1 = 0.74 (95% CI = 0.68 to 0.80, p<0.001) • Quartile 4 includes individuals with highest number of height-raising alleles, Quartile 3 has individuals with second most, etc. What if SNPs for Height are also associated with CAD risk factors? and CAD Risk Factors • Obtained estimates of effect sizes for 180 height variants on CAD risk factors based on meta-analyses for genomewide association studies: Systolic blood pressure (n=69899) Diastolic blood pressure (n=69909) Mean arterial pressure (n=29182) Pulse pressure (n=74079) LDL cholesterol level (n=95454) HDL cholesterol level (n=99900) Triglyceride level (n=96598) Type 2 diabetes (34840 cases, 114981 controls) Glucose (n=96496) Log-transformed plasma insulin (n=85573) Smoking quantity (n=41150) • β3 values calculated for association of height with CAD risk factors (similar to how they were calculated for overall CAD risk) Height-Associated Variants and CAD Risk Factors • β3 values represent change in measurement unit of variable per 1-standard deviation change in height • Only LDL cholesterol level (OR= -0.06, 95% CI = -0.09 to -0.04, p<0.001) and triglyceride level (OR= -0.05, 95% CI = -0.08 to 0.03, p<0.001) had significant associations with heightassociated SNPs • 19% of association between genetically determined height and CAD explained by effect of height on LDL cholesterol • 12% of association between genetically determined height and CAD explained by effect of height on triglyceride level Conclusions • Association between genetically determined decrease in height (sum of 180 height-associated SNPs) and increased risk of CAD (13.5% increase in CAD risk per 1-SD decrease in height) 2.3 % of this association explained by effect of height on LDL levels (inverse relationship) 1.9% of this association explained by effect of height on triglyceride levels (inverse relationship) • Genetically determined height was associated with CAD risk in men but not in women, in contrast with findings from epidemiological studies suggesting an association in both genders • Height-associated SNPs were not significantly associated with BMI, suggesting pathway independent of obesity