Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CONTEMPORARY RESEARCH IN HUMAN GENOMICS Genetics, Ethics and the Law May 29-31, 2009 Josyf Mychaleckyj, D.Phil. Center for Public Health Genomics University of Virginia Slide 2 Today we’ll review… • • • • Genome Wide Association Studies (GWAS) Copy Number Variants (CNVs) Medical Resequencing Direct-to-Consumer Services (DTC) Joe Mychaleckyj Slide 3 Genome Wide Association Studies (GWAS) Joe Mychaleckyj Slide 4 Single Nucleotide Polymorphisms: SNPs (‘SNiPs’) Chromosome #1 A T C G C T G C T G T G C G C Chromosome #2 A T C G C C, T are the 2 different alleles for this SNP Mutation = Rare variant Polymorphism = Frequent (> 1% prevalence) Joe Mychaleckyj Slide 5 Each person carries pairs of chromosomes with a separate allele at the SNP position on each chromosome 3 Possible SNP Genotypes frequency AA Homozygote f(AA) AG Heterozygote f(AG) GG Homozygote f(GG) f(AA) + f(AG) + f(GG) = 1 Joe Mychaleckyj Slide 6 Case Control Association study Cases = Clinical Disease eg Blue Allele: 0.48 (48%) Controls = Disease Free 0.41 (41%) Joe Mychaleckyj Quantitative Trait Locus (QTL) Association Study Slide 8 Genome Wide Association Study • SNPs most common type of human genome variant by number (10-15 Million) • Stable, easy to assay, accurately genotype • Able to multiplex 1000’s of SNPs into same assay Illumina 1M-Duo Affymetrix Human 6.0 906,000 SNPS 946,00 probes for CNV Joe Mychaleckyj Slide 9 GWAS • SNPs present in genes (affect proteins) but since coding sequence is ~2% of genome, the vast majority of human SNPs are outside exons or introns • Genotype Dense map of SNPs across all chromosomes of the human genome • Studies with 500,000 SNPs are becoming routine and 1 Million SNP panels are available • Do not have to test all 10M SNPs because of SNP-SNP correlations (linkage disequilibrium) Joe Mychaleckyj Slide 10 GWAS approach Does not assume a knowledge of genes or biology Hardy J, Singleton A.N Engl J Med. 2009 Apr 23;360(17):175 Joe Mychaleckyj Slide 11 Genome wide Association Analysis of Coronary Artery Disease, NEJM 2007 Joe Mychaleckyj Slide 12 But Common Diseases are Complex Clinical Monogenic Disease Clinical Complex Disease P( Hemochromatosis+ | CC homozyote) ~ 60-100% Environment 1 Gene 1 Environment 2 HFE C282Y VPPGEEQRYT[C/Y]QVEHPGLD rs1800562 GGGGAAGAGCAGAGATATAC GT[A/G]CCAGGTGGAGCACCC AGGCCTG Gene 2 OR OR Gene 3 OR Gene 5 Joe Mychaleckyj Environment 3 Gene 4 Slide 13 Monogenic vs Complex Disease Monogenic Complex 1 or small # of genes Many Often etiologic (severe phenotype) Susceptibility / molecular pathology ? Highly penetrant Modest penetrance High Odds Ratio Modest/Low Odds Ratio Strong selection => Weak/No selection => Low frequency/Rare High frequency/Common Coding Sequence Non-coding/regulation (?) Joe Mychaleckyj Slide 14 What are GWAS Studies Finding • Typically detected variants are common (allele freq >10%) • low genotype risk, odds ratio (1.1-1.5) • Small sibling relative risk • Causal variants have not been mapped function unknown and major signals occur in non-coding regions • Penetrance model not well known Joe Mychaleckyj Slide 15 Example: Crohn Disease • • • • First susceptibility gene NOD2 for Crohn Disease SNP: rs17221417 GRR (het) = 1.29, GRR Homo = 1.92 Allele frequency 0.287 Sibling Risk Ratio = 1.02 Familial risk in NOD2 has been estimated at 1.19-1.49 but varies with population Lewis J Med Genet 2007, Economou Am J Gastroenterol 2004 Joe Mychaleckyj Slide 16 >200 GWAS studies published as of December 2008 Hindorff, PNAS 2009 Joe Mychaleckyj Slide 17 Nature Genetics 41, 666 - 676 (2009) Published online: 10 May 2009 Genome-wide association study identifies eight loci associated with blood pressure Joe Mychaleckyj Slide 18 The GWAS conundrum: Little variance/risk is explained by GWAS alleles • Obesity – FTO and MC4R <2% of variance • Lipids – – – – 30 gene loci, proportion of variance explained in each trait: 9.3% for HDL cholesterol 7.7% for LDL cholesterol 7.4% for triglycerides • Diabetes – 18 replicated loci: combined sibling relative risk ~1.07 Joe Mychaleckyj Slide 19 Example: Height • • • • Highly heritable (heritability ~0.8) Combined sample of ~63,000 54 validated variants in multiple genes Each locus explains ~0.3% - 0.5% of the phenotypic variance • Total variance explained < 5% overall Joe Mychaleckyj Slide 20 What are we missing? • • • • • Population differences Alleles with small effect sizes Copy number variants Rare variants Epigenetic effects Joe Mychaleckyj Slide 21 • Genotype and phenotype datasets made available as rapidly as possible to a wide range of scientific investigators • Grantees are expected to develop a sharing plan consistent with the GWAS policy. • Plan should include data submission to the NIH GWAS data repository (dbGaP). http: grants.nih.gov/grants/guide/notice-files/NOT- OD- 07088.html) Joe Mychaleckyj Pezzolesi et al Diabetes 2009 Slide 22 http://www.ncbi.nlm.nih.gov/gap Joe Mychaleckyj Slide 24 NIH GWAS Data Sharing Issues • Sharing of individual genotype & phenotype data with any approved researcher worldwide (*Public access to genetic summary statistics) • Review by a central NIH data use committee (DUC) not constituted by the study • Informed consent templates for new GWAS • ‘Retrofitting’ existing cohorts to conform to NIH Policy – adequacy of consents – Data sharing clauses – Use of data for research purposes not intended or foreseen Joe Mychaleckyj http://grants.nih.gov/grants/gwas/ • Ancestry, ethnic origins – harm to community Slide 25 Example Results for one SNP 0.0 0.25 0.75 1.0 Allele Frequency More Likely to be in mixture Reference Sample Mixture Personal Genome Summation over all SNPs, can infer with very high confidence whether the Person (or a close relative) is more likely to be in the Mixture versus a Reference Sample PloS Genetics Aug 2008 Joe Mychaleckyj Slide 26 Copy Number Variants (CNVs) Joe Mychaleckyj Slide 27 Copy Number Variants • Submicroscopic structural genome rearrangments (cf cytogenetics, FISH) – ~ 10 – 10,000 base pairs in length – Insertions, deletions, duplications (2+ copies), inversions • Copy number variant or polymorphism – polymorphism = more common CNV (> 1% frequency = CNP) • Common feature of the genome • Frequency >1% => polymorphism (CNPs) • Assay using genome wide SNP or CNV arrays – Electronic FISH study Joe Mychaleckyj Slide 28 Copy number variants (CNVs) The Copy Number Variation (CNV) Project http://www.sanger.ac.uk/humgen/cnv/ Joe Mychaleckyj Slide 29 ~11kb deletion on chromosome 8 revealed by ultra-high resolution CGH. Blue lines: individuals with two copies. Red line: individual with zero copies. Points are SNPs or probes from GWAS Array The Copy Number Variation (CNV) Project http://www.sanger.ac.uk/humgen/cnv/ Joe Mychaleckyj Slide 30 Location and frequency of CNVs in the genome Nature. 2006 Nov 23;444(7118):444-54 Joe Mychaleckyj Slide 31 Medical Resequencing: Next Generation Sequencing (NGS) Joe Mychaleckyj Slide 32 Public Reference Human Genome Sequence (2001, 2004) is Haploid and Chimeric DNA Library 1, Individual 1 DNA Library 2, Individual 2 DNA Library 3, Individual 3 Joe Mychaleckyj Slide 33 Next Generation Sequencing (NGS) enables Diploid Sequencing of an individual Positions of variants, SNPS, CNVs etc Hundreds of Millions of small random sequence Joe Mychaleckyj ‘reads’ Slide 34 Mapping of Individual Variants (SNPs, CNVs) N = 1 individual Reference Genome T C A G T G A G T T G G A G Shotgun Reads: Joe Mychaleckyj Slide 35 Mapping of Individual Variants • Random reads from diploid genome sequencing – Align random shotgun reads from single individual diploid library & look for high quality mismatches – Find heterozygous positions • Medical Sequencing (to determine disease risk profile) – Incorporation of sequence and variants in the Medical Record Joe Mychaleckyj Slide 36 ABBA00000000 Joe Mychaleckyj Slide 37 ‘Project Jim’ 1.3 percent of Watson’s genome did not match the existing reference genome. > 600,000 novel SNPs < 68,000 insertions and deletions compared to the reference sequence, 3bp - 7kbases Bio-IT World June 2007 Joe Mychaleckyj Slide 38 NGS of Diploid Genomes 5 Completely Sequenced as of (May 2009): J. Craig Venter James Watson Yoruban (West Africa, HGVS) Chinese (YH) Korean (SJK May 2009) Levy et al, PLoS Biology, 2007 Joe Mychaleckyj Slide 39 Scientific American 2006 Joe Mychaleckyj Slide 40 Joe Mychaleckyj Slide 41 2008: Announcement of the $5,000 Genome Joe Mychaleckyj Slide 42 Direct-to-Consumer Services Joe Mychaleckyj Slide 43 Launch Platform List Cost Counselor deCODEme Nov-07 Illumina $985 Referrals 23andMe Nov-07 Illumina $399 No Navigenics Apr-08 Affymetrix $2500+$25 0 annual sub On staff SeqWright Jan-08 Affymetrix $998 No Bio-IT World November 2008 Joe Mychaleckyj Slide 44 Joe Mychaleckyj Slide 45 Rival genetic tests leave buyers confused Firms that offer to predict your risk of disease give worryingly varied results Nic Fleming (September 7, 2008) Joe Mychaleckyj Slide 46 Different Companies produce differing assessments of risk • Different genetic variants reviewed and included – threshold for inclusion • Level of expertise in companies to review literature • Different statistical models for risk prediction – no ‘right’ answer • How frequently updated – new findings in literature Joe Mychaleckyj