Download Contemporary Research in Human Genomics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Behavioural genetics wikipedia , lookup

Neurogenomics wikipedia , lookup

Transcript
CONTEMPORARY RESEARCH IN
HUMAN GENOMICS
Genetics, Ethics and the Law
May 29-31, 2009
Josyf Mychaleckyj, D.Phil.
Center for Public Health Genomics
University of Virginia
Slide 2
Today we’ll review…
•
•
•
•
Genome Wide Association Studies (GWAS)
Copy Number Variants (CNVs)
Medical Resequencing
Direct-to-Consumer Services (DTC)
Joe Mychaleckyj
Slide 3
Genome Wide Association Studies
(GWAS)
Joe Mychaleckyj
Slide 4
Single Nucleotide Polymorphisms:
SNPs (‘SNiPs’)
Chromosome #1
A
T
C
G
C
T
G
C
T
G
T
G
C
G
C
Chromosome #2
A
T
C
G
C
C, T are the 2 different alleles for this SNP
Mutation = Rare variant
Polymorphism = Frequent (> 1% prevalence)
Joe Mychaleckyj
Slide 5
Each person carries pairs of chromosomes with a separate
allele at the SNP position on each chromosome
3 Possible SNP Genotypes
frequency
AA
Homozygote
f(AA)
AG
Heterozygote
f(AG)
GG
Homozygote
f(GG)
f(AA) + f(AG) + f(GG) = 1
Joe Mychaleckyj
Slide 6
Case Control Association study
Cases =
Clinical Disease
eg Blue Allele: 0.48 (48%)
Controls =
Disease Free
0.41 (41%)
Joe Mychaleckyj
Quantitative Trait
Locus (QTL)
Association Study
Slide 8
Genome Wide Association Study
• SNPs most common type of human genome
variant by number (10-15 Million)
• Stable, easy to assay, accurately genotype
• Able to multiplex 1000’s of SNPs into same assay
Illumina 1M-Duo
Affymetrix Human 6.0
906,000 SNPS
946,00 probes for CNV
Joe Mychaleckyj
Slide 9
GWAS
• SNPs present in genes (affect proteins) but
since coding sequence is ~2% of genome, the
vast majority of human SNPs are outside
exons or introns
• Genotype Dense map of SNPs across all
chromosomes of the human genome
• Studies with 500,000 SNPs are becoming
routine and 1 Million SNP panels are available
• Do not have to test all 10M SNPs because of
SNP-SNP correlations (linkage disequilibrium)
Joe Mychaleckyj
Slide 10
GWAS approach
Does not
assume a
knowledge of
genes or
biology
Hardy J, Singleton A.N Engl J Med. 2009 Apr 23;360(17):175
Joe Mychaleckyj
Slide 11
Genome wide Association
Analysis of Coronary
Artery Disease, NEJM
2007
Joe Mychaleckyj
Slide 12
But Common Diseases are Complex
Clinical Monogenic Disease
Clinical Complex Disease
P( Hemochromatosis+ | CC homozyote)
~ 60-100%
Environment 1
Gene 1
Environment 2
HFE
C282Y
VPPGEEQRYT[C/Y]QVEHPGLD
rs1800562
GGGGAAGAGCAGAGATATAC
GT[A/G]CCAGGTGGAGCACCC
AGGCCTG
Gene 2
OR
OR
Gene 3
OR
Gene 5
Joe Mychaleckyj
Environment 3
Gene 4
Slide 13
Monogenic vs Complex Disease
Monogenic
Complex
1 or small # of genes
Many
Often etiologic
(severe phenotype)
Susceptibility / molecular
pathology ?
Highly penetrant
Modest penetrance
High Odds Ratio
Modest/Low Odds Ratio
Strong selection =>
Weak/No selection =>
Low frequency/Rare High frequency/Common
Coding Sequence
Non-coding/regulation (?)
Joe Mychaleckyj
Slide 14
What are GWAS Studies Finding
• Typically detected variants are common
(allele freq >10%)
• low genotype risk, odds ratio (1.1-1.5)
• Small sibling relative risk
• Causal variants have not been mapped function unknown and major signals occur in
non-coding regions
• Penetrance model not well known
Joe Mychaleckyj
Slide 15
Example: Crohn Disease
•
•
•
•
First susceptibility gene NOD2 for Crohn
Disease
SNP: rs17221417
GRR (het) = 1.29, GRR Homo = 1.92
Allele frequency 0.287
Sibling Risk Ratio = 1.02
Familial risk in NOD2 has been estimated at
1.19-1.49 but varies with population
Lewis J Med Genet 2007, Economou Am J Gastroenterol 2004
Joe Mychaleckyj
Slide 16
>200 GWAS studies published as of
December 2008
Hindorff, PNAS 2009
Joe Mychaleckyj
Slide 17
Nature Genetics 41, 666 - 676 (2009) Published online: 10 May 2009
Genome-wide association study identifies eight loci
associated with blood pressure
Joe Mychaleckyj
Slide 18
The GWAS conundrum: Little
variance/risk is explained by GWAS
alleles
• Obesity
– FTO and MC4R <2% of variance
• Lipids
–
–
–
–
30 gene loci, proportion of variance explained in each trait:
9.3% for HDL cholesterol
7.7% for LDL cholesterol
7.4% for triglycerides
• Diabetes
– 18 replicated loci: combined sibling relative risk ~1.07
Joe Mychaleckyj
Slide 19
Example: Height
•
•
•
•
Highly heritable (heritability ~0.8)
Combined sample of ~63,000
54 validated variants in multiple genes
Each locus explains ~0.3% - 0.5% of the
phenotypic variance
• Total variance explained < 5% overall
Joe Mychaleckyj
Slide 20
What are we missing?
•
•
•
•
•
Population differences
Alleles with small effect sizes
Copy number variants
Rare variants
Epigenetic effects
Joe Mychaleckyj
Slide 21
• Genotype and phenotype datasets made
available as rapidly as possible to a wide
range of scientific investigators
• Grantees are expected to develop a sharing
plan consistent with the GWAS policy.
• Plan should include data submission to the
NIH GWAS data repository (dbGaP).
http: grants.nih.gov/grants/guide/notice-files/NOT- OD- 07088.html)
Joe
Mychaleckyj
Pezzolesi
et al
Diabetes 2009
Slide 22
http://www.ncbi.nlm.nih.gov/gap
Joe Mychaleckyj
Slide 24
NIH GWAS Data Sharing Issues
• Sharing of individual genotype & phenotype
data with any approved researcher
worldwide
(*Public access to genetic summary statistics)
• Review by a central NIH data use committee
(DUC) not constituted by the study
• Informed consent templates for new GWAS
• ‘Retrofitting’ existing cohorts to conform to
NIH Policy – adequacy of consents
– Data sharing clauses
– Use of data for research purposes not intended or foreseen
Joe Mychaleckyj
http://grants.nih.gov/grants/gwas/
• Ancestry, ethnic origins – harm
to community
Slide 25
Example Results for one SNP
0.0
0.25
0.75
1.0 Allele Frequency
More Likely to
be in mixture
Reference
Sample
Mixture
Personal
Genome
Summation over all SNPs, can infer with very high
confidence whether the Person (or a close relative) is
more likely to be in the Mixture versus a Reference
Sample
PloS Genetics Aug 2008
Joe Mychaleckyj
Slide 26
Copy Number Variants (CNVs)
Joe Mychaleckyj
Slide 27
Copy Number Variants
• Submicroscopic structural genome
rearrangments (cf cytogenetics, FISH)
– ~ 10 – 10,000 base pairs in length
– Insertions, deletions, duplications (2+ copies), inversions
• Copy number variant or polymorphism
– polymorphism = more common CNV (> 1% frequency = CNP)
• Common feature of the genome
• Frequency >1% => polymorphism (CNPs)
• Assay using genome wide SNP or CNV arrays
– Electronic FISH study
Joe Mychaleckyj
Slide 28
Copy number variants (CNVs)
The Copy Number Variation (CNV) Project
http://www.sanger.ac.uk/humgen/cnv/
Joe Mychaleckyj
Slide 29
~11kb deletion on chromosome 8 revealed
by ultra-high resolution CGH. Blue lines:
individuals with two copies. Red line:
individual with zero copies.
Points are SNPs or
probes from GWAS
Array
The Copy Number Variation (CNV) Project
http://www.sanger.ac.uk/humgen/cnv/
Joe Mychaleckyj
Slide 30
Location and frequency of CNVs in
the genome
Nature. 2006 Nov 23;444(7118):444-54
Joe Mychaleckyj
Slide 31
Medical Resequencing: Next
Generation Sequencing (NGS)
Joe Mychaleckyj
Slide 32
Public Reference Human Genome Sequence
(2001, 2004) is Haploid and Chimeric
DNA Library 1, Individual 1
DNA Library 2, Individual 2
DNA Library 3, Individual 3
Joe Mychaleckyj
Slide 33
Next Generation Sequencing (NGS)
enables Diploid Sequencing of an
individual
Positions of variants, SNPS, CNVs etc
Hundreds of Millions of small random sequence
Joe Mychaleckyj
‘reads’
Slide 34
Mapping of Individual Variants
(SNPs, CNVs)
N = 1 individual
Reference Genome
T
C
A
G
T
G
A
G
T
T
G
G
A
G
Shotgun Reads:
Joe Mychaleckyj
Slide 35
Mapping of Individual Variants
• Random reads from diploid genome
sequencing
– Align random shotgun reads from single individual diploid library
& look for high quality mismatches
– Find heterozygous positions
• Medical Sequencing (to determine disease
risk profile)
– Incorporation of sequence and variants in the Medical Record
Joe Mychaleckyj
Slide 36
ABBA00000000
Joe Mychaleckyj
Slide 37
‘Project Jim’
1.3 percent of Watson’s genome did not match the existing reference genome.
> 600,000 novel SNPs
< 68,000 insertions and deletions compared to the reference sequence, 3bp - 7kbases
Bio-IT World June 2007
Joe Mychaleckyj
Slide 38
NGS of Diploid Genomes
5 Completely Sequenced as of (May 2009):
J. Craig Venter
James Watson
Yoruban (West Africa, HGVS)
Chinese (YH)
Korean (SJK May 2009)
Levy et al, PLoS Biology, 2007
Joe Mychaleckyj
Slide 39
Scientific American 2006
Joe Mychaleckyj
Slide 40
Joe Mychaleckyj
Slide 41
2008: Announcement
of the $5,000 Genome
Joe Mychaleckyj
Slide 42
Direct-to-Consumer Services
Joe Mychaleckyj
Slide 43
Launch
Platform
List Cost
Counselor
deCODEme
Nov-07
Illumina
$985
Referrals
23andMe
Nov-07
Illumina
$399
No
Navigenics
Apr-08
Affymetrix
$2500+$25
0 annual
sub
On staff
SeqWright
Jan-08
Affymetrix
$998
No
Bio-IT World November 2008
Joe Mychaleckyj
Slide 44
Joe Mychaleckyj
Slide 45
Rival genetic tests leave buyers confused
Firms that offer to predict your risk of
disease give worryingly varied results
Nic Fleming
(September 7, 2008)
Joe Mychaleckyj
Slide 46
Different Companies produce
differing assessments of risk
• Different genetic variants reviewed and
included – threshold for inclusion
• Level of expertise in companies to review
literature
• Different statistical models for risk prediction
– no ‘right’ answer
• How frequently updated – new findings in
literature
Joe Mychaleckyj