Download Jianfeng Xu, MD, DrPH: GWA - UCLA School of Public Health

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

NUMT wikipedia , lookup

Gene desert wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Copy-number variation wikipedia , lookup

Genetic engineering wikipedia , lookup

History of genetic engineering wikipedia , lookup

Minimal genome wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Metagenomics wikipedia , lookup

Behavioural genetics wikipedia , lookup

Microevolution wikipedia , lookup

Designer baby wikipedia , lookup

Heritability of IQ wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Genomics wikipedia , lookup

Human genome wikipedia , lookup

Genome (book) wikipedia , lookup

RNA-Seq wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Pathogenomics wikipedia , lookup

Human genetic variation wikipedia , lookup

Human Genome Project wikipedia , lookup

Genomic library wikipedia , lookup

Genome editing wikipedia , lookup

Genome evolution wikipedia , lookup

Public health genomics wikipedia , lookup

Tag SNP wikipedia , lookup

Transcript
GWA ─ promising but challenging
Jianfeng Xu, M.D., Dr.PH
Professor of Public Health and Cancer Biology
Director, Program for Genetic and Molecular Epidemiology of Cancer
Associate Director, Center for Human Genomics
Wake Forest University School of Medicine
Outline
 The need for genome-wide association studies
 The reality of genome-wide association studies
 Important issues in genome-wide association studies

Genome coverage

Strategies for pre-association analysis

Strategies for association analysis

Sample size and false positives (Type I and II errors)

Confirmation in independent study populations

Increase the magnitude of effects of a specific gene
The need for GWA
 Current understanding of disease etiology is limited

Therefore, candidate genes or pathways are insufficient
 Current understanding of functional variants is limited

Therefore, the focusing on nonsynonymous changes is not sufficient
 Results from linkage studies are often inconsistent and broad

Therefore, the utility of identified linkage regions is limited
 GWA studies offer an effective and objective approach

Better chance to identify disease associated variants

Improve understanding of disease etiology

Improve ability to test gene-gene interaction and predict disease risk
GWA is promising
 Many diseases and traits are influenced by genetic factors

i.e., they are caused by sequence variants in the genome
 Over 6 millions SNPs are known in the genome

i.e., some SNPs will be directly or indirectly associated with causal variants
 The cost of SNP Genotyping is reduced

i.e., it is affordable to genotype a large number of SNPs in the genome
 Large numbers of cases and controls are available

i.e., there is statistical power to detect variants with modest effect
 When the above conditions are met…

…associated SNPs will have different frequencies between cases and controls
GWA is challenging
 Many diseases and traits are influenced by genetic factors

But probably due to multiple modest risk variants

They confer a stronger risk when they interact

True associated SNPs are not necessary highly significant
 Too many SNPs are evaluated

False positives due to multiple tests
 Single studies tend to be underpowered

False negatives
 Considerable heterogeneity among studies

Phenotypic and genetic heterogeneity

False positives due to population stratification
Reality of GWA
AMD, IBD, T1D, etc.
Parkinson’s, nicotine dependence, T2D, etc.
Prostate cancer, breast cancer, and other ongoing studies
Heart diseases, lung diseases, psychiatric diseases, inflammatory
diseases, cancers, and many other studies that are in planning stages
Important issues in genome-wide
association studies
 Genome coverage
 Strategies for pre-association analysis
 Strategies for association analysis
 Sample size and false positives (Type I and II errors)
 Confirmation in independent study populations
 Increase the magnitude of effects of a specific gene
Genome coverage
 Two major platforms for GWA

Illumina: HumanHap300, HumanHap550, and HumanHap1M

Affymetrix: GeneChip 100K, 500K, and 1M
 Genome-wide coverage

The percentage of known SNPs in the genome that are in LD with the genotyped
SNPs

Calculated based on HapMap

Calculated based on ENCODE
Genome coverage
 Genome-wide coverage

Genome coverage of common SNPs (MAF ≥ 0.05)

Genome coverage of rare SNPs

Genome coverage using multi-markers
Pe’er, 2006
Genome coverage
 Genome coverage for common SNPs (MAF ≥ 0.05)
Pe’er, 2006
Genome coverage
 Genome coverage for common SNPs (MAF ≥ 0.05)
 Genome coverage for common and rare SNPs
Pe’er, 2006
Genome coverage
 Genome coverage of common SNPs (MAF ≥ 0.05)
 Genome coverage of common and rare SNPs
 Genome coverage using multi-markers
Pe’er, 2006
Important issues in genome-wide
association studies
 Genome coverage
 Strategies for pre-association analysis
 Strategies for association analysis
 Sample size and false positives (Type I and II errors)
 Confirmation in independent study populations
 Increase the magnitude of effects of a specific gene
Strategies for pre-association analysis
 Quality control

Filter SNPs by genotype call rates

Filter SNPs by minor allele frequencies

Filter SNPs by testing for Hardy-Weinberg Equilibrium
Strategies for pre-association analysis
 Quality control
 Quantile-quantile plot (Q-Q plot)

Evaluate whether there is an upward bias in association tests
Q-Q plot
All SNPs
Filter by call rate
Adjust for stratification
Clayton, 2006
Strategies for pre-association analysis
 Quality control
 Quantile-quantile plot (Q-Q plot)
 Population stratification

Genomic control


Correct for stratification by adjusting association statistics at each SNP by a uniform
overall inflation factor
Is susceptible to over or under adjustment
Strategies for pre-association analysis
 Quality control
 Quantile-quantile plot (Q-Q plot)
 Population stratification

Genomic control

Structure (STRUCTURE)

Used to assign the samples to discrete subpopulation clusters and then aggregate
evidence of association within each cluster

Estimate individual proportion of ancestry and treat it as a covariate

Computationally intensive when there are a large number of AIMs
Strategies for pre-association analysis
 Quality control
 Quantile-quantile plot (Q-Q plot)
 Population stratification

Genomic control

Structure (STRUCTURE)

Principal component analysis (EIGENSTRAT)

Identify several eigenvectors (ancestries or geographic regions)

Adjust genotypes and phenotypes along each eigenvector

Compute association statistics using adjusted genotypes and phenotypes

No need for AIMs
Important issues in genome-wide
association studies
 Genome coverage
 Strategies for pre-association analysis
 Strategies for association analysis
 Sample size and false positives (Type I and II errors)
 Confirmation in independent study populations
 Increase the magnitude of effects of a specific gene
Strategies for association analysis
 Single SNP analysis using pre-specified genetic models

2 x 3 table (2-df)

Additive model (1-df), and test for additivity

All possible genetic models
Strategies for association analysis
 Single SNP analysis using pre-specified genetic models
 Haplotype analysis

Two-marker and three-marker slide

Multi-marker

Within haplotype block

Between two recombination hot spots
Strategies for association analysis
 Single SNP analysis using pre-specified genetic models
 Haplotype analysis
 Gene-gene and gene-environment interactions

Interaction with main effect


Logistic regression
Interaction without main effect: data mining

Classification and recursive tree (CART)

Multifactor Dimensionality Reduction (MDR)
Important issues in genome-wide
association studies
 Genome coverage
 Strategies for pre-association analysis
 Strategies for association analysis
 Sample size and false positives (Type I and II errors)
 Confirmation in independent study populations
 Increase the magnitude of effects of a specific gene
Sample size and false positives
 Estimate sample size
 Sample size
 OR
 MAF
 Type I error
 Power
 Quanto
 Effective sample
size
Sample size and false positives
 Estimate sample size
 False positives: too many dependent tests

Adjust for number of tests

Bonferroni correction
 Nominal significance level = study-wide significance / number of tests
 Nominal significance level = 0.05/500,000 = 10-7

Effective number of tests
 Take LD into account

Permutation procedure
 Permute case-control status
 Mimic the actual analyses
 Obtain empirical distribution of maximum test statistic under null hypothesis
Sample size and false positives
 Estimate sample size
 False positives: too many dependent tests

Adjust for number of tests

False discovery rate (FDR)

Expected proportion of false discoveries among all discoveries

Offers more power than Bonferroni

Holds under weak dependence of the tests
Sample size and false positives
 Estimate sample size
 False positives: too many dependent tests

Adjust for number of tests

False discovery rate (FDR)

Bayesian approach

Taking a priori into account, False-Positive Report Probability (FPRP)
Important issues in genome-wide
association studies
 Genome coverage
 Strategies for pre-association analysis
 Strategies for association analysis
 Sample size and false positives (Type I and II errors)
 Confirmation in independent study populations
 Increase the magnitude of effects of a specific gene
Confirmation in independent study
populations
 The above approaches may limit the number of false positives
 Confirmation is needed to dissect true from false positives

Replication, examine the results from the 2nd stage only

Joint analysis, combining data from 1st stage with 2nd stage

Multiple stages
Replication vs. joint analysis
Skol, 2006
Multiple stages
1st stage
# of Risk SNPs
# of SNPs tested
# of true sig. SNPs (80% power)
# of total sig. SNPs (a = 0.01)
% of true sig. SNPs
2nd stage
3rd stage
20
16
13
500,000
5,016
63
16
13
10
5,016
63
10
0.38%
21%
100%
Important issues in genome-wide
association studies
 Genome coverage
 Strategies for pre-association analysis
 Strategies for association analysis
 Sample size and false positives (Type I and II errors)
 Confirmation in independent study populations
 Increase the magnitude of effects of a specific gene
Increase the magnitude of effects of a
specific gene
 Increase their effects by focusing on a subset of study subjects

Cases with a uniform phenotype, e.g. aggressive or early onset
Study aggressive cases
0.20
Controls
0.18
Low grade
High grade
MAF of rs1447295
0.16
0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.00
Iceland
Sweden
Chicago
CGEMS
JHU
Increase the magnitude of effects of a
specific gene
 Increase their effects by focusing on a subset of study subjects

Cases with a uniform phenotypes, e.g. aggressive or early onset

Cases with family history
Study cases with family history
Antoniou and Easton, 2003
Increase the magnitude of effects of a
specific gene
 Increase their effects by focusing on a subset of study subjects

Cases with a uniform phenotypes, e.g. aggressive or early onset

Cases with family history

Controls that are disease free
Disease free controls
0.20
Controls
0.18
Low grade
High grade
MAF of rs1447295
0.16
0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.00
Iceland
Sweden
Chicago
CGEMS
JHU
Increase the magnitude of effects of a
specific gene
 Increase their effects by focusing on a subset of study subjects

Cases with a uniform phenotypes, e.g. aggressive or early onset

Cases with family history

Controls that are disease free
 Increase their effects by studying a homogeneous population

Lower levels of genetic heterogeneity
Summary
 GWA studies are promising but difficult
 There are many important issues in GWA
 The impact of these issues can be minimized by a well-
designed study