Download GWAS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Genome-wide association
studies
BNFO 602
Roshan
Application of SNPs:
association with disease
• Experimental design to detect cancer
associated SNPs:
– Pick random humans with and without
cancer (say breast cancer)
– Perform SNP genotyping
– Look for associated SNPs
– Also called genome-wide association study
Case-control example
• Study of 100 people:
– Case: 50 subjects with
cancer
– Control: 50 subjects without
cancer
• Count number of alleles and
form a contingency table
#Allele1
#Allele2
Case
10
90
Control
2
98
Odds ratio
• Odds of allele 1 in
cancer = a/b = e
• Odds of allele 1 in
healthy = c/d = f
• Odds ratio of recessive
in cancer vs healthy =
e/f
#Allele1
#Allele2
Cancer
a
b
Healthy
c
d
Example
• Odds of allele 1 in case =
15/35
• Odds of allele 1 in control =
2/48
• Odds ratio of allele 1 in case
vs control = (15/35)/(2/48) =
10.3
#Allele1
#Allele2
Case
15
35
Control
2
48
Statistical test of association
(P-values)
• P-value = probability of the observed data (or
worse) under the null hypothesis
• Example:
– Suppose we are given a series of co in-tosses
– We feel that a biased coin produced the tosses
– We can ask the following question: what is the probability
that a fair coin produced the tosses?
– If this probability is very small then we can say there is a
small chance that a fair coin produced the observed tosses.
– In this example the null hypothesis is the fair coin and the
alternative hypothesis is the biased coin
Binomial distribution
• Bernoulli random variable:
– Two outcomes: success of failure
– Example: coin toss
• Binomial random variable:
– Number of successes in a series of independent Bernoulli
trials
• Example:
– Probability of heads=0.5
– Given four coin tosses what is the probability of three
heads?
– Possible outcomes: HHHT, HHTH HTHH, HHHT
– Each outcome has probability = 0.5^4
– Total probability = 4 * 0.5^4
Binomial distribution
• Bernoulli trial probability of success=p,
probability of failure = 1-p
• Given n independent Bernoulli trials what is
the probability of k successes?
n  k
nk
 p (1 p)
k 
• Binomial applet:
http://www.stat.tamu.edu/~west/applets/binomialdemo.html

Hypothesis testing under
Binomial hypothesis
• Null hypothesis: fair coin (probability of heads
= probability of tails = 0.5)
• Data: HHHHTHTHHHHHHHTHTHTH
• P-value under null hypothesis = probability
that #heads >= 15
• This probability is 0.021
• Since it is below 0.05 we can reject the null
hypothesis
Null hypothesis for case
control contingency table
•
We have two random variables:
#allele1
#allele2
case
a
b
control
c
d
– X: disease status
– A: allele type.
•
•
Null hypothesis: the two variables are
independent of each other (unrelated)
Under independence
– P(X=case and A=1)= P(X=case)P(A=1)
•
Expected number of cases with allele 1
is
– P(X=case)P(A=1)N
– where N is total observations
•
•
•
•
P(X=case)=(a+b)/N
P(A=1)=(a+c)/N
What is expected number of controls
with allele 2?
Do the probabilities sum to 1?
Chi-square statistic
2
(O

E
)
i
2   i
Ei
i1
n
Oi = observed frequency for ith outcome
Ei = expected frequency for ith outcome
n = totaloutcomes
The probability distribution of this statistic is given by the
chi-square distribution with n-1 degrees of freedom.
Proof can be found at
http://ocw.mit.edu/NR/rdonlyres/Mathematics/18-443Fall2003/4226DF27-A1D0-4BB8-939A-B2A4167B5480/0/lec23.pdf
Chi-square
• Using chi-square we can test how well do
observed values fit expected values
computed under the independence
hypothesis
• We can also test for the data under
multinomial or multivariate normal distribution
with probabilities given by the independence
assumption. This would require cumulative
distribution functions of multinomial and multivariate normal which are hard to compute.
• Chi-square p-values are easier to compute
Case control
E1: expected cases with allele 1
E2: expected cases with allele 2
E3: expected controls with allele 1
E4: expected controls with allele 2
N=a+b+c+d
E1 = ((a+b)/N)((a+c)/N) N =
(a+b)(a+c)/N
E2 = (a+b)(b+d)/N
E3 = (c+d)(a+c)/N
E4 = (c+d)(b+d)/N
Now compute chi-square statistic
#allele1
#allele2
case
a
b
control
c
d
Chi-square statistic
• Compute expected values
and chi-square statistic
• Compute chi-square
p-value by referring to
chi-square distribution
#Allele1
#Allele2
Case
15
35
Control
2
48