Download Slide 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Population Genetics Lab 2
BINOMIAL PROBABILITY
&
HARDY-WEINBERG EQUILIBRIUM
Last Week : Sample Point Methods:
Example: Use the Sample Point Method to find the probability of
getting exactly two heads in three tosses of a balanced coin.
1. The sample space of this experiment is:
Outcome
Toss 1
Toss 2
Toss 3
Shorthand
Probabilities
1
2
3
4
5
6
7
8
Head
Head
Head
Tail
Tail
Tail
Head
Tail
Head
Head
Tail
Head
Tail
Head
Tail
Tail
Head
Tail
Head
Head
Head
Tail
Tail
Tail
HHH
HHT
HTH
THH
TTH
THT
HTT
TTT
1/8
1/8
1/8
1/8
1/8
1/8
1/8
1/8
2. Assuming that the coin is fair, each of these 8 outcomes has a probability of 1/8.
3. The probability of getting two heads is the sum of the probabilities of outcomes
2, 3, and 4 (HHT, HTH, and THH), or 1/8 + 1/8 + 1/8 = 3/8 = 0.375.
Sample- point method :
Example: Find the probability of getting exactly 10 heads in 30 tosses of
a balanced coin.
Total # of sample points = 230 =
1,073,741,824
Need a way of accounting for all the possibilities
Example: In drawing 3 M&Ms from an unlimited M&M bowl that is
always 60% red and 40% green, what is the P(2 green)?
P(2G )  P(GGR)  P(GRG )  P( RGG )
P(2G )  0.40.40.6  0.40.60.4  0.60.40.4
P(2G )  30.40.40.6  30.4 0.6
2
If one green M&M is just as good as another…
P(2G ) 
 0.4 0.6
3
2
2
Binomial Probability Distribution
ænö y
P(Y = y) = ç ÷ s f
è yø
n-y
Where, n = Total # of trials.
y = Total # of successes.
s = probability of getting success in a single trial.
f = probability of getting failure in a single trial (f = 1-s).
n!
C =
y!(n - y)!
n
y
Assumptions of Binomial Distribution
1. # of trials are independent, finite, and conducted under
the same conditions.
2. There are only two types of outcome.(Ex. success and
failure).
3. Outcomes are mutually exclusive and independent.
4. Probability of getting a success in a single trial remains
constant throughout all the trials.
5. Probability of getting a failure in a single trial remains
constant throughout all the trials.
6. # of success are finite and a non-negative integer (0,n)
Properties of Binomial Distribution
Mean or expected # of successes in n trials, E(y) = ns
Variance of y,
V(y) = nsf
Standard deviation of y, σ (y) = (nsf)1/2
Example: Find the probability of getting exactly 10 heads in 30
tosses of a balanced coin.
Solution:
n!
y
n-y
P(y) =
*s * f
y!(n - y)!
We know,
n = 30
y = 10
s = 0.5
f = 0.5
30!
P(10) =
* 0.510 * 0.530-10
10!(30 -10)!
= 30045015 ´ 0.000976563´ 9.53674E-07
= 0.027982
Example: Find the expected # of heads in 30
tosses of a balanced coin. Also calculate variance.
Solution:
E(Y) = ns = 30*0.5 = 15
V(Y) = nsf = 30*0.5*0.5 = 7.5
Problem 1 (10 minutes)(2 points)
An allozyme locus has three alleles, A1,A2, and A3 with
frequencies 0.847, 0.133, and 0.020, respectively. If we sample
30 diploid individuals, what is the probability of:
•Not finding any copies of A2?
•Finding at least one copy of A2?
•GRADUATE STUDENTS ONLY: Finding fewer than 2 copies
of A2?
Example: How many diploid individuals should be sampled to detect at
least one copy of allele A2 from Problem 1 with probability of at least
0.95?
Solutions:
n!
10.0330 * 0.967n ³ 0.95
0!* n!
n!
0.0330 * 0.967n £ 0.05
0!* n!
Þ 0.967n £ 0.05
Þ n ln 0.967 £ ln 0.05
ln 0.05 -2.9957
Þn³
=
= 89.2735
ln 0.967 -0.0336
Thus, to detect at least one copy of allele A2 with probability of 0.95,
one would need to sample at least 90 alleles (i.e., at least 45 diploid
individuals).
Problem 2 (15 minutes)(2 points)
Problem 2. The frequency of red-green color-blindness is 0.07 for
men and 0.005 for women. You are designing a survey to
determine the effect of color blindness on educational success.
How many males and females would you have to sample to ensure
that the probability including at least one color blind individual of
each sex would be 0.90 or greater?
Estimation of allele frequency for Co-dominant locus
1
N11 + N12
2
p=
N
1
N 22 + N12
2
q=
N
Where, p = Frequency of allele A1
q = Frequency of Allele A2
N11 = # of individuals with genotype A1A1
N12 = # of individuals with genotype A1A2
N22 = # of individuals with genotype A2A2
N = total # of diploid individuals =N11+N12+N22
Estimation of Standard Error
SE p =
p(1- p)
2N
q(1- q)
SEq =
2N
Where, p = Frequency of allele A1
q = Frequency of Allele A2
SEp = Standard error for frequency of allele A1
SEq = Standard error for frequency of allele A2
N = total # of diploid individuals =N11+N12+N22
Standard Deviation v. Standard Error
SD  Var  Measure of data dispersion
Var
SE 
 Measure of mean dispersion
n
We expect ~68% of the data to fall within 1
standard deviation of the mean.
Example: What are the allele frequencies of alleles A1 and A2,
if the following genotypes have been observed in a sample of
50 diploid individuals?
Genotype
A1A1
A1A2
A2A2
Solution:
Count
17
23
10
N11 = 17, N12 = 23, and N22 = 10
1
N11 + N12
17 +11.5
2
p=
=
= 0.57,
N
50
SE p =
p(1- p)
0.57(1- 0.57)
0.57´ 0.43
=
=
= 0.002451 = 0.0495.
2N
100
100
q = 1 – p = 0.43
q(1- q)
SEq =
2N
0.43(1- 0.43)
=
100
0.43´ 0.57
=
= 0.002451 = 0.0495.
100
Problem 3 (10 minutes) (2 pts)
Estimate the allele frequencies (include their
respective standard errors) for alleles A1, A2, and
A3 if the following genotypes have been observed
in a sample of 200 individuals
Genotype Count
n
1
N ii   N ij
2 j 1
pi 
, ji
N
SE pi =
pi (1- pi )
2N
A1A1
19
A2A2
17
A3A3
14
A1A2
52
A1A3
57
A2A3
41
Problem 4 (Time 10 min.)(2 pts)
Tay Sachs disease is an autosomal recessive genetic
disorder causing the death of nerve cells in the brain due
to the steady accumulation of gangliosides. Extensive
genotyping has determined that approximately 1 in 30 of
the 5 million Ashkenazi Jews within the United States is a
carrier.
a) Assuming HWE and Mendelian inheritance of the
disease, what is the frequency of the recessive allele
in this population?
b) What is the SE of this estimate? (Assume 1,000
people were sampled)
c) How many affected children would you expect to be
born in this population?
d) What are the assumptions of these estimates?
Hypothesis Testing
Hypothesis: Tentative statement for a scientific
problem, that can be tested by further investigations.
1.Null Hypothesis(Ho): There is no significant difference in
observed and expected values.
2.Alternate Hypothesis(H1): There is a significant
difference in observed and expected values.
Example:
Ho = Fertilized and unfertilized crops have equal yields
H1 = Fertilized and unfertilized crops do not have equal yields
Remember: In final conclusion after the
experiment ,we either –
"Reject H0 in favor of H1"
Or
“Fail to reject H0”,
Type I error: Error due to rejection of a null hypothesis, when it
is actually true (False positive).
Level of significance(LOS) (α) : Maximum probability
allowed for committing “type I error”.
At 5 % LOS (α=0.05), we accept that if we were to
repeat the experiment many times, we would falsely
reject the null hypothesis 5% of the time.
P- value:

Probability of committing type I error

If P-value is smaller than a particular
value of α, then result is significant at
that level of significance
Testing departure from HWE
In a randomly mating population, allele and
genotype frequencies remain constant from
generation to generation.
Ho= There is no significant difference between observed and
expected genotype frequencies (i.e. Population is in HWE)
H1= There is a significant difference between observed and
expected genotype frequencies (i.e. Population is not in
HWE)
HWE Assumptions
1. Random mating
2. No selection
a. Equal numbers of offspring per parent
b. All progeny equally fit
3. No mutation
4. Single, very large population
5. No migration
2
χ
(Oi - Ei )
c =å
Ei
i=1
k
2
2
- test
Reject H0 if   
2
Where,
Oi  Observed count of genotype i
Ei  Expected count of genotype i
k  Number of genotypes
df  k - # parameters estimated - 1
2
df ,
Example: A population of Mountain Laurel at Cooper’s
Rock State Forest has the following observed genotype
counts:
Genotype
Observed number
A1A1
5000
A1A2
3000
A2A2
2000
Is this population in Hardy-Weinberg equilibrium ?
p=
1
N12
5000 +1500
2
=
= 0.65,
N
10000
N11 +
q =1- p =1- 0.65 = 0.35,
Genotype
Expected frequency
under HWE
Expected number
under HWE
A1A1
p2 = 0.652 = 0.4225
0.4225  10000 = 4225
A1A2
2pq = 0.455
0.455  10000 = 4550
A2A2
q2 = 0.1225
0.1225  10000 = 1225
Genotype Obs. #(O) Exp. #(E)
A1A1
5000
4225
A1A2
3000
4550
A2A2
2000
1225
(O-E)
775
-1550
775
(O-E)^2 (O-E)^2/E
600625 142.1598
2402500 528.022
600625 490.3061
χ2
1160.488
We estimated 1 parameter (p) from the data (3 genotypes) .
We do not count q as an estimated parameter because it is dependent
on p (i.e. q  1  p )
 df  3  1  1  1
The critical value (Table value) of χ2 at 1 df and
at α=0.05 is approx. 3.84.
Conclusion: Because the calculated value of χ2
(1160.49) is greater than the critical value (3.84),
we reject the null hypothesis and accept the
alternative (Not in HWE).
Problem 5 (Time 10 min) (2 pts)
Based on the observed genotype counts in problem
3, test whether the population that had been sampled
is in HWE. What are some possible explanations for
the observed results?
Genotype Count
A1A1
19
A2A2
17
A3A3
14
A1A2
52
A1A3
57
A2A3
41