Download Module III - Mendelian genetics and mating

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistical inference wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Transcript
Module III – Hypothesis Testing
Sampling distribution of the sample mean is the distribution of all possible sample
~
means ( x ) of a given sample size from a population.
The larger the sample size, the smaller the sampling error tends to be in estimating a
~
population mean,  , by a sample mean x .
~
For samples of size n, the mean of the variable x is denoted by  ~ , and  ~ =  for
x
x
each sample size.
The population standard deviation is denoted by  .
~
For samples of size n, the standard deviation of the variable x is denoted by  ~ , and
x
~ 
x
Note:

n
for each sample size.
~
 If the population variable is normally distributed then x is normally distributed
regardless of the sample size.
~
 If the sample size is large, then x is approximately normally distributed, regardless of
the distribution of the population variable.
Inferential Statistics
68.26-95.44-99.74 Rule:
If the population variable is normally distributed, the 68.26-95.44-99.74 Rule states that
95.44% of all possible observations lie within 2 standard deviations to either side of the
~
mean. If we apply this rule to the variable x , 95.44% of all samples of size n have the

mean within 2 ~  2
of  . Or, equivalently, 95.44% of all samples of size n have
x
n
~
~
the property that the interval [ x  2 ~ , x  2 ~ ] may or may not contain  .
x
x
~
~
[ x  2 ~ , x  2 ~ ] is called the confidence interval and 95.44% is the confidence level
x
x
that the interval may or may not contain 
Hypothesis Tests
Terminology:
A hypothesis is a statement that something is true
Null hypothesis is a hypothesis to be tested
Notation: ( H 0 :    0 )
Alternative hypothesis is a hypothesis to be considered as an alternative to the null
hypothesis
Notation:
( H a :    0 ) - two-tailed test
( H a :    0 ) - left-tailed test
( H a :    0 ) - right-tailed test
Basic Logic behind carrying out the hypothesis test for a normally distributed
population variable:
~
 If a sample mean x is approximately equal to the population mean  , we are
inclined not to reject H 0 .
~
 If a sample mean x differs too much from the population mean, we are inclined to
reject H 0 and conclude that the alternative hypothesis is true.
~
 Using the “95.44%” part of the 68.26-95.44-99.74 Rule, if a sample mean x is more
than two standard deviations from the population mean  , we reject the null hypothesis
( H 0 :    0 ) , and conclude the alternative hypothesis ( H a :    0 ) .
Properties of Chi-square (  2 ) curves
 The total area under the  2 - curve equals 1
 A  2 - curve starts at 0 on the horizontal axis and extends to the right asymptotically
to the horizontal axis.
 A  2 - curve is right-skewed
 As the number of degrees of freedom ( df  n  1 , where n is the sample size)
becomes larger,  2 - curves look increasingly like normal curves.
df = 5
df = 10
df = 19
A variable is said to have a chi-square distribution if its distribution has a the shape of a
chi-square curve
Chi-square goodness of fit test
This procedure can be used to perform a hypothesis test about the distribution of a
qualitative variable or a discrete quantitative variable that has only finitely many possible
values.
Example:
A violent crime is classified as murder, forcible rape, robbery, or aggravated assault.
Distribution of violent crimes in the United States in 1995
Type of violent crime
Murder
Forcible rape
Robbery
Agg.. assault
Relative frequency
0.012
0.054
0.323
0.611
1.000
Sample results for 500 randomly selected violent-crime reports from last year
Type of violent crime
Murder
Forcible rape
Robbery
Agg.. assault
frequency
9
26
144
321
500
Population – last years reported violent crimes
Variable – type of violent crime
Possible values of variable – murder, forcible rape, robbery, and aggravated assault.
Null hypothesis to be tested:
H 0 : Last year’s violent-crime distribution is the same as the 1995 distribution
Alternative hypothesis:
H a : Last year’s violent-crime distribution is different from the 1995 distribution
Expected frequencies if last year’s violent-crime distribution is the same as the 1995
distribution:
Expected frequency E = np, where n is the sample size and p is the relative frequency
from the distribution of violent crimes in 1995.
Type of violent crime
Murder
Forcible rape
Robbery
Agg.. assault
Relative frequency (p)
0.012
0.054
0.323
0.611
Expected frequency (E = np)
500(0.012) = 6
500(0.054) = 27
500(0.323) = 161.5
500(0.611) = 305.5
Question: Do the frequencies observed last year match the expected frequencies?
To answer this question, we perform the following steps:
 Determine whether the expected frequencies satisfy the assumptions below:
1. All expected frequencies are 1 or greater. (Yes)
2. At most 20% of the expected frequencies are less than 5. (none of the expected
frequencies are less than 5)
 Decide the significance level,  . We will choose to perform the test at the 5%
significance level, or   0.05 .
(TYPE I ERROR: Rejecting the null hypothesis when in fact it is true.
The probability of making a Type I error is called the significance level,  , of a
hypothesis test)
 Compute the test statistic (  2 = the sum of the chi-square subtotals) that measures
how good the fit is.
Type of
Observed
Expected
Difference
Chi-square
2
violent crime frequency
frequency
O-E
subtotal
(O – E)
x
O
E
(O – E) 2 /E
Murder
9
6
3
9
1.5
Forcible rape
26
27
-1
1
0.037
Robbery
144
161.5
-17.5
306.25
1.896
Agg.. assault
321
305.5
15.5
240.25
0.786
500
500
0
4.219
From the table  2 =

(O – E) 2 /E = 4.219
 Find the critical value  2 with df = k – 1, where k is the the number of possible
values of the variable “type of violent crime”. In our example k = 4, so df = 4 – 1 =3 and
 02.05  7.815 from Table provided.
Do not reject H 0
Reject H 0
7.815
 The final step is to reject Null hypothesis if the value of the test statistic falls in the
reject region , otherwise, do not. In our example,  2 = 4.219, which falls in the do not
reject region.
Interpretation: At the 5% significance level, the data do not provide sufficient
evidence to conclude that last year’s violent-crime distribution differs from the 1995
distribution.
Reference: Elementary Statistics by Neil Weiss, 5th /6th edition