Download Ch9 - OCCC.edu

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Psychometrics wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapters 9 Hypothesis Testing and Statistical Inference
-now we want to extend our statements about a population parameter; we use our
knowledge of sampling and sampling distributions to reject or fail to reject whether or not
we think a certain population has certain characteristics.
Ex: We may think that the average income level in a certain community is around
$45,000, but if we take an appropriate random sample and find out that the income is
much lower we now will have a basis to prove this false (with a certain level of
confidence  think and recall confidence intervals).
A. Developing and Defining the Null and Alternative Hypothesis
1. Null Hypothesis – Ho – tentative assumption about a population parameter. This is
what you are testing. It can only be rejected or failed to be rejected with a certain level of
confidence cannot be accepted.
2. Alternative Hypothesis – Ha – opposite conclusion of the null hypothesis.
3. Types of Hypotheses and How to Test Them
a. Testing a Research Hypothesis – this is when you try to prove something based on
experimental evidence. What you are trying to prove is generally your Ha. Generally
your null hypothesis is the status quo and you want to show how under your experimental
settings the status quo is not achieved. You are trying to reject the null to support your
research being done.
Ex. Suppose you look at performance and Gatorade. You want to show that the use of
Gatorade helps with performance.
Ho: Person performs the same using Gatorade
Ha: Person performs better than the status quo while using the sports drink
b. Testing the validity of claim – in this case you want to test a claim being made. We
assume that the claim is true unless there is sample evidence that supports the alternative.
No can be both a one or two sided test.
Ho: Can have u > uo, u < uo, or u = uo
Ha: Opposite of claim above.
c. Decision Situation – this is when you are trying to determine a course of action. You
would generally make some decision based on whether or not you thought Ho or Ha were
proven to be true with a degree of confidence.
1
4 Testing Conditions in General  a and b are one sided while c is two-sided
a.
Ho: u ≥ uo
Ha: u < uo
b.
Ho: u ≤ uo
Ha: u > uo
c.
Ho: u = uo
Ha: u ≠ uo
Note: uo is just some proposed value that you are testing. It is the population parameter
that you think the data takes on.
-A two sided test is concerned whether or not the value is different from some proposed
value, while a one sided test is concerned whether or not the testing value is larger or
smaller than some proposed value.
5. Type I and Type II errors
Table 1:
Accept Ho
Reject Ho
Ho is actually True
Correct Conclusion
Type I error
Ha is true
Type II error
Correct Conclusion
-when we designate α we are actually designating the amount of type I error that is going
to be accepted. So when alpha is equal to 0.05, we accept that 5% of the time we expect
that we will reject Ho when it is actually true.
-many experiments actually don’t try and control for type II error and for this reason we
do not accept Ho, but rather we fail to reject it.
A. Testing Hypothesis using the Critical Value Method and Test Statistic – Mean
with σ is known
1. Formulating the Hypothesis Test
a. Test Statistic – this is the value that you compute from the sample. It is used to decide
whether or not to reject of fail to reject the null hypothesis.
Example: one example is the z-stat for testing a sample mean. In this case your test
_
statistic takes on the value of: z = ( x - μ) / (σ /
n)
b. Critical Value – this is the value that you are testing against. It is the value of z that
you compare your test statistics to.
2
Graph 1: One Sided Test
Rejection region
for one sided test
-you need to note the level of
confidence and α to find the
appropriate critical value
zα – this is your critical value; if
you lie to the right of this you
reject Ho
Z
μ
Graph 2: Two Sided Test
Rejection regions
for two sided test
-you need to note the level of
confidence and α to find the
appropriate critical value
Z
-zα/2
zα/2 – these are your critical value;
if you lie to the right or the left of
these you reject Ho
μ
c. Example: Suppose you are told the mean score in a particular dept on a test is 70 with
σ = 2. If you take a sample of 20 students and find that the average is 75, do you have
enough evidence at the 95% confidence level to support this conclusion.
Step 1: set up the hypotheses
Ho: u = 70
Ha: u ≠70
-so this is a two sided test
Step 2: Find critical values  so we go to the normal table and find the z’s that give us α
= 0.05 or where 2.5% of the probability lies in the tails. These values are 1.96 and -1.96
 so if our test stat is below or above these we reject Ho in favor of Ha at the 95% level
of confidence.
_
Step 3: Find the test stat  z = ( x - μ) / (σ /
n ) = ( 75 – 70 ) / (2 /
20 ) = 11.18
Step 4: Compare test stat to critical value and analyze
Since we have 11.18 >> 1.96 we can conclude at the 95% level of confidence that the
mean score in the dept is most likely not 70. It is highly unlikely to find such a large
difference. Most likely the population parameter (test score) is higher, but recall that 5%
of the time we would expect outliers, so it might be that the sample we obtained was in
fact an anomaly and the avg is in fact 70.
3
2. Testing Hypothesis using the p-Value Method
-in this case we use p-value instead of the test statistic method. It is almost identical to
the above method, but instead of using and comparing a test statistic to some critical
value we simply compare the probability of the test statistic to the p-value.
p-value - the probability that the test statistic would take on some extreme value than that
which is actually observed. This assumes that the null hypothesis is true. For a one sided
test this is simply α and for a two sided test it is α/2.
Ex: Using the same example as above we find that the p-value is α/2 since the test is two
sided. Therefore we find that 0.05/2 = 0.025.
We then find our test statistic’s p-value and compare it to the p-value above. Our test
_
statistic is still z = ( x - μ) / (σ / n ) = ( 75 – 70 ) / (2 / 20 ) = 11.18  so the p-value of
this z (taken from the standard normal table) is 1-0.9998 = 0.0002
Note: since the normal table only goes up to 3.49 we will use this value noting that for
11.18 it would be even smaller.
So 0.0002 < 0.025 so we reject Ho again and find that the test avg is most likely different
from 70. It should be the case that the results from the p-value method and the test
statistic method give the same results. If you don’t get the same results a mistake has
been made.
-when you consider a two-sided test we notice that it is almost exactly like constructing a
confidence interval and the finding a sample statistic and noticing whether or not the
sample statistic lies in that interval. If we construct a 95% confidence interval then 95%
of the time our sample data should lie within that interval. If it doesn’t we can mostly
likely assume that the proposed population parameter value is in fact incorrect.
3. Statistical significance – if we get a p-value that is smaller than the α, then we say that
the data are significant at the level of α.
Ex: so in our case above we find that the results above have a statistical significance at
the 0.05 level. In fact, since our p-value obtained is even smaller than 0.05 we find it to
be even more significant.
4. Problems with Inferencing and Confidence Intervals
a. Data may not be reliable – recall that before we talked about different sampling
techniques as well as how to NOT sample. If you produce or sample data that is
unreliable any tests or conclusions drawn from those test are also invalid.
b. The conditions proposed must be met: Recall that to use a z-table we should have data
taken from a SRS and it should follow a normal distribution. If we don’t have these
4
conditions we cannot use this type of test. Many times we must use different types of
distributions to perform hypothesis testing. The procedure we have used it still valid, but
we would not use the z-table, but another one.
c. One test, especially if the results are used to make an expensive decision, is generally
not enough to ensure that the decision is correct. The method should be covered again to
ensure reliability and the level of significance should be such that the results are beyond
conclusive. Instead of testing something at the 90% level or 0.10 level of significance,
doing it at the 99% level might be more appropriate.
d. The sample size should be duly noted. The smaller the sample the larger the variation
that should be present and if the population is very large then something that affects the
population overall should definitely be present in the sample. It might be the case that for
this reason practical significance may or may not show up as statistically significant
depending on the size of the sample and the nature of the population.
5. Power – the power of a test is 1 minus the probability of a type II error for the
alternative hypothesis.
B. Inference about a Population Mean- σ is unknown
Recall from before that we mentioned the t-test statistic and t-distribution. This is used
when there is an insufficient sample size (we will use < 30 ) and if σ is unknown. If we
are given this information we use the following:
1. Sampling where σ is unknown
-in this case we cannot assume that our sampling distribution follows a completely
normal distribution. Now we must use the t-distribution (sometimes called the students tdistribution) to make probability statements.
a. t-distribution – is a family of distributions that is based on degrees of freedom. Each
distribution with its degrees of freedom has its own features. As the degrees of freedom
(d.o.f) go up the distributions get closer and closer to the standard normal. This is b/c as
the d.o.f. go up the variability is reduced.
-we read the chart the same way as the standard normal with the only difference coming
from the fact that we now have d.o.f which is (n-1)
t-test stat =
_
the denominator is known as s x
-the only difference from before is that we don’t use our population standard deviation σ
because it is not known. If one thinks about it, if we don’t know the mean, it would be
impossible to know the standard deviation since the mean is required to get this value.
5
b. Confidence Intervals
_
_
_
i. Two-Tailed: x ± tα/2 s x = x ± tα/2 s /
n
_
recall that s2 = ∑ ( xi – x )^2 / n-1 ; this is just sample variance. To get s we simply take
the square root of s2 .
Two Sided Test
These are your t-values. They
show us the ranges we expect to
find the mean.
-you need to note the level of
confidence and α to find the
appropriate t-values for your CI.
t
-tα/2
μ
tα/2
***so the only time you use the t-distribution is in the case of a small sample and when
the population variance in unknown. In the text it implies that we MUST know the
population is normal. This is actually incorrect. We even have the CLT (central limit
theorem) in the text which verifies this. The CLT tells us as long as our sample size is
‘large’ our sampling distribution will be normal. Our rule for large is > 30. If we have a
normal population our sampling distribution is guaranteed to be normal no matter what
the sample size. So the times we will use a t-stat is if:
1) we use s in place of σ
2) if the sample is < 30
**one thing that can be used to assess the validity of this claim is to go to your t-table.
As n increases (ie look at n = 40, 50, 80, 1000, and beyond) you will note that the values
for the t-tables approach what we get from our Z-table. So for a 95% CI with our Z recall
that we get
± 1.96. If we look at our values from the t-table we can see for values of 1000 our t-stat
is 1.962. Very close to our Z-values. So in these cases it is not more efficient to use the
t-stat. So we will simply use our z-table
ii. One-Tailed: We can also have a one tailed confidence interval. In this case we are
simply find an upper or lower bound for our values. It takes the form of:
_
_
_
x ± tα s x = x ± tα s / n
Note: In this case we don’t divide our α by 2.
6
tα – this is one sided now, so we
only have one t-value for the CI.
It could also be a lower tail (ie to
the left of the mean.
One Sided Test
-Just as before you need to note the level of
confidence and α to find the
appropriate t-value
t
tα
μ
2. Hypothesis testing using the t-stat
-the method for both the test statistic and p-value method would be exactly the same. So
simply go back to the previous notes and recall that method. We will do an example
below to illustrate the method again. We will use the test statistic method predominantly
since for our p-value method we cannot calculate all p-values of all possible sample tvalues.
3. Examples:
a. Construct a 99% confidence interval for a mean when you are given the following and
the population standard deviation is unknown:
_
x = 18
s=3
n = 25
All of the conditions hold for using the t-stat. We don’t know the population standard
deviation and our sample size is less than 30. So our first thing to do is to find our tα/2.
We do this by going to the t-table at the back of the book and looking up the CI for 99%
with dof = n – 1 = 25 – 1 = 24.
This value is 2.787
Now we simply apply the formula as before.
_
_
_
x ± tα/2 s x = x ± tα/2 s /
19.672
n  18 ± 2.787*(3/ 25 )  18 ± 2.787*(0.6)  16.328 to
So we are 99% sure the true population mean lies between 16.33 and 19.67.
b. Using the same data construct a 95% one lower tailed confidence interval. This is the
same thing as finding a lower bound for the mean.
So our first thing to do is to find our tα. We do this by going to the t-table at the back of
the book and looking up the CI for 95% with dof = n – 1 = 25 – 1 = 24. This value is t=
2.064
7
_
_
x ± tα s x  note that we want a lower CI, so only subtract the mean
_
x - tα s /
n  18 – 2.064(0.6) = 16.76
So we are 95% sure that the true population mean must be greater than 16.76.
c. Suppose you are told the mean score in a particular dept on a test is 70 with s = 2. If
you take a sample of 25 students and find that the average is 75, do you have enough
evidence at the 95% confidence level to support this conclusion.
**note that we don’t have σ and n < 30..so use t-stats
Step 1: set up the hypotheses
Ho: u = 70
Ha: u ≠70
-so this is a two sided test
Step 2: Find critical values  so we go to the t-table and find the t’s that give us α = 0.05
or where 2.5% of the probability lies in the tails. These values are 2064 and -2.064  so
if our test stat is below or above these we reject Ho in favor of Ha at the 95% level of
confidence.
_
Step 3: Find the test stat  t = ( x - μ) / (s /
n ) = ( 75 – 70 ) / (2 /
25 ) = 12.50
Step 4: Compare test stat to critical value and analyze
Since we have 12.50 >> 2.064 we can conclude at the 95% level of confidence that the
mean score in the dept is most likely not 70. It is highly unlikely to find such a large
difference. Most likely the population parameter (test score) is higher, but recall that 5%
of the time we would expect outliers, so it might be that the sample we obtained was in
fact an anomaly and the avg is in fact 70.
Graphically:
Rejection regions for two
sided test. 12.5 is so far to the
right of 2.064 that we reject
the null.
t
2.064 – these are your critical value; if you lie
to the right or the left of these you reject Ho
-2.064
P-value method: If we used the p-value method to evaluate we cannot find the exact pvalue for 12.5, but we can approximate it. If we go to the dof row (which is 24 in this
case) we go as far over to the right as we can without going past our test statistic value
(12.5 in this case). If we do that we get all the way to the right side of the t-table and find
8
a t-stat of 3.745. Go to the t-table and go to dof=24 to verify the last value in that row is
3.745. If you then go up the column you see that this corresponds to a confidence level
of 99.9%. So we are at least 0.001 sure that we should NOT find a test stat of 12.5
Since 0.001 < 0.05 ( we are simply comparing our p-value to α ) then we reject Ho and
conclude there is enough evidence to suggest that the mean is not 70. Both methods must
once again give you the same results.
4. Notes about t-stat
a. robustness – this is the term that is used to show how well a test stands up to things that
violate the tests assumptions. In this case we assume that the data is symmetric. A t-stat
(and Z for that matter) doesn’t actually work well if this is violated. So if you have data
that is heavily skewed or have outliers then using a Z or t-stat might not be a good idea.
b. For very small sample sizes a t-stat is also not very good. A general rule is that if it is
not greater than 15, then maybe it is better to use some other stat or test.
c. Samples should be SRS. Recall that from chapter 11 (sampling distributions) we
assumed that all samples should be random when taken. This is a very important
assumption and cannot be violated for the t-stat to be a good/valid test.
C. Hypothesis Testing: Proportions
1. Hypothesis Test
Recall from before that we had essentially 3 types of hypotheses. We have the same
general set up here. We want to test to see if the sample proportion is
a.
Ho: p ≥ p0
Ha: p < p0
**to get the other one tailed test simply switch u1 and u2 around
b.
Ho: p = p0
Ha: p ≠ p0
Question: So when might this be used?
Answer: Polling data is a really good example for this type of data, finding disease rates,
error rates or anything that uses proportion.
c. Example: Suppose that we are given that the pass rate of a certain class is 60%. If we
look at the data from one specific class where the number of people who passed out of 30
was 20, test to see whether or not the pass rate was statistically significant. Use α = 0.05.
The first thing we do is set up our null and alternative
i) Hypotheses:
Ho: p = 0.60
9
Ha: p ≠ 0.60
ii) Analysis of Critical Region: Then we find our critical value for our test. Since our
alpha is 0.05 and we have a Z-stat that is two-tailed our critical values are +/- 1.96. So
we will reject Ho only if our test statistics lies farther out than either of these two values.
iii) Now we find the Test Statistic:
So Z = (p-p0) / σpo
p=
0.667
= 0.0894
So we find that Z = ( 0.667 – 0.60 ) / 0.0894 = 0.681
iv) Conclusion: Since our test statistics = 0.681 < 1.96, we fail to reject Ho and conclude
there is not enough evidence to suggest that the pass rate in the class is any different from
0.60 or 60%
Graphically
So from our Z-table
calculation we find this value
is 0.681, which is not in our
rejection region. So the
sample proportion of 0.667 is
not far enough away from
0.60 for us to think that the
class is any different from the
rest in terms of passing rate. -Z = -1.96
α/2
Z
tα/2=1.96
Zα/2
μ
10