Download File

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Eigenstate thermalization hypothesis wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Hypothesis Testing
(Statistical Significance Testing)
Two Points to Emphasize:
1. Hypothesis testing ALWAYS involves
a null hypothesis (H0) whether one is
explicitly stated or not.
2. The significance level (i.e., -level) is
chosen BEFORE the sample statistic
is calculated.
We have data from one of the General Social Surveys
(GSS), a random sample of 2,013 individuals.
Therefore, we know that the Central Limit Theorem can
be applied. We are social psychologists interested in
people’s image of themselves (i.e., self concept).
Each participant in the survey was given a card with the
following scale on it:
1—2—3—4—5—6—7
Respondents were asked to rate their own personal
appearance. On this scale, 1 meant “way below
average,” 7 meant “way above average,” and 4 meant
“average.” Based on theories of self-concept, we
hypothesize that in general people consider themselves
“above average” in personal appearance.
In this example, the expectation that “People in general
rate their personal appearance as being above
average” is the alternate hypothesis (H1).
Note that this is a statement about the universe
(“People in general…”), not the sample. Thus, the
symbolic representation of the alternate hypothesis is
expressed as:
H1: Y > 4.00
Why “greater than 4.00”?; “greater than” because our
alternate hypothesis states that the rating is above
average; “4.00” because, on this scale, 4.00 is
average.
Our null hypothesis (H0), whether we state it or not, is
“People in general rate their personal appearance as
average.” This is the NEGATION of the alternate
hypothesis. (To state that “People in general rate their
appearance as below average” is to specify another
alternate hypothesis, not a null hypothesis.)
Symbolically, the null hypothesis is:
H0: Y = 4.00
1—2—3—4—5—6—7
Below
Average
Average
Above
Average
No special statistical test is needed to test this null
hypothesis since, with 2,013 cases in a random sample,
we can assume that the Central Limit Theorem
applies. Thus we can use our knowledge of the normal
curve in testing this hypothesis.
Let’s set the significance or -level for our hypothesis
test at  = 0.05.
Since we can assume that the Central Limit Theorem
applies in this case, we know that the sampling
distribution of all possible (sample) mean selfappearance ratings is normally distributed. In other
words, we can use Appendix 1, pp. 540-542.
With Appendix 1, we can identify the critical value. We
are making a test with an -level of 0.05, meaning that
we want only a 5 percent chance of wrongly rejecting the
null hypothesis (H0).
Alpha = 0.05 means 5 percent of the total area under
the normal curve. Since our alternate hypothesis (H1) is a
directional one, pointing to scale values ABOVE the
average score of 4.00, we are only dealing with the
RIGHT HALF of the sampling distribution.
We are looking for sample mean self-appearance ratings
that are so extreme that they cannot be explained chance
alone—the (un)luck of the draw. We are looking for those
theoretically possible sample means that could occur by
chance only 5 percent or less than 5 percent of the time.
This means that we identify a region of rejection in the
right tail of the sampling distribution that contains 5
percent of the area under the normal curve. Note: alpha
= 0.05 means 5 percent. (Alpha stands for “area”;  =
0.05 means 5 percent of the area in the tail.)
Searching Appendix 1, we find in Column C the areas in
the tails of 0.0505 in Row 1.64 and 0.0495 in Row 1.65.
Interpolating (splitting the difference in this case), we
determine the critical value of z to be + 1.645.
Next, we need to calculate the mean self-assessed
appearance score in our sample data, locate it on the
sampling distribution of mean self-appearance scores,
convert it to a z-value, and compare this z-value to the
critical value + 1.645 that we have just found.
Summing the 2,013 self-appearance scores from our
sample respondents and dividing that sum by 2,013
produced a sample mean of 4.90. This is 0.90 scale units
above the average self-appearance rating of 4.00, but is
this sufficiently greater than the average rating score to
conclude that a general trend exists (in the universe)? If
not, we must settle on chance—the luck of the draw—as
the explanation for getting 4.90 in a random sample when
the actual (unknown) rating in the population is probably
close to 4.00.
Under the null hypothesis, the mean of the sampling
distribution (Y) is PRESUMED (initially) to be 4.00
(because this would be the population mean if people in
general rate their personal appearance as “average”).
Our sample mean, 4.90, is 0.90 steps to the right of the
presumed mean of the sampling distribution. If we travel
down the curve toward the right to location + 0.90,
where are we in z-values on the x-axis below?
Our conversion factor is the estimated value of the
standard error (here, technically called the standard
error of the mean):
sY
̂ 
N
In the present example, N = 2,013. We need to
calculate the standard deviation for the sample. Its
value turns out to be 1.153. Therefore, we estimate the
standard error to be:
ˆ 
1.153
2,013
ˆ 
1.153
44.866
ˆ  0.026
To decide whether to reject or not reject the null
hypothesis, we need to determine whether a difference
of + 0.90 (4.90 – 4.00) sends us to a z-location at or
beyond zCV = + 1.645. To find out, we divide this
difference by the value of the standard error. The
equation is:
Y  Y
z
ˆ Y
Here, 0.90 divided by 0.026 produces z = + 34.62. This
z-location is WAY BEYOND the critical value of z
(i.e., + 1.645). In other words, we are WELL INSIDE the
region of rejection. This means that a sample mean
personal appearance rating of 4.90 is likely to occur by
chance LESS THAN 5 percent of the time.
Thus, we REJECT the hull hypothesis and conclude
that we CAN infer from our sample data a general
tendency for people to rate their personal appearance as
better than average.
Because we chose an -level of 0.05, there is a 5 percent
chance that we have WRONGLY REJECTED the null
hypothesis (i.e., ruled out chance as the explanation for
our sample statistic in favor of a “true” general tendency).
There are two types of errors that can arise in hypothesis
(significance) testing. We have dealt only with Type I ()
errors. There is also something called a Type II error.
Furthermore, there are two different types of tests of
null hypotheses. We have made what is called a onetailed test, meaning that we located the region of
rejection entirely within one tail of the sampling
distribution (the right). This was dictated by our
alternate hypothesis which directed us to the portion of
the sampling distribution where all theoretically possible
sample means were greater than the (presumed
universe) mean (hence the “” sign in symbolic
expression of our null hypothesis).
Another way of saying this is that directional alternate
hypotheses ALWAYS dictate making one-tailed
significance tests of null hypotheses.
What is a directional alternate hypothesis? It is a
hypothesis about the parameter in some universe
(population) expressed in language of “greater than” or
“less than” (symbolically, with “” and “” signs).
To state a hypothesis this way, we need to know a lot
about the subject we are studying. If we don’t, then we
are better off formulating a non-directional (alternate)
hypothesis.
In our self-assessed appearance example, this would
be: “In general, people DO NOT rate themselves as
average in appearance.” Notice that this does not state
precisely how people rate themselves, only that they
resist defining themselves as average.
The null hypothesis would be the same as before:
“People in general rate themselves as average in their
personal appearance.” Symbolically, the non-directional
hypothesis would be represented as:
H1: Y  4.00
and the null hypothesis as:
H0: Y = 4.00
It is the way the ALTERNATE HYPOTHESIS is worded
(NOT the null hypothesis) that determines whether a
directional or a non-directional test is called for.
A NON-DIRECTIONAL alternate hypothesis dictates
that we perform a two-tailed significance test.
Whereas in the one-tailed test the region of rejection is
located entirely in one tail (either the left or the right) of
the sampling distribution, in the two-tailed test the
region defined by the -level must be SPLIT EQUALLY
between the two tails.
For instance, if  = 0.05 and the region of rejection
were thus 5 percent of the area under the sampling
distribution curve, then we would need to locate 2.5
percent of the region in the left tail AND 2.5 percent in
the right tail. This is precisely what we did in the case
of estimation in determining lower and upper
confidence limits.
To identify TWO critical values (one establishing the left
region of rejection and the other establishing the right
region of rejection), we need to find the value of z
beyond which 2.5 percent of the area remains in each
tail. Whenever the Central Limit Theorem holds, we can
use Appendix 1, pp. 540-542.
In Appendix 1, we find in Column C area = .0250 (2.5
percent) in Column C. This identifies the value of z as
1.96 (Column A).
Therefore, the z-value where the left tail begins is
- 1.96, and the z-value where the right tail begins is
+ 1.96.
A value for a test statistic that is LESS THAN - 1.96 or
one that is GREATER THAN + 1.96 would be so
unlikely to occur by chance at the 0.05 level that we
would decide to REJECT a null hypothesis.
Conversely, any test statistic whose z-value is
BETWEEN - 1.96 and + 1.96 would result in our
deciding NOT to reject the null hypothesis.
With small random samples, we cannot assume that
the Central Limit Theorem applies. Fortunately,
someone has worked out a series of sampling
distributions for such situations. These are the socalled “Student’s t” distributions (Appendix 2, p. 543).
The specific sampling distribution of t is a function of the
number of degrees of freedom, here simply sample
size less one (i.e., N - 1).
In other words, with N values, N - 1 are free to take any
value, but the final value (the Nth value) is FIXED in
order to equal the sum of the N values.
If we have three numbers constrained to sum to 10, any
two can take any value, but the third number must have
the value to make the sum of the three equal to 10. For
example, if X1 = 6 and X2 = 11, then X3 must equal - 7.
6 + 11 + (-7) = 10
Two of the values are “free,” but the third is not. Just
remember N - 1 degrees of freedom in a sample of size
N.
As you can see in Appendix 2, once sample size (N)
exceeds 121 (i.e., df  120) the Student t distribution
becomes normally shaped.
Using the same data set that you used in your first SAS
exercise comprised of 63 randomly-selected cities in
the U.S., the question is: Was there a decrease in the
population of central cities in the U.S. between 1960
and 1970? The alternate hypothesis is that there was:
H1: average population change < 0.0
H1: Y < 0.0
Therefore, the null hypothesis is:
H0: Y = 0.0
Since the percentage change in population is expected
to be a negative number (percentage less than 0.0),
this one-tailed test demands a critical value and region
of rejection in the left tail.
The sample mean is - 1.26 percent, meaning that there
was a 1.26 percent decline in population in the sample.
The hypothesis, however, is about the universe, that is,
that the trend probably holds for ALL U.S. cities. The
sample standard deviation was 6.32 percent. Let's set
alpha at 0.05. Sample size is 63. Therefore, there are
62 degrees of freedom.
What is the critical value of t for this problem?
We find the .05 column for the one-tailed significance
level (the second column) and look for row df = 62.
Since there is no such row in the table, we find this
critical value by (1) interpolating and then (2) adding a
negative sign.
Sixty-two degrees of freedom is 2/60 of the way between
1.671 and 1.658. Multiplying (2/60) times 0.013 (the
difference between 1.671 and 1.658) produces 0.0004.
Decrementing 1.671 by 0.0004 yields 1.6706, or 1.671
rounded off. Since our region of rejection lies entirely in
the left tail, we supply the negative sign. Thus the
critical value is:
t0.05 = - 1.671
We begin by estimating the standard error in the same
way:
sY
̂ 
N
ˆ 
6.32
63
ˆ 
6.32
7.937
ˆ  0.796
Using this “exchange rate,” we can now convert
percentage differences in population change into t-units.
The algorithm is:
Y  Y
t
ˆ Y
(1.26)  (0.0)
t
0.796
 1.26
t 
0.796
t  1.582
Thus, location on the curve - 1.26 percent translates to
a location of - 1.58 in t-units on the underlying x-axis.
The question is: How likely are we to land there if the
true difference in population of central cities between
1960 and 1970 was 0.0? The answer is: It is likely that
we could land there, i.e., find a sample difference of
- 1.26 percent when there actually was no overall
difference among all cities. That is, t = - 1.58 DOES
NOT lie in the region of rejection because - 1.58 does
not exceed t0.05 = - 1.671. Thus we CANNOT reject the
null hypothesis. There is no evidence in our sample
data to infer a loss of population in general among
central cities in the U.S. between 1960 and 1970.
Testing Single-Mean Hypotheses
A random sample of 29 college students studied an average of 2.5
hours per day with a standard deviation of 0.75 hours. Using
student's t distribution (Appendix 2, p. 543), test the null
hypothesis (H0) that students in general do not study at all.
Assume that  = 0.05 and perform a two-tailed test.
1.
Symbolically, what is the null hypothesis (H 0)?
__________
2.
Symbolically, what is the alternate hypothesis (H 1)?
__________
3.
What is the value of the standard error?
__________
4.
What is the value of t?
__________
5.
How many degrees of freedom in this problem?
__________
6.
What is the critical value of t0.05?
__________
7.
Do you reject or not reject the null hypothesis?
__________
Testing Single-Mean Hypotheses Answers
A random sample of 29 college students studied an average of 2.5
hours per day with a standard deviation of 0.75 hours. Using
student's t distribution (Appendix 2, p. 543), test the null
hypothesis (H0) that students in general do not study at all.
Assume that  = 0.05 and perform a two-tailed test.
1.
Symbolically, what is the null hypothesis (H 0)?
 = 0
2.
Symbolically, what is the alternate hypothesis (H 1)?
  0
3.
What is the value of the standard error?
0.139
4.
What is the value of t?
17.951
5.
How many degrees of freedom in this problem?
6.
What is the critical value of t0.05?
7.
Do you reject or not reject the null hypothesis?
29
 2.048
reject