Download F - UIC

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Testing a Hypothesis about means

The contents in this chapter are from Chapter 12
to Chapter 14 of the textbook.



Testing a single mean
Testing two related means
Testing two independent means
1
Testing a single mean


This chapter uses the gssft.sav data, which
includes data for fulltime workers only.
The variables are:



Hrsl: number of hours worked last week
Agecat: age category
Rincome: respondents income
2
Example




The left plot is a histogram
of the number of hours
worked in the previous
week for 437 college
graduates
The peak at 40 hours is
higher than you would
expect for a normal
distribution.
There is also a tail toward
larger values of hours
worked.
It appears that people are
more likely to work a long
week than a short week.
3
Example basic statistics

S ta ti s ti cs
Number of hours worked last week
N
Valid
Missing
Mean
Median
Mode
Std. Deviation
Variance
Skewness
Std. Error of Skewness
Minimum
Maximum
437
2
47.00
45.00
40
10.207
104.193
1.240
.117
15
89


The sample mean (47) is
not equals to the sample
median (45). The
distribution is right-skewed
that is consistent with
Sk=1.24
The distribution is not
normal.
How would you go about
determining if 47 is an
unlikely value if the
population mean to be 40.
4
Testing a single mean
The variance is unknown, H0 :   0 , H1 :   0
 The statistic
X  μ0
t n
s

The rejection region
t  t n1 (α / 2) or t  t n1 (α / 2)

The critical value of t can be found in many textbooks
or SPSS.
5
Testing a single mean


The standard error of the mean is 10.2 / 437  0.49
The t -statistic
47  40
t  427
 14.3
10.207
The 95% confidence interval of the difference is
6.04  x    7.96
O ne -S a mp le T e st
Test Value = 40
t
Number of hours
worked last week
14.326
Sig.
(2-tailed)
df
436
.000
Mean
Difference
6.995
95% Confidence
Interval of the
Difference
Lower
Upper
6.04
7.96
6
The t-distribution





The statistic used in the previous page follows a tdistribution with n-1 degrees of freedom.
This is a 2-tailed test.
The p-value is the probability that a sample t
value is greater than 14.3 or less than -14.3.
The p-value in this example is less than 0.0005.
We can conclude that it’s quite unlikely that
college graduates work a 40-hour on average.
7
Normal approximation

The degree of freedoms in this test is 437-1=436. The
t distribution is very close to the normal. The critical
values or confidence interval can be determined based
on the normal population.
8
The 95% confidence interval is given by
s
s  
10.207
10.207 

 x  1.96  , x  1.96    47  1.96 427 ,47  1.96 427 

 

 46.0430,47.9570
D es cr i pt iv es
Number of hours
worked last week
Mean
95% Confidence
Interval for Mean
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis
Lower Bound
Upper Bound
Statistic
47.00
46.04
Std. Error
.488
47.96
46.23
45.00
104.193
10.207
15
89
74
10
1.240
2.356
.117
.233
9
9
Hypothesis Testing

Two kinds of errors


Type I error: 以真为假
Type II error:以假为真
10
Hypothesis Testing

Two kinds of errors

The p-value is the probability of getting a test
statistic equal to or more extreme than the
sample result, given that the null hypothesis is
true.
If the p - value  α, you do not reject H 0
If the p - value  α, you reject H 0
If the p - value is low, then H 0 must go
11
Testing a Hypothesis about Two related means




We use the endoph.sav data set provided by the
author.
Dale et al. (1987) investigated the possible role
of   endorphins in the collapse of runners.
  endorphins (  内啡肽) are morphine (吗啡)like substances manufactured in the body.
They measured plasma (血浆)   endorphins
concentrations for 11 runners before and after
they participated in a half-marathon run.
The question of interest was whether average
  endorphins levels changed during a run.
12
Testing a Hypothesis about Two related means
C as e S um ma r ie sa
before
4.30
4.60
5.20
5.20
6.60
7.20
8.40
9.00
10.40
14.00
17.80
11
after
29.60
25.10
15.50
29.60
24.10
37.80
20.20
21.90
14.20
34.60
46.20
11
1
2
3
4
5
6
7
8
9
10
11
Total N
a. Limited to first 100 cases.
diff
25.30
20.50
10.30
24.40
17.50
30.60
11.80
12.90
3.80
20.60
28.40
11
13
Testing a Hypothesis about Two related means

This problem is recommended to use the pairedsamples t test.
O ne -S a mp le St at i st ic s
N
diff
11
Mean
18.7364
Std.
Deviation
8.32974
Std. Error
Mean
2.51151
O ne -S a mp le Te st
Test Value = 0
diff
t
7.460
df
10
Sig.
(2-tailed)
.000
Mean
Difference
18.73636
95% Confidence
Interval of the
Difference
Lower
Upper
13.1404
24.3324
14
Testing a Hypothesis about Two related means



The average difference is 18.74 that is large
comparing with S.D.=8.3.
The 95% confidence interval for the average
difference is (13.14, 24.33) that does not includes
the value of o, you can reject the hypothesis.
An equivalent way or testing the hypothesis is the
t test. The p-value is less than 0.0005, we should
reject the hypothesis.
15
Testing a Hypothesis about Two related means
P ai re d S am p le s S ta ti st i cs
Mean
N
Std.
Deviation
Std. Error
Mean
Pair 1
before
after
8.4273
27.1636
11
11
4.24832
9.67794
1.28092
2.91801
P ai re d S am pl e s Co r re la ti o ns
N
Correlation
Sig.
Pair 1
before & after
11
.515
.105
P ai re d S am p le s Te s t
Pair 1
before - after
Paired Differences
Mean
Std. Deviation
-18.73636
8.32974
Std. Error Mean
95% Confidence Interval
of the Difference
t
df
Sig. (2-tailed)
2.51151
Lower
Upper
-24.33236
-13.14037
-7.460
10
.000
16
Testing a Hypothesis about Two related means









diff Stem-and-Leaf Plot
Frequency Stem & Leaf
1.00
0. 3
4.00
1 . 0127
5.00
2 . 00458
1.00
3. 0
Stem width:
10.00
Each leaf:
1 case (s)
Each difference uses only the first two digits with rounding.
17
Testing a Hypothesis about Two related means



All the differences are positive. That is, the after values are
always greater than the before values.
The stem-and-leaf plot doesn’t suggest any obvious
departures from normality.
A normal probability plot, or Q-Q plot, can helps us to test
the normality of the data.
18
Normal Probability Plot



For each data point, the Q-Q plot shows the
observed value and the value that is expected if
the data are a sample from a normal distribution.
The points should cluster around a straight line if
the data are from a normal distribution.
The normal Q-Q plot of the difference variable is
nor or less linear, so the assumption of normality
appears to be reasonable.
19
Normal Probability Plot
20
Testing Two Independent Means



This section uses the
gss.sav data set.
Consider the number of
hours of television viewing
per day reported by
internet users and nonusers.
It is clear that both are not
from a normal distribution.
21
Testing Two Independent Means

We find that there are some problems in the data.



There are people who report watching television for 24
hours a day!! It is impossible.
Watch TV is not a very well-defined term. If you have
the TV on while you are doing homework, are you
studying or watching TV?
The observations in these two groups are
independent. This fact implies “two independent
means”.
22
Testing Two Independent Means
D es cr i pt iv es
Hours per day
watching TV
Statistic
Std. Error
Mean
95% Confidence
Interval for Mean
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis
Mean
Skewness
Kurtosis
Lower Bound
Upper Bound
Use Internet?
No
Yes
3.52
2.42
3.26
2.22
3.77
2.63
3.22
3.00
7.801
2.793
0
24
24
2
2.164
7.946
.128
.112
.224
2.18
2.00
4.604
2.146
0
20
20
2
3.066
16.086
.106
.120
.240
23
Testing Two Independent Means


Two sample means, 2.42 hours of TV viewing and
3.52 hours for those who don’t use the internet. A
difference is about 1.1 hours.
The 5% trimmed means, which are calculated by
removing the top and bottom 5% of the values,
are 0.3 hours less for both groups than the
arithmetic means. The trimmed means are more
meaningful in this case study.
24
Testing Two Independent Means

For testing the hypothesis
H 0: μ1  μ2 , H1: μ1  μ2

There are several cases:
 1   2 are known
 1 and  2 are known, but  1   2
 1   2 are unknown
 1   2 , are unknown
25
Testing Two Independent Means
Z
where
( X 1  X 2 )  ( μ1  μ 2 )
σ 12 σ 22

n1 n2
~ N (0,1)
X 1  mean of the sample taken from population 1
μ1  mean of population 1
σ 12  variance of population 1
n1  size of the sample taken from population 1
X 2  mean of the sample taken from population 2
μ 2  mean of population 2
σ 22  variance of population 2
n2  size of the sample taken from population 2
26
Testing Two Independent Means

In most cases the variances are unknown.
t
( X 1  X 2 )  ( μ1  μ 2 )
1
1 
s   
 n1 n2 
~ t n1  n2 2
2
p
where
(n1  1) s12  (n2  1) s 22
s  pooled variance 
(n1  1)  (n2  1)
2
p
X 1  mean of the sample taken from population 1
s12  variance of the sample taken from population 1
n1  size of the sample taken from population 1
X 2  mean of the sample taken from population 2
s 22  variance of the sample taken from population 2
n2  size of the sample taken from population 2
27
Testing Two Independent Means

Output from t test for TV watching hours
Independent Samples Test
Hours per day
watching TV
Equal variances assumed
20.261
.000
6.455
884
Levene's Test for
F
Equality of Variances Sig.
t-test for Equality
t
of Means
df
Sig. (2-tailed)
Mean Difference
Std. Error Difference
95% Confidence Interval
of the Difference
Lower
Upper
Equal variances
not assumed
6.569
870.228
.000
.000
1.092
1.092
.169
.166
.760
1.424
.766
1.418
28
Testing Two Independent Means

In the output, there are two difference versions of
the t test.




One makes the assumption that the variances in the
two populations are equal; the other does not.
Both tests recommend to reject the hypothesis with a
significant level less than 0.0005.
The two-tailed test used in the two tests.
Testing the equality of two variances will be given
next section.
29
Testing Two Independent Means

The 95% confidence interval for the true
difference is



[0.77, 1.42] for equal variances not assumed,
[0.76, 1.42] for the equal variances assumed.
Both the intervals do not cover the value 0, we
should reject the hypothesis.
30
F test for equality of Two Variances
2
1
2
2
s
F
~ Fn1 1,n2 1
s
where
s12  variance of sample 1
s 22  variance of sample 2
n1  size of the sample taken from population 1
n2  size of the sample taken from population 2
n1  1  degree of freedom from sample 1
n2  1  degree of freedom from sample 2
31
F test for equality of Two Variances
H 0 :  12   22
H1 :  12   22
Reject H 0 if F  Fn1 1,n2 1 (1   / 2), or F  Fn1 1,n2 1 ( / 2)
32
F test for equality of Two Variances
33
F test for equality of Two Variances

From the results below we have
2.4912
F
 1.7821
2
1.866

The critical value is close to 1.00 that implies to reject the
hypothesis that two populations have the same variance.
Group Statistics
Hours per day
watching TV
Use Internet?
No
Yes
N
469
411
Mean
3.40
2.35
Std.
Deviation
2.491
1.866
Std. Error
Mean
.115
.092
34
Levene’s test for equality of variances




The SPSS report used the Levene’s test (1960)
that is used to test if k samples have equal
variances.
Equal variances across samples is called
homogeneity of variance.
The Lenene’s test is less sensitive than some other
tests.
The SPSS output recommends to reject the
hypothesis.
35
Effect Outliers


Some one reported watching TV for very long time,
including 24 hours a day.
Removed observations where the person watch TV for
more than 12 hours.
I nd ep e nd en t S am p le s Te s t
Hours per day
watching TV
Levene's Test for
Equality of Variances
t-test for Equality
of Means
Equal variances assumed
25.449
.000
7.013
878
F
Sig.
t
df
Sig. (2-tailed)
Mean Difference
Std. Error Difference
95% Confidence Interval
of the Difference
Lower
Upper
Equal variances
not assumed
7.145
857.737
.000
.000
1.053
1.053
.150
.147
.758
1.347
.763
1.342
36
Effect Outliers


The average difference between the two groups reduced
from 1.09 to 1.05.
The conclusions do not have any change.
I nd ep e nd en t S am p le s Te s t
Hours per day
watching TV
Levene's Test for
Equality of Variances
t-test for Equality
of Means
Equal variances assumed
25.449
.000
7.013
878
F
Sig.
t
df
Sig. (2-tailed)
Mean Difference
Std. Error Difference
95% Confidence Interval
of the Difference
Lower
Upper
Equal variances
not assumed
7.145
857.737
.000
.000
1.053
1.053
.150
.147
.758
1.347
.763
1.342
37
Introducing More Variables


Let us consider more related variables to study on the TV
watching time
Consider age, education, working hours.
G ro up St at i st ic s
Age of respondent
Highest year of school
completed
Number of hours worked
last week
Number of hours spouse
worked last week
Use Internet?
No
Yes
No
Yes
No
Yes
No
Yes
734
653
733
Mean
51.75
40.79
12.05
Std.
Deviation
18.857
13.212
2.702
Std. Error
Mean
.696
.517
.100
652
14.55
2.523
.099
356
532
171
238
40.80
43.74
40.98
43.38
13.960
13.481
11.990
12.498
.740
.584
.917
.810
N
38
Introducing More Variables


We reject the hypothesis that in the population the two
groups have the same average age, education, and hours.
Internet users are significantly younger, better educated,
and work more hours per week.
I nd ep e nd en t S am p le s Te s t
Levene's Test for
Equality of Variances
Age of respondent
Highest year of school
completed
Number of hours worked
last week
Number of hours spouse
worked last week
Equal variances
Equal variances
assumed
Equal variances
Equal variances
assumed
Equal variances
Equal variances
assumed
Equal variances
Equal variances
assumed
t-test for Equality of Means
95% Confidence
Interval of the
Difference
Lower
Upper
9.222
12.692
F
131.217
Sig.
.000
t
12.388
df
1385
Sig.
(2-tailed)
.000
Mean
Difference
10.957
Std. Error
Difference
.885
12.637
1314.977
.000
10.957
.867
9.256
12.658
assumed
not
7.327
.007
-17.752
1383
.000
-2.503
.141
-2.779
-2.226
-17.823
1379.733
.000
-2.503
.140
-2.778
-2.227
assumed
not
.441
.507
-3.136
886
.002
-2.936
.936
-4.774
-1.099
-3.114
742.904
.002
-2.936
.943
-4.787
-1.085
assumed
not
1.050
.306
-1.948
407
.052
-2.400
1.232
-4.822
.022
-1.961
375.077
.051
-2.400
1.224
-4.806
.006
assumed
not
39
Introducing More Variables
40
Related documents