• Study Resource
• Explore

Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Regression toward the mean wikipedia, lookup

Student's t-test wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Confidence interval wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Misuse of statistics wikipedia, lookup

Degrees of freedom (statistics) wikipedia, lookup

Transcript
```Using Statistics To Make
Inferences 3
Summary
Review the normal distribution
Z test
Z test for the sample mean
t test for the sample mean
3.11
Wednesday, 24 May 2017
8:13 AM
Goals
To perform and interpret a Z test.
To perform and interpret tests on the
sample mean.
To produce a confidence interval for the
population mean.
Know when to employ Z and when t.
Practical
Perform a t test.
Perform a two sample t test, in
preparation for next week.
3.22
Normal Distribution
0.80
0.70
0.60
0.50
Series1
Series2
Series3
0.40
0.30
Series
1
2
3
μ
0
0
1
σ
1
½
1
0.20
0.10
0.00
-6
-4
-2
0
2
4
6
Tables present results for the standard normal
distribution (μ=0, σ=1).
3.33
Use of Tables
Prob(1≤z≤∞) =
Prob(-∞≤z≤-1) =
0.16
Prob(1.96≤z≤∞) =
Prob(-∞≤z≤-1.96) =
0.025
Prob(2.58≤z≤∞) =
Prob(-∞≤z≤-2.58) =
0.005
68% of the observations
lie within 1 standard deviation
of the mean
95% of the observations lie
within 1.96 standard
deviations of the mean
99% of the observations lie
within 2.58 standard
deviations of the mean
3.44
Use of Tables
Prob(2.58≤z≤∞)
Prob(1.96≤z≤∞)
Prob(1≤z≤∞) ==
Prob(-∞≤z≤-2.58)
Prob(-∞≤z≤-1.96)
Prob(-∞≤z≤-1) ==0.16
0.005
0.025
Z
0.00
-0.01
-0.02
-0.03
-0.04
-0.05
-0.06
-0.07
-0.08
-0.09
-1.0
0.159
0.156
0.154
0.152
0.149
0.147
0.145
0.142
0.140
0.138
-1.9
0.029
0.028
0.027
0.027
0.026
0.026
0.025
0.024
0.024
0.023
-2.5
0.006
0.006
0.006
0.006
0.006
0.005
0.005
0.005
0.005
0.005
3.55
Testing Hypothesis
Null
H0
hypothesis
Alternate
H1
hypothesis
assumes that
there is no real
effect present
assumes that
there is some
effect
3.66
Z Test
For a value x taken from a
population with mean μ and standard
deviation σ, the Z-score is
z
x

3.77
The Central Limit Theorem
When taking repeated samples of
size n from the same population.
1. The distribution of the sample means is
centred around the true population mean
2. The spread of the distribution of the
sample means is smaller than that of
the original observations.
3. The distribution of the sample means
approximates a Normal curve.
3.88
Central Limit Theorem
If the standard deviation of the individual
observations is σ then the standard error of
the sample mean value is
For a sample mean,
x,
standard deviation

n

n
with mean μ and
the Z-score is
x
z

n
For a single observation the previous equation
(see 3.7) is obtained (n = 1 and x  x ).
3.99
Example 1
mean we
score
100symmetry
standard deviation 16
Note,
employ
What ≤is-0.5)
the =probability
a score
is higher
Prob(z
Prob(z ≥ 0.5)
= 0.309
than 108?
z
x
108  100 8

  0.5
16
16

Prob(x≥108) = Prob(z≥0.5) = 0.309
Z
0.00
-0.01
-0.02
-0.03
-0.04
-0.05
-0.06
-0.07
-0.08
-0.09
-0.5
0.309
0.305
0.302
0.298
0.295
0.291
0.288
0.284
0.281
0.278
3.10
10
Example 2
mean score 100 standard deviation 16
The sample mean of 25 individuals is
found to be 110.
The null hypothesis, no real effect
present, is that μ = 100. Wish to test if
the mean significantly exceeds this
value.
3.11
11
Solution 2
x
110  100 10
z


 3.125

16
3.2
n
25
Prob( x ≥ 100) = Prob(z≥3.125) = 0.0009,
beyond our basic table
Z
0.00
-0.01
-0.02
-0.03
-0.04
-0.05
-0.06
-0.07
-0.08
-0.09
-3.00
.001
.001
.001
.001
.001
.001
.001
.001
.001
.001
Since the p-value is less than 0.001 the result is
highly significant, the null hypothesis is rejected.
The sample average is significantly higher.
3.12
12
Estimating The Population Mean
A confidence interval Confidence interval for
is constructed
the population mean
around the estimate

x  z
of a population
n
parameter.
x
n
σ
z
Sample mean
Sample size
Population standard deviation (known)
Tabulated value of the z-score that achieves a
significance level of α in a two tail test
Don’t forget to multiply or divide
This test is not available in SPSS
3.13
13
Estimating The Population Mean
Confidence interval for the population mean
x  z
x
n
σ
z

n
Sample mean
Sample size
Population standard deviation (known)
Tabulated value of the z-score that achieves a
significance level of α in a two tail test
We can be 100(1-2α)% certain the
population mean lies in the interval


 
, x  z
 x  z

n
n

3.14
14
Normal Values
Conf. Prob. α
level One Tail
90%
95%
99%
0.05
0.025
0.005
Zα
1.645
1.960
2.576
Notation commonly used to
denote Z values for
confidence interval is Zα
where 100(1 - 2α) is the
desired confidence level in
percent.
Z
0.00
-0.01
-0.02
-0.03
-0.04
-0.05
-0.06
-0.07
-0.08
-0.09
-1.6
0.055
0.054
0.053
0.052
0.051
0.049
0.048
0.047
0.046
0.046
-1.9
0.029
0.028
0.027
0.027
0.026
0.026
0.025
0.024
0.024
0.023
-2.5
0.006
0.006
0.006
0.006
0.006
0.005
0.005
0.005
0.005
0.005
3.15
15
Example 3
standard deviation 16
mean of a sample of 25 individuals
is found to be 110
Require 95% confidence interval for
the population mean
x  110
n  25
  16 z  1.96
3.16
16
Solution 3
x  110 n  25   16 z  1.96

x  z
n
16
110  1.96
 [103.728, 116.272]
25
95% sure the population mean lies in
the interval [103.7,116.3]
3.17
17
Is there a
snag?
3.18
18
One Sample t-Test
The basic test statistic is
x
t
s
n
x
Note now s not σ
Sample mean
n Sample size
s Sample standard deviation
t Calculated t statistic
3.19
19
Interpreting t-values
The test has ν=n-1 degrees of freedom.
ν the Greek letter nu
If tcalc<tν(α) then we cannot reject the null
hypothesis that μ=m.
Critical value from tables
If tcalc>tν(α) the null hypothesis is rejected,
the true mean μ differs significantly at the
2α level from m.
3.20
20
If The Population Standard
Deviation Is Not Available?
t values
s
x  t ( )
n
with ν = n – 1 degrees of freedom
(ν the Greek letter nu)
Sample mean
n Sample size
ν Degrees of freedom, n-1 in this case
s Sample standard deviation
Proportion of occasions that the true
α
mean lies outside the range
tν Critical value of t from tables
x
Note
Don’tin this
module,
forget to
typically,
multiply or
the
sample
divide
variance
before you
is
required.
Divide
subtract
by
3.21
21 n-1
If The Population Standard
Deviation Is Not Available?
t values

s
s 
, x  t n 1(  )
 x  t n 1(  )

n
n

Sample mean
n Sample size
ν Degrees of freedom, n-1 in this case
s Sample standard deviation
Proportion of occasions that the true
α
mean lies outside the range
tν Critical value of t from tables
x
3.22
22
Two Tail t
To obtain confidence limits a two
tail probability is employed since it
refers to the proportion of values
of the population mean, both above
and below the sample mean.
3.23
23
Example 4
An experiment results in the following
estimates.
n  20
x  71.4
s  7.344
Obtain a 90% confidence interval for
the population mean.
3.24
24
Example 4
Given
x  71.4 n  20 s  7.344 t19 (0.05)  1.729
ν
p=0.05
p=0.025
p=0.005
p=0.0025
p=0.0025
19
1.729
2.093
2.861
3.174
3.174
s
x  t ( )
n
7.344
71.4  1.729
 [68.561,74.239]
20
We can be 90% (α=0.05) sure that the population
mean lies in this interval [68.6,74.2].
3.25
25
Example 5
Claimed mean is 75 seconds, the
times taken for 20 volunteers are
72
70
71
65
64
58
76
73
69
64
60
69
82
81
78
84
76
75
64
77
H0: there is no effect so μ = 75
H1: μ ≠ 75 (two tail test)
3.26
26
Solution 5
72
70
71
65
64
58
76
73
69
64
60
69
82
81
78
84
76
75
64
77
n = 20
Σx = 72 + 64 + … + 84 + 77 = 1428
Σx2 = 722 + 642 + … + 842 + 772 = 102984
n = 20 Σx = 1428 Σx2 = 102984
3.27
27
Solution 5
n = 20 Σx = 1428 Σx2 = 102984
n
x1  x2  ...  xn
x

n
x
i
i 1
n
1428

 71 .40
20
3.28
28
Solution 5
n = 20 Σx = 1428 Σx2 = 102984
n
x1  x2  ...  xn
x

n
n
varx  

i 1
s = 7.34
i
i 1
n
1428

 71 .40
20
2

1 
1

xi 
2


102984

1428


n  i 1 
20

 53 .9368
n 1
20  1
n
xi2
x

Note in this module, typically, the sample
variance is required. Divide by n-1.
To practice use mean-var.xls.
3.29
29
Solution 5
n  20 x  71.40 s  7.34
x   71.40  75
t

 2.193
s
7.34
n
20
ν
p=0.05
p=0.025
p=0.005
p=0.0025
p=0.0010
19
1.729
2.093
2.861
3.174
3.579
t19 (0.005)  2.861
t19 (0.025)  2.093
In an attempt to “estimate” p.
3.30
30
Conclusion 5
t19 (0.005)  2.861 t = 2.193 t19 (0.025)  2.093
Since 2.093<2.193<2.861
0.01<p-value<0.05
(note 2α since two tail)
There is sufficient evidence to reject H0 at the
5% level. The experiment is not consistent with
a mean of 75.
In fact the 95% confidence interval is
[68.0,74.8] which, as expected, excludes 75.
3.31
31
The precise p value may be found from software.
SPSS 5
Analyze > Compare Means > One Sample t Test
Note insertion of test value
3.32
32
SPSS 5
Basic descriptive statistics for a manual test
One-Sample Statistics
N
V1
20
Mean
71.40
Std. Deviation
7.344
Std. Error
Mean
1.642
3.33
33
SPSS 5
As predicted 0.01 < p-value < 0.05
One-Sample Test
Test Value = 75
V1
t
-2. 192
df
19
Sig. (2-tailed)
.041
Mean
Difference
-3. 600
95% Confidence
Int erval of the
Difference
Lower
Upper
-7. 04
-.16
The confidence interval is 75-7.04 to 75-0.16 that is
[67.96, 74.84].
3.34
34
Graph?
Graph > Legacy Dialogs > Error Bar
3.35
35
Graph?
Graph > Legacy Dialogs > Error Bar
Error Bars show 95.0% Cl of Mean
74
V1
72

70
68
3.36
36
Example 6
Experimental data
0.235
0.323
0.248
0.252
0.241
0.284
0.312
0.284
0.298
0.264
0.306
0.320
Test whether these data are consistent
with a population mean of 0.250.
H0 is that μ = 0.250
3.37
37
Solution 6
x  0.2806
s  0.0318
n  12
x   0.2806  0.250
t

 3.333
s
0.0318
n
12
ν
11
p=0.05
1.796
p=0.025
2.201
p=0.005
3.106
p=0.0025 p=0.0010
3.497
4.025
t11(0.005)=3.106 t11(0.0025)=3.497
In an attempt to “estimate” p.
3.38
38
Conclusion 6
t11(0.005)  3.106 t = 3.333 t11(0.0025)  3.497
Since 3.106 < 3.333 < 3.497
0.005 < p-value < 0.01
There is sufficient evidence to reject H0
at the 1% level.
The experimental mean would not appear
to be consistent with 0.250
3.39
39
SPSS 6
As predicted p-value < 0.01
One-Sample Test
Test Value = 0.250
V1
t
3.333
df
11
Sig. (2-tailed)
.007
Mean
Difference
.030583
95% Confidence
Int erval of the
Difference
Lower
Upper
.01039
.05078
The confidence interval is 0.250+0.010 to 0.250+0.050
that is [0.26, 0.30].
3.40
40
Read Howitt and Cramer pages 40-50
Read Davis and Smith pages 133-134,
139-143, 200-205, 237-264
3.41
41
Practical 3
This material is available from the
module web page.
http://www.staff.ncl.ac.uk/mike.cox
Module Web Page
3.42
42
Practical 3
This material for the practical is
available.
Instructions for the practical
Practical 3
Material for the practical
Practical 3
3.43
43
Whoops!
From testimony by Michael Gove, British Secretary of State for
Education, before their Education Committee:
"Q98 Chair: [I]f 'good' requires pupil performance to exceed the
national average, and if all schools must be good, how is this
mathematically possible?
"Michael Gove: By getting better all the time.
"Q99 Chair: So it is possible, is it?
"Michael Gove: It is possible to get better all the time.
"Q100 Chair: Were you better at literacy than numeracy, Secretary
of State?
"Michael Gove: I cannot remember."
3.44
44
Oral Evidence, British House of Commons, January 31, 2012, p. 28
Whoops!
3.45
45
Whoops!
3.46
46
```
Related documents