Download power point - Turning Wheel

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Revision of basic statistics


Hypothesis testing
 Principles
 Testing a proportion
 Testing a mean
 Testing the difference between two means
Estimation
Principles of hypothesis testing




Null vs alternative hypothesis
 The null as assumed true until proven otherwise
If the evidence is inconsistent with the null, reject it in favour of the
alternative. E.g.
 H0: a coin is fair vs H1: a coin is biased towards heads
 Evidence (data): 20 heads in 25 tosses
Evidence seems unlikely if H0 were true, hence reject H0
Probability of such extreme evidence is actually 0.2%. We usually
reject if the probability is < 5% (the significance level of the test.)
Testing a proportion

H0: 10% of people are left handed
H0: p = 0.1
H1: the proportion is not 10%
H1: p  0.1
The sample proportion p is a random variable and should be
somewhere near to the true value.
Its probability distribution is p ~ N(p, p(1-p)/n) under H0

Hence the test statistic is

This is, in general,


z
p -p
p (1 - p )
n
sample statistic - hypothesised value
standard error of the sample statistic
Using data to calculate the test statistic

If 7 out of a group of 50 are left handed, the test statistic is
0.14 - 0.10
z
 0.94
0.1(1 - 0.1)
50


This is less than z* = 1.96, the critical value which cuts off 5% in
the two tails of the Normal distribution.
Hence we cannot reject H0.
-3 -2.5 -2 -1.5 -1 -0.5 -0 0.5
1
1.5
2
2.5
3
Testing a mean

A firm selling franchises claims that the average weekly income of a
franchise is at least £2000. A sample of 40 such franchises finds an
average weekly income of £1770 with s.d. £450. Is the claim
justified?
H0: m = 2000 vs H1: m < 2000
Significance level for test: 1% (we want to avoid a false accusation)
Critical value: z*= 2.33

x ~ N m ,  2 n so x ~ N 2000, 4502 40



(
)
(
1770 - 2000)
z
 -3.23
450 2 40

Since z < -z* we reject H0.
(
)
The Prob-value approach




Instead of comparing the test statistic to the critical value, we could
compare the prob-value to the significance level (1% in this case)
The prob-value is the area in the tail of the distribution beyond the
value of the test statistic.
In this case (z = -3.23) the prob-value is 0.0013 (0.13%, found from
the standard Normal table)
Since 0.13% < 1% we reject H0
Left hand tail of the Normal distribution
1% in tail of distribution
0.13% in tail
-2.33
-3.23
-3.9 -3.8 -3.6 -3.5 -3.3 -3.2
-3 -2.9 -2.7 -2.6 -2.4 -2.3 -2.1 -2
How to reject the null hypothesis





Method 1
 Test statistic > critical value (in absolute value)
 3.23 > 2.33
Method 2 (prob-value)
 Prob value < significance level
 0.13% < 1%
Note the different direction of the inequality!!! Both reject the null
If in doubt, draw the diagram!
Watch out for:
 Choice of significance level (5% or 1%)
 One vs two tail test. If we had a two tail test, the prob-value
would be 0.26% (and compare this to 1%).
Testing the difference of two means


A sample of 40 students five years ago found an average
expenditure on text books per annum of £87 (at today's prices) with
s.d. £21. A current survey of 50 students found average
expenditure of £77 with s.d. £30. Has expenditure declined?
H0: m1 - m2 = 0 vs H1: m1 -m2 > 0

 12  22 

x1 - x2 ~ N  m1 - m 2 ,

n1 n2 


Random variable:

Significance level: 5%. Critical value z = 1.64.

Test statistic:

Decision: z > z* hence reject H0.

Or, prob-value associated with 1.86 is 3.14% < 5% hence reject.
z
(87 - 77 ) - 0
21 40  30 50
2
2
 1.86
The t distribution

When testing a mean with small samples, we use the t distribution
instead of the Normal.
(But note that regression coefficients follow the t distribution
whatever the sample size.)
A sample of 12 National Lottery outlets finds an average sale of 800
tickets per week, with s.d. 140. Does this suggest the original target
of 700 has been exceeded?
H0: m =700; H1: m > 700
Significance level: 5%. Critical value t* = 1.796 (d.f. = 11)

Test statistic: t 






(800 - 700) - 0  2.47
140 2 12
2.47 > 1.796 hence reject H0.
Alternatively, prob-value associated with 2.47 is 1.6%.
Estimation





An alternative approach than hypothesis testing
The sample mean or proportion is a point estimate
Around this we build a confidence interval
For the Normal distribution, the 95% CI is given by
 Point estimate  1.96 standard errors
For the franchising example above, we have
2
2
s
450
x  1.96
 1770  1.96
 1630.5,1909.5
n
40


The interval has a width of about 170, expressing our uncertainty.
For the t distribution, the interval is given by
 Point estimate  t* standard errors
where t* is obtained from tables, using the appropriate degrees
of freedom (d.f. = n – 1 for the mean).