Download confidence interval notes with answers for

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
How do we interpret Confidence Intervals (Merit)?
A 95% Confidence Interval DOES NOT mean that there is a 95 %
probability that the population mean lies in the interval.
The popn mean is fixed so there is no probability associated with it.
The probability is to do with the interval from the sample.since
different samples will give different sample means.
So…we DO SAY that there is a 95% probability that this interval
contains the population mean.
OR we CAN SAY that if this process was repeated a large number
of times, 95% of such intervals would contain the population
mean.
REMEMBER – the probabilitiy is associated with the interval which
is based on sample statistics, NOT the population mean which is
fixed.
Popn mean μ
1
Sample Proportion
There is a similar relationship between a population proportion
(π) and the sample proportion (p)
We use a proportion when we are interested in what
fraction/dec/percentage of a population match the criteria we are
interested in.
e.g. The proportion of voters that support National
e.g. The proportion of sheep that have a particular defect (!)
If a sample of size n has x successes then the proportion:
p
x
n
Then the expected value (mean) and std dev of the sample
proportions are:
E ( p)  
std dev 
 (1   )
n
( Std error of the proportion )
A Confidence Interval for the proportion is calculated by:
pz
 (1   )
n
  pz
 (1   )
n
Lower limit < π < Upper limit
Z-value is worked out from the level of confidence.
90% = 1,645, 95% = 1.96, 99% = 2.576
2
Confidence Interval for Proportion
The same process is used for confidence intervals for proportions
as for means except the parameter is the population proportion 
and its standard deviation (std error):
Example 1. A recent poll of 1000 voters showed that 545
would vote for National in the coming election.
a) Calculate a 95% confidence interval for the proportion of voters
who would vote for National.
545
, x  545, n  1000, z 1.645
1000
0.545(0.455)
0.545(0.455)
0.545  1.645
   0.545  1.645
1000
1000
Use
GC for CI:
0.51413    0.57586
Clevel = 0.95
p
b) Explain what this confidence interval means:
X = 545, n = 1000
There is a 95% probability that this interval contains the
true proportion of voters who would vote National.
Example 2. A supermarket retailer conducted a survey which
included a question on preference of brands of chocolate
confectionery. They surveyed 150 customers and found that
40% preferred Cadbury chocolate confectionery.
Calculate a 99% confidence interval for the proportion of
customers that prefer Cadbury chocolate confectionery.
p  0.4, x  60 (40%of 150), n  150, z  2.576
0.4(0.6)
0.4(0.6)
   0.4  2.576
150
150
0.29696    0.50303
0.4  2.576
3
Difference of two means
Sometimes we want to investigate the difference between the
means of two populations (e.g. to see if there is any difference
between the two). This often occurs when you want to trial
something new and compare it with a control group.
If the two populations are independent then:
Expected value of the difference:
Variance:
E( X1  X 2)  1  2
Var ( X 1  X 2 )
In formulae
 Var ( X 1 )  Var ( X 2 ) sheet

 12  2 2
so Sd ( X 1  X 2 ) 
n1

n2
 12  2 2
n1

n2
E.g. Type 1 lightbulbs have a mean life 1600 hours and a std
dev of 120 hours. Type 2 lightbulbs which are cheaper
have a mean life of 1350 hours and a std dev of 80 hours.
A manufacturer takes a sample of 100 Type 1 and 121 Type
2 lightbulbs.
A) What is the expected difference between the mean lifetimes of
the lightbulbs from Type 1 sample and Type II sample?
1600 = 1350 = 250
b) What is the std dev of the difference between the mean
lifetimes from the Type 1 sample and Type 2 sample
120 2 80 2
SD 

100 121
14.032
4
Confidence Interval for Diff of 2 Means
The same process is used for the confidence interval for the
difference of two means - the parameters are:
Mean:
1  2
std dev:
 12  2 2
n1

n2
Confidence interval:
( x1  x2 )  z
 12  2 2
n1

n2
Lower limit
 1   2  ( x1  x2 )  z
<
 12  2 2
n1

n2
1  2 < Upper limit
Example: A manufacturer is looking at making changes to the
way a good is manufactured. The manager decides to check
the amount of time it takes to produce the good in the old
way then compare this with the new way. He takes a sample
of 30 goods from the old way and finds the mean time for
production per good is 12.3 mins and a standard deviation of
2.1mins. He then takes a sample of 40 goods from the new
way and finds the mean time for production is 11.8min with a
standard deviation of 2.3mins.
a) Calculate a 99% confidence interval for the difference of the two
means.
2.12 2.32
2.12 2.32
(12.3  11.8)  2.576

 1  2  (12.3  11.8)  2.576

30 40
30 40
 0.86117  1  2 1.8611
b) Comment on whether the new method is faster. Justify.
5
Since 0 is in the interval there is insufficient evidence to suggest that
there is any difference between the times for the two methods.
Sample Size (Merit) and Margin of error
Often our estimate of the population mean or proportion is required
to be of a certain level of accuracy.
Smaller samples have many benefits (easier to gather data, less
costly etc) BUT the CI will be wider for smaller samples compared
to larger samples (because the std error σ/ will be bigger).
n
Our best estimate is the midpoint of the interval (ie our sample
mean or proportion). Therefore, our “maximum error” or “margin
of error” is the amount from the midpoint to the endpoint. These
are given below:
Therefore, if we require our degree of accuracy for μ or  to be
less than a certain value, e, then:
For the mean:
z

So.. If you want to half
the width of interval, you
need to make sample 4
times as big – because
of √ relationship
e
n
For the proportion:
z
 (1   )
n
e
For the difference of two means:
z
 12
n1

 22
n2
e
z 1.645 for 90% CI ,
1.96 for 95% CI ,
2.576 for 99% CI
Recall: to find z use InvN with area = area less than endpoint, std
dev = 1, mean = 0
6
Sample Size – finding it! (Merit)
You need to solve the equation given on the previous page to find
the required sample size to get an estimate within a certain level of
accuracy:
E.g. For the Mean:
E.g. For the Proportion:
A researcher wants to
estimate the mean amount
households donate to a
particularl aid agency to within
5% of the true amount with
90% confidence.
SKY TV are wanting to
estimate the proportion of
households that have SKY TV.
How large a sample would
need to be taken to be 95%
confident that the sample
proportion is within 2% of the
true percentage?
In previous surveys they know
the standard deviaton of the
amount donated is $15.
Calculate the minimum
sample size needed to meet
the condition.
z

n
e
90% gives z = 1.645
15
 0.05
n
use GC gives n  243542.25
so min imum sample size is 243543
1.645
z
 (1   )
n
e
USE GC: (if they don’t give
you a sample prop then use p
= ½ as this is the worst
possible scenario – this is
when p(1-p) is largest)).
0.5(1  0.5)
 0.02
n
use GC to get 2401
min imum sample size of 2401(or 2402)
1.96
7
Sample Size – finding it! (merit)
E.g. For the difference of two means:
A manufacturing analyst wants to see if there is a
difference between the performance of 2 brands of
long life batteries.
He intends to choose the same device to test how
long on average both brands work for.He intends to
test the same number of each type. He wants to know
how many of each type that he should test in order to
ensure that the margin of error for the difference of the
two means is less than 5% at the 90% confidence
level. Previous data shows that brand 1 has a sd of
0.9hrs and brand 2 of 0.7hrs.
0.9 2 0.7 2
1.645

 0.05
n
n
solve for n gives n  1407.133
so min imum sample size is 1408
8
Justify or refute claims about a population
parameter (merit)
This means to use the results of your confidence interval to justify or
refute a statement.
CI for the Mean:
365.06< μ < 372.93
e.g. A customer claims that the weight of a can of baked beans
is different to the 375g claimed on the label. A random sample
of 30 cans has a mean of 369g and a standard deviation of 11g.
Is her claim justified? Use a 95% CI to justify your answer.
95% CI is
Then make a comment like:
Since 375 lies outside this interval there is
sufficient evidence to suggest that the
mean weight of a can of baked beans is
different to 375g at the 95% conf. level.
CI for the proportion:
Similar process if given information regarding Proportions. Use
confidence interval for the proportion and then justify / refute.
CI for difference of two means:
If looking at difference of two means (mentioned earlier):
If the interval contains 0 then there is insufficient evidence to
suggest that there is a difference between the two populations
means at the given confidence level (eg 95%).
If the interval does not contain 0 then there is sufficient evidence
to suggest that there is likely to be a difference between the two
population means at the given confidence level. (you might “suggest”
that one seems higher than the other given that its sample mean was
higher).
9
Always answer in CONTEXT