Download Confidence Interval

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Distribution of
total and sample mean
Sample Statistics & Data display
We can calculate statistics form a sample.
These reflect what is happening in the population as a whole.
The statistics in the sample reflect the parameters in the population
Notation
Mean
Standard
deviation
Variance
Population



2
Parameters
Sample
x
s
s
2
Statistics
Example
25 people in a lift. They have a mean weight of 65kg and a SD of 7kg.
Find the mean and SD of the load
E (T )  n
 25  65
 1625kg
SD(T )  n
 25  7
 5 7
 35kg
If we repeat an experiment a certain number of times, then T is the sum
of n independent random variables.
E (T )  n
VAR(T )  n 2
SD(T )  n
A fruit and vegetable market accepts deliveries of crates
of apples. Each crate has a weight that is normally
distributed with a mean of 21kg and a standard
deviation of 0.4 kg. The crates are delivered in groups
of 18 on pallets that weigh exactly 30kg.
a) Calculate the mean total weight of a pallet with 18
crates of apples
b) Calculate the standard deviation of the total weight of a
pallet with 18 crates of apples.
Central Limit Theorem
Consider a sample size n from a population X with a mean of µ and a SD
of σ
sample mean
x =µ
Sample standard deviation s =
Variance s2= σ2
n

n

n
is sometimes called the standard error of the sample mean
x
If n is large (>30) then the distribution of the sample means will be
approximately a normal distribution
The Central Limit Theorem states that values of the sample means could
be expected to average out to the population mean. There is a certain
amount of spread about the mean. This is the standard error or
standard deviation of the sample mean
Example
A sample of size 20 is taken from a box of beans. The mean length of
the beans in the box is 19 cm with a SD of 2.5cm.
a) What would the expected value of the sample be?
b) What would the variance of the sample be?
c) What would the standard error of the sample be?
a) The expected value E(X)=μ
 19cm
b) The variance =

2
n
(2.5) 2

20
 0.3125
c) The standard error
is the standard deviation
=


n
2.5
20
 0.559cm
We need to know the difference between the mean, variance and
standard deviation, of the population, total of n values and the sample
Summary Table from p. 182
Random
Variable
Mean
Variance
X

2

T
n
n 2
n

2

Population
Total of n
values
Sample
mean
X
n
Standard
Deviation
n
Probabilities for the total
When we deal with the sum of a few variables We use:
E (T )  n
VAR(T )  n 2
SD(T )  n
the sum will be within a certain given range
The distribution of the sum is normal, it will be shaped like the bell
curve
Lower
Upper


The probability is
the area under
the curve
Example
A sample of 16 items is taken from a population X with a mean µ=34,
and a SD σ=4 Calculate the probability that the total T of 16 items is
below 530
n  16
  34
 4
  34  16
 544
SD  n
 16  4
 16
P  0.80921
 0.8092(4dp)
530 544
Lower: -1Exp99
Upper: 530
= 16
 = 544
Example
A lift is licensed to carry a maximum of 25
passengers. It is overloaded when the
total passenger loads exceeds 1700kg.
The weight of single passengers chosen at
random have a mean of 65kg and a
standard deviation of 7kg. Calculate the
probability that the lift is overloaded,
assuming the lift is carrying 25
passengers.
Probabilities for the sample mean
Sometimes we need to know the probability of where the sample
mean is likely to be in relation to the population mean. The sample
mean is likely to have a smaller spread as the standard deviation
will be smaller
For this we use:
VAR ( X ) 
SD ( X ) 
2
n

n
Probability for samples
A sample size of 36 is taken from a normally
distributed population with a mean of 40
and a standard deviation of 12. Calculate
the probability that the sample mean is
a) Less then 41
b) Between 37 and 42
Confidence Intervals
Remember we calculate statistics from a sample to estimate the
parameters of the population
Each sample mean will be slightly different for every other sample
mean, so it is better to give an interval that we will be confident that the
sample mean is within. This is our degree of confidence.
The spread of the values that the sample means take gives an idea of
how accurate the estimate is. This is called the confidence interval.
The spread on either side of the mean, the standard deviation of the
mean is called the standard error
Using the calculator to find
confidence intervals
Construct a 95% confidence interval, given n=25, µ=28.3, σ=4.38
0.475
0.475
0.5
0
95%
Confidence interval between
these boundaries
Calculator only measures from
the far left
For the calculator
Area  0.5  0.475
 0.975
We can use the calculator to
find Z the number of SDs
Calculating the Sample Size
If we want to have a certain confidence level that the sample mean of a
sample we are going to take, will lie with in given boundaries.
The margin of error is the distance between one of the end points of the
interval and the sample mean
Margin of
error
e
Margin of
error
e
e=z ×

e=z ×
n
µ
Eg For 30m<µ<34m, the confidence interval is 32±2m
The margin of error is 2m

n
A certain make of scientific calculator is
known to have a voltage rating with a
standard deviation of 0.05v. The mean
voltage of 40 of these calculators is
3.02V.
a. Construct a 90% confidence interval for
the average voltage.
b. Explain the meaning of this confidence
interval
Construct a 95% confidence interval, given n=25, µ=28.3, σ=4.38
0.475
95%
0.475
0.5
26.58
28.3
30.02
Confidence interval between
these values
From the calculator
Z  1.96
z
z
X 
n
n
(1.96)(4.38)
(1.96)(4.38)
28.3 
   28.3 
25
25
X
28.3  1.717    28.3  1.717
26.58    30.02
Using the Calculator to check your answer
Eg #1 Ex14.1
Construct a 95% confidence interval, given n=25, µ=28.3, σ=4.38
In Stats mode
1  s F1
Enter values
Sample with one mean
EXE
F4 intr
F2 Var
Z F1
26.58    30.02
WB Eg 11
The time taken for an individual to walk to work is to be estimated. On 15
occasions the time in minutes were, 18, 17, 15, 20, 16, 14, 19, 13, 17,
16, 14, 15, 20, 18, 19
a) Find the sample mean and SD
b) Assuming normal distribution and that the sample is sufficiently large,
calculate a 95% confidence interval for the mean time to walk to work.
Use the calculator to answer a)
  2.17(2dp)
z
z
X
X 
n
n
(1.96)(2.17)
(1.96)(2.17)
16.73 
   16.73 
15
15
x  16.73(2dp),
16.73  1.10    16.73  1.10
15.6    17.8minutes
95%
0.475+0.5
=0.975
Z=1.96
0.475
Interpreting Confidence Intervals
x  16.73(2dp),
  2.17(2dp)
15.6    17.8minutes
15.6
16.73
17.8
There is a 95% probability that the interval 15.6-17.8 contains the true
mean.
Ex P75 4.01
Confidence Interval
for
Proportions
Confidence Intervals for Proportions
Another parameter of the population is the population proportion p or π.
This is the probability of success over a large number of trials, which
should be similar to the proportion of successes in the population as a
whole
The best estimate of the proportion of success for the population is the
sample p
p
x
p
n
x  successes
n  number of trials
E ( X )  np
E ( X )  estimated value of X
X, the random variable for the number of successes in the sample has a
approximately a normal distribution.
Example
A random sample of 80 households showed that 30% owned PCs.
Construct a 95% confidence interval for p, the percentage of
households that own a PC
There is a 95% probability that the interval 19.96%-40.04%
contains the true population proportion.
(There is 95% probability that the interval 19.96%-40.04% contains the proportion
of households that own PCs.)
In a sample of 210 people with high blood pressure a particular drug is
found to be effective for 150 of them. Construct a 95% confidence
interval for P the proportion of all patients who use this particular drug
for high blood pressure
150
p
210
p  Z
2
1
 0.71429
60
q
210
 0.28571
z  1.96
pq
pq
1
 p  p  Z
n
n
2
(0.71429)(0.28571)
(0.71429)(0.28571)
0.71429  (1.96)
 p  0.71429  (1.96)
210
210
0.653  p  0.775
65.3%  p  77.5%
The main purpose of a recent survey was to estimate
the proportion of all adult NZers who are opposed to
tipping for service in restaurants. The survey used a
random sample of 663 adult New Zealanders, of whom
292 indicated that they are opposed to tipping for
service.
a)
State clearly the parameter of interest in this survey (A)
b)
Calculate a 90% confidence interval for the proportion of all adult
NZers who oppose tipping.(A)
c)
Analyse the effect of increasing the number of adults surveyed on
the width of this confidence interval. (E)
d)
Suppose 50 independent random samples of adult NZers are
taken and 90% confidence interval is constructed from the results
of each sample. Analyse the phrase “90% confidence" by making
reference to these 50 confidence intervals. (E)
There is 90% probability that the true population proportion lies within the confidence
Interval of any one of the 50 random samples. That is 45 out of 50 confidence intervals
contains the true population proportion.
•
•
•
1)
Motel occupancy rates for July 1997 from a random sample
of 35 motels gave the following statistics:
Sample size 35
Sample mean 0.572
Sample standard deviation:0.065
Calculate a 95% confidence interval for the mean occupancy
rate for July 1997 for the population sampled. (A)
2)
What would be the effect of increasing the level of
confidence on the width of this confidence interval? (M)
3)
The mean occupancy rate for the same population for July
1996 is 0.585. It is claimed that the mean occupancy rate for
July 1997 is the same as the mean occupancy rate for July
1996. Using the confidence interval calculated in (a) at the
95% level of confidence, demonstrate whether the random
sample gives us evidence against this claim. (M)
4)
Calculate the number of motels needed to be sampled if the
mean occupancy rate for July 1997 was to have been
estimated to within 0.015 of its true value at the 95% level of
confidence. (M)
Confidence interval for the
difference between two
means
Confidence interval for the difference between two means
If two populations are the similar then we would expect the difference
between their two means to be about zero.
If the populations are different then we would expect the means to be
different.
So if two populations are different, the confidence interval of the
difference between their means must not contain 0.
We use x1  x2 to estimate 1  2
SD
Sample size
Sample mean
Notation
mean
Population 1
1
1
n1
x1
Population 2

2
n2
x2
E ( D)  E ( X 1  X 2 )
 E ( X1 )  E ( X 2 )
 1  2
Confidence Interval
(x1  x2 )  Z  1
 1
n1
2

 22
n2
On formula sheet
VAR ( D)  VAR ( X 1  X 2 )
 VAR ( X 1 )  VAR ( X 2 )

SD( D) 
 1
n1

 1
n1
 22

n2
 22
n2
 X X 
1
2
 1  22
n1

n2
Example
A random sample of 30 objects is taken from a normally distributed
population with a SD of 6, another sample of 50 objects is taken from a
population with a SD of 8. The mean of the first sample is 115, and that
of the second is 108.
1) Construct a 96% confidence interval for µ1- µ2.
2) Explain whether its likely that the two groups have the same mean.
3.77  1  2  10.23
Is the 96% confidence interval
for the difference between the
two means.
The interval does not contain 0, so it is not likely that the
two means are equal. We can say this with at least 96% confidence.
Students are told to measure the area of the classroom, they
provide estimates which are approximately normally distributed
with SD=0.15m2. 31 students measured one classroom obtained
a mean of 29.76m2 , while 26 students measured another
classroom and obtained a mean of 31.23m2. What is the 95%
confidence interval for the amount by which the area of the
second classroom exceeds that of the first.
1.392  2  1  1.548
This is the 95% confidence interval for the amount
by which the area of the second classroom exceeds
that of the first.
We are 95% confident that the
area of the second exceeds that
of the first as zero is not in the
confidence interval
Interpretation
If the confidence interval includes zero then we cannot say
that there is a difference between the two samples
If zero is not included then we are confident that there is a
difference between the two samples
We need to make the assumptions that the samples are
large enough and that they are independently selected and
that the population they are selected from is normally
distributed
a< μ2– μ1 <b
• If both a and b are positive, it is reasonable to assume
that μ2 is larger than μ1 by between a and b units. It’s
unlikely two means are the same.
• If both a and b are negative, it is reasonable to assume
that μ2 is smaller than μ1 by between -a and -b units. It’s
unlikely two means are the same.
• If a and b have opposite signs, it is reasonable to
assume that μ2 is smaller than μ1 by –a or μ2 is larger
than μ1 by b units or somewhere in between. This
includes the possibility that the two means are equal.
True or false
A 99% confidence interval for the difference
between two means is calculated from sample
data. -3.5< μ2– μ1 <9.4.
a. There is a 99% probability that the means are
equal because the interval includes 0.
b. 99% of intervals calculated in the same way
will include the difference of the two means.
Below is a random sample of times for both male and female
competitors to complete the annual Mountain Biking Race.
Sample size
Mean
Standard
deviation
Male
30
57min
10min
female
30
65min
14min
a)
b)
Calculate a 95% confidence interval for the difference
between the mean time for males to complete the race and
the mean time for females to complete the race.
In last year’s race, a similar 95% confidence interval for the
difference between µmand µf was calculated and found to be
-6.25< µm - µf <1.36. Based on this confidence interval,
demonstrate whether there is a significant difference
between the mean race times for males and females.
0 is in the 95% interval (-6.25< µm - µf <1.36) so it can be
concluded that there is no significant difference between the
mean race time for males and females.
Below is the summary stats for the length of the snapper surveyed in
each region are shown in the table below.
Sample size
Sample mean
Sample standard
deviation
Reserve
897
360.18
94.48
Non-reserve’
47
257.09
59.35
a) Calculate a 95% confidence interval for the difference between the
mean length of snapper in the reserve and the non-reserve regions.
b) It is claimed that the ‘average snapper’ in the reserve is at least
130mm longer than the “average snapper” in the non-reserve
region. Use the 95% confidence interval from a to analyse the
validity of this claim.
95% of the confidence interval between 85.03 and 121.15 contains
the difference between the non-reserve and reserve snappar. 130
mm is not in this interval and so one can be 95% sure that this claim
is invalid.
Interpretation of Confidence Intervals
The company produces two different models of batteries.
‘power’ and ‘super’. 95 people were interviewed who have
used both ‘power’ and ‘super’ batteries, to find out which of
the two models these people prefer to use in their torches. Of
the 95 people, 63 said that they prefer to use the ‘power’
model in their torches.
a) Find a 95% confidence interval for the proportion of all
people who have used both ‘power’ and ‘super’ batteries and
prefer to use the ‘power’ model of battery in their torches.
0.568<π<0.758
b) Write a clear description that gives the meaning of this
confidence interval.
95% of the confidence intervals from 0.568 and 0.758 contain
the true proportion of people who prefer the ‘power’ model.
Calculating Sample Size
If we are given a particular level of confidence we
can calculate the sample size (n) to give the
required margin of error (e)

e=z ×
n
95% confidence interval, σ=4, margin of error e=2
How big is the sample size?
first we need to find Z the number of SDs
95%
 n  1.96 
 n  3.92
 2  1.96 
4
n
 n  15.366
 n  16
4
2
0.975 for calc
Z=1.96
0.475
A random variable is known to have a standard deviation of 14. What
sample size would be required to be 90% confident that an estimate of
the mean was within 2 units of its true value
0.95 for the calc
0.45
e=z ×

n
2  1.6448 
14
n
14
n  1.6448 
2
n  11.5136
n  132.56
n  133
Z  1.6448
Calculate sample size
for proportion
A pilot survey from a few tax returns has
shown that approximately 12% of all
taxpayers are in ‘high-income’ category. If
the Inland Revenue Department wishes to
estimate this percentage to within 1%, with
96% confidence, how many tax returns
should it sample?
Calculating sample size
for proportion
A market research company wishes to estimate the percentage of people
in a certain age bracket who read a current-affairs magazine. The degree
of confidence required for this estimate is 90%. What sample size should
be taken to estimate the percentage to within 4%.
e  4%
 0.04
p unstated so use 0.5   90%
 0.9
p=0.5  q  0.5
For Calculator
e

Z
0.5
 1.645
2
pq
n
It is easier to rearrange the formula first
0.5  0.45
 0.95
Z  1.6448
e  Z  1
0.45
p
pq
n
e2 pq

2
Z
n
pqZ 2
n 2
e
(0.5)(0.5)(1.645) 2
n
(0.04) 2
n  422.8
 n  423
minimum sample size
is 423
Calculating Sample Size
What size of sample should be taken from
a population of packets of butter, when the
standard deviation of the weights of
packets is 4 g, if the mean weight is to be
estimated to within 0.5 g with 95%
accuracy.
1)Use inverse norm 2) n= σ z
e
to find out Z value
σ=1 μ=0
Sample size for
population proportion
Radio Sport wishes to conduct an opinion
poll on whether the captain of the New
Zealand netball team should be replaced.
The degree of confidence required for this
poll is 95%. What size sample should be
used to obtain the percentage to within 5%
accuracy?
pq z2
1)Use inverse norm
2) n=
to find out Z value
e2
σ=1 μ=0
Sample size for proportion and
sample mean
• An opinion poll with a level of confidence of 95%
and an estimated value of p of 0.5 has a margin
of error of 4%.
How many people would have taken part in the
poll?
• A sample of containers of car parts has a mean
weight of 40kg and a standard deviation of 5 kg.
How many containers would need to have been
in the sample to ensure at the 95% level of
confidence that the sample was within 0.5kg of
the population?
Confidence Interval Revision
•
•
•
•
•
Sample mean  μ
Sample proportion  p
Difference of Means μ1- μ2
Margin of error is Half of the confidence interval
Sample size for sample mean:
n= σz 2
e
• Sample size for sample proportion
n= pqz2
e2
• Sample size for Difference of means when two σ and n
are the same
n= 2 σ2z2
e2
Meaning of confidence interval
• Mean (99%)
99% of such interval include the population mean.
• Proportion (99%)
99% of such interval include the population proportion.
• Difference of means (99%)
99% of such interval include the difference of the two
population mean.
• Confidence interval for difference of mean
If 0 is included in the confidence interval, no difference
between the two means are suggested.
If 0 is not included in the confidence interval, a difference of
the two means are suggested.
Confidence Interval Revision
• Mean
A sample of 120 wire cables is tested. The mean
breaking strain was found to be 5.4 tonnes with a
standard deviation of 1.3 tonnes. Calculate a 95%
confidence interval for the breaking strain for this type of
wire cable.
• Proportion
A sample opinion poll of 200 students is taken and 130
students are found to support the idea of extending
opening hours of the library. Calculate a 99% confidence
interval for the proportion of all students in the school in
favour of extending the library hours.
• Difference between two means
A sample of 150 Longlife batteries showed a mean
capability of 140 photos and a standard deviation of 12
photos. A sample of 200 Lastshot batteries showed a
mean capability of 120 photos and a std devation of 8
photos. Find 95% confidence interval for the difference in
the mean life time between the two brands of batteries.
Sample size (use solver)
• The owner of a camera shop knows that 65% of
the customers return to his store. How large a
sample would the shop owner have to take to be
95% confident that the sample proportion is
within 5% of the true value?
• What size of sample should be taken from a
population of packets of butter, when the
standard deviation of the weights of packets is 4
g, if the mean weight is to be estimated to within
0.5 g with 95% accuracy.
We need to know the difference
between T=X1+X2 and Y=2X
T is the sum of two random variables,
which can take different values.
T  X1  X 2
E(T)=E(X1 )  E(X 2 )

 2
VAR (T )  VAR ( X 1 )  VAR ( X 2 )
 2  2
 2 2
 SD (T )  2
Y can represent the outcome
of X multiplied by 2
Y  2X
E (Y )  E (2 X )
 2E ( X )
 2
VAR (Y )  VAR (2 X )
 22 VAR ( X )
 4 2
SD(Y )  2
ie 2 identical
Normal Distribution
68% of the data is within 1 standard
deviation either side of the mean
Data is likely to be in this region
95% of the data is within 2 standard
deviations either side of the mean
Data is very likely to be in this region
99% of the data is within 3 standard
deviations either side of the mean
Data is almost certain to be in this region
T is the sum of n independent
random variables with might
take Different values.
T  X 1  X 2  X 3  ...........  X n
E (T )  n
VAR (T )  n 2
SD(T ) 
n
T is the outcome of the
same variable multiplied
by n.
T = nX
E(T)=nμ
VAR(T)=n2σ2
25 people in a lift. They
have a mean weight of
65kg and a SD of 7kg.
Find the mean and SD of
the load
The apples in the
baskets have a mean
weight of 1.2g each.
And a SD of 0.3g each.
Find the mean and SD
of a basket of 20
apples.
The mean petrol usage
for a car is 7 litre per
day. Standard deviation
is 0.3 litre. The cost for
petrol is $1.96 per litre.
What’s the mean and
SD of the cost of petrol
per day?
1 kg of apple costs
$1.2. A basket of apple
produced from ABC
factory has a mean
weight of 2.5kg and a
SD of 3 kg. What’s the
cost of one basket of
apples?