Download Statistics Chapter 8 Estimation

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 8 Estimation
ESTIMATING 𝝁 WHEN 𝝈 IS KNOWN
Note:
β€’ Because time and money constraints, difficulty
in finding population members, and so forth,
we usually do not have access to all
measurements of an entire population.
Therefore, we rely on information from a
sample.
β€’ We will learn techniques for estimating the
population mean using sample data.
Assumptions about the random
variable x
β€’ 1. We have a simple random sample of size n drawn from a
population of x values
β€’ 2. The value of 𝜎, the population standard deviation of x, is
known
β€’ 3. If the x distribution is normal, then our methods work for
any sample size n
β€’ 4. If x has an unknown distribution, then we require a
sample size 𝑛 β‰₯ 30. However, if the x distribution is
distinctly skewed and definitely not mound-shaped, a
sample size 50 or even 100 or higher may be necessary.
Point estimate
β€’ An estimate of a population parameter given
by a single number is called point estimate.
β€’ π‘₯ 𝑖𝑠 π‘‘β„Žπ‘’ π‘π‘œπ‘–π‘›π‘‘ π‘’π‘ π‘‘π‘–π‘šπ‘Žπ‘‘π‘’ π‘“π‘œπ‘Ÿ πœ‡
Note:
β€’ Even with a large random sample, the value of
π‘₯ is not exactly equal to population mean πœ‡.
β€’ We then need to calculate the margin of error
Margin of Error
β€’ When using π‘₯ as a point estimate for πœ‡, the
margin of error is the magnitude of π‘₯ βˆ’
ΞΌ π‘œπ‘Ÿ π‘₯ βˆ’ πœ‡ .
Note:
β€’ We cannot say exactly how close π‘₯ is to πœ‡ is
unknown. Therefore, the exact margin of
error is unknown when the population
parameter is unknown.
β€’ Therefore, we need to use confidence interval
to calculate the probability to give us
reliability of an estimate.
Confidence Interval
β€’ For a confidence level c, the critical value 𝑧𝑐 is
the number such that the area under the
standard normal curve between
βˆ’π‘§π‘ π‘Žπ‘›π‘‘ 𝑧𝑐 π‘’π‘žπ‘’π‘Žπ‘™π‘  𝑐
β€’ 𝑃 βˆ’π‘§π‘ < 𝑧 < 𝑧𝑐 = 𝑐
Review!
β€’ Find the z value such that 99% of the area
under the standard normal curve lies between
–z and z
β€’ Another way to say this:
β€’ 𝑃 βˆ’π‘§0.99 < 𝑧 < 𝑧0.99 = 0.99
Group work
β€’ Find the z value such that 95% of the area
under the standard normal curve lies between
–z and z
β€’ 𝑃 βˆ’π‘§0.95 < 𝑧 < 𝑧0.95 = 0.95
Levels of confidence and their
corresponding critical values
Level of Confidence c
Critical Value 𝑧𝑐
0.70, or 70%
1.04
0.75, or 75%
1.15
0.80, or 80%
1.28
0.85, or 85%
1.44
0.90, or 90%
1.645
0.95, or 95%
1.96
0.98, or 98%
2.33
0.99, or 99%
2.58
Let’s put everything together
β€’ Since from central limit theorem, we know π‘₯ is
approximately normal with mean πœ‡π‘₯ = πœ‡ when n is
large.
β€’ Also based from central limit theorem 𝑧 =
β€’ Combined with 𝑃 βˆ’π‘§π‘ < 𝑧 < 𝑧𝑐 = 𝑐
β€’ We get:
β€’ 𝑃
𝜎
βˆ’π‘§π‘
𝑛
<π‘₯βˆ’πœ‡ <
𝜎
𝑧𝑐
𝑛
=c
π‘₯βˆ’πœ‡
.
𝜎/ 𝑛
E is known as maximal margin of error
β€’
𝜎
E=𝑧𝑐
𝑛
β€’ E is also known as error tolerance. It is the
bound on the margin of error
Let’s do more math manipulation
β€’ We get:
β€’ 𝑃 βˆ’πΈ < π‘₯ βˆ’ πœ‡ < 𝐸 = c
β€’ Followed by
β€’ 𝑃 π‘₯βˆ’πΈ <πœ‡ <π‘₯+𝐸 =c
β€’ This we called a c confidence interval for πœ‡
Confidence Interval for πœ‡
β€’ π‘₯ βˆ’ 𝐸 π‘‘π‘œ π‘₯ + 𝐸
β€’ It is an interval computed from sample data in
such a way that c is the probability of
generating an interval containing the actual
value of πœ‡. In other words, c is the proportion
of confidence intervals, based on random
samples of size n, that actually contain πœ‡.
Example:
β€’ Julia enjoys jogging. She has been jogging over a
period of several years, during which time her
physical condition has remained constantly good.
Usually, she jogs 2 miles per day. The standard
deviation of her times is 𝜎 = 1.80 minutes.
During the past year, Julia has recorded her times
to run 2 miles. She has a random sample of 90 of
these times. For these 90 times, the mean was
π‘₯=15.60 minutes. Let πœ‡ be the mean jogging
time for the entire distribution of Julia’s 2 mile
running times. Find a 0.95 confidence interval for
πœ‡
Answer
𝜎
𝑛
1.80
E=1.96
90
β€’ E=𝑧𝑐
β€’
β€’ E approx equals to 0.37
β€’
β€’
β€’
π‘₯βˆ’πΈ <πœ‡ <π‘₯+𝐸
15.60 βˆ’ 0.37 < πœ‡ < 15.60 + 0.37
15.23 < πœ‡ < 15.97
β€’ We can conclude with 95% confidence that interval from 15.23
minutes to 15.97 minutes is one that contains the population mean
πœ‡ of jogging times for Julia
Group work
β€’ Mr. Liu enjoys talking to people and get to
know them. Usually he talks to 5 people per
day. The standard deviation of talk time is
𝜎 = 3.20 π‘šπ‘–π‘›π‘’π‘‘π‘’π‘ . During the past year, he
has recorded his time to talk to 5 people. He
has a random sample of n=150. For those 150
times, the mean was π‘₯ = 12.46 π‘šπ‘–π‘›π‘’π‘‘π‘’π‘ . Let
πœ‡ be the mean talking time for the entire
distribution. Find a 0.95 confidence interval
for πœ‡.
Group Work
β€’ Walter meets Julia at the track. He prefers to
jog 3 miles. He knows that 𝜎 =
3.67 π‘šπ‘–π‘›π‘’π‘‘π‘’π‘ . For a random sample 90
samples, the mean time was π‘₯ = 23.45
minutes. Let πœ‡ be the mean jogging time for
the entire distribution of Walter’s 3-mile
running times. Find a 0.99 confidence interval
for πœ‡
How to find the sample size n for
estimating πœ‡ when 𝜎 is known
β€’ Assume π‘₯ is approximately normal
β€’ 𝑛=
𝑧𝑐 𝜎 2
𝐸
β€’ E= specified maximal error of estimate
β€’ 𝜎 = population standard deviation
β€’ 𝑧𝑐 = critical value from the normal distribution for the
desired confidence level c
β€’ If n is not a whole number, increase n to the next higher
whole number. Note that n is the minimal sample size for a
specified confidence level and maximal error of estimate E.
Example:
β€’ A wildlife study is designed to find the mean
weight of salmon caught by an Alaskan fishing
company. A preliminary study of a random
sample of 50 salmon showed 𝑠 β‰ˆ
2.15 π‘π‘œπ‘’π‘›π‘‘π‘ . How large of a sample should
be taken to be 99% confident that the sample
mean π‘₯ is within 0.20 pound of the true mean
weight πœ‡.
Answer
β€’ Since sample of 50 fish is large enough to
permit a good approximation (50>30)
β€’ 𝑛=
𝑧𝑐 𝜎 2
𝐸
β€’ 𝑛=
2.58βˆ—2.15 2
0.20
= 769.2
β€’ So about 770 fish or larger
Group Work
β€’ A study is designed to show the mean number
of boyfriend/girlfriend a person has in his/her
lifetime. A study with n=60 showed that 𝑠 β‰ˆ
5.16 people. How large of a sample should be
taken to be 99% confident that the sample
mean π‘₯ is within 0.50 people of the true mean
πœ‡?
Homework Practice
β€’ Pg 338 #1-20 eoo (check answers in the back)
ESTIMATING 𝝁 WHEN 𝝈 IS
UNKNOWN
Well…here is the situation
β€’ We have just learned how to calculate πœ‡ when
𝜎 is known. But much of the time, when πœ‡ is
unknown, 𝜎 is also unknown.
β€’ In such cases, we use the sample standard
deviation s to approximate 𝜎.
β€’ When we use s to approximate 𝜎, the
sampling distribution for π‘₯ follows a new
distribution called a Student’s t distribution
Note:
β€’ What we are about to learn is the most
common way to calculate.
β€’ What we learned last section almost never
happens!
Student’s t distribution
β€’ Assume that x has a normal distribution with
mean πœ‡. For samples of size n with sample mean
π‘₯ and sample standard deviation s, the t variable
β€’ 𝑑=
π‘₯βˆ’πœ‡
𝑠/ 𝑛
β€’ Has a Student’s t distribution with degrees of
freedom d.f. = n-1
What is degrees of freedom?
β€’ d.f. = n-1
β€’ Degrees of freedom is the number of variables
free to change when a statistic or parameter is
fixed.
β€’ Example: if a student needs a 90 average based
on three tests, and the first two scores are 82 and
95, then the last score is fixed. It must be a 93; in
other words, only the first two scores were β€œfree
to vary”
Properties of a Student’s t distribution
β€’ 1) the distribution is symmetric about the mean 0
β€’ 2) The distribution depends on the degrees of
freedom, d.f. (d.f. = n-1 for πœ‡ confidence intervals)
β€’ 3) The distribution is bell-shaped, but has thicker
tails than the standard normal distribution
β€’ 4) As the degrees of freedom increase, the t
distribution approaches the standard normal
distribution.
Now you have to be careful
β€’ When you look at the critical values for
confidence intervals, you don’t want to use
the wrong one.
Confidence Interval
β€’ 𝑃 βˆ’π‘‘π‘ < 𝑑 < 𝑑𝑐 = 𝑐
Note!!!!
β€’ If the degrees of freedom d.f. you need are
not in the table, use the closest d.f. in the
table that is smaller. This procedure results in
a critical value that is more conservative in the
sense that it is larger. The resulting
confidence interval will be longer and have a
probability that is slightly higher than c.
Example: Activity time!
β€’ Using t-chart
β€’ Go to Table 6 of Appendix II pg. A24
β€’ Find the critical value 𝑑𝑐 for a 0.99 confidence level for a t
distribution with sample size n=5
β€’ Procedure:
– First, we find the column with c heading 0.990
– Then we compute the number of degrees of freedom: d.f.=n-1 = 5-1 =
4
– Last we read down the column under the heading c=0.99 until we
reach the row headed by 4.
– The answer should be: 4.604
Group Activity
β€’ A) Find the critical value 𝑑𝑐 for a 0.95 confidence
level for a t distribution with sample size n=13
β€’ B) Find the critical value 𝑑𝑐 for a 0.99 confidence
level for a t distribution with sample size n=32
β€’ C) Find the critical value 𝑑𝑐 for a 0.90 confidence
level for a t distribution with sample size n=7
Answer
β€’ A) 2.179
β€’ B) 2.750
β€’ C) 1.943
Maximal margin of error, E
β€’ 𝐸=
𝑠
𝑑𝑐
𝑛
Confidence Interval
β€’ 𝑃
𝑠
βˆ’π‘‘π‘
𝑛
<π‘₯βˆ’πœ‡ <
𝑠
𝑑𝑐
𝑛
=𝑐
β€’ 𝑃 π‘₯βˆ’πΈ <πœ‡ <π‘₯+𝐸 =𝑐
β€’ Look at last section for prove
Summary:
β€’ Confidence interval for πœ‡ when 𝜎 is unknown
β€’ π‘₯βˆ’πΈ <πœ‡ <π‘₯+𝐸
β€’ Where π‘₯ =sample mean of a simple random
sample
β€’ 𝐸=
𝑠
𝑑𝑐
𝑛
β€’ C= confidence level (0<c<1)
β€’ 𝑑𝑐 =critical value for confidence level c and
degrees of freedom d.f.=n-1
Example:
β€’ Suppose an archaeologist discovers only 7 fossil
skeletons from a previously unknown species of
miniature horse. Reconstructions of the skeletons of
these 7 miniature horses show the shoulder heights (in
cm) to be:
β€’ 45.3 47.1 44.2 46.8 46.5 45.5 47.6
β€’ A) Find the mean and the standard deviation (sample)
β€’ B) Find a 99% confidence interval for πœ‡
Answer
β€’ A) π‘₯ = 46.14 𝑠 = 1.19
β€’ B) d.f. = n-1 = 7-1 = 6
β€’ 𝑑0.99 = 3.707
𝑠
1.19
𝐸 = 𝑑𝑐
= 3.707
= 1.67
𝑛
7
π‘₯ βˆ’ 𝐸 < πœ‡ < π‘₯ + 𝐸 = 46.14 βˆ’ 1.67 < πœ‡ <
Group Work
β€’ A company has a new process for
manufacturing large artificial sapphires. In a
trail run, 37 sapphires are produced. The
mean weight for these 37 gems is 6.75 carats
and the sample standard deviation is 0.33
carat. Let πœ‡ be the mean weight for the
distribution of all sapphires produced by the
new process. Find a 95% confidence interval
for πœ‡ and interpret it.
Answer
β€’ 6.64 < πœ‡ < 6.86
β€’ The company can be 95% confident that the
interval from 6.64 to 6.86 is an interval that
contains the population mean weight of
sapphires produced by the new process.
Group Work
β€’ Sees candies uses a new process to create
their chocolate candies. In a trial run, the
weights per box are (in lbs): 12.1 10.9 15.2
11.3 12.5 11.8
β€’ Find a 90% confidence interval for the
population mean weight per box.
Homework Practice
β€’ Pg 349 #1-14 eoe
ESTIMATING 𝝆 IN THE BINOMIAL
DISTRIBUTION
Note:
β€’ Remember binomial distribution is completely
determined by the number of trials n and the
probability p of success on a single trial.
β€’ For most experiments, the number of trials is
chosen in advance, then the distribution is
completely determined by p.
The point estimates for p and q are
β€’ 𝑝=
π‘Ÿ
𝑛
β€’ π‘ž =1βˆ’π‘
β€’ Where n= number of trials and r= number of
successes.
Margin of Error for Binomial
Distribution
β€’ π‘βˆ’π‘
β€’ Or
β€’ 𝐸 = 𝑧𝑐 π‘π‘ž/𝑛
Summary on how to find a confidence
interval for a proportion p
β€’
P is probability of success, q represents the population probability of failure. Let r be a random
variable that represents the number of successes out of the n binomial trials
β€’
The point estimates for p and q are
β€’
𝑝=
β€’
The number of trials n should be sufficiently large so that both 𝑛𝑝 > 5 π‘Žπ‘›π‘‘ π‘›π‘ž > 5
β€’
β€’
Confidence interval for p
π‘βˆ’πΈ <𝜌 <𝑝+𝐸
β€’
𝐸 β‰ˆ 𝑧𝑐 π‘π‘ž/𝑛
β€’
β€’
C= confidence level (0<c<1)
𝑧𝑐 = critical value for confidence level c based on the standard normal distribution
π‘Ÿ
𝑛
π‘Žπ‘›π‘‘ π‘ž = 1 βˆ’ 𝑝
Example:
β€’ Suppose that 800 students are selected at random from a student body of
20000 and that they are each given a shot to prevent a certain type of flu.
These 800 students are then exposed to the flu, and 600 of them do not
get the flu. Let p represent the probability that the shot will be successful
for any single student selected at random from the entire population of
20000.
β€’ A) What is the number of trials n? What is the value of r?
β€’ B)What are the point estimates for p and q?
β€’ C) Would it seem that the number of trials is large enough to justify a
normal approximation to the binomial?
β€’ D) Find a 99% confidence interval for p
Answer
β€’
A) n=800, r=600
β€’
B) 𝑝 = 800 = 0.75
β€’
π‘ž = 0.25
β€’
C) 𝑛𝑝 β‰ˆ 800 0.75 = 600 > 5, π‘›π‘ž β‰ˆ 800 0.25 = 200 > 5 a normal
approximation is justified
β€’
D)𝐸 β‰ˆ 𝑧0.99
β€’
β€’
99% confidence interval is then
π‘βˆ’πΈ <𝑝 <𝑝+𝐸
β€’
β€’
0.75 βˆ’ 0.0395 < 𝑝 < 0.75 + 0.0395
0.71 < 𝑝 < 0.79
600
π‘π‘ž
𝑛
β‰ˆ 2.58
0.75 0.25
800
β‰ˆ 0.0395
Group Work
β€’ A random sample of 190 books purchased at a
local bookstore showed that 71 of the books
were science fiction. Let p represent the
proportion of books sold by this store that are
science fiction.
β€’
β€’
β€’
β€’
A) what is a point estimate for p?
B) Find a 90% confidence interval for p
C) Interpret the confidence interval
D) Can normal approximation be justified?
Group Work
β€’ A random sample of 260 hand sanitizer was
chosen at random showed that 102 of them kills
the bacteria. Let p represent the proportion of
hand sanitizer that kills the bacteria.
β€’
β€’
β€’
β€’
A) What are the point estimates?
B) Find a 95% confidence interval for p
C) Can normal approximation be justified?
D) Interpret the confidence interval
General interpretation of poll results
β€’ 1) When a poll states the results of a survey, the
proportion reported to respond in the designated
manner is 𝑝, the sample estimate of the
population proportion
β€’ 2) The margin of error is the maximal error E of a
95% confidence interval for p
β€’ 3) A 95% confidence interval for the population
proportion p is
β€’ π‘π‘œπ‘™π‘™ π‘Ÿπ‘’π‘π‘œπ‘Ÿπ‘‘ 𝑝 βˆ’ π‘šπ‘Žπ‘Ÿπ‘”π‘–π‘› π‘œπ‘“ π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ 𝐸 < 𝑝 <
π‘π‘œπ‘™π‘™ π‘Ÿπ‘’π‘π‘œπ‘Ÿπ‘‘ 𝑝 + π‘šπ‘Žπ‘Ÿπ‘”π‘–π‘› π‘œπ‘“ π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ 𝐸
Example:
β€’ A) What confidence level corresponds to the
phrase β€œchances are 19 of 20 that if…”
β€’ B) What is the error correspond to the phrase
β€œresults by no more than 2.6 percentage
points in either direction”?
Answer
β€’ A) 19/20 = .95 or 95%
β€’ B) 2.6%
How to find the sample size n for
estimating a proportion p
β€’ 𝑛 =𝑝 1βˆ’π‘
β€’ 𝑛=
1 𝑧𝑐 2
if
4 𝐸
𝑧𝑐 2
if
𝐸
you have a preliminary estimate for p
you do not have a preliminary estimate for p
β€’ If n is not a whole number, increase n to the next higher
whole number.
β€’ Also, if necessary, increase the sample size n to ensure that
both np>5 and nq>5. Note that n is the minimal sample
size for a specified confidence level and maximal error of
estimate.
Example
β€’ A company is in the business of selling wholesale popcorn to grocery
stores. The company buys directly from farmers. A buyer for the company
is examining a large amount of corn from a certain farmer. Before the
purchase is made, the buyer wants to estimate p, the probability that a
kernel will pop.
β€’ Suppose a random sample of n kernels is taken and r of these kernels pop.
The buyer wants to be 95% sure that the point estimate 𝑝 =
π‘Ÿ
π‘“π‘œπ‘Ÿ 𝑝 𝑀𝑖𝑙𝑙 𝑏𝑒 𝑖𝑛 π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿeither way by less than 0.01
𝑛
β€’ A) if no preliminary study is made to estimate p, how large a sample
should the buyer use?
β€’ B) A preliminary study showed that p was approx 0.86. If the buyer uses
the results of the preliminary study, how large a sample should be used?
Answer
β€’ 𝑧0.95 = 1.96
β€’ A)𝑛 =
9604
1 𝑧𝑐 2
4 𝐸
β€’ B) 𝑛 = 𝑝 1 βˆ’
0.86 0.14
=
1 1.96 2
4 0.01
= 0.25 38416 =
𝑧𝑐 2
𝑝
=
𝐸
1.96 2
= 4625.29
0.01
= 4626
Group Work
β€’ Blah blah blah blah…99% sure that the point
π‘Ÿ
estimate 𝑝 = π‘“π‘œπ‘Ÿ 𝑝 𝑀𝑖𝑙𝑙 𝑏𝑒 𝑖𝑛 π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ either way
𝑛
by less than 0.48
β€’ A) if no preliminary study is made to estimate p,
how large a sample should we use?
β€’ B) A preliminary study showed that p was approx
0.42. If we use the results of the preliminary
study, how large a sample should be used?
Homework practice
β€’ Pg 361 #1-21 odd
ESTIMATING 𝝁𝟏 βˆ’ 𝝁𝟐 AND π†πŸ βˆ’ π†πŸ
Note:
β€’ How can we tell if two populations are
different?
β€’ One way is to compare the difference in
population mean or the difference in
population proportions
β€’ Two samples are independent if sample data
drawn from one population are completely
unrelated to the selection of sample data from
the other population.
β€’ Two samples are dependent if each data value
in one sample can be paired with a
corresponding data value in the other sample.
Dependent Samples
β€’ Dependent samples and data pairs occur very
naturally in which the same object or item is
measured twice.
Independent Samples
β€’ Independent samples occur very naturally
when we draw two random samples, one from
the first population and one from the second
population. Since both samples are random
samples, there is no pairing of measurements
between the two populations.
1st situation: confidence intervals for
πœ‡1 βˆ’ πœ‡2 when 𝜎1 π‘Žπ‘›π‘‘ 𝜎2 are known
β€’
Let 𝜎1 π‘Žπ‘›π‘‘ 𝜎2 be the population standard deviations of populations 1 and 2. Obtain two
independent random samples from populations 1 and 2, where
β€’
β€’
π‘₯1 and π‘₯2 are sample means from populations 1 and 2
𝑛1 and 𝑛2 are sample sizes from populations 1 and 2
β€’
If you can assume that both population distributions 1 and 2 are normal, any sample sizes 𝑛1 and
𝑛2 will work. If you cannot assume this, then use sample sizes greater than or equals to 30; 𝑛1 β‰₯
30 and 𝑛2 β‰₯ 30
β€’
Confidence interval for πœ‡1 βˆ’ πœ‡2
β€’
(π‘₯1 βˆ’ π‘₯2 ) βˆ’ E < πœ‡1 βˆ’ πœ‡2 < (π‘₯1 βˆ’ π‘₯2 ) + E
β€’
Where 𝐸 = 𝑧𝑐
β€’
β€’
C = confidence level (0<c<1)
𝑧𝑐 = critical value for confidence level c
𝜎12
𝑛1
+
𝜎22
𝑛2
Example
β€’ Suppose you are a biologist studying fishing data from Yellowstone
streams before and after a major disaster. Fishing reports include the
number of trout caught per day per fisherman. A random sample of
𝑛1 =167 reports from period before the fire showed that the average catch
was π‘₯1 =5.2 trout per day. Assume the standard deviation of daily catch
per fisherman was 𝜎1 =1.9. Another random of 𝑛2 = 125 fishing reports 5
years after the disaster showed that the average catch per day was π‘₯2 =6.8
trout. Assume the s.d. during this period was 𝜎2 =2.3
β€’ A) What is the population for each sample? Are the independent or
dependent?
β€’ B) Compute a 95% confidence interval for πœ‡1 βˆ’ πœ‡2
β€’ C) Interpret the result
Answer
β€’
A) Yes they are independent because they are from 2 different events.
β€’
B) Since 𝑛1 = 167, π‘₯1 = 5.2, 𝜎1 = 1.9, 𝑛2 = 125, π‘₯2 = 6.8, 𝜎2 = 2.3, 𝑧0.95 = 1.96
β€’
𝐸 = 𝑧𝑐
β€’
β€’
β€’
β€’
Therefore 95% CI is
(π‘₯1 βˆ’ π‘₯2 ) βˆ’ E < πœ‡1 βˆ’ πœ‡2 < (π‘₯1 βˆ’ π‘₯2 ) + E
(5.2βˆ’6.8) βˆ’ 0.50 < πœ‡1 βˆ’ πœ‡2 < (5.2βˆ’6.8) + 0.50
-2.10< πœ‡1 βˆ’ πœ‡2 < -1.10
β€’
C) Since the interval is negative, we are 95% confident that πœ‡1 < πœ‡2 so that we are
95% sure that average catch before the fire was less than average catch after the
fire.
𝜎12
𝑛1
𝜎2
+ 𝑛2 = 1.96
2
1.92
167
2.32
+ 125 = .4955 = .50
β€’
Situation 2: (Most common)
confidence intervals for πœ‡1 βˆ’ πœ‡2 when
𝜎1 π‘Žπ‘›π‘‘ 𝜎2 are unknown
Obtain two independent random samples from populations 1 and 2, where
β€’
β€’
β€’
β€’
π‘₯1 π‘Žπ‘›π‘‘ π‘₯2 are sample means from populations 1 and 2
𝑠1 π‘Žπ‘›π‘‘ 𝑠2 are sample standard deviations from populations 1 and 2
𝑛1 π‘Žπ‘›π‘‘ 𝑛2 are sample sizes from populations 1 and 2
If you can assume that both population distributions 1 and 2 are normal or at least mound shaped and symmetric,
then any sample sizes 𝑛1 π‘Žπ‘›π‘‘ 𝑛2 will work, if not, use sample sizes greater than or equal to 30, 𝑛1β‰₯ 30 π‘Žπ‘›π‘‘ 𝑛2 β‰₯
30
β€’
Confidence interval for πœ‡1 βˆ’ πœ‡2
β€’
(π‘₯1 βˆ’ π‘₯2 ) βˆ’ E < πœ‡1 βˆ’ πœ‡2 < (π‘₯1 βˆ’ π‘₯2 ) + E
β€’
Where 𝐸 = 𝑑𝑐
β€’
β€’
β€’
C = confidence level (0<c<1)
𝑑𝑐 = critical value for confidence level c
d.f.= degree of freedom, the smaller of 𝑛1 βˆ’ 1 π‘Žπ‘›π‘‘ 𝑛2 βˆ’ 1
β€’
Example: if you have 𝑛1 βˆ’ 1 =25 and 𝑛2 βˆ’ 1 = 15, you use 15 as the d.f.
𝑠12
𝑛1
𝑠2
+ 𝑛2
2
Example
β€’
Suppose that a random sample of 29 college students was randomly divided into two groups. The
first group of 𝑛1 = 15 people was given ½ liter of red wine before going to sleep. The second group
of 𝑛2 = 14 people was given no alcohol before going to sleep. Everyone in both groups went to
sleep at 11 P.M. The average brain wave activity was determined for each individual in the groups.
The results follow:
β€’
β€’
Group 1:
16.0 19.6 19.9 20.9 20.3 20.1 16.4 20.6 20.1 22.3 18.8 19.1 17.4 21.1 22.1
β€’
β€’
Group 2:
8.2 5.4 6.8 6.5 4.7 5.9 2.9 7.6 10.2 6.4 8.8 5.4 8.3 5.1
β€’
A) Do you think the samples are independent or dependent? Explain
β€’
B) What assumptions are we making about the data?
β€’
C) Compute a 90% confidence interval for πœ‡1 βˆ’ πœ‡2
Answer
β€’ A) Since they are random sample of 29 students that was randomly
divided into two groups, it is reasonable to say that they are independent.
β€’ B) We are assuming the populations of π‘₯1 and π‘₯2 are approximately
normally distributed.
β€’ C) π‘₯1 = 19.65, s1 = 1.86
β€’ π‘₯2 = 6.59, 𝑠2 = 1.91
β€’ 𝐸 = 𝑑𝑐
𝑠12
𝑛1
+
𝑠22
𝑛2
= 1.771
1.862
15
+
1.912
14
= 1.24
β€’ (19.65 βˆ’ 6.59) βˆ’ 1.24 < πœ‡1 βˆ’ πœ‡2 < (19.65 βˆ’ 6.59) + 1.24
β€’ 11.82 < πœ‡1 βˆ’ πœ‡2 < 14.30
β€’
β€’
β€’
β€’
3rd situation: confidence intervals for
πœ‡1 βˆ’ πœ‡2 when 𝜎1 π‘Žπ‘›π‘‘ 𝜎2 are unknown
but we believe that 𝜎1 = 𝜎2
π‘₯1 π‘Žπ‘›π‘‘ π‘₯2 are sample means from populations 1 and 2
𝑠1 π‘Žπ‘›π‘‘ 𝑠2 are sample standard deviations from populations 1 and 2
𝑛1 π‘Žπ‘›π‘‘ 𝑛2 are sample sizes from populations 1 and 2
If you can assume that both population distributions 1 and 2 are normal or at least mound shaped
and symmetric, then any sample sizes 𝑛1 π‘Žπ‘›π‘‘ 𝑛2 will work, if not, use sample sizes greater than or
equal to 30, 𝑛1β‰₯ 30 π‘Žπ‘›π‘‘ 𝑛2 β‰₯ 30
β€’
Confidence interval for πœ‡1 βˆ’ πœ‡2 when 𝜎1 = 𝜎2
β€’
(π‘₯1 βˆ’ π‘₯2 ) βˆ’ E < πœ‡1 βˆ’ πœ‡2 < (π‘₯1 βˆ’ π‘₯2 ) + E
β€’
Where 𝐸 = 𝑑𝑐
β€’
β€’
β€’
C = confidence level (0<c<1)
𝑑𝑐 = critical value for confidence level c
d.f.= degree of freedom, d.f=𝑛1 + 𝑛2 βˆ’ 2
𝑛1 βˆ’1)𝑠12 + 𝑛2 βˆ’1)𝑠22
𝑛1 +𝑛2 βˆ’2
Example (when you find the s.d. they
seem very close to each other)
β€’ Height of Asians in Asia (in feet):
β€’ 5.14 5.75 5.29 5.86 5.92 6.12 5.77 5.81
5.80 5.78
β€’ Height of Asians in US (in feet):
β€’ 5.16 5.72 5.30 5.84 5.95 6 5.79 5.80 5.85
5.81
β€’ A) create a 85% confidence interval for the difference
in population mean weight for Asians in Asia and USA.
How to find a confidence interval for
π†πŸ βˆ’ π†πŸ
β€’ Binomial Experiment 1
β€’ 𝑛1 = π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ π‘‘π‘Ÿπ‘–π‘Žπ‘™
β€’ π‘Ÿ1 =number of successes
out of 𝑛1 trials
β€’ 𝑝1 =
π‘Ÿ1
𝑛1
β€’ 𝑝1 =population probability
of success
β€’ Binomial Experiment 2
β€’ 𝑛2 = π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ π‘‘π‘Ÿπ‘–π‘Žπ‘™
β€’ π‘Ÿ2 =number of successes
out of 𝑛1 trials
β€’ 𝑝2 =
π‘Ÿ2
𝑛2
β€’ 𝑝2 =population
probability of success
The number of trials should be sufficiently large so that all four of the following are true:
𝑛1 𝑝1 > 5; 𝑛1 π‘ž1 > 5; 𝑛2 𝑝2 > 5; 𝑛2 π‘ž2 > 5
Confidence interval for π†πŸ βˆ’ π†πŸ
𝑝1 βˆ’ 𝑝2 βˆ’ 𝐸 < 𝜌1 βˆ’ 𝜌2 < 𝑝1 βˆ’ 𝑝2 + 𝐸
𝐸 = 𝑧𝑐 𝜎 = 𝑧𝑐
𝑝1 π‘ž1 𝑝2 π‘ž2
+
𝑛1
𝑛2
Example:
β€’ Suppose two groups of subjects were randomly chosen for
a sleep study. In group I, before going to sleep, the subjects
spent 1 hour watching a comedy movie. In this group there
were total of 175 dreams recorded, of which 49 were
dreams with feeling of anxiety, fear, or aggression. In group
II, the subject just went to sleep. There were total of 180
dreams recoded, of which 63 were dreams with feeling of
anxiety, fear or aggression.
β€’ A) Why could group I and II be considered independent
binomial distributions? Do we have enough sample?
β€’ B) compute a 95% confidence interval for π†πŸ βˆ’ π†πŸ
Answer
β€’ A) yes, because they were chosen randomly and they don’t
overlap. Also, we have enough samples because when we
did the calculations it’s all over 5 (do the work!!)
β€’ B) 𝑧𝑐
𝑝1 π‘ž1
𝑛1
+
𝑝2 π‘ž2
𝑛2
= 1.96
.28 .72)
175
+
.35 .65)
180
β€’ E=0.096
β€’ 𝑝1 βˆ’ 𝑝2 βˆ’ 𝐸 < 𝜌1 βˆ’ 𝜌2 < 𝑝1 βˆ’ 𝑝2 + 𝐸
β€’ .28 βˆ’ .35 βˆ’ 0.096 < 𝜌1 βˆ’ 𝜌2 < .28 βˆ’ .35 + 0.096
β€’ βˆ’0.166 < 𝜌1 βˆ’ 𝜌2 < 0.026
Homework Practice
β€’ Pg 377 #1-23 eoo