Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Ch. 8: Estimation
Section
1
2
3
4
Y. Butterworth
Title
Introduction to the Chapter
Estimating Mu when Sigma is Known
Estimating Mu when Sigma is Unknown
Estimating p in the Binomial Distribution
Estimating Mu1–Mu2 and p1–p2
Ch. 8 Brase’s 9th Edition
Notes Pages
2
2–5
6–8
9 – 11
12 – 16
1
Overview of Chapter 8 Concepts
The best point estimate of the population mean is the same mean, because:
1)
It is more consistent, meaning that it has less variation, than any other
estimator for the population
2)
It is an unbiased estimator of the population mean, meaning that it
symmetrically estimates over and under the population mean.
However, a better estimate than a point estimate exists. It is a confidence interval,
which is likely to contain the true population mean, based upon Normal theory.
Here is how and what we will be doing in this chapter:

We will use the standard normal distribution to define the confidence
interval with, alpha (symbol: α) This is the probability that the true value
of the mean is outside the interval created.

We will say that the level of confidence/degree of confidence that the
interval actually contains the mean is 1 – α. What this means is that if we
were to create 100 confidence intervals based upon 100 different samples,
then we should expect 1 –α of those intervals to contain the true
population mens.
§8.1 Estimating Mu when Sigma is Known
We know that sample means are normally distributed and thus there is a small probability
that they fall in the tails of the normal distribution. We will define the probability that a
value falls within the tail of the normal distribution as α. Due to the symmetry of the
normal distribution there is only 1/2 α probability that a value will fall in either tail. We’ll
write that as α/2.
1–α %
Confidence
-zα/2 0 zα/2
The values that are associated with the 1–α % Confidence Region are called critical
values and are denoted by -Zα/2 and Zα/2. We are going to practice finding critical values
based the z-table. We will also use the table in your book to find the critical values, and I
will also show you how to use a t-table to find critical values for the normal distribution.
Note: Your book uses “c” in place of α, but that is not typical notation and I will use α in its place.
Y. Butterworth
Ch. 8 Brase’s 9th Edition
2
Example:
Find the critical value corresponding to 90% level of confidence.
Step 1: Find α (remember that 1–α% is the level of confidence)
Step 2:
Split α into two tails and draw a picture
Step 3: Find the z-score
P(Z < zα/2) = α/2
Note: This is the same as finding normal values given a pre-defined probability. On your calculator this is
done with the invnormal(α/2
There are 2 alternate ways of finding the z-score associated with a 1–α% level of
confidence.
Alternate #1: Table of Critical Values (Table 5 on A23 in Appendix II or p. 332)
Step 1: Look up the level of confidence desired
Note: The draw back is that there are only values for 70-95 by increments of 5 and 98 &
99% levels of confidence. Another draw back is that this table is not always available.
Alternate #2:
Step 1:
The t-table on A24 of Appendix II
In the left column find the ∞ (the degrees of freedom where t is
approximately a standard normal)
Step 2:
Along the top in the One-Tail find α/2 or in the Two-Tail find α
Step 3:
Pinpoint the value that corresponds to step 1 & step 2 in the
“body” of the t-table.
Note: The draw back is that there are only values for 50%, 75%, 80-95% by increments of 5, 98%, 99% &
99.9%. Another draw back, depending on the table, is that the values may differ from those in a z-table
(although they are typically more accurate than a z-table).
Y. Butterworth
Ch. 8 Brase’s 9th Edition
3
Now, you might be wondering why we find critical values in the way that we do, and
how that relates to the normal distribution, so I will include it in my notes, although I
may not take the time in the class to go over the derivation as it is mostly Algebra.
We are discussing the distribution of the sample means and therefore when we are
discussing the standard normal we will be talking about a standard normal where
μ = population mean and σ = σ / √n
∴ z = x-bar – μ
σ
/ √n
P(-zα/2 < Z < zα/2) = 1 – α and knowing that
P(-zα/2 < x-bar – μ < zα/2) = 1 – α
σ
/ √n
z = x-bar – μ
σ
/ √n
and using a little Algebra to solve a
compound inequality – for μ
P(x-bar – zα/2 • (σ / √n) < μ < x-bar + zα/2 • (σ / √n)) = 1 – α
Thus, you see that the mean is between the values of x-bar minus zα/2 • (σ / √n) and x-bar
plus zα/2 • (σ / √n) with probability 1–α. We call the zα/2 • (σ / √n) the margin of error and
denote it with a “E”.
Finding Confidence Intervals when σ is known
1) You must know that the value comes from an approximately normal distribution
unless n ≥ 30
2) Standard deviation of the population must be known
3) Find the critical values
4) Compute the margin of error. E = zα/2 • (σ / √n)
5) Compute the interval x-bar ± E
Example:
If a random sample of size n = 20 from a normal population with
variance 225 has mean x-bar of 64.3. Construct a 95% confidence
interval for the population mean.
Example:
The axial loads of 175 cans have a sample mean of 267.1 lbs and
population standard deviation of 22.1 lbs. Find a 99% confidence
interval for the population mean.
Y. Butterworth
Ch. 8 Brase’s 9th Edition
4
Sometimes we might need to conduct a study in order to get sample data by which to
make inferences about the population. If this is the case, we need to make sure that we
have a large enough sample in order to make the desired inferences. In order to decide
how large of a sample to compute, we need only to rewrite the margin of error in order to
find the desired sample size to make inferences about the population mean.
E = zα/2 • (σ / √n) so this means that E√n = zα/2 • σ which means that √n = zα/2 • σ
E
2
and this means that n =
zα/2 • σ
E
[ ]
What do we need to decide on sample size
1)
Decide on the confidence level
2)
Decide on the margin of error
3)
Know σ
Note: If σ is unknown one way we can get around the problem is to estimate σ using the range rule of
thumb based upon previous studies. We will discuss another theoretical method in the next section
however.
2
4)
Find n based upon
n=
[ ]
zα/2 • σ
E
Example:
Suppose that the weights of all fox terriers dogs are normally
distributed with population standard deviation 2.5 kg. How large a
sample should be taken in order to be 95% confident that the
sample mean doesn’t differ from the population mean by more
than 0.5kg?
Example:
You want to find an estimate for the population height of men in
the US. You want to be 90% confident that the true population
mean is within 2 inches of the true population mean. How large of
sample must you draw to have this level of confidence and this
margin of error?
Y. Butterworth
Ch. 8 Brase’s 9th Edition
5
§8.2 Estimating Mu when Sigma is Unknown
When the standard deviation of the population is unknown the sampling distribution of
the sample means does not follow the normal distribution but another related distribution
called Student’s t-distribution. This distribution was discovered by a brewer for Guiness
Brewing Company, but he was not allowed to publish his findings under his true name
and although it was really Gosset’s distribution, it has come to be known in the field of
science as Student’s t-distribution and thus it shall always be known as that!
Facts about Student’s t-distribution
1)
When s is used to approximate σ, the distribution of x-bar’s follows this
distribution
2)
Student’s t-distribution is symmetric about its mean (zero) and approximately
bell-shaped
3)
Student’s t-distribution is wider, flatter with thicker tails than the normal
distribution
4)
As the number of samples increases the distribution becomes more and more like
the normal distribution (approximately 1000 is fairly normal)
5)
The degrees of freedom, d.f. is n – 1 (in advanced texts this is referred to as ν, the Greek
letter nu)
6)
Still must be known that the data comes from an approximately normal
distribution and if this is unknown then must be shown that the variables are
approximately normally distributed (see the last chapter) or n ≥ 30
First, let’s practice using the table and then learn how to find the values on your
calculator. I’ll need to upload a nice little program to everyone’s calculator before we
can use our calculators to do that, however.
Finding critical values for a confidence interval when the population standard deviation is
unknown is the equivalent of finding critical values for the normal distribution, we just
need an extra piece of information – the degrees of freedom (n – 1).
Example:
Using the t-table in Appendix II on A24 find the critical values for
a sample size of 23 for which we want a 90% confidence interval
for the population mean.
Step 1: Find α
Y. Butterworth
Step 2:
Compute the degrees of freedom –
Step 3:
Look in the top row for area in two-tails to find α
Step 4:
Look along the left column to find the degrees of freedom
Step 5:
Find the critical value in the body of the table
Ch. 8 Brase’s 9th Edition
n–1
6
Example:
Using the t-table in Appendix II on A24 find the critical values for
a sample of size 48 for which we want a 85% confidence interval
for the population mean.
Note: There is no df = 47 in the table. Convention says that when there is no corresponding degrees of
freedom that we will use the next lower degrees of freedom available.
Now, let’s put it all together and find confidence intervals for the population mean based
upon Student’s t-distribution.
Finding Confidence Intervals when σ is unknown
1) You must know that the value comes from an approximately normal distribution
unless n ≥ 30
2) Standard deviation of the sample must be computed/computable
3) Find the critical values, tn–1,α
4) Compute the margin of error. E = tn–1,α • (s / √n)
5) Compute the interval x-bar ± E
Example:
We have a sample of size 37 with a sample mean of 20 and sample
standard deviation of 2. Find a 90% confidence interval for the
true population mean.
Example:
The data below represents the ages of people when they were 1st
diagnosed with cancer. Show that a CI is relevant and give a 95%
CI for the true population mean.
Stem (x10)
Leaf(x1)
0
2
1
57
2
779
3
5569
4
01
5
0
Step 1:
Y. Butterworth
Calculate the sample mean & std. dev. & median
Ch. 8 Brase’s 9th Edition
7
Step 2:
Y. Butterworth
Investigate normality
a)
View stem-and-leaf
b)
Normal probability plot
c)
Mean vs Median
d)
Pearson’s Skewness
e)
Outliers?
Step 3:
Step 4:
Find the critical value: tn–1,α
Calculate the margin of error, E
Step 5:
Compute the interval
Ch. 8 Brase’s 9th Edition
8
§8.3 Estimating p in the Binomial Distribution
Notation
p = population proportion
p^ = x
n
q^ = 1 – p^
the sample proportion (read: p-hat)
x = # of successes (recall binomial)
n = # of trials
the complement of the sample proportion
α = probability that r.v. is in the tails of the dist.
1– α = Level of Confidence (be careful of how you interpret this!)
zα/2
the critical value for our confidence interval
Given this notation, then we can create (1–α)% confidence interval for the true
population mean in the following way:
If the following assumptions are met:
1)
Fixed number of trials
{
Binomial
Assumptions
2)
Trials are independent
3)
Two categories for the outcomes
4)
Probabilities remain constant for each trial
Normality  5)
Assumption
np≥5 & nq≥5
(1– α)% Confidence Interval for Population Proportion
With margin of error:
The (1 – α)% CI for p is:
E = zα/2
√
p^ ± E
This interval can be written in the form:
Y. Butterworth
p^ q^
n
which creates an interval for which one has
a (1–α) level of confidence, that it contains
the true population proportion.
^ – E, p^ + E)
p^ – E < p < p^ + E or (p
Ch. 8 Brase’s 9th Edition
9
Now we will do a quick example so you can get the hang of the computation and then we
will do an example with the data that you created in our last lab on random number
generation and finding sample proportions.
Example:
A sample survey at ta supermarket showed that 204 of 300
shoppers use Cent’s-Off coupons regularly. Find a 99%
confidence interval for the true population proportion of shoppers
that use Cent’s-Off coupons.
Step 1:Determine the values of:
n = ________
&
x = _______
Step 2:Calculate p^ & q^
p^ = ________
&
q^ = ________
Step 3:Find α [(1 – Level of confidence)]
α = _________
Step 4:Find zα/2 by looking up in
large t, in t-table using two-tailed test for α
use critical value for 99% listed on z-table
look up (Level of Confidence) + α/2 in the
body of positive z-value table (right-tail)
look up α/2 in the body of the
negative z-table (left-tail)
1)
2)
3)
4)
Step 5:Calculate the margin of error
E = ___________
Step 6:Give the confidence interval in interval notation
Here are a couple more examples that we can work together or you can practice on your
own.
Example:
In a random sample of 250 T.V. viewers in a certain area, 190 had
seen a certain controversial program. Find a 90% CI for the true
population proportion.
Example:
A study is being made to estimate the proportion of voters in a
sizeable community who favor the construction of a nuclear power
plant. It is found that only 140 of the 400 voters selected at
random favor the project, find a 95% CI for the proportion of all
voters in this community who favor the project.
Y. Butterworth
Ch. 8 Brase’s 9th Edition
10
Our knowledge of confidence intervals can also be used in a different way. We can use it
to calculate a sample size to poll in order to achieve a set level of confidence and margin
of error. This formula is found by solving the margin of error for n.
n =
[zα/2]2 p^ q^
E2
You may wonder why you would want to do this! Here is a real life example of how I
used it in my consulting work. A large city wanted to conduct a survey to find out
customer satisfaction with their garbage service. They want me to come up with a plan
for conducting the survey, so I asked them a few questions pertaining to margin of error
and set a confidence level at a fairly hefty size of 90% (survey’s seldom achieve that
great of level of confidence due to sampling errors). With my calculation I indicated to
them that they should poll around a thousand people (mind you this was in an area of
around 100,000).
Example:
If your client wants a 90% confidence level and a margin of error
equal to 0.2, calculate the sample size needed to ascertain that 90%
confidence is achieved if a previous poll showed that p-hat was
0.47.
*Note: When a p-hat is not known, we use p-hat=q-hat=0.5.
Y. Butterworth
Ch. 8 Brase’s 9th Edition
11
§8.4 Estimating Mu1–Mu2 and p1–p2
We’ve got the idea of how to compute confidence intervals by now, so let’s discuss how
we would use intervals for differences. First, we need to make sure that we have 2
independent samples. Independent samples means that we can’t make pairing from one
data point to another. Examples of dependent data are time series data (same object is
measured twice) or data gathered for related experimental units (such as data gathered for
twins).
Differences are interested in the differences between proportions or means.
Proportions:
√
E = zα/2
p^1 q^1
n1
Interval:
(p^1 – p^2) ± E
+
p^2 q^2
n2
Means with σ known
E = z α/2
√
Interval:
(x1 – x2) ± E
σ12
n1
+
σ22
n2
Means with σ unknown
E = tnsmall-1, α/2
Interval:
√
s12 +
n1
s2 2
n2
(x1 – x2) ± E
Note: If σ1 = σ2, but still unknown, then there is another calculation. See exercise #27 on p. 385 if you are
interested.
Interpretations of Intervals
We can make inferences based upon the confidence intervals created that we can refer
back to after we have discussed hypothesis testing in Chapter 9. Your book begins to
make the inferences very informally at this time. I want you to know that all inferences
made based upon confidence intervals for means are 100% valid and can be used to
conduct a hypothesis test. I also want to warn you, that although we can make inferences
about proportions using confidence intervals they are not as valid as inferences made
using hypothesis testing as it is introduced in Chapter 9. It is generally safe to make a
Y. Butterworth
Ch. 8 Brase’s 9th Edition
12
preliminary statement based upon a confidence interval for a proportion but an actual
inference should only be made and stated as a conclusion to a hypothesis test based on
the traditional or p-value method introduced in Chapter 9. The reason for this is that the
distributions are not the same for the test statistic and the confidence interval and
therefore sometimes conclusions can be invalid.
If the interval for μ1 – μ2 is all positive values then it can be inferred, with 1-α%
confidence that μ1 > μ2
If the interval for μ1 – μ2 is all negative values then it can be inferred, with 1-α%
confidence that μ1 < μ2
If the interval for μ1 – μ2 has both positive and negative values then it can be inferred,
with 1-α% confidence that μ1 is not different from μ2
Your book makes the same generalizations for proportions. See my caution above!
Without further explanation, let’s do some examples. Let’s call your text into play.
Example 1: #8 p. 378 from Brase/Brase’s 9th edition, Understandable Statistics
Inorganic phosphorous is a naturally occurring element in all plants and
animals, with concentrations increasing progressively up the food chain.
Geochemical surveys take soil samples to determine phosphorous content
(in ppm). A high phosphorous content may or may not indicate an ancient
burial site, food storage site, or even a garbage dump. Independent
random samples from two regions gave the following phosphorous
measurements (in ppm). Assume the distribution of phosphorous is moundshaped and symmetric for these two regions.
Region 1:
855
1550 1230 875
1080 2330 1850 1860 2340
1080 910
1130 1450 1260 1010
Region 2:
Y. Butterworth
540
640
810
1180
790
1160
1230
1050
1770
1020
960
1650
860
890
a)
Which should be used, a z-interval or a t-interval? Why?
b)
What are the degrees of freedom?
c)
Find the critical value for an 80% CI using your calculator.
d)
Write out the margin of error calculations for a confidence
level of 80% for the difference between the mean in R1 & R2.
Ch. 8 Brase’s 9th Edition
13
e)
Using your TI-83/84 find an 80% CI for μ1 – μ2
Note: We can use the data to compute the CI without even finding the mean and standard deviation with
the TI 83/84 calculators
f)
Explain what the confidence interval means in context of this
problem. Use information about the interval containing all
positive, all negative or both positive and negative.
g)
At the 80% level of confidence, is one region more interesting than
the other from a geochemical perspective? Use the information
stated in f).
Example 2: #14 p. 381 in Brase/Brase’s 9th edition, Understandable Statistics
Most married couples have two or three personality preferences in
common (see the reference to the Myers-Briggs Test in #13). Myers used a
random sample of 375 married couples and found that 132 has three
preferences in common. Another random sample of 571 couples showed
that 217 had two personality preferences in common. Let p1 be the
population proportions of all married couples with 3 preferences in
common and p2 be the population proportion of all married couples with 2
preferences in common.
a)
Find p-hat1
b)
Find p-hat2
c)
Compute the margin of error for the difference between population 1 & 2
at the 90% confidence level.
Y. Butterworth
Ch. 8 Brase’s 9th Edition
14
d)
Use your TI-83/84 to calculate the 90% CI for the difference between
population proportions for pop 1 & pop 2.
e)
What do the values contained within the interval seem to indicate about
the difference in the couples with 3 preferences in common and the
couples with 2 preferences in common?
Example 3: #18 p. 381 in Brase/Brase’s 9th edition, Understandable Statistics
“Parental Sensitivity to Infant Cues: Similarities and Differences Between
Mothers and Fathers,” by MV Graham (Journal of Pediatric Nursing, Vol. 8, No.
6), reports a study of parental empathy for sensitivity cues and baby
temperament (higher mean scores means more empathy). Let x1 be a random
variable that represent the score of a mother on an empathy test (in regards
to her baby). Let x2 be the empathy score of a father. A random sample of
32 mothers gave a sample mean of 69.44. Another random sample of 32
fathers gave a mean of 59. Assume the population standard deviation is
11.69 for mothers and 11.60 for fathers.
a)
Which should be used, a z-interval or a t-interval? Why?
b)
What is the correct critical value for a 99% CI?
c)
Compute the margin of error for the difference between mother’s mean
empathy score and father’s mean empathy score.
d)
Use your calculator to compute the 99% CI for the difference in mother’s
and father’s mean empathy scores.
Y. Butterworth
Ch. 8 Brase’s 9th Edition
15
e)
What does the confidence interval tell about the relationship between
empathy scores for mothers and fathers at the 99% confidence level?
f)
Do you think we could draw the same conclusions as in e) for the 95%
confidence level? Why or why not?
Y. Butterworth
Ch. 8 Brase’s 9th Edition
16