Download stat11t_Chapter7S

Document related concepts
no text concepts found
Transcript
Lecture Slides
Elementary Statistics
Eleventh Edition
and the Triola Statistics Series
by Mario F. Triola
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 1
Chapter 7
Estimates and Sample Sizes
7-1 Review and Preview
7-2 Estimating a Population Proportion
7-3 Estimating a Population Mean: σ Known
7-4 Estimating a Population Mean: σ Not Known
7-5 Estimating a Population Variance
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 2
Section 7-1
Review and Preview
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 3
Preview
This chapter presents the beginning of
inferential statistics.
 The two major activities of inferential statistics are:
(1) to use sample data to estimate values of a
population parameters
(2) to test hypotheses or claims made about
population parameters – later chapters.
 We introduce methods for estimating values of
these important population parameters:
proportions, means, and variances.
 We also present methods for determining sample
sizes necessary to estimate those parameters.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 4
Section 7-2
Estimating a Population
Proportion
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 5
Key Concept
In this section we present methods for using a sample proportion (like
presidential approval proportion in a poll of 1200) to estimate the value of a
population proportion (entire US population).
•
ˆ
The sample proportion p is the best point estimate (one number) of the unknown
population proportion p.
A point estimate is a single value (or point) used to approximate a population
parameter.
•
We can use a sample proportion distribution to construct a confidence interval of
a population proportion.
phat is used as the point estimate of p because it is unbiased and it is
the most consistent of the estimators that could be used.
Unbiased – distribution of sample proportions tends to center about
the value of p, not over/under estimate it.
Consistent – standard deviation of sample proportions tends to be
smaller than the standard deviation of any other unbiased estimators.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 6
Example:
In the Chapter Problem we noted that in a Pew
Research Center poll, 70% of 1501 randomly selected
adults in the United States believe in global warming,
so the sample proportion is
p̂ = 0.70.
Find the best point estimate of the proportion of all
adults in the United States who believe in global
warming.
Because the sample proportion is the best point
estimate of the population proportion, we conclude
that the best point estimate of p is 0.70.
When using the sample results to estimate the
percentage of all adults in the United States who
believe in global warming, the best estimate is 70%.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 7
Sample proportion is a good estimate of population proportion, but
just how good is it???
A confidence interval (CI) is a range (or an interval) of values used
to estimate the true value of a population parameter .
For example: presidential approval rating is 51% with 3% margin of
error => CI = [48%,54%].
It is based on known or approximated distribution of sample
proportion – it is approximately normal via normal
approximation of the binomial.
A confidence level is the probability 1 –  that the confidence
interval actually does contain the population parameter, assuming
that the estimation process of creating a confidence interval is
repeated a large number of times.
(The confidence level is also called degree of confidence, or the
confidence coefficient.)
Most common choices for confidence level are
are
90%,
95%,
or 99%
( = 10%), ( = 5%),
( = 1%)
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 8
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 9
Interpreting a Confidence Interval
We must be careful to interpret confidence intervals correctly.
There is a correct interpretation and many different and creative incorrect interpretations
of the confidence interval.
For example:
0.677 < p < 0.723. (70% with margin of error 2.3%)
“We are 95% confident that the interval from 0.677 to 0.723 actually contains the true
value of the population proportion p.”
This means that if we were to select many different samples of size 1501 and construct
the corresponding confidence intervals, 95% of them would actually contain the value of
the population proportion p.
Note that in this correct interpretation, the level of 95% refers to the success rate of the
process being used to estimate the proportion.
Incorrect interpretations:
1) There is 95% chance that the true value of p falls between 0.677 and 0.723.
Any specific population has a fixed and constant value p, and a confidence interval
constructed from a sample either contains p or not – there is no probability involved.
2) 95% of sample proportions fall between 0.677 and 0.723.
No – we are trying to estimate the true population proportion, not sample proportions.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 10
Interpreting a Confidence Interval
19
 0.95
20
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 11
Critical Values
We use normal approximation of the binomial
distribution of sample proportion
a standard z score is used to distinguish between
sample statistics that are likely to occur and those
that are unlikely to occur.
Such a z score is called a critical value.
Critical values are based on the following observations:
1)
Under certain conditions, the sampling distribution of sample proportions can
be approximated by a normal distribution.
That is binomial approximation of normal distribution from Ch 6.
2)
A z score associated with a sample proportion has a probability of /2 of falling
in the right tail.
2)
The z score separating the right-tail region is commonly denoted by z/2 and is
referred to as a critical value because it is on the borderline separating z scores
from sample proportions that are likely to occur from those that are unlikely to
occur.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 12
Finding z2 for a 95%
Confidence Level
 = 5%
 2 = 2.5% = 0.025
TI 2nd VARS
invNormal(0.975)
Excel =NORMSINV(0.975)
z2
-z2
StatDisk
Analysis->Prob Distibutions->Normal
> alpha = 0.05
> z_alpha = qnorm(1-alpha/2)
> z_alpha
[1] 1.959964
Critical Values
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 13
When data from a simple random sample are used to estimate a
population proportion p, the margin of error, denoted by E, is
the maximum likely difference (with probability 1 – , such as
0.95) between the observed proportion p̂ and the true value of
the population proportion p.
The margin of error E is also called the maximum error of the
estimate and can be found by multiplying the critical value and
the standard deviation of the sample proportions:
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 14
Margin of Error for Proportions
How to calculate mean and standard deviation of the set of population proportion s?
Suppose the sample size is n and the actual unknown population proportion is p.
For proportion s, we have n independen t trials for each person/ele ment of a sample.
Each trial is success with probabilit y p and failure with probabilit y q  1 - p.
 We deal with binomial distributi on. Its mean is np and standard deviation is npq .
E  z 2
ˆˆ
pq
n
For large n, the binomial can be approximat ed by normal distributi on.
However, here we are interested in the proportion rather tha n just number of successes.
When we divide all elements by the total number of elements n, the mean and standard
deviation are divided by the same n 
npq
np
pq
p
 pˆ 

d
n
n
n
Furhermore , if each element in an approximat ely normal distributi on is divided by the same
constant, the result wil l still be approximat ely normal with above parameters 
pˆ  p
z
 Normal (0,1) 
pq
n
For 1   CI take z / 2
 pˆ 
 There is 1   probabilit y that  z / 2  z  z / 2 
 z / 2 
pˆ  p
 z / 2   z / 2
pq
n
pq
 pˆ  p  z / 2
n
p̂
pq
n
pq
  E  pˆ  p  E   pˆ  E   p  E  pˆ 
n
pˆ Copyright
 E  p©2010,
pˆ 2007,
E 2004 Pearson Education, Inc. All Rights Reserved.
Let E  z / 2
p̂
7.1 - 15
Procedure for Constructing
a Confidence Interval for p
1.
Verify that the required assumptions are satisfied.
1)The sample is a simple random sample.
2) The conditions for the binomial distribution are satisfied.
3) The normal distribution can be used to approximate the distribution of
sample proportions because np  5, and nq  5 are both satisfied.
2. Find the critical value z/2 that corresponds to the desired
confidence level (table or technology).
3.
Evaluate the margin of error
ˆˆ n
E  z 2 pq
4. Using the value of the calculated margin of error, E and the value
of the sample proportion, p, find the values of p – E and p + E.
Substitute those values in the general format for the confidence
interval:
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 16
Example:
In the Chapter Problem we noted that a Pew Research
Center poll of 1501 randomly selected U.S. adults
showed that 70% of the respondents believe in global
warming. The sample results are n = 1501, and pˆ  0.70
a. Find the margin of error E that corresponds to a
95% confidence level.
b. Find the 95% confidence interval estimate of the
population proportion p.
c. Based on the results, can we safely conclude that
the majority of adults believe in global warming?
d. Assuming that you are a newspaper reporter, write
a brief statement that accurately describes the
results and includes all of the relevant information.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 17
Requirement check:
1) simple random sample;
2) binomial distribution: fixed number of trials = 1501, trials are
independent, two categories of outcomes (believes or does not);
probability remains constant.
3) Number of successes and failures 1501*0.7, 1501*0.3 are both >5.
a) Use the formula to find the margin of error.
ˆˆ
pq
E  z 2
 1.96
n
E  0.023183
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
 0.70  0.30 
1501
7.1 - 18
b) The 95% confidence interval:
pˆ  E  p  pˆ  E
TI Stat->Tests->1-PropZInt
0.70  0.023183  p  0.70  0.023183
0.677  p  0.723
STATDISK
Analysis->Confidence Intervals->One Sample Proportion
Margin of error, E = 0.0231784
> pbar = 0.7
> n = 1501
> alpha = 0.05
> z_alpha = qnorm(1-alpha/2)
> z_alpha
[1] 1.959964
>
> E = z_alpha * sqrt( pbar*(1-pbar)/n )
>E
[1] 0.02318288
> CI = c(pbar - E, pbar + E)
> CI
[1] 0.6768171 0.7231829
95% Confidence Interval:
0.6770214 < p < 0.7233783
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 19
Excel
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 20
c)
Based on the confidence interval obtained in part (b), it does
appear that the proportion of adults who believe in global warming
is greater than 0.5 (or 50%), so we can safely conclude that the
majority of adults believe in global warming.
Because the limits of 0.677 and 0.723 are likely to contain the true
population proportion, it appears that the population proportion is
a value greater than 0.5.
d)
Here is one statement that summarizes the results: 70% of United
States adults believe that the earth is getting warmer. That
percentage is based on a Pew Research Center poll of 1501
randomly selected adults in the United States.
In theory, in 95% of such polls, the percentage should differ by no
more than 2.3 percentage points in either direction from the
percentage that would be found by interviewing all adults in the
United States.
Caution:
Never follow the common misconception that poll results are unreliable if the sample
size is a small percentage of the population size.
The population size is usually not a factor in determining the reliability of a poll.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 21
Sample Size
Suppose we want to collect sample data in order to
estimate some population proportion with given accuracy.
The question is how many sample items must be obtained?
Note that sample size
does NOT depend on the
population size N !!!
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 22
TI 2nd VARS -> invNormal(0.975)
> pbar = 0.73; qbar = 1-pbar;
> alpha = 0.05
> E = 0.03
> z_alpha = qnorm(1-alpha/2)
> z_alpha
[1] 1.959964
> n = z_alpha^2 * pbar * qbar / E^2
>n
[1] 841.2795
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
> pbar = 0.5; qbar = 1-pbar;
> alpha = 0.05
> E = 0.03
> z_alpha = qnorm(1-alpha/2)
> z_alpha
[1] 1.959964
> n = z_alpha^2 * pbar * qbar / E^2
>n
[1] 1067.072
7.1 - 23
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 24
My example Obtaining point estimate and margin of
error from CI
A journal article says that of 1200 people surveyed the
99% confidence interval for presidential approval is
(37%,42%)
> CI_Lower = 0.37
> CI_Upper = 0.42
> pbar = ( CI_Lower + CI_Upper ) /2
> pbar
[1] 0.395
> E = (CI_Upper - CI_Lower) /2
>E
[1] 0.025
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 25
Section 7-3
Estimating a Population
Mean:  Known
In addition to knowing the values of the sample data
or statistics, in this section we also assume, we
know the value of the population standard deviation,
.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 26
Point Estimate of the
Population Mean
The sample mean x is the best point estimate
of the population mean µ.
1. For all populations, the sample mean x is an unbiased
estimator of the population mean , meaning that the
distribution of sample means tends to center about
the value of the population mean . (review Ch 6)
2. For many populations, the distribution of sample
means x tends to be more consistent (with less
variation) than the distributions of other sample
statistics.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 27
Confidence Interval for
Estimating a Population Mean
(with  Known)
 = population mean
 = population standard deviation
x = sample mean
n = number of sample values
E = margin of error
z/2 = z score separating an area of a/2 in the
right tail of the standard normal
distribution
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 28
Confidence Interval for Estimating a Population
Mean (with  Known)
1. The sample is a simple random sample.
2. The value of the population standard deviation  is known.
3. Either or both of these conditions is satisfied:
The population is normally distributed OR n > 30 – so that it is
approximately normally distributed by CLT.
In either case, the most important part is that:
If we collect simple random samples of size n, the sample means x
are either exactly or approximately Normal(,  )
z
x

n
 Normal (0,1) 
n
For 1   CI take z / 2
 There is 1   probabilit y that  z / 2  z  z / 2 
 z / 2 
x

 z / 2   z / 2

n
 x    z / 2

n
n

  E  x    E   x  E     E  x 
n
x  E ©2010,
 x2007,
 E 2004 Pearson Education, Inc. All Rights Reserved.
Copyright
Let E  z / 2
7.1 - 29
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 30
> xbar = 172.55
> n = 40
> sigma = 26
> alpha = 0.05
> z_alpha = qnorm(1-alpha/2)
> z_alpha
[1] 1.959964
>
> E = z_alpha * sigma/sqrt(n)
>E
[1] 8.057335
> CI = c(xbar - E, xbar + E)
> CI
[1] 164.4927 180.6073
TI Stat->Tests->Zinterval
STATDISK Analysis->Confidence Intervals
->Mean One Sample
Margin of error, E = 8.057328
95% Confident the population mean is
within the range:
164.4927 < mean <180.6073
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 31
Finding a Sample Size for
Estimating a Population Mean
When collecting a simple random sample that will be used to
estimate a population mean  , how many sample values (n) must
be obtained to get a certain margin or error?
E  z / 2
 z / 2 
n

n
 E 

2
If the computed sample size n is not a whole number, round
the value of n up to the next larger whole number.
If we want more accurate estimate, we need to decrease the
margin of error E
=> sample size n must be substantially increased.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 32
Example:
Assume that we want to estimate the mean IQ score for the population of statistics
students.
How many statistics students must be randomly selected for IQ tests if we want 95%
confidence that the sample mean is within 3 IQ points of the population mean?
Note that IQ tests are typically designed so that mean is around 100 and standard
deviation is 15
 = 0.05
 /2 = 0.025
z / 2 = 1.96
E = 3
 = 15
n =
1.96 • 15
2 = 96.04 = 97
3
With a simple random sample of only 97 statistics
students, we will be 95% confident that the sample
mean is within 3 IQ points of the true population
mean .
> E = 3; sigma = 15;
> alpha = 0.05
> z_alpha = qnorm(1-alpha/2)
> z_alpha
[1] 1.959964
> n = ( z_alpha*sigma/E )^2
>n
[1] 96.03647
TI 2nd VARS -> invNormal(0.975)
Excel =NORMSINV(0.975)
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 33
Section 7-4
Estimating a Population
Mean:  Not Known –
most typical situation
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 34
Sample Mean
As before, the sample mean x is the best point estimate of the population mean.
But now we do not assume that we know population standard deviation , still we can
estimate it with sample standard deviation s.
x
 NOT Normal, but t  distributi on

s
n
n
For 1   CI take t / 2  from t  table or calculator
z
x
 Normal (0,1)  t 
 There is 1   probabilit y that  t / 2  t  t / 2 
 t / 2
x
s

 t / 2  E  t / 2
and x  E    x  E
s
n
n
William Gossett,
Guinness Brewery
or TI84
The number of degrees of freedom for a
collection of sample data is the number
of sample values that can vary after
certain restrictions have been imposed
on all data values.
The degree of freedom is often
abbreviated df.
For our case the restriction is that the
sample mean is already found:
sum(x)/n = xbar – fixed number.
If say you are averaging 25 test scores
and already found that the sample mean
is 77, then the sum is fixed at 77*25
=1925.
Then, we can freely assign only 24
scores, the last one is fixed to have sum
= 1925.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
=> degrees of freedom df = n –7.1
1
- 35
Procedure for Constructing a
Confidence Interval for µ
(With σ Unknown)
1. Verify that the requirements are satisfied.
2. Using n – 1 degrees of freedom, refer to Table A-3 or use
technology to find the critical value t2 that corresponds to
the desired confidence level.
3. Evaluate the margin of error E = t2 • s / n .
4. Find the values of x  E and x  E. Substitute those
values in the general format for the confidence interval:
x E    x E
5. Round the resulting confidence interval limits.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 36
TI 2nd VARS
invT(0.975,6)
STATDISK
Analysis->Probability Distributions->Student t-distrib
t Value: -2.446909
Prob Dens: 0.0339541
> # Example 1 How to find critical value t_alpha2
> alpha = 0.05
>n=7
> df = n-1
> t_alpha2 = qt(1-alpha/2,df)
> t_alpha2
[1] 2.446912
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
Cumulative Probs
Left: 0.025000
Right: 0.975000
2 Tailed: 0.050000
Central: 0.950000
6 Degrees Freedom
7.1 - 37
Example:
A common claim is that garlic lowers cholesterol levels.
In a test of the effectiveness of garlic, 49 subjects were treated
with doses of raw garlic, and their cholesterol levels were
measured before and after the treatment.
The changes in their levels of LDL cholesterol (in mg/dL) have a
mean of 0.4 and a standard deviation of 21.0.
Use the sample statistics of n = 49, x = 0.4 and s = 21.0 to
construct a 95% confidence interval estimate of the mean net
change in LDL cholesterol after the garlic treatment.
What does the confidence interval suggest about the
effectiveness of garlic in reducing LDL cholesterol?
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 38
Requirements are satisfied:
simple random sample
n = 49 (i.e., n > 30 – good enough even for arbitrary
non-normal distribution).
95% implies alpha = 0.05.
With n = 49, the df = 49 – 1 = 48
TI 2nd VARS
invT(0.975,48)
Closest df in table is 50, two tails, so t/2 = 2.009
Using t/2 = 2.009, s = 21.0 and n = 49 the
margin of error is:
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 39
x  0.4, E  6.027
TI
Construct the confidence
interval:
x E    x E
0.4  6.027    0.4  6.027
5.6    6.4
STAT->TESTS->TInterval
> xbar = 0.4
> n = 49
> s = 21.0
> alpha = 0.05
> df = n-1
> t_alpha2 = qt(1-alpha/2,df)
> t_alpha2
[1] 2.010635
>
> E = t_alpha2 * s/sqrt(n)
>E
[1] 6.031904
> CI = c(xbar - E, xbar + E)
> CI
[1] -5.631904 6.431904
We are 95% confident that the limits of –5.6 and
6.4 actually do contain the value of , the mean of
the changes in LDL cholesterol for the
population.
Because the confidence interval limits contain
the value of 0, it is very possible that the mean of
the changes in LDL cholesterol is equal to 0,
suggesting that the garlic treatment did not affect
the LDL cholesterol levels.
It does not appear that the garlic treatment is
effective in lowering LDL cholesterol.
STATDISK Analysis->Confidence Intervals->Mean One Sample
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
Margin of error, E = 6.031898
95% Confident the population mean is within the range:
7.1
-5.631898 < mean <6.431898
- 40
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 41
Properties of the Student t Distribution
1. The Student t distribution is different for
different sample sizes (see the picture, for the
cases n = 3 and n = 12).
2. The Student t distribution has the same general
symmetric bell shape as the standard normal
distribution but it reflects the greater variability
(with wider distributions) that is expected with
small samples.
3. The Student t distribution has a mean of t = 0
(just as the standard normal distribution has a
mean of z = 0).
4. The standard deviation of the Student t
distribution varies with the sample size and is
greater than 1 (unlike the standard normal
distribution, which has a  = 1).
5. As the sample size n gets larger, the Student t
distribution gets closer to the normal
distribution.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 42
Choosing the Appropriate Distribution
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 43
Example:
1) n  11, x  24,   14  and population has normal distributi on
normal and σ is known  use z / 2
2) n  11, x  24, s  14  and population has normal distributi on
normal and σ is NOT known  use t / 2
3) n  11, x  24,   14  and population has skewed distributi on
NOT normal and n  30  can NOT use this section te chniques
4) n  40, x  24,   14  and population has skewed distributi on
n  30  can use CLT, and σ is known  use z / 2
5) n  40, x  24, s  14  and population has skewed distributi on
n  30 and s is NOT known  use t
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
 /2
7.1 - 44
Example, CI for mean women weight.
On TI calculator:
Store values in a list L1.
Stat->Tests-> TInterval
> DataFrame = read.csv("FHEALTH.csv")
> attach(DataFrame)
>
> x = WT # weight
>x
[1] 114.8 149.3 107.8 …..
> n = length(x)
>n
# check how big is the sample
[1] 40
> qqnorm(x); qqline(x); # check normality assumption
>
> xbar = mean(x)
> xbar
[1] 146.22
> s = sd(x)
# sigma is NOT given, use s - sample standard deviation
>s
[1] 37.62104
> alpha = 0.01
> df = n-1
> t_alpha2 = qt(1-alpha/2,df)
> t_alpha2
[1] 2.707913
STATDISK
>
1st use Datasets to open Female Health Exam
> E = t_alpha2 * s/sqrt(n) # error margin
2nd Data-> Descriptive Statistics on column 3 of weights
>E
3rd Analysis->Confidence Intervals->Mean One Sample
[1] 16.10777
Margin of error, E = 16.10776
> CI = c(xbar - E, xbar + E) # CI for mean mu of the population
> CI
99% Confident the population mean is within the range:
[1] 130.1122 162.3278
130.1122 < mean <162.3278
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 45
Excel computation for pulse in FHEALTH.XLS
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 46
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 47
Finding the Point Estimate and E from a CI
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 48
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 49
skip
Section 7-5
Estimating a Population
Variance
s
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 50
Chi-Square Distribution
In a normally distributed population with variance  2
assume that we randomly select independent samples
of size n and, for each sample, compute the sample
variance s2 (which is the square of the sample standard
deviation s).
The sample statistic 2 (pronounced chi-square) has a
sampling distribution called the chi-square distribution.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 51
Chi-Square Distribution
 =
2
(n – 1) s2
2
where
n = sample size
s 2 = sample variance
 2 = population variance
degrees of freedom = n – 1
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 52
Properties of the Distribution
of the Chi-Square Statistic
1. The chi-square distribution is not symmetric, unlike
the normal and Student t distributions.
As the number of degrees of freedom increases, the
distribution becomes more symmetric.
Chi-Square Distribution
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
Chi-Square Distribution for
df = 10 and df = 20
7.1 - 53
Properties of the Distribution
of the Chi-Square Statistic – cont.
2. The values of chi-square can be zero or positive, but they
cannot be negative.
3. The chi-square distribution is different for each number of
degrees of freedom, which is df = n – 1.
As the number of degrees of freedom increases, the chisquare distribution approaches a normal distribution.
In Table A-4, each critical value of 2 corresponds to an
area given in the top row of the table, and that area
represents the cumulative area located to the right of the
critical value.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 54
Example
A simple random sample of ten voltage levels
is obtained.
Construction of a confidence interval for the
population standard deviation  requires the
left and right critical values of 2
corresponding to a confidence level of 95%
and a sample size of n = 10.
Find the critical value of 2 separating an area
of 0.025 in the left tail, and find the critical
value of 2 separating an area of 0.025 in the
right tail.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 55
Example
Critical Values of the Chi-Square Distribution
Excel
=CHIINV(0.025,9)
=CHIINV(0.975,9)
> alpha = 0.05
> n = 10
> df = n-1
> df
[1] 9
> chi2L = qchisq(alpha/2,df)
> chi2L
[1] 2.700389
> chi2R = qchisq(1-alpha/2,df)
> chi2R
[1] 19.02277
STATDISK
Analysis->Prob Distr->Chi Squared
Chi Sq Value: 2.700392
Prob Dens: 0.0318663
Cumulative Probs
Left: 0.025000
Right: 0.975000
9 Degrees Freedom
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 56
Estimators of 
2
The sample variance s2 is the best
point estimate of the population
variance 2.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 57
Estimators of 
The sample standard deviation s is a
commonly used point estimate of 
(even though it is a biased estimate).
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 58
Confidence Interval for Estimating a
Population Standard Deviation or
Variance
Requirements:
1. The sample is a simple random sample.
2. The population must have normally
distributed values (even if the sample is
large).
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 59
Confidence Interval for Estimating a
Population Standard Deviation or
Variance
If we obtain simple random samples of size n from a normally distribute d population
with varia nce  2 , there is probabilit y (1   ) that the statistic :
(n  1) s 2
2
 
2
L
will fall between critical values  L2 and  R2 , i.e.
(n  1) s 2

2
 
2
R
1

2
L

2
(n  1) s
2

Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
1

2
R

(n  1) s 2

2
R
 
2
(n  1) s 2
 L2
7.1 - 60
Confidence Interval for Estimating a
Population Standard Deviation or
Variance
Confidence Interval for the Population
Standard Deviation 
n  1s

2
R
2
 
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
n  1s

2
2
L
7.1 - 61
Example:
The proper operation of typical home appliances requires voltage
levels that do not vary much.
Listed below are ten voltage levels (in volts) recorded in the
author’s home on ten different days.
These ten values have a standard deviation of s = 0.15 volt.
Use the sample data to construct a 95% confidence interval
estimate of the standard deviation of all voltage levels.
123.3 123.5 123.7 123.4 123.6 123.5 123.5 123.4 123.6 123.8
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 62
Example:
Requirements are satisfied: simple random
sample and normality
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 63
> s = 0.15
> CIvar = c( (n-1)*s^2/chi2R, (n-1)*s^2/chi2L )
> CIvar
[1] 0.01064514
0.07498918
> CIsd = sqrt(CI)
> CIsd
[1] 0.1031753
0.2738415
Example:
n = 10 so df = 10 – 1 = 9
Use table A-4 to find:
  2.700
2
L
and
  19.023
2
R
Construct the confidence interval: n = 10, s = 0.15
n  1s

2

2
R
10  10.15 
2
2
95% Confidence Interval for the st dev:
0.1031752 < SD < 0.2738414
 L2
2
19.023
n  1s


STATDISK
Analysis->Confidence Intervals->St-Dev One Sample
10  10.15 


2

2
2.700
95% Confidence Interval for the variance:
0.0106451 < VAR < 0.0749891
TI does not have it.
There are some special programs you have to
upload, see p 377.
Excel use DDXL p 377.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 64
Example:
Evaluation the preceding expression yields:
0.010645    0.075000
2
Finding the square root of each, yields this 95%
confidence interval estimate of the population
standard deviation:
0.10 volt    0.27 volt.
Based on this result, we have 95% confidence
that the limits of 0.10 volt and 0.27 volt contain
the true value of .
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 65
Determining Sample Sizes
The procedures for finding the sample size
necessary to estimate 2 are much more
complex than the procedures given earlier for
means and proportions.
Instead of using very complicated procedures,
we will use Table 7-2.
STATDISK also provides sample sizes.
With STATDISK, select Analysis, Sample Size
Determination, and then Estimate St Dev.
Minitab, Excel, and the TI-83/84 Plus calculator
do not provide such sample sizes.
7.1 - 66
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
Determining Sample Sizes
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 67
Example:
We want to estimate the standard deviation  of
all voltage levels in a home.
We want to be 95% confident that our estimate
is within 20% of the true value of .
How large should the sample be?
Assume that the population is normally
distributed.
From Table 7-2, we can see that 95% confidence
and an error of 20% for  correspond to a
sample of size 48.
=> We should obtain a simple random sample of
48 voltage levels form the population of voltage
levels.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
7.1 - 68