Download STAT 111 Introductory Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
STAT 111 Introductory Statistics
Lecture 10: Confidence Intervals and
Hypothesis Tests
June 8, 2004
Today’s Topics
• Confidence intervals revisited
• Margin of error for confidence intervals
• Introduction to hypothesis testing
Confidence Intervals Revisited
• A level C confidence interval for some population
parameter θ is an interval [L, U] computed from
sample data by a method that has probability C of
producing an interval containing the true value of
the parameter
• In other words,
P(L ≤ θ ≤ U) = C,
C can be 90%, 95%, 99%, etc.
Confidence Intervals
• The general form of a confidence interval is
given by
estimate ± margin of error
• The estimate is our guess for the value of the
unknown population parameter θ.
• The margin of error shows how accurate we
believe our guess is, based on the variability of
the estimate.
Confidence Interval for a Population
Mean
• Suppose we choose a simple random sample of
size n from a population with unknown mean µ
and known standard deviation σ. Then a level C
confidence interval for µ is
xz
• z* satisfies
*

n
– P(-z* ≤ Z ≤ z*) = C
– P(Z < -z*) = P(Z > z*) = (1 – C)/2
Confidence Interval for a Parameter
P(-z*  Z  z* )  C
1 C
P( Z   z * ) 
2
z
*
0
z
*
Z
P( Z  z * ) 
1 C
2
Confidence Interval for a Population
Mean
• Recall the Central Limit Theorem.
• Suppose we have any population whose
distribution has mean µ and standard deviation σ.
If we draw a large enough SRS from this
population, then
  
X is approximat ely N   ,


n
• This is true regardless of what the actual
population distribution is.
Confidence Interval for a Population
Mean
• Hence, if the population follows a normal
distribution, or the sample size is sufficiently
large, we have

* 
*  
*
*
P   z
 X z
  P( z  Z  z )  C

n
n
• This leads to

* 
*  
P X  z
 X z
C
n
n

Confidence Intervals for a Population
Mean
• For any confidence interval, there are two
possibilities:
– The interval contains the true value of the parameter
(in this case, µ).
– Our SRS was one of the few samples for which µ is
not contained in the interval.
• It is incorrect to say that there is probability C
that the unknown population parameter (µ) lies
within our particular confidence interval.
Confidence Interval for a Population
Mean
• It means that if we repeatedly sample from the
population, then the true population mean µ will
be covered by the constructed confidence
intervals (100C)% of the time.
• Remember! It is incorrect to say that the
probability that the true population mean µ lies
within the confidence interval is C.
• JAVA Applet for demonstrating confidence
intervals
Confidence Interval for a Population
Mean
Lower Confidence Limit
xz
*

Upper Confidence Limit
x
xz
n
*

n
Width on each side(Margin of error)
mz
Width of the CI  2 z
*
*

n

n
Commonly Used Confidence Levels
Confidence
level(C)
1-C
(1-C)/2
z* (z(1-C)/2)
99%
.01
0.005
2.575
98%
.02
0.01
2.33
95%
.05
0.025
1.96
90%
.10
0.05
1.645
80%
.20
0.1
1.28
Example 1
• The number and the types of television programs
and commercials targeted at children is affected
by the amount of time children watch TV.
• A survey was conducted among 100 American
children, in which they were asked to record the
number of hours they watched TV per week.
• The sample mean is 27.191.
• The known population standard deviation is 8.
• Estimate the average watch time at a 95%
confidence level.
Example 2
• A study of preferred height for an experimental
keyboard with large forearm-wrist support was
conducted. 31 trained typists were selected, and
the preferred keyboard height was determined for
each of them.
• The resulting sample average height was 80 cm.
• Assume the preferred height is normally
distributed with σ = 2 cm.
• Calculate a 90% confidence interval for µ, the
true average preferred height for the population.
Example 3
• Suppose we desire a confidence interval for the
true average stray-load loss µ (in watts) for a
certain type of induction motor when the line
current is held at 10 amps for a speed of 1500
rpm. Assume that stray-load loss is normally
distributed with σ = 3.0
• If the a sample of size 100 produces a mean strayload loss of 58.3, compute a 99% confidence
interval for µ.
Example 4
• The yield point of a particular type of mild steelreinforcing bar is known to be normally
distributed with σ = 100.
• The composition of the bar has been slightly
modified without affecting either the normality or
the value of σ.
• If a sample of 25 modified bars results in a
sample average yield point of 8439 lb, compute a
92% confidence interval for the true average
yield point of the modified bar.
Confidence Intervals (cont.)
• Confidence intervals for other parameters in a
population can also be constructed.
• In particular, confidence intervals can be
constructed on the standard deviation/variance of
a population whose distribution has known mean
µ.
• Also on populations in which some event occurs
with proportion p. (More on this one later on.)
Margin of Error of a Confidence
Interval
• The margin of error m is
mz
*

n
• Margin of error measures precision of our
estimate, but covers only random sampling errors.
• The size of the margin of error depends on
– Confidence level
– Sample size
– Population standard deviation
Confidence Interval
• The length (width) of a confidence interval is
width  2 z
*

n
• The length (width) of a confidence interval
increases if the margin of error increases.
• The width of a confidence interval increases if
– Confidence level increases
– Sample size decreases
– Population standard deviation increases
Choosing the Sample Size
• Fixing the confidence level, a confidence interval
for a population mean will have a specified
margin of error m when the sample size is
z
n  
 m
*



2
• By achieving a specified margin of error, we can
estimate the mean to within that margin of error
units.
Example 1
• To estimate the amount of lumber that can be
harvested in a tract of land, the mean diameter of
trees in the tract must be estimated to within one
inch with 99% confidence. What sample size
should be taken? (Assume diameters are normally
distributed with σ = 6 inches.)
Example 2
• Suppose that the standard deviation of the salaries
of a population of individuals is 30K, how many
individuals do we need to sample so that the 90%
CI has a margin of error no more than 5K?
Example 3
• Monitoring of a computer time-sharing system
has suggested that response time to a particular
command is normally distributed with σ = 25 ms.
• A new operating system is installed, and we wish
to estimate the true average response time µ for
the new environment.
• Assuming that response times are still normally
distributed with σ = 25, what sample size is
necessary to ensure that the resulting 95%
confidence interval has a width of at most 10?
Cautions on CI for Population Mean
• The data must be an SRS from the population.
• Formula is incorrect for more complex probability
sampling designs.
• Formula requires carefully produced data.
• Confidence interval is not resistant to outliers.
• When sample size is small, examine data for
skewness and other signs of non-normality.
• Formula requires standard deviation of population
to be known, which is not realistic in practice.
Introduction: Hypothesis Testing
• Confidence intervals are one of the two most
common types of formal statistical inference.
• We prefer confidence intervals when our goal is
to estimate a population parameter.
• Second common type of inference is used when
we want to assess the evidence provided by the
data in favor of some claim (hypothesis) about
the population.
Hypothesis Testing
• Examples of claims to which hypothesis testing
can be applied:
– Are less than 10% of all circuit boards produced by a
particular manufacturer defective?
– Is the true average inside diameter of a certain type of
pipe 0.75 cm?
– Does one type of twine have a higher average
breaking strength than a second type of twine?
– For a pharmaceutical company, is a new drug
effective for a certain disease?
Hypothesis Testing
• The hypothesis is a statement about the
parameters in a population or model.
• The results of a test are expressed in terms of a
probability that measures how well the data and
the hypothesis agree.
• In hypothesis testing, we need to set up two
hypotheses:
– The null hypothesis H0
– The alternative hypothesis Ha (sometimes denoted H1)
Hypothesis Testing
• The null hypothesis is the claim which is initially
favored or believed to be true.
• The null hypothesis is also the claim that we will
try to find evidence against.
• Usually the null hypothesis is a statement of “no
effect” or “no difference.”
• The test of significance is designed to assess the
strength of the evidence against the null
hypothesis.
Hypothesis Testing
• The alternative hypothesis is the claim that we
hope or suspect is true instead of H0.
• We often begin with the alternative hypothesis
Ha and then set up H0 as the statement that the
hoped-for effect is not present.
• Stating Ha is often a difficult task.
• Hypotheses in general refer to some population
or model and not to any particular outcome.
Hypothesis Testing
• The alternative hypothesis Ha can be either onesided or two-sided.
• One-sided alternative hypotheses:
–μ>0
– p ≤ 0.5
–σ<2
• Two-sided alternative hypotheses:
–μ≠0
– p ≠ 0.5
–σ≠2
Example
• Experiments on learning in animals sometimes measure
how long it takes a mouse to find its way through a maze.
The mean time is 18 second for one particular maze. A
researcher thinks that a loud noise will cause the mice to
complete the maze faster. She measures how long each of
10 mice takes with a noise as stimulus.
• Let μ be the mean time of mice to find their way through
a particular maze when noise is presented as a stimulus.
– H0: μ = 18
– Ha: μ < 18
One-sided Ha
Example
• Does more than half of the American population have
faith in the economy? 100,000 Americans are sampled.
• Let p be the population proportion of people who have
faith in the economy.
– H0: p ≤ 0.5
– Ha: p > 0.5
One-sided Ha
Example
• The Census Bureau reports that households spend an
average of 31% of their total spending on housing. A
homebuilders association in Cleveland wonders if the
national finding applies in their area. They interview a
sample of 40 households in the Cleveland metropolitan
area to learn what percent of their spending goes toward
housing.
• Let μ be the mean percent of spending of households in
Cleveland on housing.
– H0: μ = 0.31
Two-sided Ha
– Ha: μ ≠ 0.31
Example
• Does one type of twine have a higher average breaking
strength than a second type of twine?
• Let μ1 be the average breaking strength of the first type
of twine, and let μ2 be the average breaking strength of
the second type.
– H0: μ1 = μ2
– Ha: μ1 ≠ μ2
Two-sided Ha
Hypothesis Testing
• The alternative hypothesis in general should
express the hopes or suspicions we bring to the
data.
• We should not, however, look first at the data and
then frame Ha to fit what the data show.
• Use a two-sided alternative unless you have a
specific direction firmly in mind beforehand.
• In some circles, it is argued that the two-sided
alternative should always be used in testing.