• Study Resource
• Explore

Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
```Chapter 13
Introduction to Inference
Essential Statistics
Chapter 13
1
What We’ll Learn?
<Part I>
Confidence Interval
Confidence level
Margin of error
Critical value Z*
<Part II>
Null hypothesis H0
Alternative hypothesis Ha
Z test statistic
P Value
Significance level
Essential Statistics
Chapter 13
2
Statistical Inference
 Provides
methods for drawing
sample data
– Confidence Intervals
What is the population mean?
– Gives an estimated range of value which
is likely to include an unknown population
parameter such as population mean µ
Essential Statistics
Chapter 13
3
Simple Conditions
◙ SRS from the population of interest
◙ Variable has a Normal distribution
N(m, s) in the population
◙ Although the value of m is unknown, the
value of the population standard deviation
s is known
Essential Statistics
Chapter 13
4
Confidence Interval
A level C confidence interval has two parts
1. An confidence interval usually has the form:
estimate ± margin of error
 σ
xz
n
1.
The confidence level C, which is the
probability that the interval will capture the
true parameter value in repeated samples;
that is, C is the success rate for the method.
Essential Statistics
Chapter 13
5
Video references

<short, brief define Confidence interval>


Essential Statistics
Chapter 13
6
Case Study
NAEP Quantitative Scores
(National Assessment of Educational Progress)
Rivera-Batiz, F. L., “Quantitative literacy and the likelihood of
employment among young adults,” Journal of Human
Resources, 27 (1992), pp. 313-328.
What is the average score for all young
Essential Statistics
Chapter 13
7
Case Study
NAEP Quantitative Scores
The NAEP survey includes a short test of
quantitative skills, covering mainly basic
arithmetic and the ability to apply it to realistic
problems. Scores on the test range from 0 to
500, with higher scores indicating greater
numerical abilities. It is known that NAEP
scores have standard deviation s = 60.
Essential Statistics
Chapter 13
8
Case Study
NAEP Quantitative Scores
In a recent year, 840 men 21 to 25 years of
age were in the NAEP sample. Their mean
quantitative score was 272.
On the basis of this sample, estimate the
mean score m in the population of all 9.5
million young men of these ages.
Essential Statistics
Chapter 13
9
Case Study
NAEP Quantitative Scores
1.
2.
3.
To estimate the unknown population mean m,
use the sample mean x = 272.
The law of large numbers suggests that x
will be close to m, but there will be some error in
the estimate.
 distribution of x has the Normal
The sampling
distribution with mean m and 
standard
deviation

s
60

 2.1
n
840

Essential Statistics
Chapter 13
10
Case Study
NAEP Quantitative Scores
Essential Statistics
Chapter 13
11
Case Study
NAEP Quantitative Scores
4.
The 68-95-99.7 rule
indicates that
x and m are within
two standard
deviations (4.2) of
95% of all samples.
x  4 .2 = 2 7 2  4 .2 = 2 6 7 .8
x + 4 .2 = 2 7 2 + 4 .2 = 2 7 6 .2
Essential Statistics
Chapter 13
12
Case Study
NAEP Quantitative Scores
So, if we estimate that m lies within 4.2 of
we’ll be right about 95% of the time.
Essential Statistics
Chapter 13
x,
13
Confidence Interval
Mean of a Normal Population
Take an SRS of size n from a Normal population
with unknown mean m and known standard
deviation s. A level C confidence interval for m is:
xz

σ
n
z* is called the critical value, and z* and –z* mark off the
Central area C under a standard normal curve (next slide);
values of z* for many choices of C can be found at the
bottom of Table C in the back of the textbook, and the
most common values are on the next slide.
Essential Statistics
Chapter 13
14
Confidence Interval
Mean of a Normal Population
Confidence Level
C
Critical Value
z*
90%
1.645
95%
1.960
99%
2.576
Essential Statistics
Chapter 13
15
Case Study
NAEP Quantitative Scores
Using the 68-95-99.7 rule gave an approximate 95%
confidence interval. A more precise 95% confidence
interval can be found using the appropriate value of z*
(1.960) with the previous formula.
x  (1 .9 6 0 )(2 . 1 ) = 2 7 2  4 .1 1 6 = 2 6 7 .8 8 4
x  (1 .9 6 0 )(2 . 1 ) = 2 7 2  4 .1 1 6 = 2 7 6 .1 1 6
We are 95% confident that the average NAEP
quantitative score for all adult males is between
267.884 and 276.116.
Essential Statistics
Chapter 13
16
Express Confidence Interval
The confidence level describes the uncertainty
associated with a sampling method
 A 95% confidence level means that we would
expect 95% of the interval estimates to include
the population parameter

 To
express a confidence interval, you need
three pieces of information.
◙ Confidence level
◙ Statistic
◙ Margin of error
Essential Statistics
Chapter 13
17
Construct A Confidence Interval
Identify a sample statistic. Locate the statistic
(e.g, mean, standard deviation)
 Select a confidence level, choose 90%, 95%, or
99% confidence levels or others
 Find the margin of error: Margin of error = Critical

value x Standard deviation of statistic
xz


σ
n
Confidence interval = sample statistic + margin
of error
Essential Statistics
Chapter 13
18
Exercise 1
Suppose we want to estimate the average weight of
an adult male in Dekalb County, GA. We draw a
random sample of 1,000 men from a population of
1,000,000 men and weigh them. We find that the
average weighs in sample is 180 pounds, and the
standard deviation is 30 pounds. What is the 95%
confidence interval.
 (A) 180 + 1.86
(B) 180 + 3.0
(C) 180 + 5.88
(D) 180 + 30

Essential Statistics
Chapter 13
19
Hypothesis Test & P Value
Hypothesis, H0, Ha
 Significance
 Sample
 P-value
 Conclusion

From sample mean to draw an conclusion about
population mean
Essential Statistics
Chapter 13
20
Stating Hypotheses
Null Hypothesis, H0

The statement being tested in a statistical test
is called the null hypothesis.

Usually the null hypothesis is a statement of
“no effect” or “no difference”, or it is a
statement of equality.

When performing a hypothesis test, we assume
that the null hypothesis is true until we have
sufficient evidence against it.
Essential Statistics
Chapter 13
21
Stating Hypotheses
Alternative Hypothesis, Ha



The statement we are trying to find evidence for
is called the alternative hypothesis.
Usually the alternative hypothesis is a
statement of “there is an effect” or “there is a
difference”, or it is a statement of inequality.
The alternative hypothesis is what we are trying
to prove.
Essential Statistics
Chapter 13
22
Case Study I
Sweetening Colas
Diet colas use artificial sweeteners to avoid
sugar. These sweeteners gradually lose their
sweetness over time. Trained testers sip the
cola and assign a “sweetness score” of 1 to 10.
The cola is then retested after some time and the
two scores are compared to determine the
difference in sweetness after storage. Bigger
differences indicate bigger loss of sweetness.
Essential Statistics
Chapter 13
23
Case Study I
Sweetening Colas
Suppose we know that for any cola, the sweetness loss
scores vary from taster to taster according to a Normal
distribution with standard deviation s = 1.
The mean m for all tasters measures loss of sweetness.
The sweetness losses for a new cola, as measured by
10 trained testers, yields an average sweetness loss of
x = 1.02. Do the data provide sufficient evidence that
the new cola lost sweetness in storage?
Essential Statistics
Chapter 13
24
Case Study I
Sweetening Colas



If the claim that m = 0 is true (no loss of sweetness, on
average), the sampling distribution of x from 10 tasters
is Normal with m = 0 and standard deviation
σ
1

 0 .3 1 6
n
10
The data yielded x= 1.02, which is more than three
standard deviations from
m = 0. This is strong evidence
that the new cola lost sweetness in storage.
If the data yielded x= 0.3, which is less than one
standard deviations from m = 0, there would be no
 that the new cola lost sweetness in storage.
evidence
Essential Statistics
Chapter 13
25
Case Study I
Sweetening Colas
Essential Statistics
Chapter 13
26
The Hypotheses for Means
Null:
H0: m = m0
One
sided alternatives
Ha: m >m0
Ha: m <m0
Two sided alternative
Ha: m m0
Essential Statistics
Chapter 13
27
Case Study I
Sweetening Colas
The null hypothesis is no average sweetness loss
occurs, while the alternative hypothesis (that which we
want to show is likely to be true) is that an average
sweetness loss does occur.
H0: m = 0
Ha: m > 0
This is considered a one-sided test because we are
interested only in determining if the cola lost sweetness
(gaining sweetness is of no consequence in this study).
Essential Statistics
Chapter 13
28
Case Study II
Studying Job Satisfaction
Does the job satisfaction of assembly workers
differ when their work is machine-paced rather
than self-paced? A matched pairs study was
performed on a sample of workers, and each
worker’s satisfaction was assessed after
working in each setting. The response variable
is the difference in satisfaction scores, selfpaced minus machine-paced.
Essential Statistics
Chapter 13
29
Case Study II
Studying Job Satisfaction
The null hypothesis is no average difference in scores in
the population of assembly workers, while the
alternative hypothesis (that which we want to show is
likely to be true) is there is an average difference in
scores in the population of assembly workers.
H0: m = 0
Ha: m ≠ 0
This is considered a two-sided test because we are
interested determining if a difference exists (the
direction of the difference is not of interest in this study).
Essential Statistics
Chapter 13
30
Test Statistic
Testing the Mean of a Normal Population
Take an SRS of size n from a Normal population
with unknown mean m and known standard
deviation s. The test statistic for hypotheses
about the mean (H0: m = m0) of a Normal
distribution is the standardized version of x:
x  μ0
z
σ
n

Essential Statistics
Chapter 13
31
Case Study I
Sweetening Colas
If the null hypothesis of no average sweetness loss is
true, the test statistic would be:
x  μ0
1.02  0
z

 3.23
σ
1
10
n
Because the sample result is more than 3 standard
deviations above the hypothesized mean 0, it gives
strong evidence that the mean sweetness loss is not 0,
but positive.
Essential Statistics
Chapter 13
32
P-value
■ The p-value is the probability or area marked by the
test statistic z (from sample) in the normal distribution
curve when assuming that the null hypothesis is true.
■ The smaller the p-value, the more significant is
the difference between the null hypothesis and
the sample results.
■ The smaller the P-value, the stronger the evidence the
data provide against the null hypothesis.
Essential Statistics
Chapter 13
33
P-value for Testing Means

Ha: m> m0
when Ha contains “greater than” symbol,
Perform a Right- tailed test


P-value is the probability of getting a value as large or
larger than the observed test statistic (z) value.
Ha: m< m0 when Ha contains “less than” symbol,
Perform a Left-tailed test


P-value is the probability of getting a value as small or
smaller than the observed test statistic (z) value.
Ha: mm0
when Ha contains “not equal to” symbol,
Perform a Two-tailed test

P-value is two times the probability of getting a value as
large or larger than the absolute value of the observed test
statistic (z) value.
Essential Statistics
Chapter 13
34
Essential Statistics
Chapter 13
35
Case Study I
Sweetening Colas
For test statistic z = 3.23 and alternative hypothesis
Ha: m > 0, the P-value would be:
P-value = P(Z ≥ 3.23) = 1 – 0.9994 = 0.0006
If H0 is true, there is only a 0.0006 (0.06%) chance that
we would see results at least as extreme as those in the
sample; thus, since we saw results that are unlikely if H0
is true, we therefore have evidence against H0 and in
favor of Ha.
Essential Statistics
Chapter 13
36
Case Study I
Sweetening Colas
Essential Statistics
Chapter 13
37
Case Study II
Studying Job Satisfaction
Suppose job satisfaction scores follow a Normal
distribution with standard deviation s = 60. Data from
18 workers gave a sample mean score of 17. If the null
hypothesis of no average difference in job satisfaction is
true, the test statistic would be:
x  μ0
17  0
z

 1.20
σ
60
n
18
Essential Statistics
Chapter 13
38
Case Study II
Studying Job Satisfaction
For test statistic z = 1.20 and alternative hypothesis
Ha: m ≠ 0, the P-value would be:
P-value = P(Z < -1.20 or Z > 1.20)
= 2 P(Z < -1.20) = 2 P(Z > 1.20)
= (2)(0.1151) = 0.2302
If H0 is true, there is a 0.2302 (23.02%) chance that we
would see results at least as extreme as those in the
sample; thus, since we saw results that are likely if H0 is
true, we therefore do not have good evidence against H0
and in favor of Ha.
Essential Statistics
Chapter 13
39
Case Study II
Studying Job Satisfaction
Essential Statistics
Chapter 13
40
Statistical Significance α




If the P-value is as small as or smaller than the
significance level a (i.e., P-value ≤ a), then we say that
the data give results that are statistically significant at
level a.
“Rejects the null hypothesis" when the p-value is less
than the significance level α
If we choose a = 0.05, we are requiring that the data
give evidence against H0 so strong that it would occur
no more than 5% of the time when H0 is true.
If we choose a = 0.01, we are insisting on stronger
evidence against H0, evidence so strong that it would
occur only 1% of the time when H0 is true.
Essential Statistics
Chapter 13
41
Tests Procedure for a Population Mean
The five steps in carrying out a significance test:
1. State the null and alternative hypotheses.
2. Set the level of significance, usually ɑ = 0.05
3. Take a sample from population and provide
the Z test statistic.
4. Locate the P-value from Table A.
5. Using P-value compare with ɑ to reject null
hypothesis if P-value < ɑ
Essential Statistics
Chapter 13
42
Case Study I
Sweetening Colas
1.
Hypotheses:
2.
Test Statistic:
H 0: m = 0
H a: m > 0
z 
x  μ0
σ

1.02  0
1
n
3.
4.
 3.23
10
P-value: P-value = P(Z > 3.23) = 1 – 0.9994 = 0.0006
Conclusion:
Since the P-value is smaller than a = 0.01, there is very strong
evidence that the new cola loses sweetness on average during
storage at room temperature.
Essential Statistics
Chapter 13
43
Case Study II
Studying Job Satisfaction
1.
Hypotheses:
2.
Test Statistic:
H 0: m = 0
H a: m ≠ 0
z 
x  μ0
σ

17  0
60
n
3.
4.
 1.20
18
P-value: P-value = 2P(Z > 1.20) = (2)(1 – 0.8849) = 0.2302
Conclusion:
Since the P-value is larger than a = 0.10, there is not sufficient
evidence that mean job satisfaction of assembly workers differs
when their work is machine-paced rather than self-paced.
Essential Statistics
Chapter 13
44
Case Study II
Studying Job Satisfaction
A 90% confidence interval for m is:
xz
 σ
n
 17  1.645
60
 17  23.26
18
  6.26 to 40.26
Since m0 = 0 is in this confidence interval, it is plausible that
the true value of m is 0; thus, there is not sufficient evidence
(at a = 0.10) that the mean job satisfaction of assembly
workers differs when their work is machine-paced rather
than self-paced.
Essential Statistics
Chapter 13
45
Interesting Video