• Study Resource
• Explore

Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Student's t-test wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Transcript
```Chapter 12
12.1
Population
Sample
Inference
Statistic
Parameter
• Identify the parameter to be estimated or tested.
• Specify the parameter’s estimator and its sampling
distribution.
• Derive the interval estimator and test statistic.
12.2
Population
Sample
Inference
Statistic
Parameter
We will develop techniques to estimate and test three
population parameters:
Population Mean
Population Variance
Population Proportion p
12.3
Inference With Variance Unknown…
Previously, we looked at estimating and testing the
population mean when the population standard deviation ( )
was known or given:
But how often do we know the actual population variance?
Instead, we use the Student t-statistic, given by:
12.4
Inference With Variance Unknown…
When
is unknown, we use its point estimator s
and the z-statistic is replaced by the the t-statistic, where the
number of “degrees of freedom” , is n–1.
12.5
Testing
when
is unknown…
When the population standard deviation is unknown and the
population is normal, the test statistic for testing hypotheses
which is Student t distributed with = n–1 degrees of
freedom. The confidence interval estimator of
is given by:
12.6
Example 12.1…
Will new workers achieve 90% of the level of experienced
workers within one week of being hired and trained?
Experienced workers can process 500 packages/hour, thus if
our conjecture is correct, we expect new workers to be able
to process .90(500) = 450 packages per hour.
Given the data, is this the case?
12.7
Example 12.1…
IDENTIFY
Our objective is to describe the population of the numbers of
packages processed in 1 hour by new workers, that is we
want to know whether the new workers’ productivity is more
than 90% of that of experienced workers. Thus we have:
H1:
> 450
Therefore we set our usual null hypothesis to:
H0:
= 450
12.8
Example 12.1…
COMPUTE
Our test statistic is:
With n=50 data points, we have n–1=49 degrees of freedom.
Our hypothesis under question is:
H1:
> 450
Our rejection region becomes:
Thus we will reject the null hypothesis in favor of the
alternative if our calculated test static falls in this region.
12.9
Example 12.1…
From the data, we calculate
COMPUTE
= 460.38, s =38.83 and thus:
Since
we reject H0 in favor of H1, that is, there is sufficient
evidence to conclude that the new workers are producing at
more than 90% of the average of experienced workers.
12.10
Example 12.1…
COMPUTE
:
:
rejection
region
Alternatively, we can use t-test:Mean from
Tools > Data Analysis Plus in Excel…
12.11
Example 12.1…
COMPUTE
p-value
In addition to looking at the computed t-statistic and the
critical value of t (one tail), we could look at the p-value
(0.0323) and see that it is “small” (~3%), so again, we reject
the null hypothesis in favor of the alternative…
12.12
Example 12.2…
IDENTIFY
Can we estimate the return on investment for companies that
won quality awards?
We have are given a random sample of n = 83 such
companies. We want to construct a 95% confidence interval
for the mean return, i.e. what is:
??
12.13
Example 12.2…
COMPUTE
From the data, we calculate:
For this term
and so:
12.14
Example 12.2…
INTERPRET
We are 95% confident that the population mean,
, i.e. the
mean return of all publicly traded companies that win quality
awards, lies between 13.20% and 16.84%
Tools > Data Analysis Plus > t-Estimate: Mean
is an alternative to the
manual calculation…
12.15
Check Requisite Conditions…
The Student t distribution is robust, which means that if the
population is nonnormal, the results of the t-test and
confidence interval estimate are still valid provided that the
population is “not extremely nonnormal”.
To check this requirement, draw a histogram of the data and
see how “bell shaped” the resulting figure is. If a histogram
is extremely skewed (say in the case of an exponential
distribution), that could be considered “extremely
nonnormal” and hence t-statistics would be not be valid in
this case.
12.16
Estimating Totals of Finite Populations…
Large populations are defined as “populations that are at
least 20 times the sample size”
We can use the confidence interval estimator of a mean to
produce a confidence interval estimator of the population
total:
Where N is the size of the finite population.
12.17
Estimating Totals of Finite Populations…
For example, a sample of 500 households (in a city of 1
million households) reveals a 95% confidence interval
estimate that the household mean spent on Halloween candy
lies between \$20 & \$30.
We can estimate the total amount spent in the city by
multiplying these lower and upper confidence limits by the
total population:
Thus we estimate that the total amount spent on Halloween
in the city lies between \$20 million and \$30 million.
12.18
Identifying Factors…
Factors that identify the t-test and estimator of
:
12.19
If we are interested in drawing inferences about a
population’s variability, the parameter we need to
investigate is the population variance:
The sample variance (s2) is an unbiased, consistent and
efficient point estimator for
. Moreover,
the statistic,
, has a chi-squared distribution,
with n–1 degrees of freedom.
12.20
Testing & Estimating Population Variance
The test statistic used to test hypotheses about
(which is chi-squared with
is:
= n–1 degrees of freedom).
12.21
Testing & Estimating Population Variance
Combining this statistic:
With the probability statement:
Yields the confidence interval estimator for
lower confidence limit
:
upper confidence limit
12.22
Example 12.3…
IDENTIFY
Consider a container filling machine. Management wants a
machine to fill 1 liter (1,000 cc’s) so that that variance of the
fills is less than 1 cc2. A random sample of n=25 1 liter fills
were taken. Does the machine perform as it should at the 5%
significance level?
Variance is less than 1 cc2
We want to show that:
H1:
<1
(so our null hypothesis becomes: H0:
= 1). We will use
this test statistic:
12.23
Example 12.3…
COMPUTE
Since our alternative hypothesis is phrased as:
H1:
<1
We will reject H0 in favor of H1 if our test statistic falls into
this rejection region:
We computer the sample variance to be: s2=.8088
And thus our test statistic takes on this value…
12.24
Example 12.3…
INTERPRET
Since:
There is not enough evidence to infer that the claim is true.
Excel output can also be used for this test…
compare
12.25
Example 12.4…
As we saw, we cannot reject the null hypothesis in favor of
the alternative. That is, there is not enough evidence to infer
that the claim is true.
Note: the result does not say that the variance is greater than
1, rather it merely states that we are unable to show that the
variance is less than 1.
We could estimate (at 99% confidence say) the variance of
the fills…
12.26
Example 12.4…
COMPUTE
In order to create a confidence interval estimate of the
variance, we need these formulae:
lower confidence limit
upper confidence limit
we know (n–1)s2 = 19.41 from our previous calculation, and
we have from Table 5 in Appendix B:
12.27
Example 12.4…
COMPUTE
Thus the 99% confidence interval estimate is:
That is, the variance of fills lies between .426 and 1.963 cc2.
12.28
Identifying Factors…
Factors that identify the chi-squared test and estimator of
:
12.29
Inference: Population Proportion…
When data are nominal, we count the number of occurrences
of each value and calculate proportions. Thus, the parameter
of interest in describing a population of nominal data is the
population proportion p.
This parameter was based on the binomial experiment.
Recall the use of this statistic:
where p-hat ( ) is the sample proportion: x successes in a
sample size of n items.
12.30
Inference: Population Proportion…
When np and n(1–p) are both greater than 5, the sampling
distribution of is approximately normal with
mean:
standard deviation:
Hence:
12.31
Inference: Population Proportion…
Test statistic for p:
The confidence interval estimator for p is given by:
(both of which require that np>5 and n(1–p)>5)
12.32
Example 12.5…
IDENTIFY
At an exit poll, voters are asked by a certain network if they
voted Democrat (code=1) or Republican (code=2). Based on
their small sample, can the network conclude that the
Republican candidate will win the vote?
That is:
H1: p > .50
And hence our null hypothesis becomes:
H0: p = .50
12.33
Example 12.5…
COMPUTE
Since our research hypothesis is:
H1: p > .50
our rejection region becomes:
Looking at the data, we count 407 (of 765) votes for code=2.
Hence, we calculate our test statistic as follows…
12.34
Example 12.5…
INTERPRET
Since:
…we reject H0 in favor of H1, that is, there is enough
evidence to believe that the Republicans win the vote.
Likewise from Excel:
compare these…
…or look at p-value
12.35
Selecting the Sample Size…
The confidence interval estimator for a population
proportion is:
Thus the (half) width of the interval is:
Solving for n, we have:
12.36
Selecting the Sample Size…
For example, we want to know how many customers to
survey in order to estimate the proportion of customers who
prefer our brand to within .03 (with 95% confidence).
I.e. our confidence interval after surveying will be ± .03,
that means W=.03
Uh Oh. Since we haven’t taken
a sample yet, we don’t have
this sample proportion…
Substituting into the equation…
12.37
Selecting the Sample Size…
Two methods – in each case we choose a value for
solve the equation for n.
then
Method 1 : no knowledge of even a rough value of
is a ‘worst case scenario’ so we substitute = .50
. This
Method 2 : we have some idea about the value of . This is
a better scenario and we substitute in our estimated value.
12.38
Selecting the Sample Size…
Method 1 : no knowledge of value of
Method 2 : some idea about a possible
, use 50%:
value, say 20%:
Thus, we can sample fewer people if we already have a
reasonable estimate of the population proportion before
starting.
12.39
Estimating Totals for Large Populations…
In much the same way as we saw earlier, when a population
is large and finite we can estimate the total number of
successes in the population by taking the product of the size
of the population (N) and the confidence interval estimator:
The Nielsen Ratings (used to measure TV audiences) uses
this technique. Results from a small sample audience (say
2,000 viewers) is extrapolated to the total number of TV sets
(say 100 million)…
12.40
Nielsen Ratings Example…
COMPUTE
Problem: describe the population of television shows
watched by viewers across the country (population), by
examining the results from 2,000 viewers (sample).
We take these values and multiply them by N=100 million to
estimate that between 9.9 million and 12.7 million viewers
are watching the “Tonight Show”.
12.41
Identifying Factors…
Factors that identify the z-test and interval estimator of p:
12.42
Flowchart of Techniques…
Describe a Population
Data Type?
Interval
Nominal
Type of descriptive measurement?
z test &
estimator of p
Central Location
Variability
t test &
estimator of
u.
X2 test &
estimator ofs2