• Study Resource
• Explore

Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

German tank problem wikipedia, lookup

History of statistics wikipedia, lookup

Student's t-test wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Misuse of statistics wikipedia, lookup

Psychometrics wikipedia, lookup

Transcript
```In this chapter we consider questions
about population means similar to those
we have studied about proportions in the
last couple of chapters.
We would now like to construct confidence intervals
and conduct hypothesis tests for population means
(similar to what we did for population proportions in
the last couple of chapters).
We will mimic the procedures used for proportions,
but we will suddenly find ourselves unable to use the
z-curve anymore.
Recall that for the distribution of sample means (i.e. the
distribution of x ) we had:
s (x) =
s
n
where  is the population’s standard deviation
However, we typically will not know the value of  .
Thus we use s (the sample’s standard deviation) instead to get
the
of the distribution of x .
s ( x ) » SE ( x ) =
s
n
When we use this “distribution” of x , with standard
error instead of standard deviation, and calculate “z –
scores”, the distribution shape is no longer the z curve
(even if n is more than 30).
Rather, we get a curve similar to the z-curve. It is
centered at 0, unimodal and symmetric, but it is taller
and thinner than the z-curve.
We end up on one of the
.
Each one of these curves is defined by a parameter called
the “
” or df. For every positive
number, there is a t-curve for that many degrees of
freedom.
For problems involving only one population mean, we
will use the t-curve with df = n – 1 where n is the sample
size.
We can use these t distributions for problems involving
means if the population is known to be normal, the
sample size is large (C.L.T.), or if the sample data has a
fairly linear
(this can be created in the TI)
Put the data in L1.
Press
,
and while in “STAT PLOT”, choose
Plot1, turn it on and select the last plot option.
Then press
to see the plot.
If the plot is approximately linear, then the sample is
consistent with having come from a normal population.
We have a properly selected random sample of size n
that has mean x and standard deviation s.
The population is known to be normal, the sample size is
large (C.L.T.), or the sample data produces a fairly linear
normal plot.
m is in the interval:
æ s ö
x ±t ç ÷
è nø
*
where t* is the value from the t-curve with df = n – 1
corresponding to the level of confidence
(this is the equivalent of the z* values used in the
confidence intervals for a population proportion)
This formula is problematic in practice, because there are
different values of t* for each different t-curve (too many
to memorize). So we will, in practice, construct these
intervals using technology.
Go into the STAT menu, over to TESTS, and select
TInterval…
If you have the actual sample data in L1, choose “Data”,
if you have the summary statistics for the sample
( x and s) then choose “Stats” and enter the values.
A father is concerned that his teenage son is watching too
much television each day, since his son watches an average of
2 hours per day. His son says that his TV habits are no
different than those of his friends. Since this father has taken a
stats class, he knows that he can actually test to see whether or
not his son is watching more TV than his peers. The father
collects a random sample of television watching times from
boys at his son’s high school and gets the following data:
1.9 2.3 2.2 1.9 1.6 2.6 1.4 2.0 2.0 2.2
Construct and interpret a 97% confidence interval based on
this data. Be sure to justify the method used.
In a random sample of 50 of a new brand of battery, the
average lifespan was 952 hours with a standard deviation
of 18 hours. Construct and interpret a 98% confidence
interval based on this sample.
As we did for proportions, suppose we want to control the
width of the confidence interval (likely make it narrower)
while at the same time having a fairly high level of confidence.
The margin of error for these confidence intervals is:
æ s ö
ME = t ç ÷
è nø
*
Solving this for n gives:
æ t *s ö
n =ç
÷
è ME ø
2
æ t *s ö
n =ç
÷
è ME ø
2
Now, this formula has 2 problems. We cannot know with
which t-curve we are working without knowing the sample
size (because df = n – 1) and we do not have a value for s until
we have taken a sample. So, to determine n we need to know
t* and s, but to know t* and s we need to have a sample.
We “fix” this by replacing the t* values with the z* values from
our previous confidence intervals (this gives a larger value of
n than is likely needed) and we use a value for s that comes
from previous studies about the variable of interest.
Recall, z* = 1.645 for 90% confidence, 1.96 for 95%
confidence, and 2.33 for 98% confidence.
We then have:
æ z*s ö
n ³ç
÷
ME
è
ø
2
Suppose we wish to construct a 98% confidence interval
for average body temperature of people testing positive
for a new strain of influenza within 0.14. Suppose also
that previous studies support that the standard deviation
of human body temperature is 0.45. How many
subjects must be tested?
Hypotheses
H0: μ = #
Ha: one of (a) μ > # (upper tail test)
(b) μ < # (lower tail test)
(c) μ ≠ # (two-tail test)
Test Statistic
x -#
t=
s
n
where “#” is the hypothesized value of μ
P-value
Depends on the alternative hypothesis:
(a) tcdf (t, ¥, df )
(upper tail test)
(b) tcdf (-¥, t, df )
(lower tail test)
(c) 2 *tcdf ( t , ¥, df )
(two-tailed test)
where df = n – 1
Validity/Assumptions
• We have a properly collected, random sample
• Sample size is not more than 10% of the population
One of:
• population known to be normal
• large sample size (C.L.T.): n ≥ 30 or
• approximately linear normal plot of sample data
Find the P-value for each of the following, assuming
samples are from a normal population.
(a) H0:  = 100
(b) H0:  = 58
Ha:  < 100
Ha:  ≠ 58
t = -1.48
t = -2.64
n = 15
n = 10
The posted speed limit on a certain residential road is
30mph. The residents believe that drivers are speeding
on this road on average. They observe 20 randomly
selected drivers on this road and find the mean speed to
be 31.8mph with a standard deviation of 4.2mph. Is the
residents’ belief accurate?
We can use the T-Test in the TI calculator to do this as well.
Go into the STAT menu, choose TESTS, then choose T-Test…
If you have the actual sample data in L1, choose “Data”, if you
have the summary statistics for the sample
then choose “Stats”.
Example 5 would like like this:
A certain university advertises that the GPA of all of its
Business students is 3.6. An employee at another institution
believes the value is actually lower than this. He randomly
gathers the GPA of 6 Business students from this university
(results below). Test his claim at the α = 0.05 level.
3.4
3.6
3.3
3.6
3.5
3.5
We can replace the test statistic and P-value with a
confidence interval for μ calculated from the sample using
the TInterval function in the calculator.
If the hypothesized value is not in the interval, then we
reject H0
If the hypothesized value is in the interval, then we fail to
reject H0
All other “pieces” of the hypothesis test are the same.
The posted speed limit on a certain residential road is
30mph. The residents believe that drivers are speeding
on this road on average. They observe 20 randomly
selected drivers on this road and find the mean speed to
be 31.8mph with a standard deviation of 4.2mph. Is the
residents’ belief accurate? Test the relevant hypotheses
using a 90% confidence interval.
```
Related documents