Download math.tntech.edu

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Confidence Interval
A confidence interval (or interval
estimate) is a range (or an interval)
of values used to estimate the true
value of a population parameter. A
confidence interval is sometimes
abbreviated as CI.
Confidence level
A confidence level is the probability 1 – α (often
expressed as the equivalent percentage value)
that the confidence interval actually does contain
the population parameter, assuming that the
estimation process is repeated a large number of
times. (The value α is later called significance
level.)
Most common choices are 90%, 95%, or 99%.
(α = 10%), (α = 5%), (α = 1%)
Student t Distribution

Methods for estimating a population mean is
discussed when the population standard
deviation is not known. With the standard
deviation unknown, we use the Student t
distribution assuming that (I) data come from
a normal distribution, or (II) data size is at
least 30.
Student t Distribution
If the distribution of a population is essentially
normal, then the distribution of
x
t
s
n
is a Student t Distribution for all samples of
size n. It is often referred to as a t distribution
and is used to find critical values denoted by
tα .
Student t Distributions for
n = 3 and n = 12
Figure 7-5
Degrees of freedom
The number of degrees of freedom for a
collection of sample data is the number of
sample values that can vary after certain
restrictions have been imposed on all data
values. The degree of freedom is often
abbreviated df.
degrees of freedom = n – 1
in this section.
Critical Values
The value separating the right-tail region is
commonly denoted by tα and is referred
to as a critical value because it is on the
borderline separating values from a
specified distribution that are likely to
occur from those that are unlikely to
occur.
Important Properties of the
Student t Distribution
1. The Student t distribution is different for different sample sizes
(see the previous slide, for the cases n = 3 and n = 12).
2. The Student t distribution has the same general symmetric bell
shape as the standard normal distribution but it reflects the
greater variability (with wider distributions) that is expected
with small samples.
3. The Student t distribution has a mean of t = 0 (just as the
standard normal distribution has a mean of z = 0).
4. The standard deviation of the Student t distribution varies with
the sample size and is greater than 1 (unlike the standard
normal distribution, which has a σ = 1).
5. As the sample size n gets larger, the Student t distribution gets
closer to the normal distribution.

(1-α)% Confidence Interval for a
Population Mean (σ Not Known)

x

E


x

E
where
Et/2
s
n
tα/2 can be found in t-distribution table
with df = n – 1
Margin of Error E for a population
mean (With σ Not Known)
Et/2
s
n
where tα/2 has n – 1 degrees of freedom.


Procedure for Constructing a
Confidence Interval for a Population
Mean (With σ Unknown)
1. Verify that the requirements are satisfied.
2. Using n – 1 degrees of freedom, find the critical value tα/2
that corresponds to the desired confidence level.
3. Evaluate the margin of error
Et/2
s
n
4. Substitute those values in the general format for the
confidence interval:

x

E


x

E
5. Round the resulting confidence interval limits.
Example:
A common claim is that garlic lowers cholesterol
levels. In a test of the effectiveness of garlic, 49
subjects were treated with doses of raw garlic, and
their cholesterol levels were measured before and
after the treatment. The changes in their levels of
cholesterol (in mg/dL) have a mean of 0.4 and a
standard deviation of 21.0. Use the sample statistics of
n = 49, x = 0.4 and s = 21.0 to construct a 95%
confidence interval estimate of the mean net change in
LDL cholesterol after the garlic treatment. What does
the confidence interval suggest about the
effectiveness of garlic in reducing cholesterol?
Example:
Requirements are satisfied: independent
sample data with n = 49 (i.e., n > 30).
95% implies α = 0.05.
With n = 49, the df = 49 – 1 = 48
Closest df is 50, two tails, so tα/2 = 2.009
Using tα/2 = 2.009, s = 21.0 and n = 49 the
margin of error is:

2
1
.
0
E

t
2
.
0
0
9
 
6
.
0
2
7
 
2
n
4
9
Example:
Construct the confidence
interval:
x

E

x

E

0
.
4

6
.
0
2
7



0
.
4

6
.
0
2
7

5
.
6



6
.
4
Because the confidence interval limits contain the value
of 0, it is very possible that the mean of the changes in
cholesterol is equal to 0, suggesting that the garlic
treatment did not affect the cholesterol levels. It does
not appear that the garlic treatment is effective in
lowering cholesterol.
Inference on two groups
Confidence interval for the difference
between two groups uses sample data from
two independent samples, and tests
hypotheses made about two population
means μ1 and μ2, or simply shows confidence
interval estimates of the difference μ1μ2
between two population means.
Example:
A headline in USA Today proclaimed that “Men,
women are equal talkers.” That headline
referred to a study of the numbers of words
that samples of men and women spoke in a
day. Construct a 95% confidence interval
estimate of the difference between the mean
number of words spoken by men and the mean
number of words spoken by women.
Student t Distribution
If a distribution of each group is essentially
normal, then the distribution of
xx



t
1
2
2
1
1
2
2
2
s s

n
n
1
2
approximately follows a Student t Distribution.
Confidence Interval Estimate of
μ1μ2 : Independent Samples
-
-
-
( x1 – x2 ) – E < ( μ1 – μ2 ) < ( x1
–
-
x2 ) + E
where
E  t
where df = smaller than n1 – 1 and n2 – 1
Two population standard deviations are not
known and not assumed to be equal,
independent samples.
Find the margin of Error, E; use t/2 = 1.967
Construct the confidence interval use E = 1590.8
and x

1
5
,
6
6
8
.
5
a
n
d
x

1
6
,
2
1
5
.
0
.
1
2
–2137.8 < (
μ1 – μ2 ) < 1043.8
Standard Error
The standard error of estimate, denoted by
SE is a measure of the deviation (or
standard deviation) between the parameter
θ of interest and the estimate that is
obtained from the observed sample.
Confidence Interval
The confidence interval, given a confidence
level (1– α), is an interval which includes the
parameter θ of interest. It is constructed
^
from the estimate θ and the standard error
SE.
^
^
θ – t2 SE < θ < θ + t2 SE
Standard Error of Estimate
SE0
=
1
n
-
2
x
+
 (x –
2
x)
and
SE2
=
 (y – ^ / (n – 2)
y)
2
 (x – x)
-
2
Prediction Interval for parameters
^
^
b0 - t2 SE0 < b0 < b0 + t2
SE0
^
^
b1 - t2 SE1 < b1 < b1 + t2
SE1
b0 and b1 represent the ture values for
coefficents, and t2 has n
freedom
– 2 degrees of