Download Confidence Intervals

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

German tank problem wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Confidence Intervals
May 2, 2011
What is a confidence interval?
I
Being 95% certain that the average height of a male student
at SUNY Cortland in inches is 71 ± 3
µ ∈ (68, 74) with confidence level .95
What is a confidence interval?
I
Being 95% certain that the average height of a male student
at SUNY Cortland in inches is 71 ± 3
µ ∈ (68, 74) with confidence level .95
I
This is the central interval of X values that has probability .95.
What is a confidence interval?
Let X be the random variable for the sample mean with a sample
size of n. Then, if x is a sample mean, we can say
1 − α = P(x − EBM < X < x + EBM) = 1 − α
where
I
α is the error probability
What is a confidence interval?
Let X be the random variable for the sample mean with a sample
size of n. Then, if x is a sample mean, we can say
1 − α = P(x − EBM < X < x + EBM) = 1 − α
where
I
α is the error probability
I
1 − α is the confidence level CL
What is a confidence interval?
Let X be the random variable for the sample mean with a sample
size of n. Then, if x is a sample mean, we can say
1 − α = P(x − EBM < X < x + EBM) = 1 − α
where
I
α is the error probability
I
1 − α is the confidence level CL
I
EBM is the error bound for the mean
What is a confidence interval?
Let X be the random variable for the sample mean with a sample
size of n. Then, if x is a sample mean, we can say
1 − α = P(x − EBM < X < x + EBM) = 1 − α
where
I
α is the error probability
I
1 − α is the confidence level CL
I
EBM is the error bound for the mean
What is a confidence interval?
Let X be the random variable for the sample mean with a sample
size of n. Then, if x is a sample mean, we can say
1 − α = P(x − EBM < X < x + EBM) = 1 − α
where
I
α is the error probability
I
1 − α is the confidence level CL
I
EBM is the error bound for the mean
It is easy to sketch a general graph of this situation.
Constructing a Confidence Interval
I
The tolerated error α, and thus the confidence level
CL = 1 − α is determined before obtaining a sample, but how
do we find the error bound for the mean, the EMB.
Constructing a Confidence Interval
I
The tolerated error α, and thus the confidence level
CL = 1 − α is determined before obtaining a sample, but how
do we find the error bound for the mean, the EMB.
I
The CLT tells us that the standardized scores should
approximately follow the standard normal distribution.
Constructing a Confidence Interval
I
The tolerated error α, and thus the confidence level
CL = 1 − α is determined before obtaining a sample, but how
do we find the error bound for the mean, the EMB.
I
The CLT tells us that the standardized scores should
approximately follow the standard normal distribution.
I
So we can make the approximate probability statement
X − µ
P(x − EBM < X < x + EBM) ≈ P Z < σX √
where σX = σ/ n.
Constructing a Confidence Interval
I
The tolerated error α, and thus the confidence level
CL = 1 − α is determined before obtaining a sample, but how
do we find the error bound for the mean, the EMB.
I
The CLT tells us that the standardized scores should
approximately follow the standard normal distribution.
I
So we can make the approximate probability statement
X − µ
P(x − EBM < X < x + EBM) ≈ P Z < σX √
where σX = σ/ n.
I
But we don’t know µ or σ, so a better approximate
distribution for our standardized scores is what called the
Student-t distribution.
The Student-t Distribution
I
The Student-t distribution looks similar to the normal
distribution, but has a bit more probability in the tails.
The Student-t Distribution
I
The Student-t distribution looks similar to the normal
distribution, but has a bit more probability in the tails.
I
The random variable for the distribution is T , and a value of
the random variable is called a t-score
t=
x −µ
√
s/ n
which is similar to a z-score, and measures the number of
standard deviations (from the sampling distribution) a sample
mean x is from the theoretical mean µ.
The Student-t Distribution
I
The Student-t distribution looks similar to the normal
distribution, but has a bit more probability in the tails.
I
The random variable for the distribution is T , and a value of
the random variable is called a t-score
t=
x −µ
√
s/ n
which is similar to a z-score, and measures the number of
standard deviations (from the sampling distribution) a sample
mean x is from the theoretical mean µ.
I
We do not need to assume that we know σ, but we do assume
that the original population is approximately normal.
The Student-t Distribution
I
The Student-t distribution looks similar to the normal
distribution, but has a bit more probability in the tails.
I
The random variable for the distribution is T , and a value of
the random variable is called a t-score
t=
x −µ
√
s/ n
which is similar to a z-score, and measures the number of
standard deviations (from the sampling distribution) a sample
mean x is from the theoretical mean µ.
I
We do not need to assume that we know σ, but we do assume
that the original population is approximately normal.
I
There is one parameter for the Student-t distribution called
the degrees of freedom or df = n − 1, where n is the sample
size.
Degrees of Freedom
I
There is a different student-t distribution for each possible
value of df .
The notation for the student-t distribution is T ∼ tdf .
Degrees of Freedom
I
There is a different student-t distribution for each possible
value of df .
The notation for the student-t distribution is T ∼ tdf .
I
As the degrees of freedom diverges to ∞ the Student-t
distribution approaches the standard normal distribution.
Loosely speaking with df = n − 1, we have
lim tdf = N(0, 1).
n→∞
A Bit of History About the Student-t
In the golden-olden days of yore, we used tables to calculate
probabilities with the student-t and we needed a different table for
each value of df . Due to the proliferation of calculators and
computers we can use any student-t we want.
A Bit of History About the Student-t
In the golden-olden days of yore, we used tables to calculate
probabilities with the student-t and we needed a different table for
each value of df . Due to the proliferation of calculators and
computers we can use any student-t we want.
Still, the table perspective helps us to understand the process. We
can sketch a generic t distribution and the corresponding
probabilities that will allow us to find the EBM.
Confidence Intervals With a Table of Student-t Values
Suppose we are given some data and we want to construct a
confidence interval for x.
1. Calculate x and s.
Confidence Intervals With a Table of Student-t Values
Suppose we are given some data and we want to construct a
confidence interval for x.
1. Calculate x and s.
2. Look up tα/2 in a table with the appropriate df ; where tα/2 is
the t value with probability α/2 in the tail of tdf to its right.
Confidence Intervals With a Table of Student-t Values
Suppose we are given some data and we want to construct a
confidence interval for x.
1. Calculate x and s.
2. Look up tα/2 in a table with the appropriate df ; where tα/2 is
the t value with probability α/2 in the tail of tdf to its right.
3. Calculate the error bound
s
EBM = tα/2 √
n
Confidence Intervals With a Table of Student-t Values
Suppose we are given some data and we want to construct a
confidence interval for x.
1. Calculate x and s.
2. Look up tα/2 in a table with the appropriate df ; where tα/2 is
the t value with probability α/2 in the tail of tdf to its right.
3. Calculate the error bound
s
EBM = tα/2 √
n
4. The confidence interval is (x − EBM, x + EBM).
Qualitative Analysis
Consider the equation
s
EBM = tα/2 √ .
n
From this it follows that
I
Larger s means wider interval for the same CL (confidence
level).
Qualitative Analysis
Consider the equation
s
EBM = tα/2 √ .
n
From this it follows that
I
Larger s means wider interval for the same CL (confidence
level).
I
Larger sample size n means narrower interval for the same CL
(confidence level).
Qualitative Analysis
Consider the equation
s
EBM = tα/2 √ .
n
From this it follows that
I
Larger s means wider interval for the same CL (confidence
level).
I
Larger sample size n means narrower interval for the same CL
(confidence level).
I
Higher confidence means smaller α, which means a wider
interval.
Calculating a Confidence Interval Given Statistics
From a stack of IEEE Spectrum magazines, announcements for 84
upcoming engineering conferences were randomly picked. The
average length of the conferences was 3.94 days, with a standard
deviation of 1.28 days. Assume the underlying population is
normal.
I
Define the Random Variables X and X , in words.
Solution: X = the length of a randomly chosen conference.
X = the average length of stay for a random sample of 84
conferences.
Calculating a Confidence Interval Given Statistics
From a stack of IEEE Spectrum magazines, announcements for 84
upcoming engineering conferences were randomly picked. The
average length of the conferences was 3.94 days, with a standard
deviation of 1.28 days. Assume the underlying population is
normal.
I
Define the Random Variables X and X , in words.
Solution: X = the length of a randomly chosen conference.
X = the average length of stay for a random sample of 84
conferences.
I
Which distribution should you use to study the length of a
randomly chosen conference? Explain your choice.
Solution: We are told to assume that the underlying
distribution, i.e., the distribution of X is normal. Our best
estimates for the mean and standard deviation of X are
x = 3.94 and s = 1.28, so we assume X should be
approximately distributed as N(3.94, 1.28).
Calculating a Confidence Interval Given Statistics
From a stack of IEEE Spectrum magazines, announcements for 84
upcoming engineering conferences were randomly picked. The
average length of the conferences was 3.94 days, with a standard
deviation of 1.28 days. Assume the underlying population is
normal.
I
Which distribution should you use to study the average length
of another sample of randomly chosen conferences? Explain
your choice.
Solution: By the CLT, the distribution of the sample average
X is approximately normal with mean µX equal to the
population mean µX and standard deviation σX equal to
√
σX / n. In our case this means that X is approximately
distributed as N(3.94, .139). However, since µX and σX are
unknown, we are better off assuming that the t-scores follow
the t83 distribution.
Calculating a Confidence Interval Given Statistics
From a stack of IEEE Spectrum magazines, announcements for 84
upcoming engineering conferences were randomly picked. The
average length of the conferences was 3.94 days, with a standard
deviation of 1.28 days. Assume the underlying population is
normal.
I
Construct a 95% confidence interval for the population
average length of engineering conferences.
Solution: We will use a TI-83 or TI-84 to construct a T
interval and complete the following.
Calculating a Confidence Interval Given Statistics
From a stack of IEEE Spectrum magazines, announcements for 84
upcoming engineering conferences were randomly picked. The
average length of the conferences was 3.94 days, with a standard
deviation of 1.28 days. Assume the underlying population is
normal.
I
Construct a 95% confidence interval for the population
average length of engineering conferences.
Solution: We will use a TI-83 or TI-84 to construct a T
interval and complete the following.
I
State the confidence interval.
Calculating a Confidence Interval Given Statistics
From a stack of IEEE Spectrum magazines, announcements for 84
upcoming engineering conferences were randomly picked. The
average length of the conferences was 3.94 days, with a standard
deviation of 1.28 days. Assume the underlying population is
normal.
I
Construct a 95% confidence interval for the population
average length of engineering conferences.
Solution: We will use a TI-83 or TI-84 to construct a T
interval and complete the following.
I
I
State the confidence interval.
Sketch the graph.
Calculating a Confidence Interval Given Statistics
From a stack of IEEE Spectrum magazines, announcements for 84
upcoming engineering conferences were randomly picked. The
average length of the conferences was 3.94 days, with a standard
deviation of 1.28 days. Assume the underlying population is
normal.
I
Construct a 95% confidence interval for the population
average length of engineering conferences.
Solution: We will use a TI-83 or TI-84 to construct a T
interval and complete the following.
I
I
I
State the confidence interval.
Sketch the graph.
Calculate the error bound.
TI-83+ or TI-84 Confidence Interval:
From Statistics:
I
Use the function 8:TInterval in STAT TESTS.
TI-83+ or TI-84 Confidence Interval:
From Statistics:
I
Use the function 8:TInterval in STAT TESTS.
I
Once you are in TESTS, press 8:TInterval and arrow to Stats.
Press ENTER.
TI-83+ or TI-84 Confidence Interval:
From Statistics:
I
Use the function 8:TInterval in STAT TESTS.
I
Once you are in TESTS, press 8:TInterval and arrow to Stats.
Press ENTER.
I
Arrow down and enter the sample mean. Press ENTER.
TI-83+ or TI-84 Confidence Interval:
From Statistics:
I
Use the function 8:TInterval in STAT TESTS.
I
Once you are in TESTS, press 8:TInterval and arrow to Stats.
Press ENTER.
I
Arrow down and enter the sample mean. Press ENTER.
I
Arrow down and enter the sample standard deviation. Press
ENTER.
TI-83+ or TI-84 Confidence Interval:
From Statistics:
I
Use the function 8:TInterval in STAT TESTS.
I
Once you are in TESTS, press 8:TInterval and arrow to Stats.
Press ENTER.
I
Arrow down and enter the sample mean. Press ENTER.
I
Arrow down and enter the sample standard deviation. Press
ENTER.
I
Arrow down and enter the sample size. Press ENTER.
TI-83+ or TI-84 Confidence Interval:
From Statistics:
I
Use the function 8:TInterval in STAT TESTS.
I
Once you are in TESTS, press 8:TInterval and arrow to Stats.
Press ENTER.
I
Arrow down and enter the sample mean. Press ENTER.
I
Arrow down and enter the sample standard deviation. Press
ENTER.
I
Arrow down and enter the sample size. Press ENTER.
I
Enter the C-level.
TI-83+ or TI-84 Confidence Interval:
From Statistics:
I
Use the function 8:TInterval in STAT TESTS.
I
Once you are in TESTS, press 8:TInterval and arrow to Stats.
Press ENTER.
I
Arrow down and enter the sample mean. Press ENTER.
I
Arrow down and enter the sample standard deviation. Press
ENTER.
I
Arrow down and enter the sample size. Press ENTER.
I
Enter the C-level.
I
Arrow down to Calculate and press ENTER.
Using the Student-t Distribution
In this example
I
n = 84 is the sample size.
Using the Student-t Distribution
In this example
I
n = 84 is the sample size.
I
x = 3.94 is the sample mean.
Using the Student-t Distribution
In this example
I
n = 84 is the sample size.
I
x = 3.94 is the sample mean.
I
s = 1.28 is the sample standard deviation.
Using the Student-t Distribution
In this example
I
n = 84 is the sample size.
I
x = 3.94 is the sample mean.
I
s = 1.28 is the sample standard deviation.
I
α = .05 is the probability that that the confidence interval
does not contain the true mean (proportion). OR α is the
probability that the random variable X will take on a value
outside the interval.
Using the Student-t Distribution
In this example
I
n = 84 is the sample size.
I
x = 3.94 is the sample mean.
I
s = 1.28 is the sample standard deviation.
I
α = .05 is the probability that that the confidence interval
does not contain the true mean (proportion). OR α is the
probability that the random variable X will take on a value
outside the interval.
I
CL = 1 − .05 = 95 is the Confidence Level, i.e., the
probability that the true value lies inside the interval.
Using the Student-t Distribution
In this example
I
n = 84 is the sample size.
I
x = 3.94 is the sample mean.
I
s = 1.28 is the sample standard deviation.
I
α = .05 is the probability that that the confidence interval
does not contain the true mean (proportion). OR α is the
probability that the random variable X will take on a value
outside the interval.
I
CL = 1 − .05 = 95 is the Confidence Level, i.e., the
probability that the true value lies inside the interval.
I
(3.66, 4.22) = (3.94 − .28, 3.94 + .28) =
(3.94 − EBM, 3.94 + EBM) is the Confidence Interval.
Using the Student-t Distribution
In this example
I
n = 84 is the sample size.
I
x = 3.94 is the sample mean.
I
s = 1.28 is the sample standard deviation.
I
α = .05 is the probability that that the confidence interval
does not contain the true mean (proportion). OR α is the
probability that the random variable X will take on a value
outside the interval.
I
CL = 1 − .05 = 95 is the Confidence Level, i.e., the
probability that the true value lies inside the interval.
I
(3.66, 4.22) = (3.94 − .28, 3.94 + .28) =
(3.94 − EBM, 3.94 + EBM) is the Confidence Interval.
√
EBM = tα/2 · s/ n = .28 is the Error Bound for the Mean.
I
Calculating a Confidence Interval Given Data
e.g. Suppose the data 3.1; 3.3; 3.2; 3.4; 3.6; 3.3 were drawn at
random from some population. Find a 90% confidence interval for
the population mean µ.
TI-83+ or TI-84 Confidence Interval:
From Data:
I
Use the function 8:TInterval in STAT TESTS.
TI-83+ or TI-84 Confidence Interval:
From Data:
I
Use the function 8:TInterval in STAT TESTS.
I
Once you are in TESTS, press 8:TInterval and arrow to Data.
Press ENTER.
TI-83+ or TI-84 Confidence Interval:
From Data:
I
Use the function 8:TInterval in STAT TESTS.
I
Once you are in TESTS, press 8:TInterval and arrow to Data.
Press ENTER.
I
Arrow down and enter the list name where you put the data
for List.
TI-83+ or TI-84 Confidence Interval:
From Data:
I
Use the function 8:TInterval in STAT TESTS.
I
Once you are in TESTS, press 8:TInterval and arrow to Data.
Press ENTER.
I
Arrow down and enter the list name where you put the data
for List.
I
Enter 1 for Freq. if the data is in a single list or enter the list
name of the frequencies associated to the data values in the
first list.
TI-83+ or TI-84 Confidence Interval:
From Data:
I
Use the function 8:TInterval in STAT TESTS.
I
Once you are in TESTS, press 8:TInterval and arrow to Data.
Press ENTER.
I
Arrow down and enter the list name where you put the data
for List.
I
Enter 1 for Freq. if the data is in a single list or enter the list
name of the frequencies associated to the data values in the
first list.
I
Enter the C-level.
TI-83+ or TI-84 Confidence Interval:
From Data:
I
Use the function 8:TInterval in STAT TESTS.
I
Once you are in TESTS, press 8:TInterval and arrow to Data.
Press ENTER.
I
Arrow down and enter the list name where you put the data
for List.
I
Enter 1 for Freq. if the data is in a single list or enter the list
name of the frequencies associated to the data values in the
first list.
I
Enter the C-level.
I
Arrow down to Calculate and press ENTER.
Calculating a Confidence Interval Given Data
e.g. Suppose the data 3.1; 3.3; 3.2; 3.4; 3.6; 3.3 were drawn at
random from some population. Find a 90% confidence interval for
the population mean µ.
Solution: The 90% confidence interval for the population mean µ
is (3.175, 3.458), x = 3.317, and EBM = .142.
Modifications for the Sample Proportion
Let P 0 be the random variable that measures a sample proportion
p 0 . Then the Error Bound for the Proportion is
r
p 0 (1 − p 0 )
EBP = zα/2
n
Modifications for the Sample Proportion
Let P 0 be the random variable that measures a sample proportion
p 0 . Then the Error Bound for the Proportion is
r
p 0 (1 − p 0 )
EBP = zα/2
n
e.g. If n = 1000 and p 0 = .52, then
r
.52(1 − .52)
EBP = 1.96
= .03
1000
Modifications for the Sample Proportion
Let P 0 be the random variable that measures a sample proportion
p 0 . Then the Error Bound for the Proportion is
r
p 0 (1 − p 0 )
EBP = zα/2
n
e.g. If n = 1000 and p 0 = .52, then
r
.52(1 − .52)
EBP = 1.96
= .03
1000
The “variance” p 0 (1 − p 0 ) comes from the variance of the binomial
distribution. The proportion random variable is the random
variable that counts the average proportion of successes.
Margin of Error In Pictures
Figure: Source: http://en.wikipedia.org/wiki/Margin_of_error
An Example Confidence Interval
Let P 0 be the random variable that measures the percent of a
sample that says they will vote for Barack Obama. The Gallup poll
from October 27 – November 2 found p 0 = .52. To the best of my
knowledge, this sample mean was based on a sample of size
n = 1000 and the margin of error is ± 3% at 95% confidence.
An Example Confidence Interval
Let P 0 be the random variable that measures the percent of a
sample that says they will vote for Barack Obama. The Gallup poll
from October 27 – November 2 found p 0 = .52. To the best of my
knowledge, this sample mean was based on a sample of size
n = 1000 and the margin of error is ± 3% at 95% confidence.
This means
P(49 < P 0 < 55) = .95
An Example Confidence Interval
Let P 0 be the random variable that measures the percent of a
sample that says they will vote for Barack Obama. The Gallup poll
from October 27 – November 2 found p 0 = .52. To the best of my
knowledge, this sample mean was based on a sample of size
n = 1000 and the margin of error is ± 3% at 95% confidence.
This means
P(49 < P 0 < 55) = .95
The nation voted and Obama earned 52% of the vote. Should we
be surprised?
Learning the Terms by Example
In the last example
I
n = 1000 is the sample size.
Learning the Terms by Example
In the last example
I
n = 1000 is the sample size.
I
p 0 = .52 is the sample proportion.
Learning the Terms by Example
In the last example
I
n = 1000 is the sample size.
I
p 0 = .52 is the sample proportion.
I
EBP = 3 is the Error Bound for the Proportion.
Learning the Terms by Example
In the last example
I
n = 1000 is the sample size.
I
p 0 = .52 is the sample proportion.
I
EBP = 3 is the Error Bound for the Proportion.
I
(52 − 3, 52 + 3) = (49, 55) is the Confidence Interval.
Learning the Terms by Example
In the last example
I
n = 1000 is the sample size.
I
p 0 = .52 is the sample proportion.
I
EBP = 3 is the Error Bound for the Proportion.
I
(52 − 3, 52 + 3) = (49, 55) is the Confidence Interval.
I
α = .05 is the probability that that the confidence interval
does not contain the true proportion. OR α is the probability
that the random variable will take on a value outside the
interval.
Learning the Terms by Example
In the last example
I
n = 1000 is the sample size.
I
p 0 = .52 is the sample proportion.
I
EBP = 3 is the Error Bound for the Proportion.
I
(52 − 3, 52 + 3) = (49, 55) is the Confidence Interval.
I
α = .05 is the probability that that the confidence interval
does not contain the true proportion. OR α is the probability
that the random variable will take on a value outside the
interval.
I
CL = 1 − .05 = 95 is the Confidence Level, i.e., the
probability that the true value lies inside the interval.
TI-83+ and TI-84 Proportion Confidence Interval
I
Press STAT and arrow over to TESTS.
TI-83+ and TI-84 Proportion Confidence Interval
I
Press STAT and arrow over to TESTS.
I
Arrow down to A:PropZint. Press ENTER.
TI-83+ and TI-84 Proportion Confidence Interval
I
Press STAT and arrow over to TESTS.
I
Arrow down to A:PropZint. Press ENTER.
I
Arrow down and enter the number of successes for x. Press
ENTER.
TI-83+ and TI-84 Proportion Confidence Interval
I
Press STAT and arrow over to TESTS.
I
Arrow down to A:PropZint. Press ENTER.
I
Arrow down and enter the number of successes for x. Press
ENTER.
I
Arrow down and enter the sample size for n. Press ENTER.
TI-83+ and TI-84 Proportion Confidence Interval
I
Press STAT and arrow over to TESTS.
I
Arrow down to A:PropZint. Press ENTER.
I
Arrow down and enter the number of successes for x. Press
ENTER.
I
Arrow down and enter the sample size for n. Press ENTER.
I
Arrow down and enter the C-Level. Press ENTER.
TI-83+ and TI-84 Proportion Confidence Interval
I
Press STAT and arrow over to TESTS.
I
Arrow down to A:PropZint. Press ENTER.
I
Arrow down and enter the number of successes for x. Press
ENTER.
I
Arrow down and enter the sample size for n. Press ENTER.
I
Arrow down and enter the C-Level. Press ENTER.
I
Arrow down to Calculate. Press ENTER.
TI-83+ and TI-84 Proportion Confidence Interval
I
Press STAT and arrow over to TESTS.
I
Arrow down to A:PropZint. Press ENTER.
I
Arrow down and enter the number of successes for x. Press
ENTER.
I
Arrow down and enter the sample size for n. Press ENTER.
I
Arrow down and enter the C-Level. Press ENTER.
I
Arrow down to Calculate. Press ENTER.
TI-83+ and TI-84 Proportion Confidence Interval
I
Press STAT and arrow over to TESTS.
I
Arrow down to A:PropZint. Press ENTER.
I
Arrow down and enter the number of successes for x. Press
ENTER.
I
Arrow down and enter the sample size for n. Press ENTER.
I
Arrow down and enter the C-Level. Press ENTER.
I
Arrow down to Calculate. Press ENTER.
From the previous example we get the interval (.489, .551) if we
assume p 0 = .52 means x = 520.