Download Chapter 11- Confidence Intervals for Univariate Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia, lookup

History of statistics wikipedia, lookup

Transcript
Chapter 11- Confidence
Intervals for Univariate Data
Math 22
Introductory Statistics
Introduction into Estimation
Point Estimate – the value of a
sample statistic used to estimate the
population parameter.
 Interval Estimate – an interval
bounded by two values calculated from
the sample data, used to estimate a
population parameter.

Introduction into Estimation



Level of Confidence – The probability that
the sample to be selected yields an interval
that includes the parameter being estimated.
Confidence Interval – An interval estimate
with a specified level of confidence.
Assumption – a condition that needs to
exist in order to properly apply a statistical
procedure to be valid.
Confidence Interval
A confidence interval for a population
parameter is an interval of possible
values for the unknown parameter.
 The interval is computed in such a way
that we have a high degree of
confidence that the interval contains the
true value of a parameter.

Confidence Level
The confidence, stated as a percent, is
the confidence level.
 In practice, estimates of unknown
parameters are given in the form:
estimate  margin of error

Developing a Confidence
Interval
Three determinations must be made to
develop a Confidence Interval:
 A good point estimator of the
parameter.
 The sampling dist. (or approximate
sampling dist.) of the point estimator.
 The desired confidence level, usually
stated as a percentage.
Standard Error of a Statistic

The standard deviation of its sampling
dist. when all unknown population
parameters have been estimated.
Interpreting Confidence
Intervals
Q:
A:
What does a 99% C.I. really mean?
A 99% C.I. means that of 100
different intervals obtained from 100
different samples, it is likely 99 of those
intervals will contain the true parameter
and one will not.
Validity and Precision of
Confidence Levels
Validity - Measured by the confidence
level, which is the probability that the
interval will contain the true value of
the parameter.
 Precision - measured by the length of
the interval

Confidence Interval for the
Population Proportion
pˆ (1  pˆ )
pˆ  z / 2
n
pˆ - sample proportion
Reducing the Margin of Error
Two ways to reduce the margin of
error:
 Decrease z
(Problem - Reduces Validity)
 Increase n
(No Problem)
Calculating Sample Size for
Proportions
Margin of Error (ME)  z / 2
2
pˆ (1  pˆ )
n
 z / 2 
ˆ (1  p
ˆ)
n
 p
 ME 
Estimation of the Mean When
the Standard Deviation is
Known

When the population standard deviation
is known, a (1-)100% confidence
interval for x based on m is given by
the limits:
x  z / 2

n
Estimation of the Mean When
the Standard Deviation is
Unknown
We must make sure that the sampled
population is normally distributed.
 Normal Plots

Student-t Distribution

Many times we do not know what  is .
In these cases, we use s as the
standard deviation. The standard error
of the sample mean is now s
n
Characteristics of the Studentt Distribution


Bell shaped and symmetric, just like the
normal distribution is bell shaped and
symmetric. The t-distribution “looks” like the
normal distribution but is not normal.
The t-distribution is a family of distributions,
each member being uniquely identified by its
degrees of freedom (df) which is simply n1 where n is the sample size.
Characteristics of the Studentt Distribution

As the sample size increases the tdistribution becomes indistinguishable
from the standard normal curve.
The t-Interval
x  t( / 2)
s
n
Using the t-Interval
For small sample sizes:
If the sample size is less than 30,
construct a normal plot of your data. If
your data appears to be from a normal
distribution, then use the t-distribution.
If the data does not appear to be
normal, then use a non-parametric
technique that will be introduced later.
Using the t-Interval
For large sample sizes:
If the sample size is 30 or more, use
the t-distribution citing the Central Limit
theorem as justification for having
satisfied the required assumption of
normality.
Sample Size for Inference
Concerning the Mean
z / 2

n
 Margin of Error ( ME )
z / 2
 n
ME 2
 z / 2 

 n
 ME 
Confidence Interval for the
Median



Large Sample Confidence Interval for
the Median:
Sample size must be 20 or more.
We can construct a confidence interval for q
based on p.
We can then produce a confidence interval for
p with a sample proportion of .50 (this is
used to represent the definition of the
median, 50% below this mark, 50% above
this mark.)
Large Sample Confidence
Interval for the Median



Basic steps for conducting a large sample
confidence interval for the median:
Construct a normal plot to see if the data is
normal.
If the normal assumption is violated,
construct a (1-)100% for p based on a
sample proportion of .50.
Multiply the upper and lower bound of the
C.I. by n, the sample size. Round up the
lower bound and round down the upper
bound.
Large Sample Confidence
Interval for the Median

Sort the data and identify the data
values in those positions identified by
the previous step.
Small Sample Confidence
Interval for the Median
Sample size must be less than 20.
 The method we will explore is based
strictly on the binomial distribution.

Small Sample Confidence
Interval for the Median


Basic steps for conducting a small sample
confidence interval for the median:
Create a table that contains the discrete
cumulative probability distribution for 0 to n
for a binomial distribution where p = .50.
Identify the position for the lower bound with
a cumulative probability as near /2 as
possible.
Small Sample Confidence
Interval for the Median



Identify the position for the upper bound with
a cumulative probability as near 1-/2 as
possible.
Sort the data and identify the data values
corresponding to the position located in the
last two steps.
Report the actual confidence level by
summing the tail probabilities associated with
the positions chosen for the C.I. Bounds.