Download Chapter 5 Slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Sufficient statistic wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Estimation
• Goal: Use sample data to make predictions
regarding unknown population parameters
• Point Estimate - Single value that is best
guess of true parameter based on sample
• Interval Estimate - Range of values that we
can be confident contains the true parameter
Point Estimate
• Point Estimator - Statistic computed from a
sample that predicts the value of the
unknown parameter
• Unbiased Estimator - A statistic that has a
sampling distribution with mean equal to the
true parameter
• Efficient Estimator - A statistic that has a
sampling distribution with smaller standard
error than other competing statistics
Point Estimators
• Sample mean is the most common unbiased
estimator for the population mean m
Y

m Y 
^
i
n
• Sample standard deviation is the most common
estimator for s (s2 is unbiased for s2)
^
s s
2
(
Y

Y
)
 i
n 1
• Sample proportion of individuals with a (nominal)
characteristic is estimator for population proportion
Confidence Interval for the Mean
• Confidence Interval - Range of values
computed from sample information that we
can be confident contains the true parameter
• Confidence Coefficient - The probability
that an interval computed from a sample
contains the true unknown parameter
(.90,.95,.99 are typical values)
• Central Limit Theorem - Sampling
distributions of sample mean is
approximately normal in large samples
Confidence Interval for the Mean
• In large samples, the sample mean is
approximately normal with mean m and
standard error
sY  s
n
• Thus, we have the following probability
statement:
P(m  1.96s Y  Y  m  1.96s Y )  .95
• That is, we can be very confident that the sample mean lies
within 1.96 standard errors of the (unknown) population mean
Confidence Interval for the Mean
• Problem: The standard error is unknown (s
is also a parameter). It is estimated by
replacing s with its estimate from the
sample data:
s
sY 
n
^
95% Confidence Interval for m :
^
Y  1.96 s Y
s
 Y  1.96
n
Confidence Interval for the Mean
• Most reported confidence intervals are 95%
• By increasing confidence coefficient, width
of interval must increase
• Rule for (1-a)100% confidence interval:
 s 
Y  za / 2 

 n
(1-a)100%
90%
95%
99%
a
.10
.05
.01
a/2
.050
.025
.005
za/2
1.645
1.96
2.58
Properties of the CI for a Mean
• Confidence level refers to the fraction of
time that CI’s would contain the true
parameter if many random samples were
taken from the same population
• The width of a CI increases as the
confidence level increases
• The width of a CI decreases as the sample
size increases
• CI provides us a credible set of possible
values of m with a small risk of error
Confidence Interval for a Proportion
• Population Proportion - Fraction of a
population that has a particular
characteristic (falling in a category)
• Sample Proportion - Fraction of a sample
that has a particular characteristic (falling in
a category)
• Sampling distribution of sample proportion
(large samples) is approximately normal
Confidence Interval for a Proportion
• Parameter: p (a value between 0 and 1, not
3.14...)
• Sample - n items sampled, X is the number that
possess the characteristic (fall in the category)
X
p

• Sample Proportion:
n
^
– Mean of sampling distribution: p
– Standard error (actual and estimated):
^

p 1  p 


n
^
s
^
p

p (1  p )
n
^
s
^
p

Confidence Interval for a Proportion
• Criteria for large samples
– 0.30 < p < 0.70  n > 30
– Otherwise, X > 10, n-X > 10
• Large Sample (1-a)100% CI for p :
 ^
p 1  p 


n
^
^
p  za / 2
Choosing the Sample Size
• Bound on error (aka Margin of error) - For a
given confidence level (1-a), we can be this
confident that the difference between the sample
estimate and the population parameter is less
than za/2 standard errors in absolute value
• Researchers choose sample sizes such that the
bound on error is small enough to provide
worthwhile inferences
Choosing the Sample Size
• Step 1 - Determine Parameter of interest (Mean
or Proportion)
• Step 2 - Select an upper bound for the margin of
error (B) and a confidence level (1-a)
Proportions (can be safe and set p=0.5):
Means (need an estimate of s):
2

za / 2  p (1  p )
n
B2
2

za / 2  s 2
n
B2
Confidence Interval for Median
• Population Median - 50th-percentile (Half
the population falls above and below
median). Not equal to mean if underlying
distribution is not symmetric
• Procedure
–
–
–
–
Sample n items
Order them from smallest to largest
n 1
 n
Compute the following interval:
2
Choose the data values with the ranks
corresponding to the lower and upper bounds