Download Probability Essentials Chapter 3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Estimation
Chapter 7
“Farewell! Thou are too dear for my possessing
And like enough thou know’st thy estimate.”
William Shakespeare, Sonnet 87
MGMT 242
Topics and Goals for Chapter 7
• Unbiased point estimators for the population mean:
– sample mean
– sample median
– sample trimmed mean
•
•
•
•
•
•
Interval estimate of a population mean
Confidence interval for a proportion
Sample size and confidence intervals
What to do when the population variance is unknown
Confidence intervals with the Student’s t-distribution
Assumptions for the Student’s t-distribution
MGMT 242
Unbiased Estimators of the Population Mean
• An estimator of the mean, µhat (µ with a caret over it)
is unbiased if E(uhat) = µ, that is if the long run
average of µhat equals the population mean.
• The sample mean, xbar, is an unbiased estimator of µ;
• The sample median, xm, is an unbiased estimator of µ;
• The “trimmed sample mean” (doesn’t take top 10%,
bottom 10% of values) is an unbiased estimator of µ.
• Any linear combination of the sample values, divided
by the number of values, is an unbiased estimator of µ
(see Problem 7.8, where the middle of the range is
used to estimate µ).
MGMT 242
Efficient Estimators of the Population Mean
• The most efficient estimator of the population mean is
that which will give an estimate with the smallest
standard deviation.
• Example: Problems 7.1, 7.2, 7.8 (Electronic Reserve).
MGMT 242
Confidence Interval for Population Mean-I
• Suppose we measure a sample mean; how close is
this value to the population mean?
• If we know the standard deviation for the population,
this is a straightforward problem; if we don’t, it’s
more complicated--for now we’ll suppose we know
the value of , the population standard deviation
• The population distribution for the sample means,
sample size N, will approach a normal distribution,
mean µ, and standard deviation of the mean, mean =
 /N, as N gets large (practically, for N greater than
about 10 to 20).
MGMT 242
Confidence Interval for Population Mean-II
• There is a 95% probability that the measured value of
the sample mean, xbar, lies within the range
µ-1.96 mean to µ + 1.96 mean (See Board demo)
• This corresponds to the inequality
µ-1.96 mean  xbar  µ +1.96 mean
• With a little manipulation the inequality above can be
changed to the one for the confidence interval (CI)
xbar-1.96 mean  µ  xbar +1.96 mean
where mean =  /N.
• Interpretation: 95% of the trials (in the long run) will
give values of xbar within limits.(Concepts example)
MGMT 242
Confidence Interval for Population Mean-III
• General Case: Confidence Interval (CI) for level
(1-)*100 % (e.g.  = 0.05 corresponds to 95% level)
• Then the CI (1-) is given by
xbar - z (1-/2) mean  µ  xbar + z (1-/2) mean ,
where mean =  / N and z (1-/2) is the z-score for the
(1-/2) centile (see board diagram):
Confidence Level
z (1-/2)
90 %
1.645
95%
1.960
99%
2.575
MGMT 242
Interpretation of Confidence Interval
The diagram to the left is
from the “Concepts”
StatPlus add-in. µ, the
“true” mean salary,
equals $5600;
The 95% CI for given 
and N runs from
Confidence Intervals for Sample Means
$6,600
$6,400
$6,200
$6,000
$5,800
$5,600
$5,400
$5,200
$5,000
$4,800
$4,600
MGMT 242
Confidence Interval for Proportion-I
• General Case: Confidence Interval (CI) for level
)*100 % (e.g.  = 0.05 corresponds to 95% level)
• Then the CI (1-) is given by
p - z (1-/2) p    p + z (1-/2) p,
(1-
where z (1-/2) is the z-score for the (1-/2) centile,  is the
population proportion (proportion yeses in a yes/no
questionnaire, proportion test positive in a medical test,
etc.), p = x /N is the sample estimate of  (x is the number
of successes in a sample size N) and p, the standard
deviation of the proportion, is estimated by
p = {p(1-p)/N} 1/2
MGMT 242
Confidence Interval for Proportion-II
• The CI (1-) for 1-  = 0.95, a 95% CI, is given by
p - 1.96 p    p + 1.96 p,
p = {p(1-p)/N} 1/2
• Example (Ex. 7.20, text): 84 out of 125 individuals are aware
of a certain product; a 95% CI for this proportion is given by
p = 84/125 = 0.672;
1.96 p = 1.96 x [p(1-p)/N] = 0.082,
so 95% CI is given by 0.672 - 0.082 to 0.672 + 0.082 or
(0.590, 0.754)
MGMT 242
Sample Size Required for Given CI width
• We know that the CI gets smaller as the sample size,
N, increases. Suppose we require (at a certain
significance level) a specific width, E, for the CI.
Then E = 2 z (1-/2) mean and, since mean =  /N,
we get N = (2 z (1-/2)  /E)2
• Example: Exercise 7.25, text: want 95% CI for insurance
claims to $50 wide (=E), with  estimated $400;
Then N = (2x1.96x400 /50)2 = 984.
MGMT 242
Student’s t-Distribution for Unknown 
• Sample Standard Deviation, s, used to estimate
population standard deviation, , if  unknown
– s = { (xi - xbar)2/(N-1)}(1/2) for sample, size N, with mean
xbar
• Uncertainty in standard deviation (from sample size
estimate) is clearly bigger, the smaller the sample size;
• Have to account for this uncertainty by use of a new
statistic, the “Student’s-t” variable.
• t = (x - ) / s, for individual value of sample, or
• t= (xbar - ) / SEM, with SEM = s /N, for sample
mean.
MGMT 242
Student’s t-Distribution--Continued I
• The Student’s t-distribution gives the probability of the
t-statistic (see previous slide) occurring by chance
• The distribution will clearly depend on sample size, N
• The larger the sample size, the more nearly the sample
standard deviation, s, should approach the value for
population standard deviation, 
• The effective sample size is the “degrees of freedom”
(abbreviated as “df”); df = N-1
• Probability for large t, small N, is lower than for same
value of z (see “Concepts” illustration).
MGMT 242
Student’s-t Distribution
Standard Normal
MGMT 242
4.00
3.00
2.00
1.00
0.00
-1.00
-2.00
-3.00
t Distribution
-4.00
• The graph at left
compares a Student’s-t
distribution (solid blue
line) with the standard
normal bell curve
(dotted red line) for
df=2 (N=3). Note that
the probability of the
Student’s-t is less than
that for the z-curve, for
statistic values greater
than 2, or less than -2.
Student’s-t Distribution--Example
• Ex. 7.34, Text. Comparison shopping at 14 New York
area department stores to get refrigerator prices yields
the following results:
$341,347,319,331,326,298,335,351,316,307,335,320,329,346
Find the 95% Confidence Interval (CI) for the price:
(From Xcel) xbar = $328.64
SEM= s /N = 15.49 / 14 = $4.14
df = 14 -1 = 13
t13 = 2.160 (from Table 4, text, or Excel)
95% CI: 328.64- 2.160 x 4.14 to 328.64 + 2.160x 4.14
or
$319.70 to $337.58
MGMT 242