Download Chapter 4: Sampling and Statistical Inference

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 4: Sampling and
Statistical Inference
Part 2: Estimation
Types of Estimates


Point estimate – a single number used
to estimate a population parameter
Interval estimate – a range of values
between which a population parameter
is believed to be
Common Point Estimates
Theoretical Issues


Unbiased estimator – one for which the
expected value equals the population
parameter it is intended to estimate
The sample variance is an unbiased
estimator for the population variance
2
n
s2 
 x  x 
i 1
i
n 1
2
n
 
2
 x   
i 1
i
N
Confidence Intervals


Confidence interval (CI) – an interval
estimated that specifies the likelihood that
the interval contains the true population
parameter
Level of confidence (1 – a) – the probability
that the CI contains the true population
parameter, usually expressed as a percentage
(90%, 95%, 99% are most common).
Confidence Intervals for the
Mean - Rationale
Confidence Interval for the
Mean –  Known
A 100(1 – a)% CI is: x  za/2(/n)
za/2 may be found from Table A.1 or using the
Excel function NORMSINV(1-a/2)
Confidence Interval for the
Mean,  Unknown
A 100(1 – a)% CI is: x  ta/2,n-1(s/n)
ta/2,n-1 is the value from a t-distribution with
n-1 degrees of freedom, from Table A.3 or
the Excel function TINV(a, n-1)
Relationship Between Normal
Distribution and t-distribution
The t-distribution yields larger confidence
intervals for smaller sample sizes.
PHStat Tool: Confidence
Intervals for the Mean

PHStat menu > Confidence Intervals >
Estimate for the mean, sigma known…,
or Estimate for the mean, sigma
unknown…
PHStat Tool: Confidence
Intervals for the Mean - Dialog
Enter the confidence level
Choose specification of
sample statistics
Check Finite Population
Correction box if
appropriate
PHStat Tool: Confidence
Intervals for the Mean - Results
Confidence Intervals for
Proportions

Sample proportion: p = x/n




x = number in sample having desired
characteristic
n = sample size
The sampling distribution of p has mean
p and variance p(1 – p)/n
When np and n(1 – p) are at least 5,
the sampling distribution of p approach
a normal distribution
Confidence Intervals for
Proportions
A 100(1 – a)% CI is: p  za/2
p(1 - p)
n
PHStat tool is available under Confidence
Intervals option
Confidence Intervals and
Sample Size

CI for the mean,  known


Sample size needed for half-width of at
most E is n  (za/2)2(2)/E2
CI for a proportion


Sample size needed for half-width of at
most E is
( za / 2 ) 2 p (1  p )
n
E2
Use p as an estimate of p or 0.5 for the
most conservative estimate
PHStat Tool: Sample Size
Determination

PHStat menu > Sample Size >
Determination for the Mean or
Determination for the Proportion
Enter s, E, and
confidence level
Check Finite
Population Correction
box if appropriate
Confidence Intervals for
Population Total
A 100(1 – a)% CI is: N x  tn-1,a/2 N
s
n
PHStat tool is available under Confidence
Intervals option
N n
N 1
Confidence Intervals for
Differences Between Means
Population 1
Population 2
Mean
1
2
Standard
deviation
1
2
x1
x2
n1
n2
Point estimate
Sample size
Point estimate for the difference in means,
1 – 2, is given by x1 - x2
Independent Samples With
Unequal Variances
A 100(1 – a)% CI
s
s 



n
n
2 
 1
2
1
df* =
2
2
s12 s 22
is:x1 -x2  (ta/2, df*)

n1 n 2
2
 ( s12 / n1 ) 2   ( s 22 / n2 ) 2 

 

n

1
n

1
 1
  2

Fractional values
rounded down
Independent Samples With
Equal Variances
A 100(1 – a)% CI is:x1 -x2  (ta/2, n1 + n2 – 2)s p
sp 
1
1

n1 n2
(n1  1) s12  (n2  1) s 22
n1  n2  2
where sp is a common “pooled” standard deviation. Must
assume the variances of the two populations are equal.
Paired Samples
A 100(1 – a)% CI is:D  (tn-1,a/2) sD/n
Di = difference for each pair of observations
D = average of differences
n
sD 
 (D
i 1
i
 D)
n 1
PHStat tool available in the
Confidence Intervals menu
Differences Between
Proportions
A 100(1 – a)% CI is:
p1  p2  za / 2
p1 (1  p1 ) p2 (1  p2 )

n1
n2
Applies when nipi and ni(1 – pi) are greater than 5
Sampling Distribution of s


The sample standard deviation, s, is a point
estimate for the population standard
deviation, 
The sampling distribution of s has a chisquare (c2) distribution with n-1 df



See Table A.4
CHIDIST(x, deg_freedom) returns probability to
the right of x
CHIINV(probability, deg_freedom) returns the
value of x for a specified right-tail probability
Confidence Intervals for the
Variance
 (n  1) s 2 (n  1) s 2 
, 2
A 100(1 – a)% CI is:  2

 c n 1,a / 2 c n 1,1a / 2 
PHStat Tool: Confidence
Intervals for Variance - Dialog

PHStat menu > Confidence Intervals >
Estimate for the Population Variance
Enter sample size,
standard deviation,
and confidence level
PHStat Tool: Confidence
Intervals for Variance - Results
Time Series Data

Confidence intervals only make sense
for stationary time series data
Probability Intervals


A 100(1 – a)% probability interval for a
random variable X is an interval [A,B]
such that P(A X  B) = 1 – a
Do not confuse a confidence interval
with a probability interval; confidence
intervals are probability intervals for
sampling distributions, not for the
distribution of the random variable.