Download Lecture 20

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
STAB22 Statistics I
Lecture 20
1
Sample Proportion

Bernoulli population: Each population subject
belongs to one of two categories


Population proportion (parameter) p


E.g. proportion of male students
Random sample of size n:



E.g. male/female, employed/unemployed
X = # of sample subjects in category of interest
X follows Binomial with parameters n, p
ˆ  X /n
Sample Proportion: p

Sample estimate of population proportion p
2
Sampling Distribution of
Sample Proportion

For population w/ proportion of success p, if


Samples are random and independent
np  10
Sample size n is large enough, so that n(1  p)  10


If sampling without replacement, n must be less than
10% of population
Sampling distribution of pˆ is approximately
Normal, with mean p and variance p(1  p) / n
E  pˆ   p
p(1  p)
Var  pˆ  
 SD( pˆ ) 
n
p(1  p)
n
3
Example


University students will vote on proposal, for which
p=60% are in favor. You make a small poll, asking
n=25 students for their opinion.
What is the probability that the poll gives you wrong
results (i.e. rejects the proposal)?
4
Confidence Interval

Sampling distribution model of p̂ is centered
at p, with standard deviation SD( pˆ )  p  (1  p) n



Don’t know p → can’t find true st. dev.
Approximate with standard error SE ( pˆ ) 
pˆ  (1  pˆ )
n
By the 68-95-99.7% Rule, we know




~ 68% of samples have p̂ ’s within 1 SE of p
~ 95% of samples have p̂ ’s within 2 SEs of p
~ 99.7% of samples have p̂ ’s within 3 SEs of p
Look at this from p̂ ’s point of view
5
Confidence Interval

Consider 95% level: There’s a 95% chance
that p is no more than 2 SEs away from p̂
Sampling distr.
 p  2SD( pˆ )
p̂


p
 pˆ  2SE( pˆ )
Conversely, if we reach out 2 SEs in either
direction of p̂ , we can be 95% “confident” this
interval contains the true proportion p
This is called a 95% confidence interval
6
What Does “95% Confidence”
Really Mean?

Confidence Interval (CI)
uses sample statistic to
estimate parameter


Since samples vary, the
statistics used and thus
the CI’s, also vary
Our CI’s sometimes
capture the true parameter,
while other times they don’t
7
What Does “95% Confidence”
Really Mean?


Each CI will either contain population
parameter or not (but we can’t know)
Our confidence is in the process of
constructing the interval, not in any one
interval itself

Thus, we expect 95% of all 95%-CI’s to contain
the parameter that they are estimating
8
Margin of Error




With 95% CI, we are 95 confident the interval
p
ˆ  2SE( p
ˆ ) contains the true p
The extent of the interval on either side of p̂
is called the margin of error (ME)
In general, CI’s have the form
sample estimate ± ME
The more confident we want to be, the larger
our ME needs to be
9
Certainty vs Precision

To be more confident, must become less precise



I.e. we need more values in our confidence
interval to be more certain
Most common confidence levels are 90%, 95%,
and 99% (but any percentage can be used)
For every CI there is a trade-off between
certainty & precision


In most cases, can be both sufficiently certain &
sufficiently precise to make useful statements
Moreover, increasing sample size (n) always
improves precision
10
Critical Values

The ‘2’ in pˆ  2SE( pˆ ) (our 95% confidence
interval) came from the 68-95-99.7% rule



Using Normal probability table, a more exact
value for our 95% confidence interval is 1.96
Call 1.96 the critical value and denote it z*
We can find corresponding critical value for
any confidence level using probability tables
11
Critical Values

E.g. for a 90% confidence interval, the critical
value is z*=1.645
P   z*  Z  z *  P  1.645  Z  1.645  .90
12
Example

Find 99% confidence Normal critical value
 I.e.. z* such that P   z*  Z  z *  .99
13
Assumptions and Conditions

Before creating CI for proportion, check:
Independence Assumption



Randomization Condition: data must be sampled at
random or generated from properly randomized
experiment
10% Condition: sample size (n) must be < 10% of the
population size
Sample size assumption: n is large enough to
use CLT (Normal approximation)

Success/Failure Condition: Must expect at least 10
“successes” & 10 “failures” in sample
14
Proportion Confidence Interval

If conditions are met, build CI for proportion
at C% confidence level as:
pˆ  z  SE  pˆ 

where:


SE( pˆ )  pˆ (1n pˆ )
critical value z* from Normal distr., depending on
particular confidence level C
15
Example

University students will vote on proposal.
You ask random sample of 400 students,
out of which 250 are in favor.

Build 95% CI for proportion in favor
16
Example (cont’d)
17