Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STAB22 Statistics I Lecture 20 1 Sample Proportion Bernoulli population: Each population subject belongs to one of two categories Population proportion (parameter) p E.g. proportion of male students Random sample of size n: E.g. male/female, employed/unemployed X = # of sample subjects in category of interest X follows Binomial with parameters n, p ˆ X /n Sample Proportion: p Sample estimate of population proportion p 2 Sampling Distribution of Sample Proportion For population w/ proportion of success p, if Samples are random and independent np 10 Sample size n is large enough, so that n(1 p) 10 If sampling without replacement, n must be less than 10% of population Sampling distribution of pˆ is approximately Normal, with mean p and variance p(1 p) / n E pˆ p p(1 p) Var pˆ SD( pˆ ) n p(1 p) n 3 Example University students will vote on proposal, for which p=60% are in favor. You make a small poll, asking n=25 students for their opinion. What is the probability that the poll gives you wrong results (i.e. rejects the proposal)? 4 Confidence Interval Sampling distribution model of p̂ is centered at p, with standard deviation SD( pˆ ) p (1 p) n Don’t know p → can’t find true st. dev. Approximate with standard error SE ( pˆ ) pˆ (1 pˆ ) n By the 68-95-99.7% Rule, we know ~ 68% of samples have p̂ ’s within 1 SE of p ~ 95% of samples have p̂ ’s within 2 SEs of p ~ 99.7% of samples have p̂ ’s within 3 SEs of p Look at this from p̂ ’s point of view 5 Confidence Interval Consider 95% level: There’s a 95% chance that p is no more than 2 SEs away from p̂ Sampling distr. p 2SD( pˆ ) p̂ p pˆ 2SE( pˆ ) Conversely, if we reach out 2 SEs in either direction of p̂ , we can be 95% “confident” this interval contains the true proportion p This is called a 95% confidence interval 6 What Does “95% Confidence” Really Mean? Confidence Interval (CI) uses sample statistic to estimate parameter Since samples vary, the statistics used and thus the CI’s, also vary Our CI’s sometimes capture the true parameter, while other times they don’t 7 What Does “95% Confidence” Really Mean? Each CI will either contain population parameter or not (but we can’t know) Our confidence is in the process of constructing the interval, not in any one interval itself Thus, we expect 95% of all 95%-CI’s to contain the parameter that they are estimating 8 Margin of Error With 95% CI, we are 95 confident the interval p ˆ 2SE( p ˆ ) contains the true p The extent of the interval on either side of p̂ is called the margin of error (ME) In general, CI’s have the form sample estimate ± ME The more confident we want to be, the larger our ME needs to be 9 Certainty vs Precision To be more confident, must become less precise I.e. we need more values in our confidence interval to be more certain Most common confidence levels are 90%, 95%, and 99% (but any percentage can be used) For every CI there is a trade-off between certainty & precision In most cases, can be both sufficiently certain & sufficiently precise to make useful statements Moreover, increasing sample size (n) always improves precision 10 Critical Values The ‘2’ in pˆ 2SE( pˆ ) (our 95% confidence interval) came from the 68-95-99.7% rule Using Normal probability table, a more exact value for our 95% confidence interval is 1.96 Call 1.96 the critical value and denote it z* We can find corresponding critical value for any confidence level using probability tables 11 Critical Values E.g. for a 90% confidence interval, the critical value is z*=1.645 P z* Z z * P 1.645 Z 1.645 .90 12 Example Find 99% confidence Normal critical value I.e.. z* such that P z* Z z * .99 13 Assumptions and Conditions Before creating CI for proportion, check: Independence Assumption Randomization Condition: data must be sampled at random or generated from properly randomized experiment 10% Condition: sample size (n) must be < 10% of the population size Sample size assumption: n is large enough to use CLT (Normal approximation) Success/Failure Condition: Must expect at least 10 “successes” & 10 “failures” in sample 14 Proportion Confidence Interval If conditions are met, build CI for proportion at C% confidence level as: pˆ z SE pˆ where: SE( pˆ ) pˆ (1n pˆ ) critical value z* from Normal distr., depending on particular confidence level C 15 Example University students will vote on proposal. You ask random sample of 400 students, out of which 250 are in favor. Build 95% CI for proportion in favor 16 Example (cont’d) 17