Download CI_SP13

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Opinion poll wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
PRE-CLASS Questions:
Recall from Week 3 our discussion on sampling and how we calculated a conservative margin of
error for surveys by 1/sqrt(n). We discussed the Corbett survey which reported 52% of PA
registered voters supported Governor Corbett's decision to sue the NCAA. The survey had a
3.8% margin of error based on a sample size of 675. Using this information, consider the
following questions:
1. What is the population parameter of interest for this study? That is, what does this 52%
estimate? The population parameter of interest is “the proportion of all PA registered
voters who support Corbett suing NCAA” and is estimated by this 52%.
2. How "confident" do you think the group reporting these results are in their interval estimate
being correct (i.e. 52% plus/minus 3.8%)? That is, does the interval captures the true value of
interest in question 1? Seek class responses, then comment that the common level of
confidence used is 95%. That is, when you see such reported information, you can assume
– unless stated otherwise – that the margin of error was calculated based on a 95% level of
confidence. This means essentially that if the survey was repeated many times under the
same conditions that about 95% of the confidence intervals constructed would capture the
true proportion. (Refer back to last week’s discussion on sampling distributions where we
looked at how many intervals calculated captured the population parameter.
3. Do you think a wider interval would offer more or less confidence to the polling agency in
their interval being correct? Explain in common sense terms why a wider interval would
invoke a higher level of confidence in the interval capturing the unknown parameter.
4. If polling agency wanted a margin of error of 3% instead of 3.8% do you think they would
have needed a larger or smaller sample size to achieve this new margin of error?
Again, applying common sense, do you believe asking more people would increase the error
or decrease the error in your estimate? Decrease error!
5. Referring back to question 3, what size sample do you think they would have needed to
achieve a 3% margin of error?
ME = 1/sqrt(n) so n = 1/ME2 = 1/(0.03)2 = 1/0.0009 =1111.11 ~ 1112
6. The Corbett survey was based on a random sample. Considering the survey you completed for
our class two weeks ago, what population do you think this survey represents? Is our survey data
based on any probability sampling methods? If not, what affect does that have on what
population you feel is represented by our class survey?
Based on volunteer sample. For some questions such as amount spend on first date or
kissing w/o asking considered form of sexual assault, one could find support in our sample
being representative of the population of UP undergraduates. But what about the Chi
Omega picture? Do you think our class would be representative of the UP undergraduate
1
population? When you answer this question, consider if you feel some student population
subgroup might be more offended than others – then ask if this population is being
represented in our class.
One-Proportion and One-Mean Confidence Intervals
For our class purposes, we are going to base our discussion on confidence intervals for two types
of data: Categorical and Quantitative. If data is Categorical, then our interest is in getting a
confidence interval for the population proportion. If data is Quantitative, then our interest is in
getting a confidence interval for the population mean.
As a definition, a confidence interval is defined pretty much as it sounds. One it is an interval
(i.e. a range of values) that one has some degree of confidence in for capturing (i.e. the range of
values includes the parameter of interest) the population value one is trying to estimate. Thus the
primary use of a confidence interval is to estimate some unknown population parameter, i.e
the proportion for some proportion (p) or some mean (u) for a population.
Recalling notation from sampling distributions we have the following:
̂) is a point estimate that estimates Population Proportion (p)
Sample Proportion (𝒑
̅
Sample Mean (𝑿) is a point estimate that estimates the Population Mean (u)
The basic format of a confidence interval is:
Sample statistic ± Margin of Error
As we discussed above, however, this margin of error can be affected by such things as sample
size (larger the sample the smaller the error) or by how confident one is (the more confident the
wider the interval). Because of this, we may realize that a margin of error must incorporate these
two features in its calculation. Recall from our sampling distribution discussion for sample
proportion and sample mean that we understand that due to sampling we will have some degree
of variation in our sample results, i.e. the sample proportions and sample means would vary from
sample to sample. So we formulated a standard error. We then looked at how the Empirical
Rule would be applied to these standard errors to create intervals that would contain the
proportion of mean. If you remember, for 95% we used a multiplier of “2 times the Standard
Error”. We apply that practice here to offer a basic construct for the margin of error:
Margin of Errror = Multiplier × Standard error
As stated previously, the most commonly used and accepted level of confidence is 95%. This
means that if a study was repeated say 100 times – under the same conditions (e.g. sample size,
sampling method, population sampled) – and 100 confidence intervals were constructed, then
roughly 95 of these 100 intervals would include the population parameter of interest. Other lessused levels of confidence are: 90%, 98%, and 99%.
2
The Multiplier:
Recall from our sampling distribution discussion that we stated that sample proportions and
sample means will follow an approximate normal distribution under certain conditions. For
sample proportions this will be having at least 10 successes (i.e. n*𝑝̂ >= 10 ) and at least 10
failures (n*(1-𝑝̂ ) >= 10. Using this “theory”, we can get our multipliers from the standard
normal distribution (Z) table.
Confidence Level
90%
95%
98%
99%
Z-Multiplier
1.65
1.96 ~ 2.00
2.33
2.58
Motivation behind these multipliers;
Looking at 95%, the multiplier is found by determining what z-score would result in a 0.95
probability for 0.25 in the lower (left) tail and 0.25 in the upper (right) tail. From the z-table we
find that this takes place for a z-value of -1.96 and + 1.96
One Proportion Confidence Interval
pˆ  Z *
pˆ (1  pˆ )
n
p̂ = sample proportion and
pˆ (1  pˆ )
n
is the standard error (NOTE since we do NOT know the
population proportion, p, we substitute in the sample proportion,
p̂ )
Examples:
Assuming our class survey represents a random sample of UP undergraduates, estimate with
95% confidence, the proportion of UP undergraduates who:
3
1. Knew that kissing someone without asking technically falls under the definition of sexual
assault.
2. Felt the reaction to the Chi Omega photograph was proper.
For Question 1: sample proportion is 73/177 = 0.41 Note 73 successes and 104 failures
0.41(1  0.41)
= 0.41  1.96 * 0.037 = 0.41 ± 0.074 = 0.336 to 0.484 or
177
0.41  1.96 *
from 33.6% to 48.4%
INTERPRETATION: We are 95% confident that the proportion of UP undergraduates who
know kissing w/o asking falls under sexual assault is from 33.6% to 48.4%
Event = Yes
Variable
KnowKiss
X
73
N
177
Sample p
0.412429
95% CI
(0.339908, 0.484951)
For Question 2: sample proportion is 48/156 = 0.41 Note 48 successes and 108 failures
0.31(1  0.31)
= 0.31  1.96 * 0.037 = 0.41 ± 0.074 = 0.236 to 0.384 or
156
0.31  1.96 *
from 23.6% to 38.4%
INTERPRETATION: We are 95% confident that the proportion of UP undergraduates who felt
the reaction to the Chi Omega photo was proper is from 23.6% to 38.4%
Event = Proper
Variable
ChiOmega
X
48
N
156
Sample p
0.307692
95% CI
(0.235266, 0.380118)
Using the normal approximation.
One Mean Intervales
For 1-mean the confidence interval will involve a new concept: Degrees of Freedom, or df. We
will use this df in conjunction with Table A2 to find the multiplier. This occurs since we only
have information on the sample and therefore do not know the population standard deviation (σ):
so we have to estimate this with the sample standard deviation (s). The formula for a 1-mean
confidence interval is:
4
xt*
s
n
Therefore we apply similar techniques but now we are interested in estimating the population
mean, μ, by using the sample statistic and the multiplier is a t-value. Until now we assumed
that our random variable came from a normal distribution with a known population standard
deviation, σ. However, typically we do not know this parameter and therefore must estimate it.
This is done by using the standard deviation of the sample which is expressed as "S". Since we
need to make this estimate we lose our reference to the variable being from a normal distribution.
These t-values come from a t-distribution which is similar to the standard normal distribution
from which the z-values came. The similarities are that the distribution is symmetrical and
centered on 0. The difference is that when using a t-table we need to consider a new feature:
degrees of freedom (df). This degree of freedom will be based on the sample size, n.
Examples:
Assuming our class survey represents a random sample of UP undergraduates, estimate with
95% confidence,
1. The mean amount of money they feel should be spent on the first date.
2. The mean loan amount they expect to face upon graduation.
For Question 1: I took a random sample of 30 just to better illustrate using the t-table. The
sample mean is 55 dollars with standard deviation of 85.9 dollars. With n of 30 the degrees of
freedom (DF) = 30 – 1 = 29.
xt*
s
n
= 55  2.045 *
85.9
= 55 ± 2.045*15.7 = 55 ± 32.1 or from 22.9 to 77.1 dollars.
30
INTERPRETATION: We are 95% confident that the mean amount of money UP undergraduates
feel should be spent on a first date is from $22.9 to $77.1
One-Sample T: Sample_FirstDate
Variable
Sample_FirstDate
N
30
Mean
55.0
StDev
85.9
SE Mean
15.7
95% CI
(22.9, 87.1)
For Question 2: The sample mean is 36,352 dollars with standard deviation of 54,056 dollars.
With n of 172 the degrees of freedom (DF) = 172 – 1 = 171. From table use 100
5
xt*
s
= 36352  1.984 *
n
to 44530 dollars.
54056
= 36352 ± 1.984*4122 = 36352 ± 8178 or from 28174
172
INTERPRETATION: We are 95% confident that the mean loan amount UP undergraduates will
pay upon graduation is from $28,174 to $44,530
One-Sample T: Loan
Variable
Loan
N
172
Mean
36352
StDev
54056
SE Mean
4122
95% CI
(28216, 44488)
6