Download RAND

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Chapter 6
Handout
Stata Exercise 1: Sampling and Confidence Intervals
This exercise relies on the bsample command
bsample -- Sampling with replacement: draws random samples
with replacement from the data in memory
ο‚·
Type this into a Do-file.
clear
set obs 999
gen x = invnorm(uniform())
br x
hist x
ci x
ο‚·
When you are done typing, hit CTRL-D
 Suppose we have a hypothesis and, given that hypothesis, a sample outcome looks
extreme. This is evidence that they hypothesis is not true.
gen sample =.
bsample 20, weight(sample)
br
x sample if sample==1
hist x
if sample==1
ci
x
if sample==1
ο‚·
When you are done typing (in a Do-file), hit CTRL-D
𝜎
 FALSE: β€œThere’s a 95% chance that the true mean fall within π‘₯Μ… ± 2 ( π‘₯⁄ )”
βˆšπ‘›
– This is false because the true population mean just is. It doesn’t have a
sampling distribution because it is not the result of a sample. So either the
interval contains the true population mean (which is not a random variable)
or it doesn’t. If it does contain it, then the probability that the interval
contains the true population mean is Pr=1. If it doesn’t then the probability is
Pr=0.
𝜎
 TRUE: β€œThere’s a 95% chance that the this interval, given by π‘₯Μ… ± 2 ( π‘₯⁄ ), is one
βˆšπ‘›
of the ones that contain the true mean.”
forvalues i=1/20 {
gen sample`i'=.
bsample 20, weight(sample`i')
ci x if sample`i'==1
}
ο‚·
When you are done typing (in a Do-file), hit CTRL-D
Stata Exercise 2: Confidence Intervals for different Confidence Levels
ci x if sample1==1, level(90)
ci x if sample1==1
ci x if sample1==1, level(99)
Stata Exercise 3: Confidence Intervals for different sample sizes
gen sample21=.
gen sample22=.
bsample 20, weight(sample21)
bsample 100, weight(sample22)
hist x if sample21==1
hist x if sample22==1
ci x if sample21==1
ci x if sample22==1
Stata Exercise 4: Confidence Intervals for different population standard deviations
Remember that if you have a variable π‘₯ with variance 𝜎π‘₯2 , and you multiply it times
𝑏,
the variance of bx
is
b2 Οƒ2x .
2
2 2
πœŽπ‘π‘₯
=
𝑏 𝜎π‘₯
the standard deviation of 𝑏π‘₯
is
π‘πœŽπ‘₯ .
πœŽπ‘π‘₯
=
π‘πœŽπ‘₯
gen y =
gen w =
sum y w
ci y if
ci w if
invnorm(uniform())
y*.5
sample21==1
sample21==1
Excel Exercise 1
1. Tools | Data Analysis … | Random
Number Generator
o Fill out the dialog box with the
information on the right. Hit OK.
2. On cell K1, type =SUM(A1:J1)
o Double-click on the handle to
extend that formula all the way
down to cell K160
3. Selecting the cells from K1 to K160, press
CTRL-C to copy the numbers into
STATA.
4. Open STATA. Type ed to open the Data Editor
5. Draw a histogram of the data, overlaying a normal density plot and a kernel density
plot:
hist v, norm kden
6. Draw a Normal quantile plot, using qnorm v.
o If the dots of the plot lie close to a straight line, there is evidence the data is
Normally distributed.
7. We know that the underlying data is uniformly distributed, because we generated the
distribution. We would have gotten a similar uniform distribution from 10 tosses of
an icosahedron (20-sided die).
o What is the shape of the distribution of the sum of a 10 observations of a
uniform random variable?
Stata Exercise 5
ο‚·
ο‚·
ο‚·
ο‚·
Using the results from the previous exercise
o ed
o rename var1 icosahedron
o tabstat i, s(mean sd)
 The theoretical mean of the sum of the outcomes of rolling ten
icohsahedrons is 105 and the theoretical standard deviation is about 20.
To calculate the standardized value (z-score)
o gen stdicosah = (icosahedron-105)/20
To make this look like IQ scores (which have mean 100 and stdev 15),
o gen iq = stdicosah*15+100
o tabstat ic std iq, s(mean sd)
 The result is β€œyour” IQ score.
Let’s compare the means of fake-IQ scores of men vs. women.
o Calculate the means (meanW and meanM).
o Calculate the difference between means (meanW – meanM)
o then calculate whether this difference is significantly different from zero.
ο‚·
ο‚·
Those with fake-IQ scores
o above 110, hold up two hands (group A)
o 90 to 110, hold up one hand (group B)
o below 90, keep hands down. (group C)
 Are there any visible characteristics that differentiate each group?
Let’s compare the means of fake-IQ scores
o of group A vs group B
o of group B vs group C
o of group A vs group C
 Is the difference statistically significant?
A short summary of Statistical Inference
ο‚·
Get a sample
o Other people’s samples (out of the same population) may be different
ο‚·
Calculate descriptive statistics (such as the sample mean).
o Law of large numbers

the larger a sample gets, the closer the (sample) statistic gets to the
parameter.
o Central limit theorem

Each sample will have different statistics (say, a different mean).
There will be a distribution of these statistics for many different
samples.

If the many samples are decently large, this distribution (of the sample
mean) will be centered around the true population mean, and that the
spread of this distribution is smaller if the samples are larger.
ο‚·
Formally, π‘₯Μ… ~𝑁(πœ‡, 𝜎/βˆšπ‘›). The average of your random
sample’s mean is the true mean, and the standard deviation of
the sample’s mean is the true population standard deviation,
divided by the square root of the sample size.
ο‚·
Calculate the standard deviation of the sample mean: 𝝈/βˆšπ’
o Typically, we don’t know the true population standard deviation, 𝜎. So we
estimate it with the sample standard deviation, sx.
First way of drawing conclusions out of sample data: The Confidence Interval
ο‚·
Pick a confidence level (1 βˆ’ 𝛼) that you are comfortable with.
o (1 βˆ’ 𝛼) = 95% is a popular confidence level.
ο‚·
Find the critical value 𝑧 βˆ— associated with that confidence level
o Get this from Table A.

If you want an (1 βˆ’ 𝛼) confidence interval, find the 𝛼/2 value on (the
left-side of) Table A. Then find the associated z-critical value.

For a 95% confidence interval, find 0.05/2 = 0.025. The associated zscore is 1.96.
ο‚·
Calculate the margin of error: 𝑧 βˆ— (𝜎/βˆšπ‘›)
Confidence Level
(1 βˆ’ 𝛼)
90%
95%
99%
ο‚·
𝛼
significance level
0.10
0.05
0.01
𝛼/2
on Table A
0.050
0.025
0.005
critical value
z
1.645
1.960
2.575
Calculate the Confidence Interval, for the particular confidence level, 1 βˆ’ 𝛼:
o This CI is wide enough that (1 βˆ’ 𝛼)% of samples will contain the true mean.
π‘₯Μ… ± π‘§π›Όβˆ— (𝜎/βˆšπ‘›)
or
(π‘₯Μ… βˆ’ π‘§π›Όβˆ—
𝜎
βˆšπ‘›
, π‘₯Μ… + π‘§π›Όβˆ—
𝜎
βˆšπ‘›
)
For example, the CI might be 3 ± 0.45, which is (2.55, 3.45).
ο‚·
Does your hypothesized value fall within the Confidence Interval?
o If so, β€œfail to reject the null hypothesis”
o If not, β€œreject the null hypothesis”
Second way of drawing conclusions out of sample data: The z-score
ο‚·
What’s your null hypothesis, 𝐻0 ?
o For sample, that the true population average GPA (πœ‡) is πœ‡0 = 3.0
o Or that the true population mean (πœ‡) is some given value, πœ‡0 .
o So we say 𝐻0 : πœ‡ = πœ‡0
ο‚·
Calculate the sample mean, π‘₯Μ…
ο‚·
Calculate the standard deviation of the sample mean: 𝜎/βˆšπ‘›
ο‚·
Calculate the z-score
π‘₯Μ… βˆ’ πœ‡0
𝜎/βˆšπ‘›

ο‚·
π‘₯Μ… is 𝑧 standard deviations away from πœ‡0 .
Pick a significance level (𝛼) that you are comfortable with.
o 𝛼 = 5% is a popular significance level.
o Find the critical value 𝑧 βˆ— associated with that significance level

β€œIf the statistic is less than 𝑧 βˆ— standard deviations away from the
hypothesized value, we will think it’s a fluke.”
ο‚·
Is the calculated z-score bigger than your critical value?
o If so, β€œreject the null hypothesis”

π‘₯Μ… was too many standard deviations away from πœ‡0 to be a fluke.
o If not, β€œfail to reject the null hypothesis”

Confidence Level
(1 βˆ’ 𝛼)
90%
95%
99%
π‘₯Μ… wasn’t far enough from πœ‡0 : it was probably a random fluke.
𝛼
significance level
0.10
0.05
0.01
𝛼/2
on Table A
0.050
0.025
0.005
critical value
z
1.645
1.960
2.575
Third way of drawing conclusions out of sample data: The p-value
π‘₯Μ… βˆ’πœ‡0
ο‚·
Calculate the z-score, 𝜎/
ο‚·
Find the associated β€œstandard normal probability” in Table A. This is 𝑃/2
βˆšπ‘›
o For example, the associated β€œstandard normal probability”
for the z-score of 2.10 is 𝑃/2 = 0.0179.
ο‚·
Find 𝑃
o If the z-score is 2.10, 𝑃/2 = 0.0179 and 𝑃 = 0.0358.

ο‚·
β€œSuppose the true mean is πœ‡0 . Sample means will vary. Only P% of all
samples from this population have sample means more extreme than
π‘₯Μ… .”
Pick a significance level (𝛼) that you are comfortable with.
o 𝛼 = 5% is a popular significance level.

β€œIf π‘₯Μ… is not extreme enough (more than 𝛼% of all samples from this
population have sample means more extreme than π‘₯Μ… ), we will accept
this sample’s result as a fluke and not really different from πœ‡0 .”
ο‚·
ο‚·
β€œwe are willing to reject a true null hypothesis 𝛼% of the time.”
Is the calculated P-value smaller than your chosen significance level?
o If so, β€œreject the null hypothesis”

β€œthis result would have happened too infrequently if the true mean
were πœ‡0 .”
o If not, β€œfail to reject the null hypothesis”

Statistic
P-value
z-score
(or t-statistic)
Confidence
Interval, with
Confidence Level
(1 βˆ’ 𝛼)
β€œthis result is not infrequent enough.”
Test
significance level,
𝛼
Critical value, z*
Critical value, t*
Does πœ‡0 lie within
the Confidence
Interval?
Reject H0 : πœ‡ = πœ‡0
Fail to Reject H0
P-value < 𝛼
P-value > 𝛼
z-score > 𝑧 βˆ—
t-stat > 𝑑 βˆ—
z-score < 𝑧 βˆ—
t-stat < 𝑑 βˆ—
If πœ‡0 not within CI
If πœ‡0 within CI