Download RAND

Chapter 6 Handout Stata Exercise 1: Sampling and Confidence Intervals This exercise relies on the bsample command bsample -- Sampling with replacement: draws random samples with replacement from the data in memory  Type this into a Do-file. clear set obs 999 gen x = invnorm(uniform()) br x hist x ci x  When you are done typing, hit CTRL-D  Suppose we have a hypothesis and, given that hypothesis, a sample outcome looks extreme. This is evidence that they hypothesis is not true. gen sample =. bsample 20, weight(sample) br x sample if sample==1 hist x if sample==1 ci x if sample==1  When you are done typing (in a Do-file), hit CTRL-D 𝜎  FALSE: “There’s a 95% chance that the true mean fall within 𝑥̅ ± 2 ( 𝑥⁄ )” √𝑛 – This is false because the true population mean just is. It doesn’t have a sampling distribution because it is not the result of a sample. So either the interval contains the true population mean (which is not a random variable) or it doesn’t. If it does contain it, then the probability that the interval contains the true population mean is Pr=1. If it doesn’t then the probability is Pr=0. 𝜎  TRUE: “There’s a 95% chance that the this interval, given by 𝑥̅ ± 2 ( 𝑥⁄ ), is one √𝑛 of the ones that contain the true mean.” forvalues i=1/20 { gen sampleì'=. bsample 20, weight(sampleì') ci x if sampleì'==1 }  When you are done typing (in a Do-file), hit CTRL-D Stata Exercise 2: Confidence Intervals for different Confidence Levels ci x if sample1==1, level(90) ci x if sample1==1 ci x if sample1==1, level(99) Stata Exercise 3: Confidence Intervals for different sample sizes gen sample21=. gen sample22=. bsample 20, weight(sample21) bsample 100, weight(sample22) hist x if sample21==1 hist x if sample22==1 ci x if sample21==1 ci x if sample22==1 Stata Exercise 4: Confidence Intervals for different population standard deviations Remember that if you have a variable 𝑥 with variance 𝜎𝑥2 , and you multiply it times 𝑏, the variance of bx is b2 σ2x . 2 2 2 𝜎𝑏𝑥 = 𝑏 𝜎𝑥 the standard deviation of 𝑏𝑥 is 𝑏𝜎𝑥 . 𝜎𝑏𝑥 = 𝑏𝜎𝑥 gen y = gen w = sum y w ci y if ci w if invnorm(uniform()) y*.5 sample21==1 sample21==1 Excel Exercise 1 1. Tools | Data Analysis … | Random Number Generator o Fill out the dialog box with the information on the right. Hit OK. 2. On cell K1, type =SUM(A1:J1) o Double-click on the handle to extend that formula all the way down to cell K160 3. Selecting the cells from K1 to K160, press CTRL-C to copy the numbers into STATA. 4. Open STATA. Type ed to open the Data Editor 5. Draw a histogram of the data, overlaying a normal density plot and a kernel density plot: hist v, norm kden 6. Draw a Normal quantile plot, using qnorm v. o If the dots of the plot lie close to a straight line, there is evidence the data is Normally distributed. 7. We know that the underlying data is uniformly distributed, because we generated the distribution. We would have gotten a similar uniform distribution from 10 tosses of an icosahedron (20-sided die). o What is the shape of the distribution of the sum of a 10 observations of a uniform random variable? Stata Exercise 5     Using the results from the previous exercise o ed o rename var1 icosahedron o tabstat i, s(mean sd)  The theoretical mean of the sum of the outcomes of rolling ten icohsahedrons is 105 and the theoretical standard deviation is about 20. To calculate the standardized value (z-score) o gen stdicosah = (icosahedron-105)/20 To make this look like IQ scores (which have mean 100 and stdev 15), o gen iq = stdicosah*15+100 o tabstat ic std iq, s(mean sd)  The result is “your” IQ score. Let’s compare the means of fake-IQ scores of men vs. women. o Calculate the means (meanW and meanM). o Calculate the difference between means (meanW – meanM) o then calculate whether this difference is significantly different from zero.   Those with fake-IQ scores o above 110, hold up two hands (group A) o 90 to 110, hold up one hand (group B) o below 90, keep hands down. (group C)  Are there any visible characteristics that differentiate each group? Let’s compare the means of fake-IQ scores o of group A vs group B o of group B vs group C o of group A vs group C  Is the difference statistically significant? A short summary of Statistical Inference  Get a sample o Other people’s samples (out of the same population) may be different  Calculate descriptive statistics (such as the sample mean). o Law of large numbers  the larger a sample gets, the closer the (sample) statistic gets to the parameter. o Central limit theorem  Each sample will have different statistics (say, a different mean). There will be a distribution of these statistics for many different samples.  If the many samples are decently large, this distribution (of the sample mean) will be centered around the true population mean, and that the spread of this distribution is smaller if the samples are larger.  Formally, 𝑥̅ ~𝑁(𝜇, 𝜎/√𝑛). The average of your random sample’s mean is the true mean, and the standard deviation of the sample’s mean is the true population standard deviation, divided by the square root of the sample size.  Calculate the standard deviation of the sample mean: 𝝈/√𝒏 o Typically, we don’t know the true population standard deviation, 𝜎. So we estimate it with the sample standard deviation, sx. First way of drawing conclusions out of sample data: The Confidence Interval  Pick a confidence level (1 − 𝛼) that you are comfortable with. o (1 − 𝛼) = 95% is a popular confidence level.  Find the critical value 𝑧 ∗ associated with that confidence level o Get this from Table A.  If you want an (1 − 𝛼) confidence interval, find the 𝛼/2 value on (the left-side of) Table A. Then find the associated z-critical value.  For a 95% confidence interval, find 0.05/2 = 0.025. The associated zscore is 1.96.  Calculate the margin of error: 𝑧 ∗ (𝜎/√𝑛) Confidence Level (1 − 𝛼) 90% 95% 99%  𝛼 significance level 0.10 0.05 0.01 𝛼/2 on Table A 0.050 0.025 0.005 critical value z 1.645 1.960 2.575 Calculate the Confidence Interval, for the particular confidence level, 1 − 𝛼: o This CI is wide enough that (1 − 𝛼)% of samples will contain the true mean. 𝑥̅ ± 𝑧𝛼∗ (𝜎/√𝑛) or (𝑥̅ − 𝑧𝛼∗ 𝜎 √𝑛 , 𝑥̅ + 𝑧𝛼∗ 𝜎 √𝑛 ) For example, the CI might be 3 ± 0.45, which is (2.55, 3.45).  Does your hypothesized value fall within the Confidence Interval? o If so, “fail to reject the null hypothesis” o If not, “reject the null hypothesis” Second way of drawing conclusions out of sample data: The z-score  What’s your null hypothesis, 𝐻0 ? o For sample, that the true population average GPA (𝜇) is 𝜇0 = 3.0 o Or that the true population mean (𝜇) is some given value, 𝜇0 . o So we say 𝐻0 : 𝜇 = 𝜇0  Calculate the sample mean, 𝑥̅  Calculate the standard deviation of the sample mean: 𝜎/√𝑛  Calculate the z-score 𝑥̅ − 𝜇0 𝜎/√𝑛   𝑥̅ is 𝑧 standard deviations away from 𝜇0 . Pick a significance level (𝛼) that you are comfortable with. o 𝛼 = 5% is a popular significance level. o Find the critical value 𝑧 ∗ associated with that significance level  “If the statistic is less than 𝑧 ∗ standard deviations away from the hypothesized value, we will think it’s a fluke.”  Is the calculated z-score bigger than your critical value? o If so, “reject the null hypothesis”  𝑥̅ was too many standard deviations away from 𝜇0 to be a fluke. o If not, “fail to reject the null hypothesis”  Confidence Level (1 − 𝛼) 90% 95% 99% 𝑥̅ wasn’t far enough from 𝜇0 : it was probably a random fluke. 𝛼 significance level 0.10 0.05 0.01 𝛼/2 on Table A 0.050 0.025 0.005 critical value z 1.645 1.960 2.575 Third way of drawing conclusions out of sample data: The p-value 𝑥̅ −𝜇0  Calculate the z-score, 𝜎/  Find the associated “standard normal probability” in Table A. This is 𝑃/2 √𝑛 o For example, the associated “standard normal probability” for the z-score of 2.10 is 𝑃/2 = 0.0179.  Find 𝑃 o If the z-score is 2.10, 𝑃/2 = 0.0179 and 𝑃 = 0.0358.   “Suppose the true mean is 𝜇0 . Sample means will vary. Only P% of all samples from this population have sample means more extreme than 𝑥̅ .” Pick a significance level (𝛼) that you are comfortable with. o 𝛼 = 5% is a popular significance level.  “If 𝑥̅ is not extreme enough (more than 𝛼% of all samples from this population have sample means more extreme than 𝑥̅ ), we will accept this sample’s result as a fluke and not really different from 𝜇0 .”   “we are willing to reject a true null hypothesis 𝛼% of the time.” Is the calculated P-value smaller than your chosen significance level? o If so, “reject the null hypothesis”  “this result would have happened too infrequently if the true mean were 𝜇0 .” o If not, “fail to reject the null hypothesis”  Statistic P-value z-score (or t-statistic) Confidence Interval, with Confidence Level (1 − 𝛼) “this result is not infrequent enough.” Test significance level, 𝛼 Critical value, z* Critical value, t* Does 𝜇0 lie within the Confidence Interval? Reject H0 : 𝜇 = 𝜇0 Fail to Reject H0 P-value < 𝛼 P-value > 𝛼 z-score > 𝑧 ∗ t-stat > 𝑡 ∗ z-score < 𝑧 ∗ t-stat < 𝑡 ∗ If 𝜇0 not within CI If 𝜇0 within CI

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download RAND