* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lab 6 Oct 13th 2016 - adv quant techniques
Survey
Document related concepts
Transcript
Advanced Quantitative Techniques Lab 6 October 13th 2016 • Agenda today – DSSC presentation: available data & how they can help you. Use their resources now – Final project ideas activity – Quick recap / midterm review – intro to regression in STATA (tbc next week) Final project peer brainstorming – Divide into groups of 4 – Turn to person next to you & describe your ideas thus far in 2 minutes. (5 mins) – Regroup and describe your partner’s idea to the 4 person group (15 mins) – 1 person from each group report back on sticky points – Ideas? Data? Tools? Midterm review, cont. A. Big concepts: get them right B. Problems: plug in carefully C. Explaining findings: don’t forget. Use precise language. Logistics: • Blue book, hand-written. • worth 20% of semester grade • Done in-class Midterm review: A: Big concepts: get them right e.g. • error types: type a, type b • Confidence: 90% confidant that this interval contains the true value of y. If we repeated the experiment 100 times, in 95, our value would fall inside of this range. • hypotheses • Power • Dependent & independent variables • If something doesn’t make sense, look up other explanations.. (I like graphics and cartoons) B: Manipulating equations Look @ homework problems + in-class examples • Standard distribution, CI, t-test, zstat – Knowing mean + SD, what % of observations fall below X value? • Calculate the input that you’re missing (either the sample or the population or the SD). Plug into m-m/SD. Look up value on z table. Remember to subtract if one-tailed. Use normal to approximate binomial if needed. – Calculate a mean, build a CI around it. Mean +se*Tcrit. Usually you’ll have to calculate SE from SD. C: Explaining findings: don’t forget • Make sure to write a concluding sentence. Hint: look back at the question. What puzzle are you trying to unravel? Make a formula sheet in your own words, e.g standard error = SD / sq root of sample size [sample] [pop estimate] [sample] t statistic = sample mean – pop mean / standard error s xz n * Quick ref for the important Z scores.. Confidence interval : mean plus or minus the z (or t) stat multiplied by standard error. Coyotes & poison Hypothesis (H1): <28% of coyotes will survive the winter. Null Hypothesis(H2): ≥28% more of coyotes will survive the winter. We want to see where the actual survival last year (51/214) =24% survival maps on the overall survival percentage (~population p/mean) s or σ = √p*(1-p) = √.28*(1-.28) =√.2016 = 0.45 s.e. = s/√n = .45/√214 = .031 t= = .24-.28/.031 = 1.33 (+-) p = .0885 … so yes to 90% but no to 95% significance Intro to regression in STATA - tbc • Open the 311 data Command: Scatterplot Relation between 311 calls & vacancy rate? calls_per_thousand & vacant • generate vacant_rate= vacant/ HSE_UNIT*100 • twoway (scatter calls_per_thousand vacant_rate) (lfit calls_per_thousand vacant_rate) Command: correlate (corr) • corr calls_per_thousand vacant_rate Linear Regression • Describes a relationship between an explained variable (y) and an explanatory variable (x). You “regress y on x.” • Attempts to explain this relationship with a straight line fit. • Simple linear regression has one input (x) and one output (y) • The ideal formula to approximate the regression: Y 0 1 X ( i ) Intercept Slope Error term What are ‘residuals’ (error terms)? Y 0 1 X ( i ) • Residuals (or error terms) are the difference between an observed value of the response variable and the value predicted by the regression line. • Residual = observed y – predicted y • Residuals represent the ‘leftover’ or ‘unexplained’ variation in the response variable after fitting the regression line. Command: regress (reg) • reg calls_per_thousand vacant_rate Interpreting the Output 1. Slope: • The coefficient of the independent variable (ß1) is the slope of the regression line. • Slope is the amount of increase in the dependent variable for every unit increase in the independent variable. 2. Y-Intercept: • The constant (ß0). Interpreting the Output 3. The p-value and CI of the Coefficients: • P-value corresponds to the coefficient of the independent variable. • If the p-value is less than alpha, you can conclude there is a statistically significant relationship between the independent variable and the dependent variable. • Or, you can examine whether zero is in the confidence interval of the independent variable. If zero is in the interval, then the coefficient is not statistically different from zero at 95% confidence. How to Read Stata Output? SS – Sum of Squares associated with three sources of variance: Model, Residual, and Total MS – Mean of Squares, the SS divided by the respective degrees of freedom. MS represents the sample, error and model variance respectively F-statistic – this is the MS Model divided by the MS Residual; the numbers in brackets are the respective df Prob>F – this is the p-value associated with F-statistic. It tests the hypothesis that all the model coefficients are 0 R-squared – the proportion of variance in y explained by the independent variables. Adjusted R-squared – in which the addition of extraneous variables to the model is penalized. It is always less than R-squared and increases only if the addition of one more explanatory variable improves the model more than what would be expected by chance Root MSE – the Root of the MS Residual. This is the standard deviation of the residuals