STATISTICS 200 Lecture #25 Textbook: 14.1 and 14.3 Tuesday, November 15, 2016 Objectives: • Formulate null and alternative hypotheses involving regression coefficients. • Calculate T-statistics; determine correct degrees of freedom. Refresher • Chapter 3 covered regression equations and the relationship between two quantitative __________ variables. Example: Q. What is your height in inches? Q. What is your father’s height in inches? Which variable should be the explanatory variable? A.Father’s height B.Student’s height response on the y-axis __________ Regression equation R-squared explanatory ___________ on the x-axis Regression terminology • The linear regression equation looks like this: slope Estimated (mean) response explanatory y-intercept • Recall that “y-hat” is the estimated mean of the response (y). It can also refer to a predicted value of y. In our example: If the dad was 0 inches tall, the predicted student height is 30.34 inches. CAREFUL! y-hat = 30.34 + 0.53 x (dadheight) For every one inch increase in _______, the predicted _______ dadheight height increases by ______ .53 inches. ___________ Refresher • An important point to remember is that if the slope (b1) is different from zero, then the two quantitative variables are linearly related. Parameters for Regression • When regression was discussed before, we only talked b0 (sample about it in terms of the sample statistics: _____ b1 (sample slope). intercept) and ___ parameters for the slope and • However, there are also ____________ intercept in a linear regression equation. • These parameters represent the intercept and slope that would be found if the whole population for both variables was used to create a regression equation. Parameters for Regression β0 is the population intercept. It is estimated by the • ____ sample intercept (b0). β1 is the population slope. It is estimated by the • ____ sample slope (b1). Parameters for Regression E(y) is the population mean response (i.e. expected • _____ value of y for all individuals in the population who have the particular value of x.) y-hat is an estimate of E(y) • Note that ______ ____. • The value epsilon is called the error or the deviation. It has a mean of zero. (And we assume it is normally distributed.) Parameters for Regression • If two variables have a linear relationship, then β1 (the population slope) would be different from 0 _____. Hypothesis Testing About the Slope • Statistical significance of a linear relationship can be evaluated by testing whether the population slope is 0 ______ or not. • This test is done in a similar way to tests with proportions and means. • First, the null and alternative hypotheses need to be determined. Null and Alternative Hypotheses • Null hypothesis β1 = 0 H0: ______ • This would mean that our two variables, x and y, are not linearly related _______ • Alternative hypothesis β1 ≠ 0 Ha: _________ are linearly related • The variables x and y ____ Null and Alternative Hypotheses • The alternative hypothesis can be 1-sided _______ as well (β1 > 0 or β1 < 0), but most software use the 2-sided _______ alternative hypothesis (β1 ≠ 0) 2-sided alternative hypothesis • We will only use the ________ The Test Statistic t-statistic is • For the hypothesis tests for slope, the __________ used. • The t-statistic is calculated in the same way as before: The Test Statistic • When we are using the t-test for the test of the slope, the degrees of freedom are equal to the sample size minus two. n–2 • df = ________ The Test Statistic • The calculations for the sample slope and its standard error are complicated • Luckily, Minitab can do this for us: Coefficients Term Constant dadheight b1 Coef 30.34 0.5280 SE Coef 5.08 0.0732 T-Value 5.98 7.21 s.e.(b1) P-Value 0.000 0.000 VIF 1.00 p-value t-stat Example: Age and Reading Distance • A sample was taken in which subjects were asked their age, and then they were measured to see how far away they could read a road sign. • Age was treated as the explanatory variable, and reading distance was the response variable. • There are n = 30 observations Example: Age and Reading Distance • The sample slope was –3.0068, which means that for each additional year of age, the estimated reading decreased by about 3 feet. distance ____________ • The standard error for the slope was 0.4243. Example: Age and Reading Distance • The t-statistic is calculated like this: • The Minitab output would look like this: Example: Age and Reading Distance • The correct conclusion is that, since the p-value is < 0.05 , the null hypothesis should be rejected ______ ________. • This would mean the slope is significantly 0 __________ different from ______. linearly related • So age and reading distance are ___________________. Confidence Intervals for Slope • Just like with means and proportions, confidence intervals can be made for slopes. • These intervals are used to estimate the true value for the population slope. Confidence Intervals for Slope • Just like with hypothesis testing, the value for degrees of Two fewer than the sample size. freedom is ___________ n–2 • df = ______ Example: Age and Reading Distance • The 95% confidence interval for the slope from the reading distance example is Example: Age and Reading Distance • The correct interpretation for this confidence interval is that we are 95% confident that the true population slope ____________________for the linear relationship between reading distance and age is -3.88 _______ -2.14and ________. between • Does this agree with our conclusion from the YES! hypothesis test? __________. Correlation • Remember, correlation (r) is a measure of direction and _________ strength for a linear _________ relationship • As a note, if you find a significant hypothesis test for the population slope (so β1 ≠ 0), then the correlation will also be significantly different from zero. Sample Size and Significance • An important concept to keep in mind is that the larger the sample size, the more likely it is that significance would be found for a hypothesis test n • ___increases p-value decreases • _________ n increases • _____ p-value decreases significance increases significance increases If you understand today’s lecture… 14.1, 14.2, 14.4, 14.5, 14.7, 14.9, 14.21, 14.22, 14.24, 14.25, 14.27, 14.28 Objectives: • Formulate null and alternative hypotheses involving regression coefficients. • Calculate T-statistics; determine correct degrees of freedom.