Download Chapter 6: Introduction to Inference

Chapter 10: Inference for Regression I. Simple Linear Regression (IPS section 10.1 pages 658-677) A. Statistical Model for Linear Regression – Simple linear regression studies the relationship between a response variable y and a single explanatory variable x. We expect that different values of x will produce different mean responses. The statistical model for simple linear regression states that the observed response yi when the explanatory variable takes the value xi is y i   0  1xi   i where i  1,2,..., n . The  i are assumed to be independent and normally distributed with mean 0 and standard deviation σ. The parameters of the model are  0 , 1, and σ. B. Population Regression Line – The equation of the line is μy=β0+β1x  y   0   1x with intercept  0 and slope 1 . This is the population regression line. It describes how the mean response changes with x. Actually observed y’s will vary about these means. The model assumes that this variation, measured by the standard deviation σ, is the same for all values of x. C. The intercept and slope  0 and 1 are estimated by the intercept and slope of the least- ei2 n2 where the ei are the residuals ei  observed response-predicted response = yi  yˆi D. Confidence Intervals and Significance Tests for Regression Slope and Intercept – A 100(1-α)% confidence interval for the intercept β0 is b0  t / 2SEb0 squares regression line, bo and b1. The parameter σ is estimated by s  A 100(1-α)% confidence interval for the slope β1 is b1  t / 2SEb1 To test the hypothesis Ho: β1=0 compute the t statistic t  b1 . The degrees of freedom SEb1 are n -2. In terms of a random variable T having t (n-2) distribution, the P-value for a test of Ho against Ha : 1  0 is P(T  t ) Ha : 1  0 is P(T  t ) Ha : 1  0 is 2P(T | t |) NOTE: testing whether the true slope 1 = 0 or not is actually testing whether x is useful in predicting y or not. If you reject, you would conclude that x is a good predictor of y (or their linear relationship is statistically significant). If you fail to reject, you would conclude that x is not useful in predicting y (or their linear relationship is NOT significant). Moore, David and McCabe, George. 2002. Introduction to the Practice of Statistics. W. H. Freeman and Company, New York. 656-707. E. Confidence Interval for a Mean Response – A 100(1-α)% confidence interval for the mean response μy when x takes the value x* is ˆ y  t / 2SE ˆ . Confidence intervals are for the mean, or average response. F. Prediction Interval for a Future Observation – A 100(1-α)% prediction interval for a future observation on the response variable y from the subpopulation corresponding to x* is yˆ  t / 2SE yˆ . Prediction intervals are for individual responses; they are used to predict a future observation. II. More Detail about Simple Linear Regression (IPS section 10.2 pages 678-691) A Analysis of Variance for Regression – The usual computer output for regression includes additional calculations called analysis of variance. Analysis of variance, often abbreviated ANOVA, is essential for multiple regression and for comparing several means. Analysis of variance summarizes information about the sources of variation in the data. It is based on the DATA=FIT + RESIDUAL framework. G. Sums of Squares, Degrees of Freedom, and Mean Squares – Sums of squares represent variation present in the responses. They are calculated by summing squared deviations. Analysis of variance partitions the total variation between two sources. The sums of squares are related by the formula SST=SSM+SSE. That is, the total variation is partitioned into two parts, one due to the model and one due to the deviations from the model. Degrees of freedom are associated with each sum of squares. The are related in the same way. DFT=DFM+DFE. To calculate mean squares, use the formula sum of squares MS  degrees of freedom H. Analysis of Variance F-Test – In the simple linear regression model, the hypothesis H o : 1  0 H1 : 1  0 are tested by the F statistic F MSM MSE The P-value is the probability that a random variable having the F (1,n - 2) distribution is greater than or equal to the calculated value of the F statistic. I. ANOVA table – The ANOVA calculations are displayed in an analysis of variance table, often abbreviated ANOVA table. Here is the format of the table for simple linear regression: Moore, David and McCabe, George. 2002. Introduction to the Practice of Statistics. W. H. Freeman and Company, New York. 656-707. Source Sum of Squares Mean Square F Groups Degrees of Freedom 1   yˆ i  y  SSM/DFM MSM/MSE Error n -2   y i  yˆ i  Total n -1   yi  y  2 2 2 SSE/DFE SST/DFT The following information is added for completeness. You are not responsible for these formulas. J. Standard Errors for Estimated Regression Coefficients –The standard error of the slope b1 of the least-squares regression line is s SEb1  ( xi  x )2 The standard error of the intercept bo is 1 x2 SEb0  s  n ( xi  x )2 K. Standard Errors for ˆ and yˆ - The standard error of ̂ is 1 ( x *  x )2 SEˆ  s  n ( xi  x )2 The standard error for predicting an individual response ŷ is 1 ( x *  x )2 SEyˆ  s 1   n ( xi  x )2 L. Test for a Zero Population Correlation – To test the hypothesis Ho :   0 , compute the t statistic: r n2 t 1 r 2 where n is the sample size and r is the sample correlation. In terms of a random variable T having the t (n – 2) distribution, the P-value for a test of Ho against Ha :   0 is P(T  t ) Ha :   0 is P(T  t ) Ha :   0 is 2P(T | t |) Moore, David and McCabe, George. 2002. Introduction to the Practice of Statistics. W. H. Freeman and Company, New York. 656-707.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter 6: Introduction to Inference