Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
11.1 Variation in the Slope from Sample to Sample Objectives To learn that the regression equation ŷ b0 b1 x must sometimes be thought of as an estimate of a true underlying linear model, y 0 1 x . To understand that the slope of a regression line fitted from sample data will vary from sample to sample. To learn that having a wider spread in the values of x and a smaller spread in the values of y (for each x) decreases the variability of the slope b1. To learn that the variability in the values of y from a sample is measured in terms of the difference of y and its predicted value ŷ and so is equivalent to the variability in the residuals. Linear Models In Chapter 3 we learned about the following regression line equation which describes a linear relationship: ŷ b0 b1 x If this equation is based on a sample from the entire population then the values of b0 and b1 are simply estimates of the true population parameters 0 and 1 . The standard model for linear regression is: response = prediction from true regression line + random deviation y = ( b0 + b1 x ) + e y = my + e where b0 = intercept for the true regression line b1 = slope for the true regression line e = random error m y = mean of the values of y for the given value of the explanatory variable x (lies on the true regression line) 1 11.1 Variation in the Slope from Sample to Sample The values of y in a sample can be thought of as: observed response = fitted response + residual y ˆy residual y ( b0 b1 x ) residual Activity 11.1a: How Fast Do Kids Grow? Variability in x and y “Above” a single fixed value of x there are different values of y. The mean and variability of these values is called the conditional distribution of y given x. Each conditional distribution has a mean, y , and a measure of variability, . The mean of each conditional distribution lies on the theoretical line and the variability is assumed to be the same for each value of x. This implies that measures the variability of all values of y about the true regression line. This fact is used to estimate , essentially the standard deviation of the residuals, from your data. s y ˆy i n2 i 2 SSE n2 conditional distribution of y given x mean = y standard deviation = 2 11.1 Variation in the Slope from Sample to Sample The Standard Error of the Slope The different possible values of b1 are called the sampling distribution of the slope. The standard error (standard deviation) of this sampling distribution is: sb = 1 å( y - ŷ ) i se å( x - x ) i 2 i = 2 i n-2 å( x - x ) i 2 i = standard deviation of the residuals n -1 ×standard deviation of the x-values As you can see from the formula the variability in the slope of the regression line from sample to sample depends on the following: The sample size, n. Larger sample sizes result in less variability in the slope. The spread in the values of x. A wider spread in x results in regression lines with less variability in their slope. The variability of y at each fixed value of x ( s ). More variability in each conditional distribution of y means more variability in the slope b1. Example: According to Leonardo da Vinci, a person’s arm span and height are about equal. The table below gives height and arm span measurements for a sample of 15 high-school students. Arm Span (cm) Height (cm) 168 170.5 172 170 101 107 161 159 166 166 174 175 153.5 158 95 95.5 129 132.5 169 165 175 179 154 149 142 143 156.5 158 a) What is the theoretical regression line, m y = b 0 + b1x , that Leonardo is proposing for this situation? Use arm span as the explanatory variable. b) Find the least squares regression line, ŷ = b0 + b1x , for these data. Interpret the slope. Compare the estimated slope and intercept to the theoretical slope and intercept in part a. Are they close? c) Calculate the random deviation, e , for each student, using the theoretical regression line. Plot these random deviations against the arm spans and comment on the pattern. d) For each student, calculate the residual from the estimated regression line. Plot the residuals against the arm spans and comment on the pattern. Is the pattern similar to that for the random deviations? 3 11.1 Variation in the Slope from Sample to Sample 164 161