Download Powerpoint

Inference for Regression AP Exam Review Slope as a Sample Statistic Say you have data on cost and square footage of a random sample of houses from Gainesville, FL, and you generate a leastsquares regression line on cost v. square footage. Or maybe you have conducted an experiment to see the effect of levels of pesticide on the numbers of insects on a plant, and you plot number of insects v. level of insecticide If you had a different sample of houses, or ran the experiment again, you would end up with a different regression line. That means the slope and y-intercept are statistics based on your data. And that means they have sampling distributions. Slope as a Sample Statistic Let’s look at an example of that, which I stole from Statistics in Action: Understanding a World of Data. First, notice the symbols my b0 and b1. That’s different from your formula sheet, which has The difference, as we often see, is the Greek letters represent population parameters and the English letters represent sample statistics. And the hat over the y means the same thing as a hat over the p in proportions. It is our estimate based on the sample. Mean height = 35 + 2.age The plot at the right shows a sample of one child at each age based on our model. The line we came up shows the mean height of boys at each age, but there is variation. The model for our variation is a normal distribution with mean 0 and sd 2.1. This is one possible sample of children. The regression line has slope 1.61. We’ll record this and put it on a plot. The dot in the plot at the right represents that particular sample of five boys, and its location is the slope of the regression line. Here is another sample of five boys. The slope of this regression line is 1.56. Now we’ll add that to our dotplot. Now let’s look at the slopes of 100 lines from 100 samples. What does the red dot represent? Describe this distribution (shape, center, spread). Notice the mean is very close to 2 (which is b1). In this case the standard deviation is about 0.5. So, what affects that standard deviation? is called s and refers to the standard deviation of the residuals. So, the sampling distribution of the sample slope is approximately normal. The mean is b1, and the standard deviation is given by that messy formula, which you don’t need to know. And it’s on the formula sheet if you need it. But you should know what affects the sampling distribution of the slope. But, just like working with means, we don’t know the actual standard deviation of all the residuals, so we estimate it based on the standard deviation of the residuals from our sample. So, just like with means, we use t instead of z to compensate for that. The difference is that now we use n – 2 degrees of freedom. Now that we know that, we can do inference for slope of a regression line. It uses the same logic as other inference, and follows the same procedures. We state our hypotheses, check conditions – Wait. What conditions do we check? Think about the heights of boys simulation. How did we set it up? • • • • We had an underlying linear relationship. We had a random sample of boys from each fixed x-value. The y-values were normally distributed at each fixed x-value. The y-values had the same standard deviation at each fixed x-value. That’s the set of mathematical that makes the t procedure work for slope. Here’s how we check the conditions. Multiple Choice Questions • Take a minute to read and answer. • When I say, discuss with your neighbor. • Then we’ll discuss them as a class. Answer: A Answer: E

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Powerpoint