Download Powerpoint

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Inference for Regression
AP Exam Review
Slope as a Sample Statistic
Say you have data on cost and square footage of a random
sample of houses from Gainesville, FL, and you generate a leastsquares regression line on cost v. square footage.
Or maybe you have conducted an experiment to see the effect
of levels of pesticide on the numbers of insects on a plant, and
you plot number of insects v. level of insecticide
If you had a different sample of houses, or ran the experiment
again, you would end up with a different regression line.
That means the slope and y-intercept are statistics based on
your data. And that means they have sampling distributions.
Slope as a Sample Statistic
Let’s look at an example of that, which I stole
from Statistics in Action: Understanding a World
of Data.
First, notice the symbols my b0 and b1. That’s different
from your formula sheet, which has
The difference, as we often see, is the Greek letters
represent population parameters and the English letters
represent sample statistics. And the hat over the y means
the same thing as a hat over the p in proportions. It is
our estimate based on the sample.
Mean height = 35 + 2.age
The plot at the right shows
a sample of one child at
each age based on our
model.
The line we came up shows
the mean height of boys at
each age, but there is
variation. The model for
our variation is a normal
distribution with mean 0
and sd 2.1.
This is one possible sample
of children. The regression
line has slope 1.61. We’ll
record this and put it on a
plot.
The dot in the plot at the
right represents that
particular sample of five
boys, and its location is the
slope of the regression line.
Here is another sample of
five boys. The slope of this
regression line is 1.56.
Now we’ll add that to our
dotplot.
Now let’s look at the slopes
of 100 lines from 100
samples.
What does the red dot
represent?
Describe this distribution
(shape, center, spread).
Notice the mean is very
close to 2 (which is b1). In
this case the standard
deviation is about 0.5.
So, what affects that
standard deviation?
is called s and refers to the standard deviation of the residuals.
So, the sampling distribution of the sample slope is approximately normal. The
mean is b1, and the standard deviation is given by that messy formula, which
you don’t need to know. And it’s on the formula sheet if you need it.
But you should know what affects the sampling distribution of the slope.
But, just like working with means, we don’t know the actual standard deviation
of all the residuals, so we estimate it based on the standard deviation of the
residuals from our sample. So, just like with means, we use t instead of z to
compensate for that. The difference is that now we use n – 2 degrees of
freedom.
Now that we know that, we can do inference for slope of a regression line. It
uses the same logic as other inference, and follows the same procedures. We
state our hypotheses, check conditions – Wait. What conditions do we check?
Think about the heights of boys simulation. How did we set it up?
•
•
•
•
We had an underlying linear relationship.
We had a random sample of boys from each fixed x-value.
The y-values were normally distributed at each fixed x-value.
The y-values had the same standard deviation at each fixed x-value.
That’s the set of mathematical that makes the t procedure work for slope.
Here’s how we check the conditions.
Multiple Choice Questions
• Take a minute to read and answer.
• When I say, discuss with your neighbor.
• Then we’ll discuss them as a class.
Answer: A
Answer: E
Related documents