PowerPoint Download

Transcript
STATISTICS 200
Lecture #25
Textbook: 14.1 and 14.3
Tuesday, November 15, 2016
Objectives:
• Formulate null and alternative hypotheses involving
regression coefficients.
• Calculate T-statistics; determine correct degrees of freedom.
Refresher
• Chapter 3 covered regression equations
and the relationship between two
quantitative
__________ variables.
Example:
Q. What is your height in
inches?
Q. What is your father’s
height in inches?
Which variable should
be the explanatory
variable?
A.Father’s height
B.Student’s height
response on the y-axis
__________
Regression equation
R-squared
explanatory
___________ on the x-axis
Regression terminology
• The linear regression equation looks like this:
slope
Estimated
(mean)
response
explanatory
y-intercept
• Recall that “y-hat” is the estimated mean of the
response (y). It can also refer to a predicted value of y.
In our example:
If the dad was 0 inches tall,
the predicted student height
is 30.34 inches. CAREFUL!
y-hat = 30.34 + 0.53 x (dadheight)
For every one inch increase in
_______, the predicted _______
dadheight
height
increases by ______
.53 inches.
___________
Refresher
• An important point to remember is that if the
slope (b1) is different from zero, then the two
quantitative variables are linearly related.
Parameters for Regression
• When regression was discussed before, we only talked
b0 (sample
about it in terms of the sample statistics: _____
b1 (sample slope).
intercept) and ___
parameters for the slope and
• However, there are also ____________
intercept in a linear regression equation.
• These parameters represent the intercept and slope that
would be found if the whole population for both variables
was used to create a regression equation.
Parameters for Regression
β0 is the population intercept. It is estimated by the
• ____
sample intercept (b0).
β1 is the population slope. It is estimated by the
• ____
sample slope (b1).
Parameters for Regression
E(y) is the population mean response (i.e. expected
• _____
value of y for all individuals in the population who have
the particular value of x.)
y-hat is an estimate of E(y)
• Note that ______
____.
• The value epsilon is called the error or the deviation.
It has a mean of zero. (And we assume it is normally
distributed.)
Parameters for Regression
• If two variables have a linear relationship, then β1
(the population slope) would be different from
0
_____.
Hypothesis Testing About the Slope
• Statistical significance of a linear relationship can be
evaluated by testing whether the population slope is
0
______
or not.
• This test is done in a similar way to tests with
proportions and means.
• First, the null and alternative hypotheses need
to be determined.
Null and Alternative Hypotheses
• Null hypothesis
β1 = 0
H0: ______
• This would mean that our two variables, x and y,
are
not linearly related
_______
• Alternative hypothesis
β1 ≠ 0
Ha: _________
are linearly related
• The variables x and y ____
Null and Alternative Hypotheses
• The alternative hypothesis can be 1-sided
_______ as well
(β1 > 0 or β1 < 0), but most software use the 2-sided
_______
alternative hypothesis (β1 ≠ 0)
2-sided alternative hypothesis
• We will only use the ________
The Test Statistic
t-statistic is
• For the hypothesis tests for slope, the __________
used.
• The t-statistic is calculated in the same way as before:
The Test Statistic
• When we are using the t-test for the test of the slope, the
degrees of freedom are equal to the sample size minus
two.
n–2
• df = ________
The Test Statistic
• The calculations for the sample slope and its
standard error are complicated
• Luckily, Minitab can do this for us:
Coefficients
Term
Constant
dadheight
b1
Coef
30.34
0.5280
SE Coef
5.08
0.0732
T-Value
5.98
7.21
s.e.(b1)
P-Value
0.000
0.000
VIF
1.00
p-value
t-stat
Example: Age and Reading Distance
• A sample was taken in which subjects were asked their
age, and then they were measured to see how far away
they could read a road sign.
• Age was treated as the explanatory variable, and reading
distance was the response variable.
• There are n = 30 observations
Example: Age and Reading Distance
• The sample slope was –3.0068, which means that for
each additional year of age, the estimated reading
decreased by about 3 feet.
distance ____________
• The standard error for the slope was 0.4243.
Example: Age and Reading Distance
• The t-statistic is calculated like this:
• The Minitab output would look like this:
Example: Age and Reading Distance
• The correct conclusion is that, since the p-value is
< 0.05 , the null hypothesis should be rejected
______
________.
• This would mean the slope is significantly
0
__________
different from ______.
linearly related
• So age and reading distance are
___________________.
Confidence Intervals for Slope
• Just like with means and proportions, confidence intervals
can be made for slopes.
• These intervals are used to estimate the true
value for the population slope.
Confidence Intervals for Slope
• Just like with hypothesis testing, the value for degrees of
Two fewer than the sample size.
freedom is ___________
n–2
• df = ______
Example: Age and Reading Distance
• The 95% confidence interval for the slope from
the reading distance example is
Example: Age and Reading Distance
• The correct interpretation for this confidence
interval is that we are 95% confident that the true
population slope
____________________for
the linear
relationship between reading distance and age is
-3.88 _______
-2.14and ________.
between
• Does this agree with our conclusion from the
YES!
hypothesis test? __________.
Correlation
• Remember, correlation (r) is a measure of
direction and _________
strength for a linear
_________
relationship
• As a note, if you find a significant hypothesis test
for the population slope (so β1 ≠ 0), then the
correlation will also be significantly different from
zero.
Sample Size and Significance
• An important concept to keep in mind is that the
larger the sample size, the more likely it is that
significance would be found for a hypothesis
test
n
• ___increases
p-value decreases
• _________
n increases
• _____
p-value decreases
significance increases
significance increases
If you understand today’s lecture…
14.1, 14.2, 14.4, 14.5, 14.7, 14.9, 14.21,
14.22, 14.24, 14.25, 14.27, 14.28
Objectives:
• Formulate null and alternative hypotheses involving
regression coefficients.
• Calculate T-statistics; determine correct degrees of freedom.