Download PowerPoint

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Transcript
STATISTICS 200
Lecture #25
Textbook: 14.1 and 14.3
Tuesday, November 15, 2016
Objectives:
• Formulate null and alternative hypotheses involving
regression coefficients.
• Calculate T-statistics; determine correct degrees of freedom.
Refresher
• Chapter 3 covered regression equations
and the relationship between two
quantitative
__________ variables.
Example:
Q. What is your height in
inches?
Q. What is your father’s
height in inches?
Which variable should
be the explanatory
variable?
A.Father’s height
B.Student’s height
response on the y-axis
__________
Regression equation
R-squared
explanatory
___________ on the x-axis
Regression terminology
• The linear regression equation looks like this:
slope
Estimated
(mean)
response
explanatory
y-intercept
• Recall that “y-hat” is the estimated mean of the
response (y). It can also refer to a predicted value of y.
In our example:
If the dad was 0 inches tall,
the predicted student height
is 30.34 inches. CAREFUL!
y-hat = 30.34 + 0.53 x (dadheight)
For every one inch increase in
_______, the predicted _______
dadheight
height
increases by ______
.53 inches.
___________
Refresher
• An important point to remember is that if the
slope (b1) is different from zero, then the two
quantitative variables are linearly related.
Parameters for Regression
• When regression was discussed before, we only talked
b0 (sample
about it in terms of the sample statistics: _____
b1 (sample slope).
intercept) and ___
parameters for the slope and
• However, there are also ____________
intercept in a linear regression equation.
• These parameters represent the intercept and slope that
would be found if the whole population for both variables
was used to create a regression equation.
Parameters for Regression
β0 is the population intercept. It is estimated by the
• ____
sample intercept (b0).
β1 is the population slope. It is estimated by the
• ____
sample slope (b1).
Parameters for Regression
E(y) is the population mean response (i.e. expected
• _____
value of y for all individuals in the population who have
the particular value of x.)
y-hat is an estimate of E(y)
• Note that ______
____.
• The value epsilon is called the error or the deviation.
It has a mean of zero. (And we assume it is normally
distributed.)
Parameters for Regression
• If two variables have a linear relationship, then β1
(the population slope) would be different from
0
_____.
Hypothesis Testing About the Slope
• Statistical significance of a linear relationship can be
evaluated by testing whether the population slope is
0
______
or not.
• This test is done in a similar way to tests with
proportions and means.
• First, the null and alternative hypotheses need
to be determined.
Null and Alternative Hypotheses
• Null hypothesis
β1 = 0
H0: ______
• This would mean that our two variables, x and y,
are
not linearly related
_______
• Alternative hypothesis
β1 ≠ 0
Ha: _________
are linearly related
• The variables x and y ____
Null and Alternative Hypotheses
• The alternative hypothesis can be 1-sided
_______ as well
(β1 > 0 or β1 < 0), but most software use the 2-sided
_______
alternative hypothesis (β1 ≠ 0)
2-sided alternative hypothesis
• We will only use the ________
The Test Statistic
t-statistic is
• For the hypothesis tests for slope, the __________
used.
• The t-statistic is calculated in the same way as before:
The Test Statistic
• When we are using the t-test for the test of the slope, the
degrees of freedom are equal to the sample size minus
two.
n–2
• df = ________
The Test Statistic
• The calculations for the sample slope and its
standard error are complicated
• Luckily, Minitab can do this for us:
Coefficients
Term
Constant
dadheight
b1
Coef
30.34
0.5280
SE Coef
5.08
0.0732
T-Value
5.98
7.21
s.e.(b1)
P-Value
0.000
0.000
VIF
1.00
p-value
t-stat
Example: Age and Reading Distance
• A sample was taken in which subjects were asked their
age, and then they were measured to see how far away
they could read a road sign.
• Age was treated as the explanatory variable, and reading
distance was the response variable.
• There are n = 30 observations
Example: Age and Reading Distance
• The sample slope was –3.0068, which means that for
each additional year of age, the estimated reading
decreased by about 3 feet.
distance ____________
• The standard error for the slope was 0.4243.
Example: Age and Reading Distance
• The t-statistic is calculated like this:
• The Minitab output would look like this:
Example: Age and Reading Distance
• The correct conclusion is that, since the p-value is
< 0.05 , the null hypothesis should be rejected
______
________.
• This would mean the slope is significantly
0
__________
different from ______.
linearly related
• So age and reading distance are
___________________.
Confidence Intervals for Slope
• Just like with means and proportions, confidence intervals
can be made for slopes.
• These intervals are used to estimate the true
value for the population slope.
Confidence Intervals for Slope
• Just like with hypothesis testing, the value for degrees of
Two fewer than the sample size.
freedom is ___________
n–2
• df = ______
Example: Age and Reading Distance
• The 95% confidence interval for the slope from
the reading distance example is
Example: Age and Reading Distance
• The correct interpretation for this confidence
interval is that we are 95% confident that the true
population slope
____________________for
the linear
relationship between reading distance and age is
-3.88 _______
-2.14and ________.
between
• Does this agree with our conclusion from the
YES!
hypothesis test? __________.
Correlation
• Remember, correlation (r) is a measure of
direction and _________
strength for a linear
_________
relationship
• As a note, if you find a significant hypothesis test
for the population slope (so β1 ≠ 0), then the
correlation will also be significantly different from
zero.
Sample Size and Significance
• An important concept to keep in mind is that the
larger the sample size, the more likely it is that
significance would be found for a hypothesis
test
n
• ___increases
p-value decreases
• _________
n increases
• _____
p-value decreases
significance increases
significance increases
If you understand today’s lecture…
14.1, 14.2, 14.4, 14.5, 14.7, 14.9, 14.21,
14.22, 14.24, 14.25, 14.27, 14.28
Objectives:
• Formulate null and alternative hypotheses involving
regression coefficients.
• Calculate T-statistics; determine correct degrees of freedom.
Document related concepts

Student's t-test wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Misuse of statistics wikipedia, lookup

Degrees of freedom (statistics) wikipedia, lookup

Psychometrics wikipedia, lookup

Foundations of statistics wikipedia, lookup

Omnibus test wikipedia, lookup