Download Chapter 12

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear least squares (mathematics) wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Chapter 12
More About Regression 
Let’s look at the Warm-Up first to remind ourselves
what we did with regression!
Remember FODS!
Section 12.1
Inference for Linear Regression
Least-Squares Regression fits a straight
line of the form 𝑦 = π‘Ž + 𝑏π‘₯ to data to
predict a response variable y from the
explanatory variable x.
ο‚— Inference in this setting uses the sample
regression line to estimate or test a
claim about the population (true)
regression line.
ο‚—
ο‚—
Confidence intervals and significance tests
about the slope of the population
regression line are based on the sampling
distribution of b, the slope of the sample
regression line.
Conditions - LINER
ο‚—
ο‚—
ο‚—
ο‚—
ο‚—
Linear – the actual relationship between x and y is
linear. For any fixed value of x, the mean response πœ‡π‘₯ falls
on the population (true) regression line πœ‡π‘¦ = 𝛼 + 𝛽π‘₯ –
graph scatterplot
Independent – individual observations are independent
Normal – for any fixed value of x, the response y varies
according to a Normal distribution – you will be graphing
the Normal probability plot of the residuals here.
Equal Variance – the standard deviation of y (call it 𝜎)
is the same for all values of x – graph residual plot
Random – the data are produced from a well-designed
random sample or a randomized experiment.
How it works…
The slope b and intercept a of the leastsquares regression line estimate the slope
𝛽 and the intercept 𝛼 of the population
(true) regression line.
ο‚— To estimate 𝜎, use the standard deviation
of s of the residuals.
ο‚—
Confidence Intervals
Confidence intervals and significance tests
for the slope 𝛽 of the population regression
line are based on a t distribution with n-2
degrees of freedom.
ο‚— n-2 because we have two lists – we need to
allow another degree of freedom for the
extra variable…
ο‚— The t interval for the slope 𝛽 has the
form 𝑏 ± 𝑑 βˆ— 𝑆𝐸𝑏
ο‚— The standard error of the slope is 𝑆𝐸𝑏 =
𝑠
ο‚—
𝑠π‘₯ π‘›βˆ’1
Let’s look at SE…
ο‚—
𝑆𝐸𝑏 =
𝑠
𝑠π‘₯ π‘›βˆ’1
π‘Ÿπ‘’π‘ π‘–π‘‘π‘’π‘Žπ‘™π‘  2
π‘›βˆ’2
ο‚—
𝑠=
ο‚—
On the AP formula sheet:
ο‚— 𝑠𝑏
=
(π‘¦βˆ’π‘¦)2
π‘›βˆ’2
(π‘₯βˆ’π‘₯)2
=
(π‘¦βˆ’π‘¦)2
π‘›βˆ’2
Hypothesis Tests
ο‚—
ο‚—
To test the null hypothesis, carry out a t
test for the slope.
Use 𝒕 =
π’ƒβˆ’πœ·πŸŽ
𝑺𝑬𝒃
(𝐻0 : 𝛽 = β„Žπ‘¦π‘π‘œπ‘‘β„Žπ‘’π‘ π‘–π‘§π‘’π‘‘ π‘£π‘Žπ‘™π‘’π‘’)
ο‚— The most common null hypothesis is
𝐻0 : 𝛽 = 0, which says that there is no
linear relationship between x and y in the
population.
ο‚—
Let’s do a confidence interval!
We examined data from a study that investigated why some
people don’t gain weight even when they overeat. Researchers
deliberately overfed a random sample of 16 healthy young adults
for 8 weeks. They measured fat gain and change in energy use
from activity other than deliberate exercise (non-exercise
activity, NEA) – fidgeting, daily living, etc – for each subject. Here
are the results:
NEA Change (cal)
-94
-57
-29
135
143
151
245
355
Fat Gain (kg)
4.2
3.0
3.7
2.7
3.2
3.6
2.4
1.3
NEA Change (cal)
392
473
486
535
571
580
620
690
Fat Gain (kg)
3.8
1.7
1.6
2.2
1.0
0.4
2.3
1.1
Construct and interpret a 90% confidence interval
for the slope of the population regression line.
Check conditions first! Type information into calculator!
ο‚— Linear – look at scatterplot and draw it to prove that
you have checked this condition.
ο‚— Independent –
ο‚—
Normal – look at Normal probability plot of residuals
and draw it to prove you checked this condition. (find
the LinReg first and then do NPP with RESID – 2nd list)
Keep checking conditions…
ο‚—
Equal Variance – we want the standard deviation (the
average distance from the mean – or 0) to be the same
for all points – draw the residual plot to prove that you
have looked at it.
ο‚—
Random –
Do:
A Significance Test…
Infants who cry easily may be more easily stimulated than others. This may
be a sign of higher IQ. Researchers explored the relationship between crying
infants 4 to 10 days old and their later IQ scores. The researchers flicked
the infants with a rubber band and recorded the crying. They measured its
intensity by the number of peaks in the most active 20 seconds. The table
below contains data from a random sample of 38 infants.
a) Here is a scatterplot of the data with the least-squares
regression line added. Describe what this graph tells you
about the relationship between these two variables.
b) Using the min-tab output, what is the equation
of the least-squares regression line?
c) Interpret slope and y-intercept of
the regression line in context
d) Do these data provide convincing evidence that there is
a positive linear relationship between crying counts and IQ
scores in the population of infants?
Homework
ο‚—
Pg 759 (6, 8, 13-15, 18-26)