Download Activity 7

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Forecasting wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Activity 2 - 7
Body Fat Percentage
5-Minute Check on Activity 2-6
1. Match the Equation of the line with its name
Point–Slope Form
Ax + By = C
Slope–Intercept Form
y = mx + b
Standard Form
y – y1 = m(x – x1)
2. Find the point-slope form of a line containing (2, 5) and (4, 9).
y
y2 – y1
9–5
Slope = m = ---------- = ----------- = -------- =
x
x2 – x1
4-2
2
y – 5 = 2(x – 2)
3. Find the slope-intercept form of the line in number 2 above.
y – 5 = 2(x – 2)

y – 5 = 2x – 4

distribute slope
y – 9 = 2(x – 4)

y – 9 = 2x – 8
y = 2x + 1
add/sub y-point

y = 2x + 1
Click the mouse button or press the Space Bar to display the answers.
Objectives
• Construct scatterplots from sets of data pairs
• Recognize when patterns of points in a scatterplot
have a linear form
• Recognize when the pattern in the scatterplot show
that the two variables are positively or negatively
related
• Identify individual data points, called outliers, that
fall outside the general pattern of the other data
• Estimate and draw a line of best fit through a set of
points in a scatterplot
Objectives cont
• Determine residuals between the actual value and
the predicted value for each point in the data set
• Use a graphing calculator to determine a line of best
fit by the least-squares method
• Measure the strength of the correlation (association)
by a correlation coefficient
• Recognize that a strong correlation does not
necessarily imply a linear or cause and effect
relationship
Vocabulary
• Scatterplot – a graph of individual (x, y) points
• Outlier – a data point outside the general pattern of
points in the scatterplot
• Residuals – a statistical term for the error: actual
value – predicted value
• Least Squares Regression Line – a line that minimizes
the sum of the squares of all the residuals
• Linear Correlation Coefficient – r, measures how
strongly two variables follow a linear pattern
• Lurking Variable – better called an extraneous
variable; one that is not measured or accounted for in
the experiment
Activity - Background
Your body fat percentage is simply the percentage of fat
your body contains. If you weigh 150 pounds and have a
10% body fat, you body consists of 15 pounds of fat and
135 pounds of lean body mass (bone, muscle, organs,
tissue, blood, etc). A certain amount of fat is essential to
bodily functions. Fat regulates body temperature,
cushions and insulates organs and tissues, and is the
main form of the body’s energy reserve. The American
Council on Exercise has established the following
categories for male and females based on body fat %.
Classification
Essential Fat
Athletes
Fitness
Acceptable
Obese
Female (% fat)
10 –12 %
14 – 20 %
21 – 24 %
25 – 31 %
≥ 32 %
2–4%
6 – 13 %
14 – 17 %
18 – 25 %
≥ 26 %
Male(% fat)
Activity
A group of researchers is searching for alternative
methods to measure body fat percentage. They first
investigate if there is an association between body fat %
and a person’s weight. The body fat percentage of 19
male subjects is accurately determined, using
hydrostatic weighing method. Then each subject is
weighed using a traditional scale. The results are below:
W, weight (lbs)
175
181
200
159
195
192
205
173
187
188
Y, Body Fat %
16
21
25
6
22
30
32
21
25
19
W, weight (lbs)
240
175
168
246
160
215
155
146
219
Y, Body Fat %
15
22
9
38
14
27
12
10
30
Drawing Scatter Plots by Hand
• Plot the explanatory variable on the x-axis. If there is
no explanatory-response distinction, either variable
can go on the horizontal axis.
• Label both axes
• Scale both axes (but not necessarily the same scale
on both axes). Intervals must be uniform.
• Make your plot large enough so that the details can
be seen easily.
• If you have a grid, adopt a scale so that you plot
uses the entire grid
Activity cont
W, weight (lbs)
175
181
200
159
195
192
205
173
187
188
Y, Body Fat %
16
21
25
6
22
30
32
21
25
19
W, weight (lbs)
240
175
168
246
160
215
155
146
219
Y, Body Fat %
15
22
9
38
14
27
12
10
30
Plot the data points as
ordered pairs of the form
(w, y)
y
40
35
30
25
20
15
10
5
W
100
125
150
175
200
225
250
Activity Questions
Does there appear to be a linear relationship?
Yes, except for one point
What is the general trend of the graph?
Positive slope
Identify any outliers (points that fall way outside the
general trend or pattern of the data)
(240,15)
Activity Questions cont
Use a straight edge to draw a line connecting the points
(175, 16) and (200, 25). Use this line to represent the
trend.
Determine the slope of this line
25 – 16
9
m = ----------------- = -------- = 0.36
200 – 175
25
Determine the equation of the line
y – 25 = (0.36) (x – 200)
point-slope form
y = (0.36)x – 47
slope-intercept form
Predict the body fat % of a 192 pound male
y = (0.36)x – 47
 y = (0.36)(192) – 47 = 69.12 – 47 = 22.12
TI-83 Instructions for Scatter Plots
•
•
•
•
•
•
Enter explanatory variable in L1
Enter response variable in L2
Press 2nd y= for StatPlot, select 1: Plot1
Turn plot1 on by highlighting ON and enter
Highlight the scatter plot icon and enter
Press ZOOM and select 9: ZoomStat
Interpreting Scatterplots
Scatter plots should be described by
– Direction
positive association (positive slope left to right)
negative association (negative slope left to right)
– Form
linear – straight line,
curved – quadratic, cubic, etc, exponential, etc
– Strength of the form
weak
moderate (either weak or strong)
strong
– Outliers (any points not conforming to the form)
– Clusters (any sub-groups not conforming to the form)
Example 1
Strong Negative Linear Association
Response
Response
Response
Explanatory
Explanatory
Strong Positive Linear Association
Explanatory
No Relation
Response
Response
Explanatory
Strong Negative Quadratic Association
Explanatory
Weak Negative Linear Association
Interpreting our Scatterplot
y
40
Direction
positive association
Form
linear
Strength of the form
relatively strong
Outliers
(240, 15)
Clusters
35
30
25
20
15
10
5
W
100
125
150
175
200
225
250
none
Residuals Part One
• Positive residuals mean that the observed (actual
value, y) lies above the line (predicted value, y-hat)
predicted value is smaller
• Negative residuals mean that the observed (actual
value, y) lies below the line (predicted value, y-hat)
predicted value is larger
• Order is not optional!
Activity - Residuals
Determine the residual from the 192 lb prediction
Predicted = 22.12
Actual = 30
residual = 30 – 22.12 = 7.88
What does it tell us about the predicted value?
Predicted value was below the actual (positive residual)
Determine the residual for a body weight of 168 lb
Predicted = 0.36(168) – 47 = 13.48
Actual = 9 residual = 9 – 13.48 = -4.48
What does it tell us about the predict value?
Predicted value was above the actual (negative residual)
Activity – Residuals cont
Let’s use our calculator to help figure out all the
residuals for our data. Remember we type it data.
“x-data” is entered in L1
“y-data” is entered in L2
Model:
L3 = 0.36(L1) – 72
Residuals: L4 = L2 – L4
Scatterplot L4
Line of Best Fit
The established method for finding the line of best fit is
called the Least Squares Regression Model. It
minimizes the sum of the square of the residual values.
It uses calculus, so is beyond our course, but our
calculator can do all the work for us.
Diagnostics must be turned on (see last page)
Use LinReg(ax+b) L1, L2 (from STAT, CALC)
Write down a = 0.22357
b = – 21.3767
r = 0.7199
(the slope)
(the y-intercept)
(correlation coefficient)
Least Squares Regression Line
residual
residual
• The blue line minimizes the sum of the
squares of the residuals (dark vertical lines)
Regression Line
Let’s plot the regression line, our first line, and the data
(using our scatterplot).
Assign Y1 = 0.36X – 47
Assign Y2 = 0.224x – 21.38
(Original Line)
(Regression Line)
Hit GRAPH
Use the regression line to predict the body fat % for a
225 lb male
Y2 = (0.224)(225) – 21.38 = 50.4 – 21.38 = 29.02
Important Properties of r
Our r-value was 0.71987 or r ≈ 0.72 (not as strong as we
thought)
• Correlation makes no distinction between explanatory
and response variables
• r does not change when we change the units of
measurement of x, y or both
• Positive r indicates positive association between the
variables and negative r indicates negative
association
• The correlation r is always a number between -1 and 1
Example 2
Match the r values
to the Scatterplots
to the left
1)
2)
3)
4)
5)
6)
r = -0.99
r = -0.7
r = -0.3
r=0
r = 0.5
r = 0.9
F
E
D
A
B
C
A
D
B
E
C
F
Residuals Part Two
• The sum of the least-squares residuals is
always zero
• Residual plots helps assess how well the line
describes the data
• A good fit has
– no discernable pattern to the residuals
– and the residuals should be relatively small in size
• A poor fit violates one of the above
– Discernable patterns:
Curved (or linear) residual plot
– Increasing / decreasing spread in residual plot
(Horn-effect)
Residuals Part Two Cont
A)
B)
C)
Unstructured scatter
of residuals indicates
that linear model is a
good fit
Curved pattern of
residuals indicates
that linear model may
not be good fit
Increasing (or
decreasing) spread of
the residuals indicates
that linear model is not
a good fit (accuracy!)
Activity - Revisited
A group of researchers is searching for alternative
methods to measure body fat percentage. They then
check a person’s waist and body fat %. The results are
below:
W, waist (in)
32
36
38
33
39
40
41
35
38
33
Y, Body Fat %
16
21
25
6
22
30
32
21
25
19
W, weight (lbs)
40
36
32
44
33
41
34
34
44
Y, Body Fat %
15
22
9
38
14
27
12
10
30
Plot the data and describe it using “DFSOC”
Interpreting our Scatterplot
BF%
40
Direction
positive association
Form
linear
Strength of the form
moderately strong
Outliers
maybe (40, 15) and (33, 5)
Clusters
35
30
25
20
15
10
5
Waist
25
30
35
40
45
50
none
Activity -- Revisited Questions
Use the LinReg feature of the calculator to determine the
regression line
y = 1.844 W – 47.499
Determine the correlation coefficient
r = 0.8415
Which is a more reliable predictor of body fat %, waist
size or weight?
Waist has a larger |r| value so we would conclude
that waist is better
Cause-and-Effect Relationships
• Strong correlations between two variables does not
mean that a cause-and-effect relationship exists
• For example there is a strong correlation between
the number of drownings in a month and the number
of cases of Rocky Mountain spotted fever
• Both are tied to the seasonal warming of summer
and having no direct effect on each other
• Cause and effect can only be determined by a well
designed experiment and never by observation
Summary and Homework
• Summary
– Scatterplots are graphs of individual data points and are useful
in visually seeing relationships
– Outlier is a data point far outside the general pattern of points
in a scatterplot
– The line of best fit is the line that lies in the middle of the linear
pattern of the data points
– The correlation coefficient, r, measures how strong the linear
relationship between the variables is
– Residuals are the vertical distance between the data point and
the predicted point on the best-fit line
– Regression line is considered the best-fit line for paired data
– Least-squares regression minimizes the sum of the squares of
the residuals
• Homework
– Pg 244-48; 1-5