Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Activity 2 - 7 Body Fat Percentage 5-Minute Check on Activity 2-6 1. Match the Equation of the line with its name Point–Slope Form Ax + By = C Slope–Intercept Form y = mx + b Standard Form y – y1 = m(x – x1) 2. Find the point-slope form of a line containing (2, 5) and (4, 9). y y2 – y1 9–5 Slope = m = ---------- = ----------- = -------- = x x2 – x1 4-2 2 y – 5 = 2(x – 2) 3. Find the slope-intercept form of the line in number 2 above. y – 5 = 2(x – 2) y – 5 = 2x – 4 distribute slope y – 9 = 2(x – 4) y – 9 = 2x – 8 y = 2x + 1 add/sub y-point y = 2x + 1 Click the mouse button or press the Space Bar to display the answers. Objectives • Construct scatterplots from sets of data pairs • Recognize when patterns of points in a scatterplot have a linear form • Recognize when the pattern in the scatterplot show that the two variables are positively or negatively related • Identify individual data points, called outliers, that fall outside the general pattern of the other data • Estimate and draw a line of best fit through a set of points in a scatterplot Objectives cont • Determine residuals between the actual value and the predicted value for each point in the data set • Use a graphing calculator to determine a line of best fit by the least-squares method • Measure the strength of the correlation (association) by a correlation coefficient • Recognize that a strong correlation does not necessarily imply a linear or cause and effect relationship Vocabulary • Scatterplot – a graph of individual (x, y) points • Outlier – a data point outside the general pattern of points in the scatterplot • Residuals – a statistical term for the error: actual value – predicted value • Least Squares Regression Line – a line that minimizes the sum of the squares of all the residuals • Linear Correlation Coefficient – r, measures how strongly two variables follow a linear pattern • Lurking Variable – better called an extraneous variable; one that is not measured or accounted for in the experiment Activity - Background Your body fat percentage is simply the percentage of fat your body contains. If you weigh 150 pounds and have a 10% body fat, you body consists of 15 pounds of fat and 135 pounds of lean body mass (bone, muscle, organs, tissue, blood, etc). A certain amount of fat is essential to bodily functions. Fat regulates body temperature, cushions and insulates organs and tissues, and is the main form of the body’s energy reserve. The American Council on Exercise has established the following categories for male and females based on body fat %. Classification Essential Fat Athletes Fitness Acceptable Obese Female (% fat) 10 –12 % 14 – 20 % 21 – 24 % 25 – 31 % ≥ 32 % 2–4% 6 – 13 % 14 – 17 % 18 – 25 % ≥ 26 % Male(% fat) Activity A group of researchers is searching for alternative methods to measure body fat percentage. They first investigate if there is an association between body fat % and a person’s weight. The body fat percentage of 19 male subjects is accurately determined, using hydrostatic weighing method. Then each subject is weighed using a traditional scale. The results are below: W, weight (lbs) 175 181 200 159 195 192 205 173 187 188 Y, Body Fat % 16 21 25 6 22 30 32 21 25 19 W, weight (lbs) 240 175 168 246 160 215 155 146 219 Y, Body Fat % 15 22 9 38 14 27 12 10 30 Drawing Scatter Plots by Hand • Plot the explanatory variable on the x-axis. If there is no explanatory-response distinction, either variable can go on the horizontal axis. • Label both axes • Scale both axes (but not necessarily the same scale on both axes). Intervals must be uniform. • Make your plot large enough so that the details can be seen easily. • If you have a grid, adopt a scale so that you plot uses the entire grid Activity cont W, weight (lbs) 175 181 200 159 195 192 205 173 187 188 Y, Body Fat % 16 21 25 6 22 30 32 21 25 19 W, weight (lbs) 240 175 168 246 160 215 155 146 219 Y, Body Fat % 15 22 9 38 14 27 12 10 30 Plot the data points as ordered pairs of the form (w, y) y 40 35 30 25 20 15 10 5 W 100 125 150 175 200 225 250 Activity Questions Does there appear to be a linear relationship? Yes, except for one point What is the general trend of the graph? Positive slope Identify any outliers (points that fall way outside the general trend or pattern of the data) (240,15) Activity Questions cont Use a straight edge to draw a line connecting the points (175, 16) and (200, 25). Use this line to represent the trend. Determine the slope of this line 25 – 16 9 m = ----------------- = -------- = 0.36 200 – 175 25 Determine the equation of the line y – 25 = (0.36) (x – 200) point-slope form y = (0.36)x – 47 slope-intercept form Predict the body fat % of a 192 pound male y = (0.36)x – 47 y = (0.36)(192) – 47 = 69.12 – 47 = 22.12 TI-83 Instructions for Scatter Plots • • • • • • Enter explanatory variable in L1 Enter response variable in L2 Press 2nd y= for StatPlot, select 1: Plot1 Turn plot1 on by highlighting ON and enter Highlight the scatter plot icon and enter Press ZOOM and select 9: ZoomStat Interpreting Scatterplots Scatter plots should be described by – Direction positive association (positive slope left to right) negative association (negative slope left to right) – Form linear – straight line, curved – quadratic, cubic, etc, exponential, etc – Strength of the form weak moderate (either weak or strong) strong – Outliers (any points not conforming to the form) – Clusters (any sub-groups not conforming to the form) Example 1 Strong Negative Linear Association Response Response Response Explanatory Explanatory Strong Positive Linear Association Explanatory No Relation Response Response Explanatory Strong Negative Quadratic Association Explanatory Weak Negative Linear Association Interpreting our Scatterplot y 40 Direction positive association Form linear Strength of the form relatively strong Outliers (240, 15) Clusters 35 30 25 20 15 10 5 W 100 125 150 175 200 225 250 none Residuals Part One • Positive residuals mean that the observed (actual value, y) lies above the line (predicted value, y-hat) predicted value is smaller • Negative residuals mean that the observed (actual value, y) lies below the line (predicted value, y-hat) predicted value is larger • Order is not optional! Activity - Residuals Determine the residual from the 192 lb prediction Predicted = 22.12 Actual = 30 residual = 30 – 22.12 = 7.88 What does it tell us about the predicted value? Predicted value was below the actual (positive residual) Determine the residual for a body weight of 168 lb Predicted = 0.36(168) – 47 = 13.48 Actual = 9 residual = 9 – 13.48 = -4.48 What does it tell us about the predict value? Predicted value was above the actual (negative residual) Activity – Residuals cont Let’s use our calculator to help figure out all the residuals for our data. Remember we type it data. “x-data” is entered in L1 “y-data” is entered in L2 Model: L3 = 0.36(L1) – 72 Residuals: L4 = L2 – L4 Scatterplot L4 Line of Best Fit The established method for finding the line of best fit is called the Least Squares Regression Model. It minimizes the sum of the square of the residual values. It uses calculus, so is beyond our course, but our calculator can do all the work for us. Diagnostics must be turned on (see last page) Use LinReg(ax+b) L1, L2 (from STAT, CALC) Write down a = 0.22357 b = – 21.3767 r = 0.7199 (the slope) (the y-intercept) (correlation coefficient) Least Squares Regression Line residual residual • The blue line minimizes the sum of the squares of the residuals (dark vertical lines) Regression Line Let’s plot the regression line, our first line, and the data (using our scatterplot). Assign Y1 = 0.36X – 47 Assign Y2 = 0.224x – 21.38 (Original Line) (Regression Line) Hit GRAPH Use the regression line to predict the body fat % for a 225 lb male Y2 = (0.224)(225) – 21.38 = 50.4 – 21.38 = 29.02 Important Properties of r Our r-value was 0.71987 or r ≈ 0.72 (not as strong as we thought) • Correlation makes no distinction between explanatory and response variables • r does not change when we change the units of measurement of x, y or both • Positive r indicates positive association between the variables and negative r indicates negative association • The correlation r is always a number between -1 and 1 Example 2 Match the r values to the Scatterplots to the left 1) 2) 3) 4) 5) 6) r = -0.99 r = -0.7 r = -0.3 r=0 r = 0.5 r = 0.9 F E D A B C A D B E C F Residuals Part Two • The sum of the least-squares residuals is always zero • Residual plots helps assess how well the line describes the data • A good fit has – no discernable pattern to the residuals – and the residuals should be relatively small in size • A poor fit violates one of the above – Discernable patterns: Curved (or linear) residual plot – Increasing / decreasing spread in residual plot (Horn-effect) Residuals Part Two Cont A) B) C) Unstructured scatter of residuals indicates that linear model is a good fit Curved pattern of residuals indicates that linear model may not be good fit Increasing (or decreasing) spread of the residuals indicates that linear model is not a good fit (accuracy!) Activity - Revisited A group of researchers is searching for alternative methods to measure body fat percentage. They then check a person’s waist and body fat %. The results are below: W, waist (in) 32 36 38 33 39 40 41 35 38 33 Y, Body Fat % 16 21 25 6 22 30 32 21 25 19 W, weight (lbs) 40 36 32 44 33 41 34 34 44 Y, Body Fat % 15 22 9 38 14 27 12 10 30 Plot the data and describe it using “DFSOC” Interpreting our Scatterplot BF% 40 Direction positive association Form linear Strength of the form moderately strong Outliers maybe (40, 15) and (33, 5) Clusters 35 30 25 20 15 10 5 Waist 25 30 35 40 45 50 none Activity -- Revisited Questions Use the LinReg feature of the calculator to determine the regression line y = 1.844 W – 47.499 Determine the correlation coefficient r = 0.8415 Which is a more reliable predictor of body fat %, waist size or weight? Waist has a larger |r| value so we would conclude that waist is better Cause-and-Effect Relationships • Strong correlations between two variables does not mean that a cause-and-effect relationship exists • For example there is a strong correlation between the number of drownings in a month and the number of cases of Rocky Mountain spotted fever • Both are tied to the seasonal warming of summer and having no direct effect on each other • Cause and effect can only be determined by a well designed experiment and never by observation Summary and Homework • Summary – Scatterplots are graphs of individual data points and are useful in visually seeing relationships – Outlier is a data point far outside the general pattern of points in a scatterplot – The line of best fit is the line that lies in the middle of the linear pattern of the data points – The correlation coefficient, r, measures how strong the linear relationship between the variables is – Residuals are the vertical distance between the data point and the predicted point on the best-fit line – Regression line is considered the best-fit line for paired data – Least-squares regression minimizes the sum of the squares of the residuals • Homework – Pg 244-48; 1-5