Download Simple Linear Regression Deterministic Model

Simple Linear Regression 14 Slope-intercept equation for a line y  mx  b y 8 10 12 Regression equation—an equation that describes the average relationship between a response (dependent) and an explanatory (independent) variable. (4,6) 6 slope  y  6  3  3 (2,3) 4 y 3   1.5 x 2 2 x  4  2  2 2 4 6 8 10 x 1 120 Deterministic Model 2 96 A model that defines an exact relationship between variables. Fahrenheit 48 72 Example: y  1.5 x 9 F  C  32 5 0 24 There is no allowance for error in the prediction of y for a given x. -10 0 10 20 30 40 50 Celsius 3 4 1 Probabilistic Model y  1.5 x   15 A model that accounts for random error. 10 Includes both a deterministic component and a random error component. 5 y y  1.5 x  random error 0 This model hypothesizes a probabilistic relationship between y and x. 0 2 4 6 8 10 x 5 Probabilistic Model—General Form 6 First-Order (Straight Line) Probabilistic Model y   0  1 x   y = Deterministic component + Random component where y = Dependent variable x = Independent variable  0 = population y-intercept of the line—the point at which the line intersects or cuts through the y-axis 1 = population slope of the line—the amount of increase (or decrease) in the deterministic component of y for every 1-unit increase (or decrease) in x.  = random error component where y is the “variable of interest”. Assume that the mean value of the random error is zero  the mean value of y, E(y), equals the deterministic component of the model 7 8 2 First-Order (Straight Line) Probabilistic Model Model Development  0 and 1 are population parameters. They will only  0 and 1 will generally be unknown. The process of developing a model, estimating model parameters, and using the model can be summarized in these 5-steps: 1. Hypothesize the deterministic component of the model that relates the mean, E(y) to the independent variable x E  y    0  1 x be known if the population of all (x, y) measurements are available.  0 and 1, along with a specific value of the independent variable x determine the mean value of the dependent variable y. 2. Use sample data to estimate unknown model parameters find estimates: ˆ or b , ˆ or b 0 0 1 1 9 Model Development (continued) 10 Example: Reaction time versus drug percentage 3. Specify the probability distribution of the random error term and estimate the SD of this distribution  ~ N  0,    will revisit this later 4. Statistically evaluate the usefulness of the model 5. Use model for prediction, estimation or other purposes 11 Subject Amount of Drug (%) x Reaction Time (seconds) y 1 1 1 2 2 1 3 4 3 4 2 2 5 5 4 12 3 -0.2 Reaction Time (sec.) 0.7 1.5 2.3 3.2 4.0 Example: Reaction time versus drug percentage -1.0 -1.0 -0.2 Reaction Time (sec.) 0.7 1.5 2.3 3.2 4.0 Example: Reaction time versus drug percentage 0 1 2 3 4 5 0 1 Drug (%) 2 3 4 5 Drug (%) 13 14 Example: Reaction time versus drug percentage Least Squares Line Errors of prediction---vertical differences between the observed and the predicted values of y Also called regression line, or the least squares prediction equation x y y  1  x  y  y   y  y  1 1 0 (1-0) = 1 1 2 1 1 (1-1) = 0 0 3 2 2 (2-2) = 0 0 4 2 3 (2-3) = -1 1 5 4 4 (4-4) = 0 0 2 Method to find this line is called the method of least squares For our example, we have a sample of n = 5 pairs of (x, y) values. The fitted line that we will calculate is written as ŷ  b0  b1 x ŷ is an estimator of the mean value of y, E  y  ; Sum of Sum of squared errors = 0 errors (SSE) = 2 b0 and b1 are estimators of  0 and 1 15 16 4 Least Squares Line (continued) Define the sum of squares of the deviations of the y values about their predicted values for all n data points as: n n SSE    yi  ˆy     yi   b0  b1 xi   2 i 1 Formulas for the Least Squares Estimates SD y SS Slope: b1  xy or b1  r SS xx SD x sxx = SSxx    xi  x  sxy = SS xy    xi  x  yi  y  2   xi yi  i 1 We want to find b0 and b1 to make the SSE a minimum---termed least squares estimates   x   y  i i n y-intercept: b0  y  b1 x  ŷ  b0  b1 x is called the least squares line   xi2  y i n  b1 2  x  n x i n n = sample size 17 LS Calculations for Drug/Reaction Example 1 2 1 4 2 3 2 9 6 4 2 16 8 5 4 25 20 x i  15 SSxy  37  y i  10 x 2 i  55 1510  37  30  7 5 x y i i 4.0 xi yi 1 b1  7  0.7 10 10 15   .7  5 5  2   .7  3 b0   2  2 .1   . 1  37 SS xx  55  15 5 2 ŷ  .1  .7 x Reaction Time (sec.) 0.7 1.5 2.3 3.2 x 1 LS Line for Drug/Reaction Example -0.2 yi 1 18  55  45  10 -1.0 xi 2 i 2 i 0 19 1 2 3 Drug (%) 4 5 20 5 LS Calculations for Drug/Reaction Example Least Squares Line—Interpretation of ŷ  .1  .7 x  y  ˆy  Estimated intercept is negative that the estimated mean reaction time is equal to -0.1 seconds when the amount of drug is 0%. x y ŷ  .1  .7 x  y  ˆy  1 1 .6 (1-.6) = .4 .16 2 1 1.3 (1-1.3) = -.3 .09 3 2 2.0 (2-2.0) = 0 .00 4 2 2.7 (2-2.7) = -.7 .49 5 4 3.4 (4-3.4) = .6 Sum of errors = 0 2 What does this mean since negative reaction times are not possible? .36 Sum of squared errors (SSE) = 1.10 Model parameters should be interpreted only within the sampled range of the independent variable. The LS line has a sum of errors = 0, but SSE = 1.1 < 2.0 for visual model 21 22 Least Squares Line—Interpretation of ŷ  .1  .7 x Coefficient of Determination The slope of 0.7 implies that for every unit increase of x, the mean value of y is estimated to increase by 0.7 units. A measure of the contribution of x in predicting y 4.0 Assuming that x provides no information for the prediction of y, the best prediction for the value of y is y SS yy    yi  y  2 -1.0 -0.2 For every 1% increase in the amount of drug in the bloodstream, the mean reaction time is estimated to increase by 0.7 seconds over the sampled range of drug amounts from 1% to 5%. ˆy  y Reaction Time (sec.) 0.7 1.5 2.3 3.2 In the context of the problem: 0 23 1 2 3 Drug (%) 4 5 24 6 4.0 Coefficient of Determination (continued) Reaction Time (sec.) 0.7 1.5 2.3 3.2 SSE    yi  ˆyi  Coefficient of Determination (continued) SS yy    yi  y  2 2 --total sample variation around mean i SSE    yi  ˆyi  --unexplained sample variability after 2 fitting SS yy  SSE --explained sample variability attributable to linear relationship -0.2 SS yy  SSE explained   proportion of total sample total SSyy -1.0 variability explained by the linear relationship 0 1 2 3 4 5 Drug (%) 25 26 Coefficient of Determination (continued) r2  SS yy  SSE SSE  1 SS yy SS yy } Unexplained variability In simple linear regression r 2 is computed as the square of the correlation coefficient, r 0  r2  1 Interpretation— r 2  .75 means that the sum of squared deviations of the y values about their predicted values has been reduced by 75% by the use ŷ, instead of y , to predict y of the least squares equation. 27 7

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Simple Linear Regression Deterministic Model