Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Simple Linear Regression 14 Slope-intercept equation for a line y mx b y 8 10 12 Regression equation—an equation that describes the average relationship between a response (dependent) and an explanatory (independent) variable. (4,6) 6 slope y 6 3 3 (2,3) 4 y 3 1.5 x 2 2 x 4 2 2 2 4 6 8 10 x 1 120 Deterministic Model 2 96 A model that defines an exact relationship between variables. Fahrenheit 48 72 Example: y 1.5 x 9 F C 32 5 0 24 There is no allowance for error in the prediction of y for a given x. -10 0 10 20 30 40 50 Celsius 3 4 1 Probabilistic Model y 1.5 x 15 A model that accounts for random error. 10 Includes both a deterministic component and a random error component. 5 y y 1.5 x random error 0 This model hypothesizes a probabilistic relationship between y and x. 0 2 4 6 8 10 x 5 Probabilistic Model—General Form 6 First-Order (Straight Line) Probabilistic Model y 0 1 x y = Deterministic component + Random component where y = Dependent variable x = Independent variable 0 = population y-intercept of the line—the point at which the line intersects or cuts through the y-axis 1 = population slope of the line—the amount of increase (or decrease) in the deterministic component of y for every 1-unit increase (or decrease) in x. = random error component where y is the “variable of interest”. Assume that the mean value of the random error is zero the mean value of y, E(y), equals the deterministic component of the model 7 8 2 First-Order (Straight Line) Probabilistic Model Model Development 0 and 1 are population parameters. They will only 0 and 1 will generally be unknown. The process of developing a model, estimating model parameters, and using the model can be summarized in these 5-steps: 1. Hypothesize the deterministic component of the model that relates the mean, E(y) to the independent variable x E y 0 1 x be known if the population of all (x, y) measurements are available. 0 and 1, along with a specific value of the independent variable x determine the mean value of the dependent variable y. 2. Use sample data to estimate unknown model parameters find estimates: ˆ or b , ˆ or b 0 0 1 1 9 Model Development (continued) 10 Example: Reaction time versus drug percentage 3. Specify the probability distribution of the random error term and estimate the SD of this distribution ~ N 0, will revisit this later 4. Statistically evaluate the usefulness of the model 5. Use model for prediction, estimation or other purposes 11 Subject Amount of Drug (%) x Reaction Time (seconds) y 1 1 1 2 2 1 3 4 3 4 2 2 5 5 4 12 3 -0.2 Reaction Time (sec.) 0.7 1.5 2.3 3.2 4.0 Example: Reaction time versus drug percentage -1.0 -1.0 -0.2 Reaction Time (sec.) 0.7 1.5 2.3 3.2 4.0 Example: Reaction time versus drug percentage 0 1 2 3 4 5 0 1 Drug (%) 2 3 4 5 Drug (%) 13 14 Example: Reaction time versus drug percentage Least Squares Line Errors of prediction---vertical differences between the observed and the predicted values of y Also called regression line, or the least squares prediction equation x y y 1 x y y y y 1 1 0 (1-0) = 1 1 2 1 1 (1-1) = 0 0 3 2 2 (2-2) = 0 0 4 2 3 (2-3) = -1 1 5 4 4 (4-4) = 0 0 2 Method to find this line is called the method of least squares For our example, we have a sample of n = 5 pairs of (x, y) values. The fitted line that we will calculate is written as ŷ b0 b1 x ŷ is an estimator of the mean value of y, E y ; Sum of Sum of squared errors = 0 errors (SSE) = 2 b0 and b1 are estimators of 0 and 1 15 16 4 Least Squares Line (continued) Define the sum of squares of the deviations of the y values about their predicted values for all n data points as: n n SSE yi ˆy yi b0 b1 xi 2 i 1 Formulas for the Least Squares Estimates SD y SS Slope: b1 xy or b1 r SS xx SD x sxx = SSxx xi x sxy = SS xy xi x yi y 2 xi yi i 1 We want to find b0 and b1 to make the SSE a minimum---termed least squares estimates x y i i n y-intercept: b0 y b1 x ŷ b0 b1 x is called the least squares line xi2 y i n b1 2 x n x i n n = sample size 17 LS Calculations for Drug/Reaction Example 1 2 1 4 2 3 2 9 6 4 2 16 8 5 4 25 20 x i 15 SSxy 37 y i 10 x 2 i 55 1510 37 30 7 5 x y i i 4.0 xi yi 1 b1 7 0.7 10 10 15 .7 5 5 2 .7 3 b0 2 2 .1 . 1 37 SS xx 55 15 5 2 ŷ .1 .7 x Reaction Time (sec.) 0.7 1.5 2.3 3.2 x 1 LS Line for Drug/Reaction Example -0.2 yi 1 18 55 45 10 -1.0 xi 2 i 2 i 0 19 1 2 3 Drug (%) 4 5 20 5 LS Calculations for Drug/Reaction Example Least Squares Line—Interpretation of ŷ .1 .7 x y ˆy Estimated intercept is negative that the estimated mean reaction time is equal to -0.1 seconds when the amount of drug is 0%. x y ŷ .1 .7 x y ˆy 1 1 .6 (1-.6) = .4 .16 2 1 1.3 (1-1.3) = -.3 .09 3 2 2.0 (2-2.0) = 0 .00 4 2 2.7 (2-2.7) = -.7 .49 5 4 3.4 (4-3.4) = .6 Sum of errors = 0 2 What does this mean since negative reaction times are not possible? .36 Sum of squared errors (SSE) = 1.10 Model parameters should be interpreted only within the sampled range of the independent variable. The LS line has a sum of errors = 0, but SSE = 1.1 < 2.0 for visual model 21 22 Least Squares Line—Interpretation of ŷ .1 .7 x Coefficient of Determination The slope of 0.7 implies that for every unit increase of x, the mean value of y is estimated to increase by 0.7 units. A measure of the contribution of x in predicting y 4.0 Assuming that x provides no information for the prediction of y, the best prediction for the value of y is y SS yy yi y 2 -1.0 -0.2 For every 1% increase in the amount of drug in the bloodstream, the mean reaction time is estimated to increase by 0.7 seconds over the sampled range of drug amounts from 1% to 5%. ˆy y Reaction Time (sec.) 0.7 1.5 2.3 3.2 In the context of the problem: 0 23 1 2 3 Drug (%) 4 5 24 6 4.0 Coefficient of Determination (continued) Reaction Time (sec.) 0.7 1.5 2.3 3.2 SSE yi ˆyi Coefficient of Determination (continued) SS yy yi y 2 2 --total sample variation around mean i SSE yi ˆyi --unexplained sample variability after 2 fitting SS yy SSE --explained sample variability attributable to linear relationship -0.2 SS yy SSE explained proportion of total sample total SSyy -1.0 variability explained by the linear relationship 0 1 2 3 4 5 Drug (%) 25 26 Coefficient of Determination (continued) r2 SS yy SSE SSE 1 SS yy SS yy } Unexplained variability In simple linear regression r 2 is computed as the square of the correlation coefficient, r 0 r2 1 Interpretation— r 2 .75 means that the sum of squared deviations of the y values about their predicted values has been reduced by 75% by the use ŷ, instead of y , to predict y of the least squares equation. 27 7