* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Stats ch03.s03-3
Survey
Document related concepts
Transcript
Chapter 3 Summarizing Descriptive Relationships © Scatter Plots 1. 2. 3. 4. We can prepare a scatter plot by placing one point for each pair of two variables that represent an observation in the data set. The scatter plot provides a picture of the data including the following: Range of each variable; Pattern of values over the range; A suggestion as to a possible relationship between the two variables; Indication of outliers (extreme points). Scatter Plots (Excel Example) Rent vs. Apartment Size (Sq. Ft) 2500 Rent 2000 1500 1000 500 0 0 500 1000 1500 Size 2000 2500 Covariance The covariance is a measure of the linear relationship between two variables. A positive value indicates a direct or increasing linear relationship and a negative value indicates a decreasing linear relationship. The covariance calculation is defined by the equation n Cov( x, y ) s xy ( x X )( y Y ) i 1 i i n 1 where xi and yi are the observed values, X and Y are the sample means, and n is the sample size. Covariance Scatter Plots of Idealized Positive and Negative Covariance y y * * * ** * * * * * * ** Y Y * * * * * * ** * * ** X x X Positive Covariance Negative Covariance (Figure 3.5a) (Figure 3.5b) x Correlation Coefficient The sample correlation coefficient, rxy, is computed by the equation Cov( x, y) rxy sx s y Correlation Coefficient The correlation ranges from –1 to +1 with, • • • rxy = +1 indicates a perfect positive linear relationship – the X and Y points would plot an increasing straight line. rxy = 0 indicates no linear relationship between X and Y. rxy = -1 indicates a perfect negative linear relationship – the X and Y points would plot a decreasing straight line. Correlation Coefficient Positive correlations indicate positive or increasing linear relationships with values closer to +1 indicating data points closer to a straight line and closer to 0 indicating greater deviations from a straight line. Correlation Coefficient Negative correlations indicate decreasing linear relationships with values closer to –1 indicating points closer to a straight line and closer to 0 indicating greater deviations from a straight line. Scatter Plots and Correlation Y X (a) r = .8 Scatter Plots and Correlation Y X (b)r = -.8 Scatter Plots and Correlation Y X (c) r = 0 Linear Relationships Linear relationships can be represented by the basic equation Y 0 1 X where Y is the dependent or endogenous variable that is a function of X the independent or exogenous variable. The model contains two parameters, 0 and 1 that are defined as model coefficients. The coefficient 0, is the intercept on the Y-axis and the coefficient 1 is the change in Y for every unit change in X. Linear Relationships The nominal assumption made in linear applications is that different values of X can be set and there will be a corresponding mean value of Y that results because of the underlying linear process being studied. The linear equation model computes the mean of Y for every value of X. This idea is the basis for obtaining many economic and business procedures including demand functions, production functions, consumption functions, sales forecasts,and many other application areas. Linear Function and Data Points yˆ b0 b1 x1 y Yi ei (x1i, yi) x1 Least Squares Regression Least Squares Regression is a technique used to obtain estimates (i.e. numerical values) for the linear coefficients 0 and 1. These estimates are usually defined as b0 and b1 respectively. Excel Regression Output SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.560264658 0.313896487 0.303341048 0.330316589 67 ANOVA df Regression Residual Total Intercept Satverb SS 1 65 66 MS F Significance F 3.244673026 3.244673026 29.73789125 8.22E-07 7.092088168 0.109109049 10.33676119 Coefficients Standard Error t Stat P-value 1.6384 0.2679 6.1147 0.0274 0.005 5.4532 The y-intercept, b0 = 1.6384 and the slope b1 = 0.0274 Lower 95% Upper 95% 0 1.1033 2.1735 0 0.0174 0.0375 Cross Tables Cross Tables present the number of observations that are defined by the joint occurrence of specific intervals for two variables. The combination of all possible intervals for the two variables defines the cells in a table. Cross Tables Lumbe Paint r Area Tools None Total East 100 50 50 50 250 North 50 95 45 60 250 West 65 70 75 40 250 215 215 170 150 750 Cross Table of Household Demand for Products by Residence Key Words Least Squares Estimation Procedure Least Squares Regression Sample Correlation Coefficient Sample Covariance Scatter Plot