Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Session 9 The Coefficient of Correlation, r The Coefficient of Correlation (r) is a measure of the strength of the relationship between two variables. It shows the direction and strength of the linear relationship between two interval or ratio-scale variables It can range from -1.00 to +1.00. Values of -1.00 or +1.00 indicate perfect and strong correlation. Values close to 0.0 indicate weak correlation. Negative values indicate an inverse relationship and positive values indicate a direct relationship. Correlation Coefficient - Interpretation Minitab Scatter Plots Strong positive correlation No correlation Weak negative correlation Coefficient of Determination The coefficient of determination (r2) is the proportion of the total variation in the dependent variable (Y) that is explained or accounted for by the variation in the independent variable (X). It is the square of the coefficient of correlation. It ranges from 0 to 1. It does not give any information on the direction of the relationship between the variables. Testing the Significance of the Correlation Coefficient H0: = 0 (the correlation in the population is 0) H1: ≠ 0 (the correlation in the population is not 0) Reject H0 if: t > t/2,n-2 or t < -t/2,n-2 Testing the Significance of the Correlation Coefficient – Copier Sales Example H0: = 0 (the correlation in the population is 0) H1: ≠ 0 (the correlation in the population is not 0) Reject H0 if: t > t/2,n-2 or t < -t/2,n-2 t > t0.025,8 or t < -t0.025,8 t > 2.306 or t < -2.306 Stationarity A strictly stationary process is one where the distribution of its values remains the same as time proceeds, implying that the probability lies in a particular interval is the same now as at any point in the past or the future. However we tend to use the criteria relating to a ‘weakly stationary process’ to determine if a series is stationary or not. Weakly Stationary Series A stationary process or series has the following properties: - constant mean - constant variance - constant autocovariance structure The latter refers to the covariance between y(t1) and y(t-2) being the same as y(t-5) and y(t6). Implications of Non-stationary data If the variables in an OLS regression are not stationary, they tend to produce regressions with high R-squared statistics and low DW statistics, indicating high levels of autocorrelation. This is caused by the drift in the variables often being related, but not directly accounted for in the regression, hence the omitted variable effect. Stationary Data It is important to determine if our data is stationary before the regression. This can be done in a number of ways: - plotting the data - assessing the autocorrelation function - Using a specific test on the significance of the autocorrelation coefficients. - Specific tests to be covered later. Correlogram The sample correlogram is the plot of the ACF against k. As the ACF lies between -1 and +1, the correlogram also lies between these values. It can be used to determine stationarity, if the ACF falls immediately from 1 to 0, then equals about 0 thereafter, the series is stationary. If the ACF declines gradually from 1 to 0 over a prolonged period of time, then it is not stationary Stationary time series k ACF 11 9 7 5 3 1.5 1 0.5 0 -0.5 1 ACF Correlogram autocorrelation Measures correlation between successive observations over time rk (Y Y )(Y Y ) (Y Y ) t k t 2 t rk = autocorrelation for a k-period lag Yt = value of the time series at period t Y t-k = value of time series k periods before time period t Y bar = mean of the time series If the time series is stationary, the value of rk should diminish rapidly towards zero If there is a trend rk will decline towards zero slowly If a seasonal pattern exists, the value of rk may be significantly zero at k = 4 for quaterly data, or k =12 for monthly data A k period plot of autocorrelations is called an autocorrelation function or a correlogram To determine the whether the autocorrelation is significant at lag k is significantly different from zero Ho : rk = 0 Hi : rk ≠ 0 For an k reject Ho if |rk| > r / √0