Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Session 9
The Coefficient of Correlation, r
The Coefficient of Correlation (r) is a measure of the strength
of the relationship between two variables.





It shows the direction and strength of the linear relationship between two interval or ratio-scale
variables
It can range from -1.00 to +1.00.
Values of -1.00 or +1.00 indicate perfect and strong correlation.
Values close to 0.0 indicate weak correlation.
Negative values indicate an inverse relationship and positive values indicate a direct
relationship.
Correlation Coefficient - Interpretation
Minitab Scatter Plots
Strong positive
correlation
No correlation
Weak negative
correlation
Coefficient of Determination
The coefficient of determination (r2) is the proportion of
the total variation in the dependent variable (Y) that is
explained or accounted for by the variation in the
independent variable (X). It is the square of the coefficient
of correlation.


It ranges from 0 to 1.
It does not give any information on the direction of
the relationship between the variables.
Testing the Significance of
the Correlation Coefficient
H0:  = 0 (the correlation in the population is 0)
H1:  ≠ 0 (the correlation in the population is not 0)
Reject H0 if:
t > t/2,n-2 or t < -t/2,n-2
Testing the Significance of
the Correlation Coefficient – Copier Sales Example
H0:  = 0 (the correlation in the population is 0)
H1:  ≠ 0 (the correlation in the population is not 0)
Reject H0 if:
t > t/2,n-2 or t < -t/2,n-2
t > t0.025,8 or t < -t0.025,8
t > 2.306 or t < -2.306
Stationarity


A strictly stationary process is one where the
distribution of its values remains the same as
time proceeds, implying that the probability lies
in a particular interval is the same now as at
any point in the past or the future.
However we tend to use the criteria relating to a
‘weakly stationary process’ to determine if a
series is stationary or not.
Weakly Stationary Series


A stationary process or series has the following
properties:
- constant mean
- constant variance
- constant autocovariance structure
The latter refers to the covariance between y(t1) and y(t-2) being the same as y(t-5) and y(t6).
Implications of Non-stationary data


If the variables in an OLS regression are not
stationary, they tend to produce regressions
with high R-squared statistics and low DW
statistics, indicating high levels of
autocorrelation.
This is caused by the drift in the variables often
being related, but not directly accounted for in
the regression, hence the omitted variable
effect.
Stationary Data

It is important to determine if our data is
stationary before the regression. This can be
done in a number of ways:
- plotting the data
- assessing the autocorrelation function
- Using a specific test on the significance of
the autocorrelation coefficients.
- Specific tests to be covered later.
Correlogram




The sample correlogram is the plot of the ACF
against k.
As the ACF lies between -1 and +1, the
correlogram also lies between these values.
It can be used to determine stationarity, if the
ACF falls immediately from 1 to 0, then equals
about 0 thereafter, the series is stationary.
If the ACF declines gradually from 1 to 0 over a
prolonged period of time, then it is not
stationary
Stationary time series
k
ACF
11
9
7
5
3
1.5
1
0.5
0
-0.5
1
ACF
Correlogram
autocorrelation

Measures correlation between successive
observations over time
rk
(Y  Y )(Y  Y )


 (Y  Y )
t k
t
2
t




rk = autocorrelation for a k-period lag
Yt = value of the time series at period t
Y t-k = value of time series k periods before time period t
Y bar = mean of the time series



If the time series is stationary, the value of rk
should diminish rapidly towards zero
If there is a trend rk will decline towards zero
slowly
If a seasonal pattern exists, the value of rk may
be significantly zero at k = 4 for quaterly data,
or k =12 for monthly data

A k period plot of autocorrelations is called an
autocorrelation function or a correlogram

To determine the whether the autocorrelation is
significant at lag k is significantly different from
zero
Ho : rk = 0
Hi : rk ≠ 0

For an k reject Ho if |rk| > r / √0

