Download Daily Temperature Analysis - Neas

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Daily
Temperature
Analysis
VEE Time Series – Spring
2014
Vincentius Hadi Hartono
Daily Temperature Analysis
Contents
Introduction .................................................................................................................................................. 2
Data ............................................................................................................................................................... 2
Analysis ......................................................................................................................................................... 2
Statistics .................................................................................................................................................... 2
Daily Temperature .................................................................................................................................... 2
Seasonally Adjusted .................................................................................................................................. 3
Correlogram .............................................................................................................................................. 4
Regression ................................................................................................................................................. 4
Durbin-Watson Test Statistic .................................................................................................................... 6
Box-Pierce Q Test ...................................................................................................................................... 6
Conclusion ...................................................................................................... Error! Bookmark not defined.
1
Introduction
Weather in a specific region can varies differently even though it might be the same day in different
years. We can use the regression and time series technique in order to estimate the temperature in a
given year. In this case, Los Angeles will be use as a case example. The reasons behind it are I grew up
and lived in Los Angeles for 7 years and the weather does not go to the extreme in this city. It is
interesting to see, analyze, and predict the weather in my hometown.
Data
The data is obtained from http://academic.udayton.edu/kissock/http/Weather/citylistWorld.htm. The
data is from Jan 1, 1995 to Sept 17, 2014 covering 7200 days in 19 years. There are 14 days missing in
the data and has been replaced by using linear interpolation between the adjacent days. The units are in
Fahrenheit.
Analysis
Statistics
The table below listed the average, median, mode, minimum, max, and the standard deviation of the
temperature in Los Angeles for the last 14 years.
Mean
Median
Mode
Minimum
Max
Std Dev
62.45
62.60
58.80
44.80
84.60
5.79
From the table above, Los Angeles can be said that the temperature does not vary extremely. The
difference between the max and the min is only around 40 degrees. From the standard deviation, we
also can see that the temperature does not vary much. The mean and the median also support the claim
with the difference between the two is small.
Daily Temperature
Below is the graph of daily temperature taken in 1995. As we can see, the graph is rough and hard to
analyze. We can use Regression technique of moving average to smoothen the graph. In this case, we
choose 31-day Moving Average for the best smooth graph compared to 7 Days and 15 Days Moving
Average.
2
Daily Temperature
90
80
70
60
50
40
Daily Temperature
30
20
10
1
16
31
46
61
76
91
106
121
136
151
166
181
196
211
226
241
256
271
286
301
316
331
346
361
0
As we can see, the 31 Days Moving Average graph is smoother than the real data.
31 Days Moving Average
80.0
70.0
60.0
50.0
40.0
31 Days Moving Average
30.0
20.0
10.0
1
17
33
49
65
81
97
113
129
145
161
177
193
209
225
241
257
273
289
305
321
337
353
0.0
Seasonally Adjusted
We can see in both graphs above that the temperature rise and fall due to the change in season (winter,
spring, summer, fall and then back to winter). We can de-seasonalize the data using the additive model
shown in graph below.
3
Seasonally Adjusted Temperature
25.0000
20.0000
15.0000
10.0000
Seasonally Adjusted
Temperature
5.0000
-5.0000
1995
1995
1996
1997
1998
1999
2000
2001
2002
2003
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2012
2013
0.0000
-10.0000
-15.0000
As we see in the above graph, the data mostly lies between -5 to 5 with many is around 0 with spikes
between years. The reason for the spikes is that at the beginning and the end of the year, the
temperature is far from the average temperature.
Correlogram
Below, we can see the graph of the auto-correlation in 1995 (the first year of the data). I choose to use
the 1995 number because this is the only year that we can see the significant lags (0.7817, 0.5129,
0.3452 respectively) before it drops to 0 as the lag increases. Since it is a mean reverting process thus it
follows an autoregressive process.
Sample Autocorrelation
0.9000
0.8000
0.7000
0.6000
0.5000
0.4000
Sample Autocorrelation
0.3000
0.2000
0.1000
-0.1000
1
17
33
49
65
81
97
113
129
145
161
177
193
209
225
241
257
273
289
305
321
337
353
0.0000
Regression
By using all the data for Los Angeles, a regression analysis has been made for AR(1), AR(2), and AR(3).
4
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.781266268
R Square
0.610376982
Adjusted R Square
0.610322747
Standard Error
2.243563042
Observations
7186
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
1
7184
7185
SS
MS
F
Significance F
56649.54412 56649.54412 11254.33568
0
36161.20368 5.033575122
92810.74779
Coefficients Standard Error
t Stat
P-value
Lower 95%
Upper 95% Lower 95.0% Upper 95.0%
-0.002052356
0.026466509 -0.077545389 0.938191844 -0.053934502 0.04982979 -0.053934502 0.04982979
0.781290182
0.007364655 106.0864538
0 0.766853291 0.795727074 0.766853291 0.795727074
AR (1)
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.797049056
R Square
0.635287198
Adjusted R Square
0.635185649
Standard Error
2.170809466
Observations
7186
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
X Variable 2
2
7183
7185
SS
MS
F
Significance F
58961.47991 29480.73995 6255.974451
0
33849.26789 4.712413739
92810.74779
Coefficients Standard Error
t Stat
P-value
Lower 95%
Upper 95%
-0.002775245
0.025608282 -0.108372933 0.913702898 -0.052975014 0.047424525
0.978831469
0.011415654 85.74467122
0 0.956453428 1.00120951
-0.252868323
0.011416376 -22.14961477 3.2883E-105 -0.275247779 -0.230488867
Lower 95.0% Upper 95.0%
-0.052975014 0.047424525
0.956453428 1.00120951
-0.275247779 -0.230488867
AR (2)
5
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.799833678
R Square
0.639733913
Adjusted R Square
0.639583425
Standard Error
2.157685421
Observations
7186
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
X Variable 2
X Variable 3
3
7182
7185
SS
MS
F
Significance F
59374.18282 19791.39427 4251.088404
0
33436.56498 4.655606374
92810.74779
Coefficients Standard Error
-0.002375366
0.025453498
1.006750853
0.011727722
-0.360943416
0.016140769
0.110431178
0.011729002
t Stat
P-value
Lower 95%
Upper 95%
-0.093321797 0.925650519 -0.052271714 0.047520982
85.84368241
0 0.983761065 1.02974064
-22.36221961 3.9012E-107 -0.392584073 -0.329302758
9.415223574 6.23003E-21 0.087438881 0.133423474
Lower 95.0%
-0.052271714
0.983761065
-0.392584073
0.087438881
Upper 95.0%
0.047520982
1.02974064
-0.329302758
0.133423474
AR (3)
By using regression statistics in Excel, we can see that R square for AR (1) is different compared to the R
square in AR (2) and AR (3) while AR (3) has the lowest standard error from them all. We can conclude
that it is better to use AR (2) when analyzing the data because using AR (3) is a hassle and not worth the
time and effort.
Durbin-Watson Test Statistic
In order to determine whether the data is auto-correlated or not, there are two tests to do so: DurbinWatson Test and Box-Pierce Q. Below is the result for DWS Test.
AR (1)
AR (2)
AR (3)
DWS
1.604917 1.944158 2.005551
For the DWS Test, the result should range between 0 and 4. A result of 2, the middle number between 0
and 4, means that the data has no correlation. As we see in the result, for AR (1), the number is below 2
which mean it has positive correlation and not a good to use AR (1). Given the standard error for AR (2)
is good enough compared to AR (3) and the DWS is close to 2, we can use AR (2) when calculating.
Box-Pierce Q Test
The other way to test the correlation is by using Box-Pierce Q Test. For this test, we find the 10% critical
value and compared it to the Box-Pierce Q Stats. If the Box-Pierce Q number is below than the critical
value, it means that it has no correlation. Below is the result.
Model
AR(1)
AR(2)
AR(3)
Box-Pierce Q Statistics
4,741.73
3,976.39
3,197.07
10% Critical Value
7,339.05
7,339.05
7,339.05
6