Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Daily Temperature Analysis VEE Time Series – Spring 2014 Vincentius Hadi Hartono Daily Temperature Analysis Contents Introduction .................................................................................................................................................. 2 Data ............................................................................................................................................................... 2 Analysis ......................................................................................................................................................... 2 Statistics .................................................................................................................................................... 2 Daily Temperature .................................................................................................................................... 2 Seasonally Adjusted .................................................................................................................................. 3 Correlogram .............................................................................................................................................. 4 Regression ................................................................................................................................................. 4 Durbin-Watson Test Statistic .................................................................................................................... 6 Box-Pierce Q Test ...................................................................................................................................... 6 Conclusion ...................................................................................................... Error! Bookmark not defined. 1 Introduction Weather in a specific region can varies differently even though it might be the same day in different years. We can use the regression and time series technique in order to estimate the temperature in a given year. In this case, Los Angeles will be use as a case example. The reasons behind it are I grew up and lived in Los Angeles for 7 years and the weather does not go to the extreme in this city. It is interesting to see, analyze, and predict the weather in my hometown. Data The data is obtained from http://academic.udayton.edu/kissock/http/Weather/citylistWorld.htm. The data is from Jan 1, 1995 to Sept 17, 2014 covering 7200 days in 19 years. There are 14 days missing in the data and has been replaced by using linear interpolation between the adjacent days. The units are in Fahrenheit. Analysis Statistics The table below listed the average, median, mode, minimum, max, and the standard deviation of the temperature in Los Angeles for the last 14 years. Mean Median Mode Minimum Max Std Dev 62.45 62.60 58.80 44.80 84.60 5.79 From the table above, Los Angeles can be said that the temperature does not vary extremely. The difference between the max and the min is only around 40 degrees. From the standard deviation, we also can see that the temperature does not vary much. The mean and the median also support the claim with the difference between the two is small. Daily Temperature Below is the graph of daily temperature taken in 1995. As we can see, the graph is rough and hard to analyze. We can use Regression technique of moving average to smoothen the graph. In this case, we choose 31-day Moving Average for the best smooth graph compared to 7 Days and 15 Days Moving Average. 2 Daily Temperature 90 80 70 60 50 40 Daily Temperature 30 20 10 1 16 31 46 61 76 91 106 121 136 151 166 181 196 211 226 241 256 271 286 301 316 331 346 361 0 As we can see, the 31 Days Moving Average graph is smoother than the real data. 31 Days Moving Average 80.0 70.0 60.0 50.0 40.0 31 Days Moving Average 30.0 20.0 10.0 1 17 33 49 65 81 97 113 129 145 161 177 193 209 225 241 257 273 289 305 321 337 353 0.0 Seasonally Adjusted We can see in both graphs above that the temperature rise and fall due to the change in season (winter, spring, summer, fall and then back to winter). We can de-seasonalize the data using the additive model shown in graph below. 3 Seasonally Adjusted Temperature 25.0000 20.0000 15.0000 10.0000 Seasonally Adjusted Temperature 5.0000 -5.0000 1995 1995 1996 1997 1998 1999 2000 2001 2002 2003 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2012 2013 0.0000 -10.0000 -15.0000 As we see in the above graph, the data mostly lies between -5 to 5 with many is around 0 with spikes between years. The reason for the spikes is that at the beginning and the end of the year, the temperature is far from the average temperature. Correlogram Below, we can see the graph of the auto-correlation in 1995 (the first year of the data). I choose to use the 1995 number because this is the only year that we can see the significant lags (0.7817, 0.5129, 0.3452 respectively) before it drops to 0 as the lag increases. Since it is a mean reverting process thus it follows an autoregressive process. Sample Autocorrelation 0.9000 0.8000 0.7000 0.6000 0.5000 0.4000 Sample Autocorrelation 0.3000 0.2000 0.1000 -0.1000 1 17 33 49 65 81 97 113 129 145 161 177 193 209 225 241 257 273 289 305 321 337 353 0.0000 Regression By using all the data for Los Angeles, a regression analysis has been made for AR(1), AR(2), and AR(3). 4 SUMMARY OUTPUT Regression Statistics Multiple R 0.781266268 R Square 0.610376982 Adjusted R Square 0.610322747 Standard Error 2.243563042 Observations 7186 ANOVA df Regression Residual Total Intercept X Variable 1 1 7184 7185 SS MS F Significance F 56649.54412 56649.54412 11254.33568 0 36161.20368 5.033575122 92810.74779 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% -0.002052356 0.026466509 -0.077545389 0.938191844 -0.053934502 0.04982979 -0.053934502 0.04982979 0.781290182 0.007364655 106.0864538 0 0.766853291 0.795727074 0.766853291 0.795727074 AR (1) SUMMARY OUTPUT Regression Statistics Multiple R 0.797049056 R Square 0.635287198 Adjusted R Square 0.635185649 Standard Error 2.170809466 Observations 7186 ANOVA df Regression Residual Total Intercept X Variable 1 X Variable 2 2 7183 7185 SS MS F Significance F 58961.47991 29480.73995 6255.974451 0 33849.26789 4.712413739 92810.74779 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% -0.002775245 0.025608282 -0.108372933 0.913702898 -0.052975014 0.047424525 0.978831469 0.011415654 85.74467122 0 0.956453428 1.00120951 -0.252868323 0.011416376 -22.14961477 3.2883E-105 -0.275247779 -0.230488867 Lower 95.0% Upper 95.0% -0.052975014 0.047424525 0.956453428 1.00120951 -0.275247779 -0.230488867 AR (2) 5 SUMMARY OUTPUT Regression Statistics Multiple R 0.799833678 R Square 0.639733913 Adjusted R Square 0.639583425 Standard Error 2.157685421 Observations 7186 ANOVA df Regression Residual Total Intercept X Variable 1 X Variable 2 X Variable 3 3 7182 7185 SS MS F Significance F 59374.18282 19791.39427 4251.088404 0 33436.56498 4.655606374 92810.74779 Coefficients Standard Error -0.002375366 0.025453498 1.006750853 0.011727722 -0.360943416 0.016140769 0.110431178 0.011729002 t Stat P-value Lower 95% Upper 95% -0.093321797 0.925650519 -0.052271714 0.047520982 85.84368241 0 0.983761065 1.02974064 -22.36221961 3.9012E-107 -0.392584073 -0.329302758 9.415223574 6.23003E-21 0.087438881 0.133423474 Lower 95.0% -0.052271714 0.983761065 -0.392584073 0.087438881 Upper 95.0% 0.047520982 1.02974064 -0.329302758 0.133423474 AR (3) By using regression statistics in Excel, we can see that R square for AR (1) is different compared to the R square in AR (2) and AR (3) while AR (3) has the lowest standard error from them all. We can conclude that it is better to use AR (2) when analyzing the data because using AR (3) is a hassle and not worth the time and effort. Durbin-Watson Test Statistic In order to determine whether the data is auto-correlated or not, there are two tests to do so: DurbinWatson Test and Box-Pierce Q. Below is the result for DWS Test. AR (1) AR (2) AR (3) DWS 1.604917 1.944158 2.005551 For the DWS Test, the result should range between 0 and 4. A result of 2, the middle number between 0 and 4, means that the data has no correlation. As we see in the result, for AR (1), the number is below 2 which mean it has positive correlation and not a good to use AR (1). Given the standard error for AR (2) is good enough compared to AR (3) and the DWS is close to 2, we can use AR (2) when calculating. Box-Pierce Q Test The other way to test the correlation is by using Box-Pierce Q Test. For this test, we find the 10% critical value and compared it to the Box-Pierce Q Stats. If the Box-Pierce Q number is below than the critical value, it means that it has no correlation. Below is the result. Model AR(1) AR(2) AR(3) Box-Pierce Q Statistics 4,741.73 3,976.39 3,197.07 10% Critical Value 7,339.05 7,339.05 7,339.05 6