Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Fowler,Keith,Skau Executive Summary An earthquake is the release of energy in the form of a wave that travels through the ground. Earthquake prediction and analysis technology has seen increasing interest with the expense and density of cities. This analysis is of the 1995 Kobe earthquake in Japan that rated a 6.8 on the Richter scale. This earthquake was one of the most devastating in Japan's history and the shocks were detectable around the world. Data measuring the vertical acceleration of waves from this earthquake was recorded at Tasmania University in Hobart, Australia. The analysis of this data aims to extract relevant information of the earthquake event and compare it to background noise by doing nonlinear fits of the time series data with a sinusoidal curve. General information about Rayleigh waves as well as assumptions made in the analysis will be discussed. The null hypothesis that there is no net acceleration will also be tested. Results show that there is in fact a difference between background oscillations and the earthquake event. The analysis also demonstrated that there is an offset in the data that can be attributed to a calibration of the seismograph. The 1995 Kobe earthquake showed some interesting anomalies that are successfully analyses and explained in this analysis. 2 Fowler,Keith,Skau Table of Contents Executive Summary.......................................................................................................................1 Table of Contents...........................................................................................................................2 Introduction....................................................................................................................................3 Methods and Results......................................................................................................................4 Discussion......................................................................................................................................9 Appendix A...................................................................................................................................10 Appendix B....................................................................................................................................11 3 Fowler,Keith,Skau Introduction The 1995 Kobe earthquake was one of the most destructive earthquakes in Japan's history. The earthquake was centered about 20 km from Kobe, Japan and rated a 6.8 on the Richter scale. The analysis of this earthquake was done using data taken by Tasmania University in Hobart, Australia. The vertical acceleration of a seismograph was recorded for fifty one minutes at one second intervals. The raw acceleration from the seismograph was recorded in nanometers per second squared. The type of wave recorded by the type of seismograph used is called the Rayleigh wave. Rayleigh waves are some of the most destructive earthquake waves. Although the waves move horizontally along the surface of the earth, the ground moves vertically making Rayleigh waves transverse waves. Sources indicate that typical velocities of Rayleigh waves vary from 1000m/s to 5000m/s. For this analysis a velocity of 1000m/s is used to infer some physical meanings. The distance between the epicenter of the earthquake and the seismograph is so large that this needed to be taken into some consideration. It was determined that the distance effect would result in a linear scaling of the amplitudes of the acceleration. This assumption leads to result that the distance from the epicenter to the seismograph is unnecessary for this analysis. The assumption that there is a linear relation between the vertical position, velocity, and acceleration of the ground was also made due to the nature of the data. By assuming a linear transformation between position, velocity, and acceleration the frequencies found for the acceleration are also the frequencies of the ground position and velocity. The initial plotting of the data as a time series and a histogram yielded several intriguing anomalies. The time series plot seems to indicate that there are two different times of interest. The beginning of the time series plot appears to be background noise unrelated to the earthquake. The background is followed by what appears to be an event which is distinguishable by an increased 4 Fowler,Keith,Skau amplitude. These two different sections of data are analyzed separately to determine if there are any differences in the acceleration trends. The actual analysis of these two sections consists of a nonlinear sinusoidal fit. Assuming that there is no net vertical acceleration is the null hypothesis. The thirteen binned histogram of the accelerations, shows that the peak of the histogram appears to be non-zero. Analysis to determine if the mean acceleration is statistically different than zero was done. Methods and Results Time Independent Analysis Plots of the data suggest that the mean acceleration is non-zero. Since such a result would demand an explanation, a formal test for significance was performed on the entire data set to determine the mean. A One Sample t-test (appendix A) returned the result that the true mean is not zero. This result was given with a p-value well below .0001. A 95 percent confidence interval for the true mean was calculated to be from 1950 nm/s^2 and 2223 nm/s^2. This result provides formal evidence that the plot does in fact have a non-zero mean, and so an explanation must be produced. 5 Fowler,Keith,Skau A second qualitative observation made of the original data plot were the two separate phases, one earlier and one latter in the series. The hypothesis that the earlier calmer phase was simply background noise preceding the actual earthquake recorded during the second phase unfortunately cannot be definitively tested. However, the two phases can be separated and analyzed independently, at least supporting the hypotheses that some sort of phase change does take place in the data. Initially, we plot a histogram of the original data, displaying the distribution of accelerations throughout the entire sample. R calculated the mean and standard deviation of this data, and using this information a 6 Fowler,Keith,Skau Gaussian was superimposed on the plot (Appendix A). From the figure, it is can be observed that this Gaussian fails to adequately describe the data. However, by separating the data into the earlier and later sections, the analysis can be repeated to much more satisfactory results, again see the figures provided. Now that the data is divided, and there is evidence that this division is sensible, further analysis can be performed to compare these two regions. Keeping in mind the non-zero mean discovered earlier from the entire data set, it seems sensible to test the mean of these two regions, for one might have a mean of zero and the other might be the cause of the discrepancy. Performing a t-test on both sections yields the following results. The early set of data, exhibiting smaller amplitudes and suspected of being composed mostly of background noise was found to have a non zero mean with pvalue bellow .0001. Likewise, the latter data, which exhibited larger oscillations, also was found to have a non-zero mean with a very low p-value. The 95 percent confidence intervals were found as follows; the mean of the earlier data was estimated between 2544 and 2912 nm/s^2. The mean of the latter data was estimated between 1596 and 3398 nm/s^2. Since these two ranges completely overlap, we must reject the hypothesis that these two sections of data have two separate means. However, we do note that the sets do have significantly different standard deviations. The earlier data with a smaller range of acceleration values has a standard deviation 3,600 while the late quake has period had a much higher standard deviation of 13,000. From this we do conclude that the regions exhibit different oscillatory behavior, and thus the oscillations of each should be examined separately. Time Dependent Analysis As previously stated, we did observe that the data could be distinguished by two relatively different types of behavior. Between the initial measurement and about 1500 seconds is what we have determined to be the earlier part or noise of the earthquake, and from 1500 to the end of measurement we have determined to be the larger or later part of the earthquake. Although this division may not accurately represent the actual physical transition in the data, due to the volume of measurements, 7 Fowler,Keith,Skau choosing the somewhat arbitrary value of 1500 to partition the data does not reduce the validity of our statistical analysis. Our first intention was to understand the frequencies of oscillations of the earthquake as we noticed the oscillatory behavior of the accelerations. We then realized that our data could possibly be fitted by the following model: Y = A + B*sin(C*(t-D)) where Y is the measured acceleration in nm/s^2, A is the vertical shift of the accelerations, B is the amplitude of the curve, C is the angular frequency of oscillations, D is the phase shift, and t is the time in seconds from the first second of the particular subset of data. At first, we attempted to fit the full subsets (the earlier part and the later part) separately with their own model. However, it was difficult to find coefficients such that the whole subset was relatively accurate at all – parts of the model would fit accurately, but other parts would be significantly off. In order to simply understand the angular frequency of each subset, we simply took a smaller subset of each part and attempted to model these to get an understanding of the angular frequency of its parent subset. For the earlier part of the earthquake, we used the representative subset between 400 and 449 seconds, and for the later part of the earthquake, we used the representative subset between 2018 and 2083. There was no specific justification for picking these subsets except that they appeared to be the most sinusoidal. Using the nonlinear least squares method in R to fit our model, we obtained the following fits for each of our smaller subsets: 8 Fowler,Keith,Skau For the earlier subset, we calculated the following coefficients for our model: A = 2666.699 B = 3524.596 C = -1.147 D = 3.184 And, for the later subset, we calculated the following coefficients for our model: A = 2521.4846 B = 27988.9959 C = -0.2776 D = -6.5806 These coefficients fit each subset of data very accurately, as shown by their respective graphs as well as the summary of each fit (Appendix B). First, we see that the models disagree in coefficient A, suggesting that each partition has a different offset in acceleration, but as previously shown, this difference is not statistically significant. Clearly, the amplitudes of each model are also largely different, allowing us to determine that much of the damage caused by the earthquake can be attributed to the later part of the data. This is supported by the frequency of oscillations and physical wavelength of the transverse waves. In the later earthquake model, the angular frequency of 0.2776 rad/sec (the negative can be dropped because we are only interested in the magnitude) can be transformed to a frequency of 0.0442 Hz (by f=2π/c) and to an average period of about 22.63 seconds (τ=1/f). Using these values and the fact that the surface waves travel at around 1000 m/s, the wavelength (λ=vτ) of these waves was approximately 22,630 meters. In comparison, the frequency for the earlier data was 9 Fowler,Keith,Skau approximately 0.18255 Hz, the period about 5.477 seconds, with a wavelength of 5,477.93 meters. Of course, the coefficient D in each model is relatively unimportant in a physical sense, and is only used in fitting the model. Discussion All formal t-tests conducted yield the conclusion that the data has a non-zero mean acceleration value. This has been interpreted as a calibration error in the measuring device, for if the ground did accelerate as the data suggests, then the 51 minute span over which this was measured would result in a net displacement of 16 cm. Since this data was measured in Australia, thousands of miles from the epicenter, such a drastic effect is simply unreasonable. Furthermore, after analyzing separating the data into a high and low amplitude phase, analyzing histograms with respect to best-fit Gaussian curves, and performing a t-test on the separated data, while we found the two phases exhibited the same mean, they exhibited drastically different standard deviation. These results suggest that there is a difference in the two phases, and we re-assert our hypothesis that the high amplitude data was the actual earthquake, while the preceding phase was simply background noise. A closer comparison between the background and quake event of the time dependent analysis yield some interesting results. The amplitude of the oscillations divided by one half of the wavelength of the waves will create a normalized destructive factor for the two types of waves. In essence, this is rise over run. Though the two numbers have no meaning alone, relative to each other one can see that number representing the background of about 1.2 is much smaller than the number representing the quake event which is 2.5. This indicates that the slope of the ground changes more during the quake event than during background oscillations, as expected. 10 Fowler,Keith,Skau Appendix A R Code: For Making Plots: > t ← c(-40000:40000) > plot(quake$V1,xlab="Time",ylab="Acceleration") > hist(earlyquake,prob=T) > lines(t,dnorm(t,mean(earlyquake),sd(earlyquake))) > hist(latequake,prob=T) > lines(t2,dnorm(t,mean(latequake),sd(latequake))) > hist(quake$V1,prob=T,xlab="total quake",ylab="Density",main="Histogram of Total Quake") > lines(t,dnorm(t,mean(quake$V1),sd(quake$V1))) > lines(t1,dnorm(t,mean(earlyquake),sd(earlyquake)),col="Red") > lines(t1,dnorm(t,mean(latequake),sd(latequake)),col="Blue") T Test for Mean of Total Quake Data: > t.test(quake) One Sample t-test data: quake t = 29.5622, df = 6095, p-value < 2.2e-16 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 1946.787 2223.319 sample estimates: mean of x 2085.053 T Tests for Early and Later Phase of Data: > t.test(latequake) One Sample t-test data: latequake t = 5.4426, df = 800, p-value = 6.988e-08 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 1596.717 3398.206 sample estimates: mean of x 2497.462 11 Fowler,Keith,Skau > t.test(earlyquake) One Sample t-test data: earlyquake t = 29.071, df = 1499, p-value < 2.2e-16 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 2544.166 2912.341 sample estimates: mean of x 2728.253 Computing Mean and Standard Deviation for all 3 sets: > sd(quake) [1] 7697.899 > mean(quake) [1] 2645.606 > sd(earlyquake) [1] 3634.721 > mean(earlyquake) [1] 2728.253 > sd(latequake) [1] 12987.1 > mean(latequake) [1] 2497.462 APPENDIX B # MIDDLE EARTHQUAKE > model=function(a,b,c,d,t){a+b*x*sin(c*(t-d))} > midquake=quake$V2[2018:2083] > midtime=c(0:65) > plot.ts(midquake,ylab="Acceleration (nm/s^2)",main="Earthquake from 2018 to 2083 seconds") > fit1=nls(midquake ~ model(a,b,c,d,midtime),start=list(a=2500,b=30000,c=-.3,d=-5)) > fit1 Nonlinear regression model model: midquake ~ model(a, b, c, d, midtime) data: parent.frame() a b c d 2521.4846 27988.9959 -0.2776 -6.5806 residual sum-of-squares: 1.534e+09 12 Fowler,Keith,Skau Number of iterations to convergence: 5 Achieved convergence tolerance: 5.408e-06 > lines(model(2521.484582, 27988.995921 , -0.277636, -6.580625, midtime),col="red") > summary(fit1) Formula: midquake ~ model(a, b, c, d, midtime) Parameters: Estimate Std. Error t value Pr(>|t|) a 2.521e+03 6.347e+02 3.972 0.000188 *** b 2.799e+04 8.824e+02 31.718 < 2e-16 *** c -2.776e-01 1.636e-03 -169.702 < 2e-16 *** d -6.581e+00 2.579e-01 -25.520 < 2e-16 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 4975 on 62 degrees of freedom Number of iterations to convergence: 6 Achieved convergence tolerance: 2.66e-06 #EARLY QUAKE > earlyquake=quake$V2[400:449] > earlytime=c(0:49) > plot.ts(earlyquake,ylab="Acceleration (nm/s^2)",xlab="Time since 400 seconds",main="Earthquake from 400 to 449 seconds") > fit2=nls(earlyquake ~ model(a,b,c,d,earlytime),start=list(a=2700,b=3500,c=-1.2,d=3)) > fit2 Nonlinear regression model model: earlyquake ~ model(a, b, c, d, earlytime) data: parent.frame() a b c d 2666.699 3524.596 -1.147 3.184 residual sum-of-squares: 1.85e+08 Number of iterations to convergence: 13 Achieved convergence tolerance: 6.616e-06 > lines(model(2666.699, 3524.596, -1.147, 3.184 , earlytime),col="red") > summary(fit2) Formula: earlyquake ~ model(a, b, c, d, earlytime) Parameters: Estimate Std. Error t value Pr(>|t|) 13 a 2.667e+03 2.841e+02 9.388 2.92e-12 *** b 3.525e+03 4.017e+02 8.773 2.20e-11 *** c -1.147e+00 7.865e-03 -145.775 < 2e-16 *** d 3.184e+00 1.746e-01 18.239 < 2e-16 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2005 on 46 degrees of freedom Number of iterations to convergence: 13 Achieved convergence tolerance: 6.616e-06 Fowler,Keith,Skau