Download Final report

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
1
Fowler,Keith,Skau
Executive Summary
An earthquake is the release of energy in the form of a wave that travels through the ground.
Earthquake prediction and analysis technology has seen increasing interest with the expense and
density of cities. This analysis is of the 1995 Kobe earthquake in Japan that rated a 6.8 on the Richter
scale. This earthquake was one of the most devastating in Japan's history and the shocks were
detectable around the world. Data measuring the vertical acceleration of waves from this earthquake
was recorded at Tasmania University in Hobart, Australia.
The analysis of this data aims to extract relevant information of the earthquake event and
compare it to background noise by doing nonlinear fits of the time series data with a sinusoidal curve.
General information about Rayleigh waves as well as assumptions made in the analysis will be
discussed. The null hypothesis that there is no net acceleration will also be tested.
Results show that there is in fact a difference between background oscillations and the
earthquake event. The analysis also demonstrated that there is an offset in the data that can be attributed
to a calibration of the seismograph. The 1995 Kobe earthquake showed some interesting anomalies that
are successfully analyses and explained in this analysis.
2
Fowler,Keith,Skau
Table of Contents
Executive Summary.......................................................................................................................1
Table of Contents...........................................................................................................................2
Introduction....................................................................................................................................3
Methods and Results......................................................................................................................4
Discussion......................................................................................................................................9
Appendix A...................................................................................................................................10
Appendix B....................................................................................................................................11
3
Fowler,Keith,Skau
Introduction
The 1995 Kobe earthquake was one of the most destructive earthquakes in Japan's history. The
earthquake was centered about 20 km from Kobe, Japan and rated a 6.8 on the Richter scale. The
analysis of this earthquake was done using data taken by Tasmania University in Hobart, Australia. The
vertical acceleration of a seismograph was recorded for fifty one minutes at one second intervals. The
raw acceleration from the seismograph was recorded in nanometers per second squared.
The type of wave recorded by the type of seismograph used is called the Rayleigh wave.
Rayleigh waves are some of the most destructive earthquake waves. Although the waves move
horizontally along the surface of the earth, the ground moves vertically making Rayleigh waves
transverse waves. Sources indicate that typical velocities of Rayleigh waves vary from 1000m/s to
5000m/s. For this analysis a velocity of 1000m/s is used to infer some physical meanings.
The distance between the epicenter of the earthquake and the seismograph is so large that this
needed to be taken into some consideration. It was determined that the distance effect would result in a
linear scaling of the amplitudes of the acceleration. This assumption leads to result that the distance
from the epicenter to the seismograph is unnecessary for this analysis. The assumption that there is a
linear relation between the vertical position, velocity, and acceleration of the ground was also made due
to the nature of the data. By assuming a linear transformation between position, velocity, and
acceleration the frequencies found for the acceleration are also the frequencies of the ground position
and velocity.
The initial plotting of the data as a time series and a histogram yielded several intriguing
anomalies. The time series plot seems to indicate that there are two different times of interest. The
beginning of the time series plot appears to be background noise unrelated to the earthquake. The
background is followed by what appears to be an event which is distinguishable by an increased
4
Fowler,Keith,Skau
amplitude. These two different sections of data are analyzed separately to determine if there are any
differences in the acceleration trends. The actual analysis of these two sections consists of a nonlinear
sinusoidal fit. Assuming that there is no net vertical acceleration is the null hypothesis. The thirteen
binned histogram of the accelerations, shows that the peak of the histogram appears to be non-zero.
Analysis to determine if the mean acceleration is statistically different than zero was done.
Methods and Results
Time Independent Analysis
Plots of the data suggest that the mean acceleration is non-zero. Since such a result would
demand an explanation, a formal test for significance was performed on the entire data set to determine
the mean. A One Sample t-test (appendix A) returned the result that the true mean is not zero. This
result was given with a p-value well below .0001. A 95 percent confidence interval for the true mean
was calculated to be from 1950 nm/s^2 and 2223 nm/s^2. This result provides formal evidence that the
plot does in fact have a non-zero mean, and so an explanation must be produced.
5
Fowler,Keith,Skau
A second qualitative observation made of the original data plot were the two separate phases,
one earlier and one latter in the series. The hypothesis that the earlier calmer phase was simply
background noise preceding the actual earthquake recorded during the second phase unfortunately
cannot be definitively tested. However, the two phases can be separated and analyzed independently, at
least supporting the hypotheses that some sort of phase change does take place in the data. Initially, we
plot a histogram of the original data, displaying the distribution of accelerations throughout the entire
sample. R calculated the mean and standard deviation of this data, and using this information a
6
Fowler,Keith,Skau
Gaussian was superimposed on the plot (Appendix A). From the figure, it is can be observed that this
Gaussian fails to adequately describe the data. However, by separating the data into the earlier and
later sections, the analysis can be repeated to much more satisfactory results, again see the figures
provided. Now that the data is divided, and there is evidence that this division is sensible, further
analysis can be performed to compare these two regions. Keeping in mind the non-zero mean
discovered earlier from the entire data set, it seems sensible to test the mean of these two regions, for
one might have a mean of zero and the other might be the cause of the discrepancy. Performing a t-test
on both sections yields the following results. The early set of data, exhibiting smaller amplitudes and
suspected of being composed mostly of background noise was found to have a non zero mean with pvalue bellow .0001. Likewise, the latter data, which exhibited larger oscillations, also was found to
have a non-zero mean with a very low p-value. The 95 percent confidence intervals were found as
follows; the mean of the earlier data was estimated between 2544 and 2912 nm/s^2. The mean of the
latter data was estimated between 1596 and 3398 nm/s^2. Since these two ranges completely overlap,
we must reject the hypothesis that these two sections of data have two separate means. However, we
do note that the sets do have significantly different standard deviations. The earlier data with a smaller
range of acceleration values has a standard deviation 3,600 while the late quake has period had a much
higher standard deviation of 13,000. From this we do conclude that the regions exhibit different
oscillatory behavior, and thus the oscillations of each should be examined separately.
Time Dependent Analysis
As previously stated, we did observe that the data could be distinguished by two relatively
different types of behavior. Between the initial measurement and about 1500 seconds is what we have
determined to be the earlier part or noise of the earthquake, and from 1500 to the end of measurement
we have determined to be the larger or later part of the earthquake. Although this division may not
accurately represent the actual physical transition in the data, due to the volume of measurements,
7
Fowler,Keith,Skau
choosing the somewhat arbitrary value of 1500 to partition the data does not reduce the validity of our
statistical analysis.
Our first intention was to understand the frequencies of oscillations of the earthquake as we
noticed the oscillatory behavior of the accelerations. We then realized that our data could possibly be
fitted by the following model:
Y = A + B*sin(C*(t-D))
where Y is the measured acceleration in nm/s^2, A is the vertical shift of the accelerations, B is the
amplitude of the curve, C is the angular frequency of oscillations, D is the phase shift, and t is the time
in seconds from the first second of the particular subset of data. At first, we attempted to fit the full
subsets (the earlier part and the later part) separately with their own model. However, it was difficult to
find coefficients such that the whole subset was relatively accurate at all – parts of the model would fit
accurately, but other parts would be significantly off. In order to simply understand the angular
frequency of each subset, we simply took a smaller subset of each part and attempted to model these to
get an understanding of the angular frequency of its parent subset. For the earlier part of the
earthquake, we used the representative subset between 400 and 449 seconds, and for the later part of
the earthquake, we used the representative subset between 2018 and 2083. There was no specific
justification for picking these subsets except that they appeared to be the most sinusoidal.
Using the nonlinear least squares method in R to fit our model, we obtained the following fits
for each of our smaller subsets:
8
Fowler,Keith,Skau
For the earlier subset, we calculated the following coefficients for our model:
A = 2666.699 B = 3524.596
C = -1.147 D = 3.184
And, for the later subset, we calculated the following coefficients for our model:
A = 2521.4846 B = 27988.9959 C = -0.2776 D = -6.5806
These coefficients fit each subset of data very accurately, as shown by their respective graphs as well as
the summary of each fit (Appendix B). First, we see that the models disagree in coefficient A,
suggesting that each partition has a different offset in acceleration, but as previously shown, this
difference is not statistically significant. Clearly, the amplitudes of each model are also largely
different, allowing us to determine that much of the damage caused by the earthquake can be attributed
to the later part of the data. This is supported by the frequency of oscillations and physical wavelength
of the transverse waves. In the later earthquake model, the angular frequency of 0.2776 rad/sec (the
negative can be dropped because we are only interested in the magnitude) can be transformed to a
frequency of 0.0442 Hz (by f=2π/c) and to an average period of about 22.63 seconds (τ=1/f). Using
these values and the fact that the surface waves travel at around 1000 m/s, the wavelength (λ=vτ) of
these waves was approximately 22,630 meters. In comparison, the frequency for the earlier data was
9
Fowler,Keith,Skau
approximately 0.18255 Hz, the period about 5.477 seconds, with a wavelength of 5,477.93 meters. Of
course, the coefficient D in each model is relatively unimportant in a physical sense, and is only used in
fitting the model.
Discussion
All formal t-tests conducted yield the conclusion that the data has a non-zero mean acceleration
value. This has been interpreted as a calibration error in the measuring device, for if the ground did
accelerate as the data suggests, then the 51 minute span over which this was measured would result in a
net displacement of 16 cm. Since this data was measured in Australia, thousands of miles from the
epicenter, such a drastic effect is simply unreasonable.
Furthermore, after analyzing separating the data into a high and low amplitude phase, analyzing
histograms with respect to best-fit Gaussian curves, and performing a t-test on the separated data, while
we found the two phases exhibited the same mean, they exhibited drastically different standard
deviation. These results suggest that there is a difference in the two phases, and we re-assert our
hypothesis that the high amplitude data was the actual earthquake, while the preceding phase was
simply background noise.
A closer comparison between the background and quake event of the time dependent analysis
yield some interesting results. The amplitude of the oscillations divided by one half of the wavelength
of the waves will create a normalized destructive factor for the two types of waves. In essence, this is
rise over run. Though the two numbers have no meaning alone, relative to each other one can see that
number representing the background of about 1.2 is much smaller than the number representing the
quake event which is 2.5. This indicates that the slope of the ground changes more during the quake
event than during background oscillations, as expected.
10
Fowler,Keith,Skau
Appendix A
R Code:
For Making Plots:
> t ← c(-40000:40000)
> plot(quake$V1,xlab="Time",ylab="Acceleration")
> hist(earlyquake,prob=T)
> lines(t,dnorm(t,mean(earlyquake),sd(earlyquake)))
> hist(latequake,prob=T)
> lines(t2,dnorm(t,mean(latequake),sd(latequake)))
> hist(quake$V1,prob=T,xlab="total quake",ylab="Density",main="Histogram of Total Quake")
> lines(t,dnorm(t,mean(quake$V1),sd(quake$V1)))
> lines(t1,dnorm(t,mean(earlyquake),sd(earlyquake)),col="Red")
> lines(t1,dnorm(t,mean(latequake),sd(latequake)),col="Blue")
T Test for Mean of Total Quake Data:
> t.test(quake)
One Sample t-test
data: quake
t = 29.5622, df = 6095, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
1946.787 2223.319
sample estimates:
mean of x
2085.053
T Tests for Early and Later Phase of Data:
> t.test(latequake)
One Sample t-test
data: latequake
t = 5.4426, df = 800, p-value = 6.988e-08
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
1596.717 3398.206
sample estimates:
mean of x
2497.462
11
Fowler,Keith,Skau
> t.test(earlyquake)
One Sample t-test
data: earlyquake
t = 29.071, df = 1499, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
2544.166 2912.341
sample estimates:
mean of x
2728.253
Computing Mean and Standard Deviation for all 3 sets:
> sd(quake)
[1] 7697.899
> mean(quake)
[1] 2645.606
> sd(earlyquake)
[1] 3634.721
> mean(earlyquake)
[1] 2728.253
> sd(latequake)
[1] 12987.1
> mean(latequake)
[1] 2497.462
APPENDIX B
# MIDDLE EARTHQUAKE
> model=function(a,b,c,d,t){a+b*x*sin(c*(t-d))}
> midquake=quake$V2[2018:2083]
> midtime=c(0:65)
> plot.ts(midquake,ylab="Acceleration (nm/s^2)",main="Earthquake from 2018 to 2083 seconds")
> fit1=nls(midquake ~ model(a,b,c,d,midtime),start=list(a=2500,b=30000,c=-.3,d=-5))
> fit1
Nonlinear regression model
model: midquake ~ model(a, b, c, d, midtime)
data: parent.frame()
a
b
c
d
2521.4846 27988.9959 -0.2776 -6.5806
residual sum-of-squares: 1.534e+09
12
Fowler,Keith,Skau
Number of iterations to convergence: 5
Achieved convergence tolerance: 5.408e-06
> lines(model(2521.484582, 27988.995921 , -0.277636, -6.580625, midtime),col="red")
> summary(fit1)
Formula: midquake ~ model(a, b, c, d, midtime)
Parameters:
Estimate
Std. Error t value Pr(>|t|)
a 2.521e+03 6.347e+02 3.972 0.000188 ***
b 2.799e+04 8.824e+02 31.718 < 2e-16 ***
c -2.776e-01 1.636e-03 -169.702 < 2e-16 ***
d -6.581e+00 2.579e-01 -25.520 < 2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4975 on 62 degrees of freedom
Number of iterations to convergence: 6
Achieved convergence tolerance: 2.66e-06
#EARLY QUAKE
> earlyquake=quake$V2[400:449]
> earlytime=c(0:49)
> plot.ts(earlyquake,ylab="Acceleration (nm/s^2)",xlab="Time since 400 seconds",main="Earthquake
from 400 to 449 seconds")
> fit2=nls(earlyquake ~ model(a,b,c,d,earlytime),start=list(a=2700,b=3500,c=-1.2,d=3))
> fit2
Nonlinear regression model
model: earlyquake ~ model(a, b, c, d, earlytime)
data: parent.frame()
a
b
c
d
2666.699 3524.596 -1.147 3.184
residual sum-of-squares: 1.85e+08
Number of iterations to convergence: 13
Achieved convergence tolerance: 6.616e-06
> lines(model(2666.699, 3524.596, -1.147, 3.184 , earlytime),col="red")
> summary(fit2)
Formula: earlyquake ~ model(a, b, c, d, earlytime)
Parameters:
Estimate
Std. Error t value Pr(>|t|)
13
a 2.667e+03 2.841e+02 9.388 2.92e-12 ***
b 3.525e+03 4.017e+02 8.773 2.20e-11 ***
c -1.147e+00 7.865e-03 -145.775 < 2e-16 ***
d 3.184e+00 1.746e-01 18.239 < 2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2005 on 46 degrees of freedom
Number of iterations to convergence: 13
Achieved convergence tolerance: 6.616e-06
Fowler,Keith,Skau