Download Data Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Data Analysis
Do it yourself!
What to do with your data?
• Report it to professionals (e.g., AAVSO)
– Excellent! A real service to science; don’t
neglect this
• Publish observations (e.g., JAAVSO)
• Analyze it – yourself!
But …
• I’m not a mathematician
– Let the computer do the math
• I’m not a programmer
– Get programs from the net (often free)
• I don’t know how to use or interpret them
– Neither do the pros!
– Practice, practice, practice …
Time Series Analysis
• A time series is a set of data pairs
(ta , xa ), a  1,2,3,..., N
• t is the time, x is the data value
• Usually, times are assumed error-free
• Data = Signal + Error
x(t )  f (t )  
• x can be anthing, e.g. brightness of variables star,
time of eclipse, eggs/day from a laying hen
Basic properties of data x
Actual
Estimated
• Mean =  = expected
value
• Standard deviation = 
expected rms
difference from mean
• Average = estimated 
• Sample standard
deviation = estimated 
Average and sample standard
deviation
• Average
1
x
N
x
• Sample standard deviation
s
1
2
(x  x)

N 1
Method #1: world’s best
• Eye + Brain: Look at the data!
• Plot x as a function of t: Explore!
• Scientific name:
Visual Inspection
• World’s best – but not infallible
• Programs:
– TS
– MAGPLOT
http://www.aavso.org
http://www.aavso.org
Method #2: Fourier Analysis
• Period analysis and curve-fitting
• Powerful, well-understood, popular
• Programs
– TS
– PerAnSo
http://www.aavso.org
http://www.peranso.com
Method #3: Wavelet Analysis
• Time-frequency analysis
• Old versions bad, new version good
• Programs:
– WWZ
– WinWWZ
http://www.aavso.org
http://www.aavso.org
Visual Inspection
Let’s take a look
Fourier Analysis
Fourier analysis for period search
• Match the data to sine/cosine waves
f (t )  c0  c1 cos( 2t )  c2 sin( 2t )
•
•
•
•
 = frequency
Period = P  1 /
Amplitude = A = size of fluctuation
Obvious choice is period; mathematically
sound choice is frequency
Null Hypothesis (important!)
•
•
•
•
Null hypothesis: no time variation at all
So f (t )   = constant
So, xa     a
Quite important! Often neglected. Even
the pros often forget this.
Is it real?
• Fit produces a test statistic under the null
hypothesis
2
• Is usually “  /degree of freedom” (d)
• Linear:  2  4 is significant (not just by
accident) at 95% confidence
• 95% confidence means 5% false-alarm
probability
Meaning of significance
• Significance does not mean the signal is
linear, sinusoidal, periodic, etc.
• It only means the null hypothesis is
incorrect, i.e., the signal is not constant
• Important!!!
Pre-whitening
• If you find a significant fit, then subtract the
estimated signal, leaving residuals
• Analyze the residuals for more structure
• This process is called pre-whitening
How to choose frequency?
• Test all reasonable values, get a “strength of
fit” for each. Common is “chi-square per
degree of freedom” (but there are many)
• Plot frequency .vs. fit – the Fourier
transform (aka periodogram, aka power
spectrum)
Fourier decomposition
Any periodic function of period P
(frequency   1 / P) can be expressed as a
Fourier series:
F (t )  a  b1 sin( 2t )  c1 cos( 2t )
 b2 sin( 4t )  c2 cos( 4t )
 b3 sin( 6t )  c3 cos(6t )  ...
 bn sin( 2nt )  cn cos( 2nt )  ...
Fundamental + harmonics
For a pure sinusoid, expect response at
frequency 
For a general periodic signal at a given
frequency, expect a fundamental component
at  , as well as harmonics at frequencies
2 , 3 , 4 , etc.
Lots of Fourier methods
• FFT: fast Fourier transform
–
–
–
–
Not just fast: it’s wicked fast
Requires even time spacing
Requires N=integer power of 2
Beware!
• DFT: discrete Fourier transform
– Applies to any time sampling, but incorrect results for
highly uneven (as in astronomy!)
– Beware!
Problems from uneven
time sampling
• Aliasing: false peaks, often from a periodic
data density
• Aliases at    signal  n data
• Common in astronomy: data density have a
period P = 1 yr = 365.2422 d, so
 data  0.002738
• Solution: pre-whitening
Aliasing
Aliasing: UZ Hya
Problems from uneven
time sampling
• Mis-calculation of frequency (slightly) and
amplitude (greatly); sabotages prewhitening
Solution: better Fourier methods
(for astronomy)
• Lomb-Scargle modified periodogram
– Improvement over FFT, DFT
• CLEAN spectrum
– Bigger improvement
• DCDFT: date-compensated discrete Fourier
transform (this is the one you want)
• CLEANEST spectrum: DCDFT-like for
multiple frequencies
DCDFT
• Much better estimates of period, amplitude
Let’s take a look
• Peranso (uses DCDFT and CLEANEST)
• Available from CBA Belgium
– http://www.peranso.com
Fourier transform (CLEANEST)
of TU Cas
Wavelet Analysis
Wavelets
• Fit sine/cosine-like functions of brief
duration
• Shift them through time
• Gives a time-frequency analysis
Problems
• Same old same old: uneven time spacing,
especially variable data density, invalidate
the results
• But: even worse than Fourier
• Essentially useless for most astronomical
data
Wavelet methods
• DWT: discrete wavelet transform
– Just not right for unevenly sampled data
(astronomy!)
• Solution: WWZ =
weighted wavelet Z-transform
Let’s take a look
Data Analysis
• Do it yourself
• Use your eyes and brain
• Healthy skepticism
• [email protected]
• Enjoy!