Download Big Data Sources - Cedarville University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Big Data
Steven Gollmer
Cedarville University
Working with Large Data
•
•
•
•
•
•
Accessing data
Collection and calibration assumptions
Selecting appropriate parameters
Formatting
Calculation
Testing hypothesis
Hipparcos Space Astrometry
• Main Page
– http://www.rssd.esa.int/index.php?project=HIPPARCOS
• Data Catalogues
– http://www.rssd.esa.int/index.php?project=HIPPARCOS&page=
Overview
– http://cdsweb.u-strasbg.fr/
• Software
– Desktop http://www.rssd.esa.int/index.php?project=HIPPARCOS&page=
Celestia2000
– Search tool http://www.rssd.esa.int/index.php?project=HIPPARCOS&page=
multisearch2
• Data Format
– Flexible Image Transport System (FITS) http://fits.gsfc.nasa.gov/
Sloan Digital Sky Survey
• Main Page
– http://www.sdss.org/
• Data
– 9th Data Release - http://www.sdss3.org/dr9/
– Archive Server - http://dr9.sdss3.org/
• Software
– IDL - http://www.sdss3.org/dr9/software/
Weather Data
• NOAA National Climatic Data Center
– http://www.ncdc.noaa.gov/
– Popular Data - http://www.ncdc.noaa.gov/mostpopular-data
• Environmental Modeling Center
– http://www.emc.ncep.noaa.gov/
TERRA/AQUA
• http://terra.nasa.gov
• http://aqua.nasa.gov
• Data
– LARC DAAC - http://eosweb.larc.nasa.gov/
– LAADS Web http://ladsweb.nascom.nasa.gov/index.html
• Format
– NetCDF http://www.unidata.ucar.edu/software/netcdf/
– HDF - http://www.hdfgroup.org/
Other Topics of Interest
• Topics of Interest
– Extra-Solar Planets
– Asteroid Mapping and Near Earth Detection
– Earthquakes
• Agencies and Products
–
–
–
–
–
NASA - http://www.nasa.gov/home/index.html
ESA - http://www.esa.int/ESA
USGS - http://www.usgs.gov/
GOES - http://www.goes.noaa.gov/
Paleoclimatology http://www.ncdc.noaa.gov/paleo/pubs/pcn/pcnproxy.html
Hypothesis Testing
• P-value
– Probability of a value being found assuming the null hypothesis.
– Usually reject the null hypothesis if p < 0.05 or 0.01 (5% or 1%)
– May have more stringent criteria for rejection.
• T-test
– Assume a normal distribution
– One-sample test 𝑡 =
– Two-sample test 𝑡 =
𝑥−𝜇0
𝑆/ 𝑛
𝑀𝑥 −𝑀𝑦
S – Estimate of standard deviation
M – Estimate of the mean
n – Number of samples
𝑆𝑥2 𝑆𝑦2
+
𝑛𝑥 𝑛𝑦
– Check significance using T distribution table
• Compare t value and degrees of freedom
– 1 sample df = n-1
2 sample df = n1 + n2 – 2
Example
• Hypothesis
– Data is from a
distribution with
mean m = 2.5
• Statistics
– X = 3.317
– S = 0.7139
– df = 5
• Result
– T = 2.80
• 2 tail rejection
– p = 0.05 is 2.571
– p = 0.02 is 3.365
Data
2.3
4.2
3.6
3.1
2.8
3.9
Z-Value
• Assume a normal random variable
s2)
– x ~ (m,
– m – mean
– s – standard deviation
𝑓 𝑥 =
• Z – Value
1
𝜎 2𝜋
−(𝑥−𝜇)2
𝑒 2𝜎2
𝑥−𝜇
𝑧~
𝜎
– z ~ (0, 1)
• If number of samples is large, then z-test will
work on one-sample test instead of a t-test.
– erf(x)=
2 𝑥 −𝑢2
𝑒
0
𝜋
𝑑𝑢
– One Tail: p=1/2(1+erf(z/ 2)
Two Tail: p=erf(z/ 2)
Related documents