Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Big Data Steven Gollmer Cedarville University Working with Large Data • • • • • • Accessing data Collection and calibration assumptions Selecting appropriate parameters Formatting Calculation Testing hypothesis Hipparcos Space Astrometry • Main Page – http://www.rssd.esa.int/index.php?project=HIPPARCOS • Data Catalogues – http://www.rssd.esa.int/index.php?project=HIPPARCOS&page= Overview – http://cdsweb.u-strasbg.fr/ • Software – Desktop http://www.rssd.esa.int/index.php?project=HIPPARCOS&page= Celestia2000 – Search tool http://www.rssd.esa.int/index.php?project=HIPPARCOS&page= multisearch2 • Data Format – Flexible Image Transport System (FITS) http://fits.gsfc.nasa.gov/ Sloan Digital Sky Survey • Main Page – http://www.sdss.org/ • Data – 9th Data Release - http://www.sdss3.org/dr9/ – Archive Server - http://dr9.sdss3.org/ • Software – IDL - http://www.sdss3.org/dr9/software/ Weather Data • NOAA National Climatic Data Center – http://www.ncdc.noaa.gov/ – Popular Data - http://www.ncdc.noaa.gov/mostpopular-data • Environmental Modeling Center – http://www.emc.ncep.noaa.gov/ TERRA/AQUA • http://terra.nasa.gov • http://aqua.nasa.gov • Data – LARC DAAC - http://eosweb.larc.nasa.gov/ – LAADS Web http://ladsweb.nascom.nasa.gov/index.html • Format – NetCDF http://www.unidata.ucar.edu/software/netcdf/ – HDF - http://www.hdfgroup.org/ Other Topics of Interest • Topics of Interest – Extra-Solar Planets – Asteroid Mapping and Near Earth Detection – Earthquakes • Agencies and Products – – – – – NASA - http://www.nasa.gov/home/index.html ESA - http://www.esa.int/ESA USGS - http://www.usgs.gov/ GOES - http://www.goes.noaa.gov/ Paleoclimatology http://www.ncdc.noaa.gov/paleo/pubs/pcn/pcnproxy.html Hypothesis Testing • P-value – Probability of a value being found assuming the null hypothesis. – Usually reject the null hypothesis if p < 0.05 or 0.01 (5% or 1%) – May have more stringent criteria for rejection. • T-test – Assume a normal distribution – One-sample test 𝑡 = – Two-sample test 𝑡 = 𝑥−𝜇0 𝑆/ 𝑛 𝑀𝑥 −𝑀𝑦 S – Estimate of standard deviation M – Estimate of the mean n – Number of samples 𝑆𝑥2 𝑆𝑦2 + 𝑛𝑥 𝑛𝑦 – Check significance using T distribution table • Compare t value and degrees of freedom – 1 sample df = n-1 2 sample df = n1 + n2 – 2 Example • Hypothesis – Data is from a distribution with mean m = 2.5 • Statistics – X = 3.317 – S = 0.7139 – df = 5 • Result – T = 2.80 • 2 tail rejection – p = 0.05 is 2.571 – p = 0.02 is 3.365 Data 2.3 4.2 3.6 3.1 2.8 3.9 Z-Value • Assume a normal random variable s2) – x ~ (m, – m – mean – s – standard deviation 𝑓 𝑥 = • Z – Value 1 𝜎 2𝜋 −(𝑥−𝜇)2 𝑒 2𝜎2 𝑥−𝜇 𝑧~ 𝜎 – z ~ (0, 1) • If number of samples is large, then z-test will work on one-sample test instead of a t-test. – erf(x)= 2 𝑥 −𝑢2 𝑒 0 𝜋 𝑑𝑢 – One Tail: p=1/2(1+erf(z/ 2) Two Tail: p=erf(z/ 2)