Download Parker_Summary

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of gamma-ray burst research wikipedia , lookup

Galaxy Zoo wikipedia , lookup

International Ultraviolet Explorer wikipedia , lookup

Cosmic distance ladder wikipedia , lookup

Wilkinson Microwave Anisotropy Probe wikipedia , lookup

Hipparcos wikipedia , lookup

Hubble Deep Field wikipedia , lookup

Observational astronomy wikipedia , lookup

Transcript
Project Summary for 2011 SULI Program at SLAC
July 1, 2011
Ashley Marie Parker
Mentor: Deborah Bard
1) Is blended two or more objects which have different redshifts superimposed?
2) Blended has“Multiple spectral peaks”? Don’t ordinary stars have many peaks?
3) Is the learning done on blended data or only on unblended?
4) How much spectral data does SDSS have?
The goal of this project is to utilize Sloan Digital Sky Survey’s data along with machine
learning techniques to ultimately increase the reliability of photometric redshift analysis for
“blended” galaxies. This is, to our knowledge, an original research project which will yield a
photometric redshift determination method, for “blended” galaxies, to be implemented in the
next generation of sky survey databases, namely LSST, Large Synaptic Survey Telescope.
The SDSS's newest data release DR8, covers approximately one third of the sky and
includes all photometric measurements that will be taken with this imaging camera. A
photometric redshift is measured using photometry, a method of looking at the light from an
object through various filters and using the overall magnitudes per filter to determine the
redshift. Photometry is much less time consuming than the alternate method of spectroscopic
redshift determination. In order to spectroscopically measure redshift there must be
significantly more light collected for the object, so the full spectrum can be seen rather than just
the intensities per filter, which makes this method far more accurate however it is also more
time consuming making it less useful for a large-scale data set. For the purpose of this project,
only galaxies for which the redshift has been determined by both photometric and
spectroscopic methods will be analyzed so that an accurate redshift measurement exists to test
against results from new machine learning techniques.
A “blended” object is defined by the SDSS, database as a light source, could be galaxy,
star, etc., for which spectral analysis shows multiple spectral peaks in the single light source,
meaning there are multiple objects present. Within the database the “frames pipeline” analyzes
the data to determine if a light emitting object is “blended”, if so, a de-blending algorithm is
used to separate the multiple objects into “child” objects whose spectra add to become the
“parent” image. Objects that are flagged as “blended” are given a unique parentID number
greater than zero, otherwise parentID is set to zero for the not “blended”.
The initial step of the project was to learn SQL, Structured Query Language, which is
used to write queries that acquire data from SDSS. The multitude of data available on SDSS's
database makes it ideal for “training” a machine learning program such that the example data
should show nearly every variation in galaxy type. This project made use of CasJobs DR8, a
program which utilizes SQL queries to acquire large amounts of data, from the most recent data
release. The queries allowed for request of specific useful quantities such as; spectroscopic
redshift measurement, two separate photometric redshift measurements using “random
forest” and “robust fit” methods, magnitudes in the bands u, g, r, I, z, parentID and uncertainty
measurements for all relevant quantities.
Data was requested for all objects which are of the type “galaxy” and downloaded for
use in the ROOT data analysis framework or possibly WEKA. Data was requested for both
“blended” and not blended objects so that a determination can be made if the photometric
redshift measurements for blended galaxies are less accurate than measurements of not
blended objects. It is hypothesized that the photometric redshift measurements will be less
precise for parentID > 0, if this is the case then there will be an investigation into which of the
predetermined photometric redshift determination method that is most accurate.
After the best predetermined photometric redshift method has been determined,
machine learning techniques will be used on the acquired data in order to find a more accurate
determination method for “blended” galaxies, assuming there is one. If this study proves useful
there will also be an investigation into “blending” with other object types, such as when stars
and galaxies are blended. Ultimately this work is hoped to be useful for the future LSST
database which will not contain all spectroscopic redshift measurements and will therefore
need to make use of the photometric method.