Download Photometric Determination of Quasar Candidates

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gravitational lens wikipedia , lookup

Astronomical spectroscopy wikipedia , lookup

Transcript
**FULL TITLE**
ASP Conference Series, Vol. **VOLUME**, **YEAR OF PUBLICATION**
**NAMES OF EDITORS**
Photometric Determination of Quasar Candidates
Sheelu Abraham and Ninan Sajeeth Philip
Santhom Computing Facility, St. Thomas College, Kozhencheri, India
-689641
Abstract.
We describe an efficient and fast method for the detection and
classification of quasars using a machine learning tool, making use of photometric information from SDSS DR7 data release. The photometric information used
are the ten independent colours that can be derived from the 5 filters available
with SDSS and the machine learning algorithm used is a difference boosting neural network (DBNN) that uses Bayesian classification rule. An adaptive learning
algorithm was used to prepare the training sample for each region. Cross validations were done with SDSS spectroscopy and it was found that the method
could detect quasars with above 96.96% confidence regarding their true classification. The completeness at this stage was 99.01%. Contaminants were mainly
stars and the incorrectly classified quasars belonged to a few specific patches
of redshifts. Color plots indicated that the colors of some stars and quasars in
those redshits were indistinguishable from each other and was the major cause of
their incorrect classification. A confidence value (computed posterior Bayesian
belief of the network) was assigned to every object that was classified. Most of
the incorrect classifications had a low confidence value. This information may
be used to filter out contaminants and improve the classification accuracy at the
cost of reduced completeness.
Introduction
Modern surveys are capable of gathering several terabytes of data on a daily
basis. Data management and its classification thus becomes a real challenge in
astronomy. Automated classification tools such as Bayesian classifiers, Support
Vector Machines, Decision trees etc have found extensive application in large surveys. Thus the need for more robust photometric classification and photometric
redshift estimation tools are rapidly becoming apparent. This brief paper describes the use of a machine learning tool for classification and the identification
of interesting objects from large surveys considering photometric identification
of quasar candidates from Sloan Digital Sky Survey (SDSS1 ) as an example. Although it may be argued that spectroscopy is more reliable for classification, the
inherent spectroscopic limitations at fainter levels pauses significant limitations
on deep sky surveys.
SDSS is an all sky survey covering 11663 deg2 of the northern sky providing
optical images of 357 million celestial objects (Abazajian et al. 2009). We used
a Difference Boosting Neural Network (DBNN) (Philip & Joseph 2000), which
1
http://www.sdss.org/dr7/
1
2
Abraham and Philip
is a machine learning implementation of the Bayesian theorem, for this study.
DBNN is an extension of Naive Bayes learning algorithm (Elkan 1997). It is
a supervised neural network and thus requires a training data to configure the
network before it can be used for classification of unseen data.
Object Classification
The classification was restricted to the colour space dominated by quasars (Richards
et al. 2004) in SDSS Data Release 7 that gave 90033 unresolved objects for which
spectroscopic identification was available. The dataset consisted of point sources
of all spectral types from the region. A representative sample of this data was
taken as the training sample.
The classification was done using the ten colours, derived from five filters,
namely u-g, u-r, u-i, u-z, g-r, g-i, g-z, r-i, r-z, i-z and the u magnitude, all
obtained from the SDSS DR7 photometric catalog after applying necessary corrections. The classification label taken was the object’s spectral type given by
SDSS.
In the training process, the Bayesian estimate for each of the training example is computed and stored by the network. This information is later used
for classification when a new data is presented to the network. When a prediction is made, in addition to a predicted spectral type, DBNN returns the
computed Bayesian posterior estimate for its prediction. This value can be used
as a measure of the confidence the network has for each prediction.
In our example, the training data consisted of 1698 stars, 2875 quasars,
529 galaxies and 178 late-type stars. After training, the network was tested on
the entire dataset containing 90033 objects. The network predictions for the
objects are given in Table 1 . It may be noted that the network was able to
classify quasars with a completeness of 99.01% and was able to correctly pick
up about 96.96% quasars from the total number of quasars in the region. The
contamination from other sources were about 3.04%.
Table 1. The total number of objects and the percentage of correct classification by the machine learning tool for each class are shown.
Object
Total no. of objects
Correctly Predicted
Total Predicted
% of prediction
Star
Galaxy
Quasar
Star-Late
11867
585
77552
29
9964
53
76789
2
10703
129
79198
3
83.96
6.77
99.01
6.89
Total
90033
86808
90033
96.42
The confidence plot for correctly and incorrectly classified objects are shown
in Figure 1. In the histogram, the gray region corresponds to correctly classified
(passed) objects and the black to incorrectly classified (failed) objects. It may
Photometric Determination of Quasar Candidate
3
Figure 1. Confidence levels of correctly classified and incorrectly classified
objects. Gray colour corresponds to correctly classified objects and black to
incorrectly classified. Note the relative abundance of incorrectly classified
objects at lower confidence levels.
Figure 2. Colour merges between quasars classified as stars and stars classified as quasars are shown. The ten colour features used are given on the x-axis
and their ranges are on the y-axis. Gray colour corresponds to quasars incorrectly classified as stars and white to stars incorrectly classified as quasars.
be noted that we can remove most of the incorrectly classified objects by adding
certain cut off on the confidence level. This process, however, removes a good
number of correctly classified objects that falls within that confidence level. This
is the cost of the confidence cut off.
Beyond Classification
There were 763 quasars that got incorrectly classified in the dataset. We analyzed the failed data in detail. The main contaminants were stars (∼90.7%).
The colour plots shown in Figure 2 shows the colours of those failed objects in
all the ten features. It may be noted that the features for both classes span
the same colour space in all ten features. The reason for this is obvious in
Figure 3(a) which reveals that failures are at some specific patches of redshift,
where the redshifted colours of quasars matches that of a few stars.
4
Abraham and Philip
Figure 3. The incorrectly classified objects in the data set are from specific patches of redshifts. The right side plot shows histogram after removing
objects with confidence below 60%. Note that it removes most of the contaminants.
The important point to be noted here is that Bayes rule assigns lower confidence value to objects belonging to such overlapping patches in the feature
space and that they can be easily filtered out for separate analysis as shown in
Figure 3. Of these, we found 197 quasars that were incorrectly labeled as galaxy
by SDSS.
Conclusion
In addition to photometric classification, machine learning tools can have interesting and useful applications in the study of the outliers that may be used to
discover the inherent limitations in a system and the better understanding of the
universe. It is found that using a Bayesian network, objects that exhibit peculiar feature can be detected easily by filtering out objects with lower confidence
values. Low confidence could also signify the presence of a new class of objects
in the dataset. This is a very desirable feature for the selection of candidates
for follow up observations.
Acknowledgments. This work was supported by ISRO Respond project
grant No: ISRO/RES/2/339/2007-08. Funding for the SDSS and SDSS-II has
been provided by the Alfred P. Sloan Foundation, the Participating Institutions,
the National Aeronautis and Space Administration, the National Science Foundation, the U.S. Department of Energy, the Japanese Monbukagakusho, and the
Max Planck Society. The SDSS web site is htpp://www.sdss.org/.
References
Abazajian, K. N., et al. 2009, ApJS, 182, 543
Philip, N. S., & Joseph, K. B. 2000, Journal of Intelligent Data Analysis, 4, 463
Elkan, C. 1997, Technical Report No.CS97-557
Richards, G. T., et al. 2004, ApJS, 155, 257