Download Astronomical e-Science in Edinburgh Introduction: astronomy and e-science Sky Survey Science Archives

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Allen Telescope Array wikipedia , lookup

CfA 1.2 m Millimeter-Wave Telescope wikipedia , lookup

Arecibo Observatory wikipedia , lookup

International Ultraviolet Explorer wikipedia , lookup

Transcript
Astronomical
e-Science in Edinburgh
Introduction: astronomy and e-science
Sky Survey Science Archives
Astronomy and e-science are perfect partners.
Astronomy needs the new techniques being
developed in e-science to help it cope with the data
avalanche it is experiencing: the volume of
astronomical data available online doubles every
eighteen months or so, with the largest sky survey
databases growing at several Terabytes per year.
Conversely, astronomy data provide an ideal
testbed for many e-science developments: in the
words of Jim Gray of Microsoft Research “It has no
commercial value [and] no privacy concerns. It is
real and well documented…high-dimensional,
spatial and temporal data (with confidence
intervals), [generated by] many different
instruments from many different places and many
different times. There is a lot of it.”
The principal e-science initiative within astronomy
is the creation of an international Virtual
Observatory (VO), which will federate the world’s
significant astronomical data sources and integrate
them with the hardware and software required to
exploit that federation scientifically.
As astronomical databases grow to multi-TB scales
they must change from being passive data
repositories to incorporate data exploration, as well
as data access, facilities. AstroGrid aims to provide
data exploration services for a set of major UK
databases, which it will federate, the largest
amongst these is the new 1.5TB SuperCOSMOS
Science Archive (SSA), based upon a legacy
photographic sky survey, scanned by the
SuperCOSMOS measuring
machine (shown right). The
SSA is also being used as
a testbed for the larger
WFCAM Science Archive,
which will hold sky survey
data from the new wide
field camera (WFCAM) on
the UK Infrared Telescope.
WFCAM (shown below),
will generate over 20TB of
data per year for seven
years from 2004. A key
requirement for these large
Astronomy and e-science in Edinburgh
The University of Edinburgh is home to the Wide
Field Astronomy Unit (WFAU[1]), which curates the
largest databases in UK astronomy; the science
archives from optical and near-infrared sky
surveys. WFAU is also one of the UK data centres
within AstroGrid[2], the UK’s VO project, itself part
of the Astrophysical Virtual Observatory (AVO[3])
consortium, which is undertaking an EU-funded
R&D study into the design of a data Grid for
European astronomy.
Collaborations exist between WFAU astronomers
and researchers in the University’s School of
Informatics interested in data management and in
data mining, since, in both cases, the scale and
complexity of sky survey databases present
interesting problems.
The astronomical e-science community in
Edinburgh is completed by NeSC, whose
workshops and training courses have greatly
benefited AstroGrid, and which collaborates with
both WFAU and AstroGrid through the edikt[4] and
OGSA-DAI[5] projects. In what follows we describe
in more detail some of the projects currently
underway within this community.
sky survey databases is effective spatial indexing,
so WFAU and NeSC are collaborating with IBM,
Microsoft and Oracle to study the SSA as a
prototype of a large, spatially-indexed database,
comparing the spatial options supported by the
three companies’ products. The SSA will also be a
scalability testbed for the data access and data
exploration services being developed by AstroGrid,
and over the next few years the SSA and WSA will
become key components of the Virtual
Observatory, with deployed on them a growing
array of web and Grid services allowing
astronomers to do research on the TB scale.
Data Mining and Machine Learning
Foremost amongst the services deployed on these
new science archives will be data mining
algorithms, to enable scientists to extract scientific
knowledge from the terabytes of data. The first
application of data mining with the SSA has been
to help clean up the dataset itself[6]. Optical sky
surveys include a number of “junk” objects - due to
artificial satellites, aeroplanes, optical effects
around bright stars, etc – which often cannot be
distinguished from stars or galaxies on the basis of
their measured attributes alone. They can,
however, be identified from their unlikely spatial
arrangements. The plots below show the detection
of a satellite trail (left) and a diffraction halo around
a bright star (right) through the application of a
machine learning algorithm looking for statistically
unlikely linear and elliptical arrangements of
objects, respectively: in each case the junk objects
number only a few hundred, from a catalogue of
several hundred thousand in the particular survey
image. This algorithm has applicability beyond the
SSA, and it is intended that it will be implemented
as a prototype Grid service by WFAU and NeSC.
Association techniques
A second area where machine learning is being
used is in the association of entries in different
databases which represent observations of the
same celestial source in different passbands. The
angular resolution inherent in the two sets of
observations can differ markedly, as shown
schematically below, where many objects from a
high resolution observation (in red) lie within the
blue ellipse, which
denotes the region of
sky within which a
source from a much
lower resolution image
is constrained to lie. In
this situation proximity
alone cannot decide
which of the red sources is the most likely
counterpart of the blue source. Machine learning
techniques are being used to discover the
attributes of the population of red sources which
correlate with being close to blue sources as part
of a PhD project, funded by a PPARC e-science
studentship to deliver a suite of association
services for use by AstroGrid.
Virtual Observatory data formats
The interoperability of data sets is the key to the
Virtual Observatory, so XML has been advocated
as a VO exchange format, with the development of
VOTable[7]. VOTable is a new XML standard for
tabular astronomical datasets, developed under the
auspices of the International Virtual Observatory
Alliance (IVOA[8]), which is the standards agency
for the VO.
Fully tagged XML is, of course, verbose, and, thus,
inefficient for storing large datasets, as are often
found in astronomy. Researchers in Edinburgh are
working on two possible solutions to this problem.
The first is BinX[9], an XML schema for binary
data, being developed by the edikt project, and
described in more detail in their flyer. BinX
performs the conversion between VOTable and
FITS[10], a compact binary data format commonly
used in astronomy: this means that astronomers
can choose when to store their data in readily
interoperable XML and when in compact binary.
The second approach, being developed by Peter
Buneman’s database research group in the School
of Informatics, is to restructure the VOTable file
into a vectorized format, which is more compact
and can be queried much more quickly. Both these
approaches will be developed into working
prototypes for AstroGrid.
AstroGrid
WFAU is one of the data centres comprising the
AstroGrid consortium and AstroGrid developers
and researchers based in Edinburgh are engaged
in the fully range of the project’s activities. Some of
the first fruits of this labour will be demonstrated on
the NeSC stand at All Hands 2003, as well as on
the PPARC stand.
References
1. Wide Field Astronomy Unit (WFAU):
www-wfau.roe.ac.uk
2. AstroGrid: www.astrogrid.org
3. Astrophysical Virtual Observatory (AVO)
www.euro-vo.org
4. edikt: www.edikt.org
5. OGSA-DAI: www.ogsadai.org.uk
6. Amos Storkey’s “junk” detection page:
www.anc.ed.ac.uk/~amos/sattrackres.html
7. VOTable:
http://cdsweb.u-strasbg.fr/doc/VOTable
8. International Virtual Observatory Alliance (IVOA)
www.ivoa.net
9.BinX: www.edikt.org/binx
10. Flexible Image Transport System (FITS):
fits.gsfc.nasa.gov