Download No Slide Title

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
Astronomical Data Tagging
Web 2.0 meets Astronomy in the HLA
Niall I. Gaffney, W. Warren Miller
(STScI)
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
What is the HLA
• Hubble Legacy Archive
– Joint project STScI, ST-ECF, CADC
– Providing best archive data products from HST data
•
•
•
•
Improving WCS solutions
Combine data
Extracting image photometry and GRISM spectra
Create Simple and Powerful User Interface
– Typical HST archive user visits once a year
– Get the right data into the users own environment
• Users want to use their daily applications (e.g. web)
• Users have their own data analysis system
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
HLA UI Philosophy
• UI “Requirements” from users
– Interfaces must be simple, understandable, powerful,
rich, self-explanatory
• “Google like”
– Interface must feature the Data and not the Query
– Interface must NOT get in the way of getting data and
using them in the tools users are accustomed to
– Interface should expose information that previous
interfaces have not been able to
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
Early Data Release - Target Oriented
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
Who else does this…
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
What is Web 2.0
•
•
Web 2.0 is a change in how we use the network
Web 2.0 is NOT dynamic web pages (AJAX)
– Web 2.0 is enabled by AJAX
•
Web 2.0 are applications and APIs delivered via the web
– Netscape vs. Google
– DoubleClick vs. AdSense
– My Home Page vs. My Blog or MySpace
•
•
•
A synergy between services and information to provide a
more focused information service
User aware and user provided (context)
Tim O’Reilly article with long discussion
http://www.oreilly.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
YouTube - Data and Tags
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
Where to get Tags for our Data
• Proposal data not enough (one target in a sea)
• Astronomers are few and busy
– Its not “Browse or Perish”, “Publish or Perish”
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
What we did
• Use a “basic footprint” (aka cone search) with
Simbad to identify objects within a given field
– Not a true footprint as objects returned are all points
• Used Simbad to then get bibcodes for objects
• Used ADS to get keywords for each bibcode
• Harvested other data from HST proposal
information (abstract, proposed targets…)
• Use Apache Lucene as our search engine
• Modified the Apache Lucene search demo
• 43% of the 2769 ACS WFC “visits” in the past 2 years
 38% of “visits” are parallels (semi-random pointing)
• Average ~ 22 keywords per observation with keywords
Keywords
120
100
80
Count
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
How well did this work
60
Series1
40
20
0
1
5
9
13
17
21
25
29
33
37
41
45
49
53
57
61
Number of Keywords
65
69
73
77
81
85
89
93
97 101
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
DEMO
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
Where to go next
• Scientific input needed
– Is More Like This useful or annoying scientifically more
often than not? Can it be tweaked?
• Footprints and more Footprints
– Intersection of observation footprints with object
footprints improve tags (especially smaller fields)
– Real time evaluation for cutouts and surveys (seconds
not minutes)
• Standardize tags more
– Case, spelling, removal of irrelevant words (e.g.
“Galaxy Clusters General” -> “Galaxy Clusters”,
“Colour” -> “Color”, “Charged Coupled Device”
=>/dev/null)
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
AstroTube