Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI Astronomical Data Tagging Web 2.0 meets Astronomy in the HLA Niall I. Gaffney, W. Warren Miller (STScI) HUBBLE LEGACY ARCHIVE PROJECT @ STSCI What is the HLA • Hubble Legacy Archive – Joint project STScI, ST-ECF, CADC – Providing best archive data products from HST data • • • • Improving WCS solutions Combine data Extracting image photometry and GRISM spectra Create Simple and Powerful User Interface – Typical HST archive user visits once a year – Get the right data into the users own environment • Users want to use their daily applications (e.g. web) • Users have their own data analysis system HUBBLE LEGACY ARCHIVE PROJECT @ STSCI HLA UI Philosophy • UI “Requirements” from users – Interfaces must be simple, understandable, powerful, rich, self-explanatory • “Google like” – Interface must feature the Data and not the Query – Interface must NOT get in the way of getting data and using them in the tools users are accustomed to – Interface should expose information that previous interfaces have not been able to HUBBLE LEGACY ARCHIVE PROJECT @ STSCI Early Data Release - Target Oriented HUBBLE LEGACY ARCHIVE PROJECT @ STSCI Who else does this… HUBBLE LEGACY ARCHIVE PROJECT @ STSCI What is Web 2.0 • • Web 2.0 is a change in how we use the network Web 2.0 is NOT dynamic web pages (AJAX) – Web 2.0 is enabled by AJAX • Web 2.0 are applications and APIs delivered via the web – Netscape vs. Google – DoubleClick vs. AdSense – My Home Page vs. My Blog or MySpace • • • A synergy between services and information to provide a more focused information service User aware and user provided (context) Tim O’Reilly article with long discussion http://www.oreilly.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html HUBBLE LEGACY ARCHIVE PROJECT @ STSCI YouTube - Data and Tags HUBBLE LEGACY ARCHIVE PROJECT @ STSCI Where to get Tags for our Data • Proposal data not enough (one target in a sea) • Astronomers are few and busy – Its not “Browse or Perish”, “Publish or Perish” HUBBLE LEGACY ARCHIVE PROJECT @ STSCI What we did • Use a “basic footprint” (aka cone search) with Simbad to identify objects within a given field – Not a true footprint as objects returned are all points • Used Simbad to then get bibcodes for objects • Used ADS to get keywords for each bibcode • Harvested other data from HST proposal information (abstract, proposed targets…) • Use Apache Lucene as our search engine • Modified the Apache Lucene search demo • 43% of the 2769 ACS WFC “visits” in the past 2 years 38% of “visits” are parallels (semi-random pointing) • Average ~ 22 keywords per observation with keywords Keywords 120 100 80 Count HUBBLE LEGACY ARCHIVE PROJECT @ STSCI How well did this work 60 Series1 40 20 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 Number of Keywords 65 69 73 77 81 85 89 93 97 101 HUBBLE LEGACY ARCHIVE PROJECT @ STSCI DEMO HUBBLE LEGACY ARCHIVE PROJECT @ STSCI Where to go next • Scientific input needed – Is More Like This useful or annoying scientifically more often than not? Can it be tweaked? • Footprints and more Footprints – Intersection of observation footprints with object footprints improve tags (especially smaller fields) – Real time evaluation for cutouts and surveys (seconds not minutes) • Standardize tags more – Case, spelling, removal of irrelevant words (e.g. “Galaxy Clusters General” -> “Galaxy Clusters”, “Colour” -> “Color”, “Charged Coupled Device” =>/dev/null) HUBBLE LEGACY ARCHIVE PROJECT @ STSCI AstroTube