Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PODS NODS EOSDIS PO.DAAC Lessons Learned Victor Zlotnicki Jet Propulsion Laboratory California Institute of Technology PODS NODS EOSDIS PO.DAAC • 198x: NASA Pilot X Data Systems • 1985?: transferred to NASA Disciplines • 1994: EOSDIS ‘operational’ • 1997: TRMM & EOSDIS • 2003-5: EOSDIS ‘evolution’ SERVICE MODEL, PODS, 1980s Original emphasis: central Pilot Ocean Data System service, store, subset, • Beautiful software: browse, deliver browse or modelled orbit, modelled small subset sensor viewing geometry, (SEASAT data) subset along track, Directory to find data displayed & delivered (‘GOLD’ Catalog) subsets, etc. VAX 780. • Jim Brown, Carol Miller, Chuck Klose … Problems • Cost of adding each new, derived data set (solved with ‘levels of service’) • Decreasing price of computer hdw at user’s end (less necessary to subset centrally) NASA HQ OCEANS 1984 PODS under NASA OCEANOGRAPHY • Must provide computing to JPL oceanographers • JPL scientist must be Group Supervisor of PODS mgr. • Must be available to reprocess TOPEX, other data (Project would not) • Managers 1984-2006: J.C. Klose, D. Halpern, D. Collins, P. Liggett JPL OCEANS GROUP 1983 SERVICE MODEL, EOSDIS Problems Satellite Cmd & Ctrl • Handling all these Telemetry conflicting requirements Level 0 processing with the same ‘NASA project’ structure. Level 1,2,3 processing • Web use grew without Delivery to Science NASA help. Users • Delivery to gral public • Also: GCMD • TRMM (1997), • • • • • SOME LESSONS ¿LEARNED? Centralized, large data Projects • • • • Live longer than ‘Programs’ that group small tasks Are less cost-effective Extra funds can be put to good, honest use (data recovery, ‘Pathfinder’ Climate Time Series generation.) Find it hard to adapt to technological changes C&C, downlink, level 0 processing must be decoupled from the rest. Derived products (eg Pathfinder Time Series, reprocessed and streamlined GDRs) are a great way to improve data quality, shrink volume, make data available to many,. It is still hard to find data today. Need ‘google data’ Trends in Data storage Memory: • • • • 4 GB mem stick: $120 1 GB ECC mem: $500 32 bit OS: 4 GB mem max (232) 64 bit OS: 1 TB mem in PCs (264=16x109 GB mem theoret) Optical disk • 4.6 GB DVD: $0.50 Magnetic Disk: Cheap Good/ Bettr est RAID RAID $/TB $/TB $/TB 1992 1,000k 4,000k 8,000k 1996 92k 366k 732k 2000 8k 33k 67k 2004 0.8k 3k 6k 2006 0.2k 0.9k 1.8k 2010 0.02k 0.08k 0.17k 60% annual increase in density. Source: Steve Gilheany, http://www.berghell.com/whitepapers.htm Source: http://www.pricegrabber.com Trends in Network Transfer • ~1980: dial upTelemail (0.3, 1.2 kb/s) @ work. • ~1990: 10 Mb/s @work, 1.2kb/s @home. • 2006: JPL offices have 100 Mb/s std, 1 Gb/s if insist, 10 Gb/s in 2007. 0.5+ Mb/s @home • 2006: NREN 10 Gbps between Ames & GSFC • ABILENE: hi speed optical intercontinental US (10 Gb/s, -> 100 in 2006/2007). Separate from Internet. Research ctrs. • NATIONAL LAMBDA RAIL: 10 Gb/s optical ETHERNET Trends in Distributed Computing • GRID computing: split a huge calculation among many disparate, geographically separate, administratively separate computers. • Goal: solving problems too big for any single supercomputer. Computational Grids focus on computationally-intensive operations. Data Grids, ‘the controlled sharing and management of large amounts of distributed data’. Equipment Grids, e.g. a telescope, where the surrounding Grid is used to control the equipment remotely and to analyse the data collected. • Source: Wikipedia. 2006-10 Trends in Web Services • Web Service: software system designed to support interoperable machine-tomachine interaction over a network (Wikipedia, 2006) • Web Services + Grid Computing: a product is created ‘on the fly’ from data and algorithms scattered ‘out there’. Example: SciFlo (http://sciflo.jpl.nasa.gov) • Danger: would you write a scientific paper or base policy on untraceable computations? SUMMARY • NASA, NOAA, USGS do have a responsibility to manage sat data in order to maximize its use, maximize the hdw investment. • Huge centralized data projects have the advantage of survivability, the disadvantage of inertia. • Scientific ‘stewardship’, frequent reprocessing, ‘higher level’ products are cost-effective ways to improve quality, decrease volume. • Failure to understand technological trends, and build a true ‘open architecture’ system, may cause a data system to be built to solve a problem that no longer exists when the system is completed.