Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The GRID Adventures: SDSC's Storage Resource Broker and Web Services in Digital Library Applications Arcot Rajasekar, Reagan Moore, Bertram Ludäscher, Ilya Zaslavsky [email protected] San Diego Supercomputer Center University of California, San Diego Data and Knowledge Systems Staff • Reagan Moore • Chaitan Baru • Data Mining Lab (Tony Fountain) • Advanced Query Processing Lab (Amarnath Gupta) • Knowledge-Based Integration Lab (Bertram Ludäscher) • Data Grid Lab (Arcot Rajasekar) • Spatial Information Systems Lab (Ilya Zaslavsky) + 2-3 programmers in each lab, + graduate and undergraduate students Now: connecting research with production databases and data grid solutions RCDL’02, Dubna, October 15-17 2002 2 Overview • Intro – SDSC and NPACI • Part I: technologies – – – – What is Data Grid Data, Information, and Knowledge Infrastructures at SDSC/DICE SDSC Storage Resource Broker, with examples MIX (Mediation of Information Using XML), and Knowledge-Based Mediation • Part II: case studies – BIRN: the First Operational Data Grid – Web Services Demos – Persistent Archives at SDSC • Summary RCDL’02, Dubna, October 15-17 2002 3 A Distributed National Laboratory for Computational Science and Engineering RCDL’02, Dubna, October 15-17 2002 4 1st Teraflops System for US Academia • 1 TFLOPs IBM SP – 144 8-processor compute nodes – 12 2-processor service nodes – 1,176 Power3 processors at 222 MHz – Initially > 640 GB memory (4 GB/node), upgrade to > 1 TB later – 6.8 TB switch-attached disk storage • Largest SP with 8-way nodes • High-performance access to HPSS RCDL’02, Dubna, October 15-17 2002 5 Bioinformatics Infrastructure for Large-Scale Analyses • Next-generation tools for accessing, manipulating, and analyzing biological data – Biology, Stanford University – DICE, SDSC • Analysis of Protein Data Bank, GenBank and other databases • Accelerate key discoveries for health and medicine • Supporting and leveraging new data grid projects, such as BIRN in biology RCDL’02, Dubna, October 15-17 2002 6 SRB Part I: technologies What is Data Grid Data, Information, and Knowledge Infrastructures at SDSC/DICE SDSC Storage Resource Broker MIX (Mediation of Information Using XML), and Knowledge-Based Mediation What are Data Grids? • Power Grid Analogy – Multiple power generators – Complex transmission networks with switching – Simple Usage Interface – plug and play – Guaranteed Supply - Meeting of demands (peak and lull) – Complex cost function • • • • • More than one data provider Best movement of data across computer networks Seamless Access to Data with good ‘Finding Aids’ Guarantee of Data Access Access Control, Quotas & Complex Usage Costing RCDL’02, Dubna, October 15-17 2002 8 Data Grids Data Grid - linking multiple data collections Separate name spaces Separate schema Separate administration domains Heterogeneous database instances Database A Data grid Database B The data grid is itself a collection that provides mechanisms to hide latency and manage semantics RCDL’02, Dubna, October 15-17 2002 9 Federated Digital Libraries Virtual Data Grid - linking multiple data collections Ability to execute processes to recreate derived data Database A Services Virtual Data Grid Database B Services The virtual data grid integrates data grid and digital library technology to manage processes RCDL’02, Dubna, October 15-17 2002 10 Why Data Grids: Data Handling Problems • • • • • • • • • • • Large Datasets; Large Number of Datasets; Scaling Distributed, Heterogeneous Storage Virtualization & Transparency Collaboration, Access Control, Authentication, Security Replication, Coherency, Synchronization Fault Tolerance and Load Distribution Scheduling, Caching & Data Placements Data Migration over Time & Space Data/Collection Curation Uniform Name Space Handling Legacy Data and Data/Resource Evolution • User-friendly Interfaces – foster collaborations RCDL’02, Dubna, October 15-17 2002 11 Why Data Grids: Metadata Problems • • • • • • • • • • Types of Metadata – Relational to XML to unstructured Standardized to User-defined Metadata Large Number of Attributes; Large Size; Scaling Federation - integration over space Evolution - integration over time Evolution - integration over contexts Discovery and Search Presentation – user friendly Extraction and Maintenance RCDL’02, Dubna, October 15-17 2002 12 DAKS Data Management Hierarchy • Model-Based Information Management – Rule-based ontology mapping, conceptual-level mediation - CMIX • Information Mediation – Data federation across multiple libraries - MIX • Digital Library – Interoperable services for information discovery and presentation SDLIP • Data Collection – Tools for managing data set collections on databases - MCAT • Data Handling – Systems for data retrieval from remote storage - SRB • Persistent Archives – Storage of data collections for 30+ years RCDL’02, Dubna, October 15-17 2002 13 SRB as a Solution • The Storage Resource Broker is a middleware • It virtualizes resource access • It mediates access to distributed heterogeneous resources • It uses a MetaCATalog to facilitate the brokering • It integrates data and metadata Application MCAT SRB Server HRM DB2, Oracle, Illustra, ObjectStore HPSS, ADSM, UniTree UNIX, NTFS, HTTP, FTP Distributed Storage Resources (database systems, archival storage systems, file systems, ftp, http, …) RCDL’02, Dubna, October 15-17 2002 14 Solution SRB SDSC Storage Resource Broker & Meta-data Catalog Application Resource, Mthd, User User Defined C, C++, Linux I/O Unix Shell Java, NT Browsers Prolog Web Predicate SRB MCAT Dublin Core Archives HPSS, ADSM, HRM UniTree, DMF File Systems Databases Unix, NT, Mac OSX Metadata Extraction Remote Proxies DB2, Oracle, Sybase DataCutter Application Meta-data RCDL’02, Dubna, October 15-17 2002 16 SRB Space DR DR DL DL SRB SRB SRB SRB DR SRB Client Client SRB DL SRB Client Client DR Client Client DR SRB SRB MC DL RCDL’02, Dubna, October 15-17 2002 SRB DR - Data Repository DL - Dig Library MC - Meta Catalog DR 17 MySRB: Web-bases Access to the SRB • Browse in Hierarchical Collections • Registration of (remote) Legacy Files & Directories • Registration of SQL Objects • Registration of URLs • Data Movement Operations – Ingest & Re-Ingest, Delete, Unlink – Replicate, Copy, Move, S-Link • Access Control Operations – Read, Write, Own, Curate, Annotate, … – Ticket-based Access • Version Control Operations – Read Lock, Write Lock, Unlock – Check In Check Out RCDL’02, Dubna, October 15-17 2002 18 Meta data Management in MySRB • Types of Meta Data – System-level Metadata • Size, resource, owner, date, access control, … – User-defined Meta data • • • • for data & collections <name,value,unit> triples No limits in number of metadata Support for Collection-level schemas – Comments, default values, drop-down lists • Support for Standardized Schemas – (eg. Dublin Core) – Annotations • Supports textual annotations • Annotator, date, context also registered RCDL’02, Dubna, October 15-17 2002 19 SRB Projects • Digital Libraries – UCB, Umich, UCSB, Stanford,CDL – NSF NSDL - UCAR / DLESE • NASA Information Power Grid • DOE ASCI Data Visualization Corridor • Astronomy – National Virtual Observatory – 2MASS Project (2 Micron All Sky Survey) • Particle Physics – Particle Physics Data Grid (DOE) – GriPhyN – SLAC Synchrotron Data Repository • Medicine – Visible Embryo (NLM) • Earth Systems Sciences – ESIPS – LTER • Persistent Archives – NARA – LOC • Neuro Science & Molecular Science – – TeleScience, Brain Images, BIRN JCSG (SSRL/SLAC), AfCS, … RCDL’02, Dubna, October 15-17 2002 20 Large Data Project Examples • Astronomy: – National Virtual Observatory • Integrate 18 sky surveys- (ITR prop) – 2MASS Project (2 Micron All Sky Survey) • 10TB; 5million files • Co-locate Images for Spatial Access • Data Mining across entire collection • Replicate to CalTech HPSS • Particle Physics: – Particle Physics Data Grid (DOE) – GrPhyN (NSF ITR proj) • CERN LHC 1PB/yr (1billion obj) • Multi-Lab integration – SLAC Synchrotron Data RCDL’02, Dubna, October 15-17 2002 Repository 21 National Virtual Observatory Data Grid 1. Portals and Workbenches 2.Knowledge & Resource Management Concept space 4.Grid Security Caching Replication Backup Scheduling 3. Metadata View Bulk Data Catalog Analysis Analysis Standard APIs and Protocols Data View Information Metadata Data Data 5. Discovery delivery Discovery Delivery Standard Metadata format, Data model, Wire format 6. Catalog Mediator Data mediator Catalog/Image Specific Access 7. Compute Resources Derived Collections Catalogs Data Archives RCDL’02, Dubna, October 15-17 2002 22 RCDL’02, Dubna, October 15-17 2002 23 RCDL’02, Dubna, October 15-17 2002 24 Digital Sky Data Ingestion star catalog Informix SUN input tapes from telescopes SRB SUN E10K 800 GB Data Cache HPSS …. 10 TB IPAC CALTECH RCDL’02, Dubna, October 15-17 2002 SDSC 25 Digital Sky Data Ingestion • The input data was on tapes in a random (temporal…) order. • Ingestion nearly 1.5 year - almost continuous, 4 parallel streams (4 MB/sec per stream), 24*7*365 • Total 10+TB, 5 million, 2 MB images in 147,000 containers. • SRB performed a spatial sort on data insertion (Scientists view/analyze data by neighborhood). The disc cache (800 GB) for the HPSS containers was utilized. • Ingestion speed limited by input tape reads – Only two tapes per day can be read • Work flow incorporated persistent features to deal with network outages and other failures. • C API was utilized for fine grain control and to be able to manipulate and insert metadata into Informix catalog at IPAC Caltech. – http://www.ipac.caltech.edu/2mass RCDL’02, Dubna, October 15-17 2002 26 DigSky Conclusion • • • • • • • • SRB can handle large number of files Metadata access is still less than ½ sec delay Replication of large collections Single command for geographical replication On-the-fly sorting (out-of-tape sorting) Availability of data otherwise not possible Near-line access to 5 million files (10 TB) Successfully used in web-access & large scale analysis (daily) RCDL’02, Dubna, October 15-17 2002 27 Demonstration • goto mySRB • For Additional Information: http://www.npaci.edu/dice/srb [email protected] RCDL’02, Dubna, October 15-17 2002 28 MIX: Mediation of Information using XML Mediation of Information using XML (MIX) XML Query XML XML View Document(s) Wrapper Data Source (eg. home ads) RCDL’02, Dubna, October 15-17 2002 Export: • Schema & Metadata (DTD, RDF,…) • Capabilities XML View Document(s) XML View Document(s) Wrapper Native XML Database Legacy Source 30 A Typical Mediation Scenario User Interface Query Results Mediator (integrated views over heterogeneous sources) Query “fragment” Convert incoming query Wrapper and outgoing data SQL Database RCDL’02, Dubna, October 15-17 2002 Query “fragment” Wrapper Wrapper GIS HTML 31 The Home Buyer Scenario Web Client XMAS Query Results (XML) MIXm Mediator “Homes” mediator Data “Neighborhood” mediator Data Data National test scores “Schools” mediator N’hood info Community info (demographics) (name, ZIP) www.sandag.cog.ca.us RCDL’02, Dubna, October 15-17 2002 Crime info (ZIP, stats) www.sannet.gov Home info (real estate) www.realtor.com Schools info (address, size) www.asd.com School district info (scores,spending,ZIP) www.homeadvisor.msn.com 32 Home Buyer GUI RCDL’02, Dubna, October 15-17 2002 33 An XML Query (XMAS) $C:<*.condo> <address zip=$Z/> </condo> AT www.condo.com AND $S:<*.school type=elementary> <address zip=$Z/> </school> AT schools.org ... <RealEstateAgent> <name>J. Smith</name> <condos> <condo> <address ... zip=92037> <price>$170k OBO</price> <bedrooms>2</bedrooms> </condo> <condos> </RealEstateAgent> RCDL’02, Dubna, October 15-17 2002 <folder> $C $S for $S </folder> for $C <condosAndSchools> <folder> <condo> <address ... zip=92037> <price>$170k OBO</price> <bedrooms>2</bedrooms> </condo> <school> <name>La Jolla High</name> <address … zip=92037> </school> <school>…</school> 34 </folder> Home Buyer GUI (Answers) Generated XMAS Query RCDL’02, Dubna, October 15-17 2002 XML Answer Document 35 Our Research • In what query language does the user pose a query? User Query • How does the query engine of the XMAS mediator rewrite the query? • How does the mediator Mediator combine/restructure/post-process partial results? XML • What data model and query W1 W2 transformation scheme should the wrappers use for different source S1 S2 types? W3 S3 For details: http://www.npaci.edu/DICE/MIX RCDL’02, Dubna, October 15-17 2002 36 New MIX Challenges from Scientific Applications • Complex Data – SDSC’s Scientific Data Applications (current/planned, e.g. Neurosciences: NCMIR, NIH BIRN, Earth sciences: GEON, GeoGrid, ...) show that syntactic/structural integration is insufficient for ... Complex Multiple-World Mediation Problems: – complex, disjoint, seemingly unrelated data – “hidden semantics” in complex, indirect relationships => Semantic (aka Model/Knowledge-Based) Mediation – lift mediation to the level of conceptual models (CMs) – use domain experts’ knowledge formalized as rules over CMs => Specialized Extensions • temporal, geospatial, statistical, DQ/accuracy... operations => Extend Mediation Scope and Power via Deductive Rules RCDL’02, Dubna, October 15-17 2002 37 INFORMATION MEDIATION WITH DOMAIN MAPS An Unresolved Challenge How do nerve cells change as we learn and remember? A multi-resolution study of the rat hippocampus at Boston University RCDL’02, Dubna, October 15-17 2002 39 Dendritic spine morphology and its variations density = #spines/length Reconstructions from the Synapse Lab, Boston University RCDL’02, Dubna, October 15-17 2002 40 Hypothesis • Distribution of spines changes with learning • Each spine type performs a different task in information transmission Observations Next Questions • Spine density, size, • Does anyone else have shape and PSD vary with maturity • Spine neck geometry controls peak Calcium amount • Calcium flow parameters depend on the different subclasses of spines corroborative evidence for these observations? • Are these observations true in other comparable parts of the brain? • Is this consistent with the distribution of Calcium-binding proteins? RCDL’02, Dubna, October 15-17 2002 41 Example for Formalizing Domain Knowledge: Domain Map for SYNAPSE and NCMIR A domain map comprises • Description Logic facts ... - concepts ("classes") - roles ("associations") • derived properties ... • ... expressed as logic rules - (e.g. F-logic) Purkinje cells and Pyramidal cells have dendrites that have higher-order branches that contain spines. Dendritic spines are ion (calcium) regulating components. Spines have ion binding proteins. Neurotransmission involves ionic activity (release). Ion-binding proteins control ion activity (propagation) in a cell. Ion-regulating components of cells affect ionic activity (release). domain expert knowledge domain map RCDL’02, Dubna, October 15-17 2002 equivalent Description Logic facts 42 Extended Mediator Architecture for Semantic Mediation USER/Client CM (Integrated View) Domain Map DM Mediator Engine Integrated View Definition IVD XSB Engine FL rule proc. LP rule proc. Graph proc. GCM GCM GCM CM S1 CM S2 CM S3 CM Plug-In CM Queries & Results (exchanged in XML) Logic API (capabilities) CM-Wrapper CM-Wrapper CM-Wrapper XML-Wrapper XML-Wrapper XML-Wrapper S1 S2 S3 Comparison & Summary: Semantic Mediation (Complex) Single World / Simple Multiple World Complex Multiple World Integration target global schema (common / shared) 1..n shared domain maps Example scenario suppliers’ catalogs / home buyer complex scientific data (neuroscience, geoscience,…) large / small large / none none … small none direct, instance / schema level relational, semistructured, queries & transformations (e.g., SQL, XQuery, XSLT) indirect, conceptual (knowledge) level domain maps, formalized domain knowledge (“semantic bridges”) => model-based (“semantic”) mediation conceptual (description logics), object-oriented, deductive features (e.g., GCM, F-logic) DB expert domain expert + KRDB expert Schema level overlap Instance level overlap Source correlation Techniques Integration languages Expressiveness Integrators RCDL’02, Dubna, October 15-17 2002 schema transformations, schema integration “structural” integration 44 Part II: case studies BIRN Web Services Persistent Archives NIH is Funding a Brain Imaging Federated Repository Biomedical Informatics Research Network (BIRN) NIH Plans to Expand to Other Organs and Many Laboratories Part of the UCSD CRBS National Partnership for Advanced Computational Infrastructure Center for Research on Biological Structure RCDL’02, Dubna, October 15-17 2002 46 Infrastructure for Sharing Neuroscience Data SOURCES: • • • • • • • • NCMIR, U.C. San Diego Caltech Neuroimaging Center for Imaging Science, John Hopkins Center for Computational Biology, Montana State Laboratory of Neuro Imaging (LONI), UCLA Computatuonal Neurobiology Laboratory, Salk Inst. Van Essen Laboratory, Washington University … Data Management Infrastructure (DAKS/NPACI) • • • • • MIX Mediation in XML MCAT information discovery SRB data handling HPSS storage ... Surface atlas, Van Essen Lab Knowledge-based GRID infrastructure ? ? ? ? Data Management Infrastructure (“Data Grid”) GTOMO, Telemicroscopy, Globus, SRB/MCAT, HPSS stereotaxic atlas LONI MCell, CNL, Salk NCMIR, UCSD CCB, Montana SU The Need for Semantic Integration Cross-source queries What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents? Cross-source relationships are modeled ??? Integrated View Definition ??? Wrapper Semantic (knowledgebased) mediation services ??? Integrated View ??? ???Mediator ??? Wrapper Data, relationships, constraints are modeled (CMs) Wrapper Wrapper Web protein localization morphometry neurotransmission CaBP, Expasy Hidden Semantics: Protein Localization Purkinje Cell layer of Cerebellar Cortex <protein_localization> <neuron type=“purkinje cell” /> <protein channel=“red”> <name>RyR</> …. </protein> <region h_grid_pos=“1” v_grid_pos=“A”> <density> <structure fraction=“0.8”> <name>spine</> <amount name=“RyR”>0</> Molecular layer of </> Cerebellar Cortex <structure fraction=“0.2”> <name>branchlet</> Fragment of dendrite <amount name=“RyR”>30</> </> Mediation Services: Source Registration (System Issues) Source Data Type Result Delivery table tree Query Capability Access Protocol ARC SQL XML DOOD QL file Tuple-at-a-time Stream SRB HTTP JDBC Set-at-a-time Binary for Viewer Selections SPJ Mediation Services: Source Registration (Semantics Issues) • Domain Map Registration – provide concept space/ontology • … as a private object (“myANATOM”) • … merge with others (give “semantic bridges”) • … and check for conflicts • Conceptual Model Registration – schema: classes, associations, attributes – domain constraints – “put data into context” (linking data to the domain map) Next Mediation Services: Integrated View Definition DERIVE protein_distribution(Protein, Organism, Brain_region, Feature_name, Anatom, Value) FROM I:protein_label_image[ proteins ->> {Protein}; organism -> Organism; anatomical_structures ->> {AS:anatomical_structure[name->Anatom]}] , % from PROLAB NAE:neuro_anatomic_entity[name->Anatom; % from ANATOM located_in->>{Brain_region}], AS..segments..features[name->Feature_name; value->Value]. • provided by the domain expert and mediation engineer • declarative language (here: Frame-logic) Mediation Services: Semantic Annotation Tools line drawing annotation (spatial) database for mediation Part II: case studies Web Services Find school districts in San Diego where computer ownership rates among residents are over 80% Web Services Demo 1 Clients: AxioMap, Polexis Java Servlet XML Mediator (Enosys) XML query (XCQL) Spatial Mediator XML WSDL WSDL Web Server SOAP Sociology Web Server Workbench SOAP Java Servlets Oracle DBMS RCDL’02, Dubna, October 15-17 2002 San Diego Digital Divide Survey Java Servlets Boundaries of municipalities and school districts Oracle DBMS 56 Web Services Demo 2 Web spatial source, EPA data ArcObjects spatial service Spatial Mediator Java Servlet XML WSDL Web Server SOAP ESRI ArcObjects Coordinate Conversion Service RCDL’02, Dubna, October 15-17 2002 XML Wrapper XML Wrapper EPA Envirofacts Website Local Pollution Data 57 Web Services Demo 3 GIS source, WSDL: for spatial analysis, survey data analysis, DBMS query UCR/FBI data Process flow across Web services Counties crossed by an interstate Counties with decrease in homicide rates over … %, 1993-99 Counties with decrease in victims of firearms over … %, 1993-99 RCDL’02, Dubna, October 15-17 2002 UCR summaries , Oracle WSDL WSDL Victim data, SWB Spatial Query, ArcIMS/ ArcObjects 58 Part II: case studies Persistent Archives Persistent Archives • • • • NARA project Store & Recover Data after 400 years 5 million emails 33 million web pages • 90 million personnel records RCDL’02, Dubna, October 15-17 2002 60 Persistent Archives • Challenges: each of the software and hardware systems may become obsolete – the storage media may degrade – the storage system may become obsolete – the database backups may become obsolete, with no way to recover the collection (structure) – the digital object formats may become obsolete, with no helper application that can read them • Persistent archive is a migration mechanism – support for automatic migration to new technology; automatic ingestion, management, access, catalog discovery • Infrastructure independence – Non-proprietary formatting -- Collection management -- Data set access – Authentication -- Presentation • Persistent archive is an interoperability system – XML as a (meta-) information markup language RCDL’02, Dubna, October 15-17 2002 61 Persistent Archive Persistent archive Describe archived data as collections Describe processes used to create collections Manage evolution of technology Database A (today) Virtual Data Grid Database A (tomorrow) The persistent archive is itself a virtual data grid that provides mechanisms to manage migration to new technology RCDL’02, Dubna, October 15-17 2002 62 Information Hierarchy (Simplest Definitions) • Data – digital object, i.e., the object representation as a bit stream • Information – any tagged data, where tags are treated as information attributes – attributes may be tagged data within the digital object, or tagged data that is associated with the digital object • Knowledge – higher-order concepts and relationships between attributes – relationships can be procedural, temporal, structural, spatial, functional, ... and described in a Logic formalism (semantic networks, description logics, conceptual graphs, ...) which is often rule-based (e.g. Datalog, Frame-Logic) RCDL’02, Dubna, October 15-17 2002 63 What Types of Interoperability are Needed? • Data management (digital objects) – ability to work with multiple types of storage systems, across separate administration domains • Information management (attributes) – ability to define a collection independent of database choice – ability to migrate collection onto new databases • Knowledge management (relationships) – ability to manage relationships and high-level domain concepts – ability to map concepts to collection attributes RCDL’02, Dubna, October 15-17 2002 64 From XML-Based to Knowledge-Based Archives • Collection-based archival with XML: save data "as is" plus... – ... separate content from presentation – ... tag your data (take a lift in the info hierarchy) – ... use a self-describing, semistructured data format (XML) • Knowledge-based archival: now add ... – ... conceptual level information – ... integrity constraints – ... explanations/derivation rules: • archiving only results y=f(x) vs. archiving the rules/function "f" (e.g. f = “the Florida procedure”...) => employ knowledge representation languages RCDL’02, Dubna, October 15-17 2002 65 Knowledge-Based Persistent Archive Knowledge Repository for Rules Access Services Rules - KQL Knowledge Relationships Between Concepts Management XTM DTD Ingest Services Knowledge or Topic-Based Query / Browse Attributes Semantics Information Repository SDLIP Information XML DTD (Topic Maps / Model-based Access) Attribute- based Query Fields Containers Folders RCDL’02, Dubna, October 15-17 2002 Storage (Replicas, Persistent IDs) Grids Data MCAT/HDF (Data Handling System - SRB / FTP / HTTP) Feature-based Query 66 Knowledge-Based Archival: Senate Example Data provider says: “Please archive all records of legislative activities of the 106th senate!” Integrity constraints, eg: (1) {senators_with_file} = UNION (sponsor, cosponsors, submitted_by) (2) {senators} = {sponsors} = {co-sponsors} Violation: – the rhs is a SUPERSET of the lhs ! Exceptions: – (Chafee, John), (Gramm, Phil), (Miller, Zell) (Possible) Explanations: – senators who joined (Zell), passed away (Chafee), were forgotten (Gramm)!? Checking ICs: IF sponsor(X), not senator(X) THEN ADD(exception_log, missing_senator_info(X)) IF condition THEN action Action = LOG, WARN, ABORT, ... RCDL’02, Dubna, October 15-17 2002 67 NARA Herbicides Collection: Introduction RCDL’02, Dubna, October 15-17 2002 68 The Herbicides Collection - input From EBCDIC tapes: 6507213207565 6507243207565 6507253207565 6507263207565 6507273207565 6507283207565 6507293207565 6508022022365 AS890255 6508022022365 AS940140 6508042022365 AS925205 6508042022365 AS970065 6508062022365 BS290320 6508062022365 BS275298 6508073207565 YT080110 6508073207565 YT110060 6508113207565 6508123207565 6508151022465 YD350155 6508151022465 YD450150 260404040 260606060 260606060 260606060 260606060 260505050 260404040 060202020 040000{0000D0000000{048{ 060000{0000D0000000{072{ 060000{0000D0000000{072{ 060000{0000D0000000{072{ 060000{0000D0000000{072{ 050000{0000D0000000{060{ 040000{0000D0000000{048{ 010000{0000C0000000{012{ 000{000{ {0000000{0000000{0000000{0000000{ {0000000{0000000{0000000{0000000{ {0000000{0000000{0000000{0000000{ {0000000{0000000{0000000{0000000{ {0000000{0000000{0000000{0000000{ {0000000{0000000{0000000{0000000{ {0000000{0000000{0000000{0000000{ {0000000{0000000{0000000{0000000{1A 1B 000{000{ 060202020 006000{0000C0000000{007B {0000000{0000000{0000000{0000000{1A 000{000{ 1B 000{000{ 060202020 004000{0000C0000000{004H {0000000{0000000{0000000{0000000{1A 000{000{ 1B 000{000{ 260202020 020000{0000D0000000{024{ {0000000{0000000{0000000{0000000{1A 000{000{ 1B 000{000{ 260202020 020000{0000D0000000{024{ {0000000{0000000{0000000{0000000{ 260202020 020000{0000D0000000{024{ {0000000{0000000{0000000{0000000{ 020202020 008000{0000C0000000{009F {0000000{0000000{0000000{0000000{1A 000{000{ 1B RCDL’02, Dubna, October 15-17 2002 69 The Herbicides Collection - preservation Converted to XML: <YEAR><yearnum>66</yearnum> <MONTH><monthnum>01</monthnum> <DATE><datenum>01</datenum> <MISSION><num>206866</num> <RUN><code>A</code> <ctz>3</ctz><multi></multi><prov>27</prov> <aircrafts> <scheduled>02</scheduled><airborne>02</airborne><productive>02</productive> </aircrafts> <agent>O</agent><gal>02000</gal><hits>0</hits> <aborts> <maintenance>0</maintenance><weather>0</weather><battle_damage>0</battle_damage><other>0</other> </aborts> <type>D</type><area>024</area><rsult></rsult> <UTM> <utmid>1A</utmid> <utm_coor>YS240780</utm_coor> </UTM> <UTM> <utmid>1B</utmid> <utm_coor>YS290630</utm_coor> </UTM></RUN> <RUN><code>B</code> <ctz>3</ctz><multi></multi><prov>27</prov> <aircrafts> <scheduled>02</scheduled><airborne>02</airborne><productive>02</productive> </aircrafts> <agent>O</agent><gal>02000</gal><hits>0A</hits> <aborts> <maintenance>0</maintenance><weather>0</weather><battle_damage>0</battle_damage><other>0</other> </aborts> <type>D</type><area>024</area><rsult></rsult> MAPPING RCDL’02, Dubna, October 15-17 2002 70 From Geography Markup to Rendering <?xml version="1.0" encoding="iso-8859-1"?> <rs> <r><name>Horton Plaza</name><URL></URL><labelpos>41.46,77.51</labelpos><c>5076,1540 4986,1540 4895,1539 4803,1539 4715,1539 4622,1539 4534,1538 4534,1641 <?xml version="1.0"?> 4534,1745 4534,1856 4622,1856 4711,1856 4800,1856 4893,1855 4984,1855 <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20000303 Stylable//EN" 5075,1854 5075,1749 5076,1646 </c></r> "http://www.w3c.org/2000/svg10-20000303-stylable" [ <r><name>Gaslamp</name><URL></URL><labelpos>44.60,83.00</labelpos><c>5 <!ENTITY base "fill:#ff0000;stroke:#000000;stroke-width:1;"> 162,1013 5084,1057 5083,1116 5081,1222 5079,1326 5079,1433 5076,1540 ]> 5076,1646 5075,1749 5075,1854 5167,1854 5257,1855 5257,1750 5259,1647 <svg width="100%" height="100%" viewBox="0 0 11590 7547" style="shape5260,1541 5262,1434 5262,1328 5263,1222 5263,1013 </c></r> rendering:geometricPrecision; text-rendering:optimizeLegibility"> ... <g id="karta" transform="scale(1, -1) translate(0, -7547)"> <g id="base" style="&base;"> <path id="a1" title="Horton Plaza" style="fill:#00ff00;" d="M5076,1540L 4986,1540 4895,1539 4803,1539 4715,1539 4622,1539 4534,1538 4534,1641 4534,1745 4534,1856 4622,1856 4711,1856 4800,1856 4893,1855 4984,1855 5075,1854 5075,1749 5076,1646 5076,1540z"/> <path id="a2" title="Gaslamp" style="fill:#ffff00;" d="M5162,1013L 5084,1057 5083,1116 5081,1222 5079,1326 5079,1433 5076,1540 5076,1646 5075,1749 5075,1854 5167,1854 5257,1855 5257,1750 5259,1647 5260,1541 5262,1434 5262,1328 5263,1222 5263,1013 5162,1013z"/> </g></g></svg> SVG XML encoding of geographic features (such as GML) VML or SVG or… RCDL’02, Dubna, October 15-17 2002 71 XML Map Viewer for the Herbicides Collection RCDL’02, Dubna, October 15-17 2002 72 Conclusion • Necessity & Requirements of a Virtual Data Grid • SRB – a proven solution – It is an existing middle-ware – Field-tested in multiple projects – Proven Scalability: users, data & resources • New element of data grid: knowledge management • Working solutions – BIRN: the first real data grid complete with knowledge management and cross-ontology bridges – Web services, to expose grid functionality in a uniform way – Archiving data, information and knowledge as a grid activity • www.npaci.edu/DICE/ RCDL’02, Dubna, October 15-17 2002 73