Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microsoft SQL Server wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Relational model wikipedia , lookup
Functional Database Model wikipedia , lookup
Clusterpoint wikipedia , lookup
Oracle Database wikipedia , lookup
CERN/IT/DB ORACLE 9i for Physics Data CERN Testplan Database Group, IT DB Workshop 11-13 July 2001 CERN/IT/DB Main Areas of Interest VLDB & Oracle database architecture C++ Binding (OCCI) Data consistency and concurrent access Database performance CERN/IT/DB VLDB & Oracle Database Architecture Scalability Maximum database size Maximum size and number of DB entities Runtime limits Storage Overhead Mass Storage System integration Data Import/Export CERN/IT/DB Maximum Database Size ORACLE stores data in files or disk partitions Similar characteristics - in most places in this presentation datafile can be replaced by disk partition Maximum number of datafiles per database Oracle 8i – 216 files Oracle 9 – maybe more Maximum datafile size 4 million blocks, up to 64GB Oracle 8i database: ~100TB Oracle 9i …? CERN/IT/DB Limitations of DB Entities Size limitations of most likely used database types: Objects LOBS VArrays How big can they be and how many can we have of: Tables, columns, rows Table partitions Indexes How many classes in the schema? CERN/IT/DB Runtime Limits Server side Instances per database Locks SGA size Sessions Open files Client side Cache size Max. # opened objects Single object larger than cache CERN/IT/DB Storage overhead What storage overhead is introduced by the various levels in the Oracle storage hierarchy? Motivation: O(PB) -> even factor 2 is a lot Can we reliably estimate the storage consumption? Which overheads are most relevant for our particular data models? CERN/IT/DB Storage overhead (cont’d) Physical & logical storage hierarchy database, tablespace, extent, segment, block table, partition, column, row Relational data types NUMBER, VARCHAR, LOB, index, nested table, REF OO data types OID, VArray CERN/IT/DB MSS/HSM Interface Too much data (many PBs) to keep it all online at the same time Need an integration with a MSS Only files that are not in use (closed) can be moved to MSS and removed from disk ORACLE instance does not close files when the client closes connection Changing tablespace status to offline will close all files that belong to it Moving tablespaces offline/online needs to be done manually CERN/IT/DB Data Export/Import Data distribution to other computing sites needs to be done efficiently in self-consistent portions Transportable tablespaces seem to be the best solution Experience in the DB group already exists Need more testing with object tables • How the restrictions for “self containment” relate to object references? Compare with the standard ORACLE data export tools See if transportable tablespaces can be used with MSS Is is possible to import back the same tablespace? Is it worth working around the “self containment” restrictions in order to have the schema in MSS? CERN/IT/DB C++ Binding (OCCI) Persistent objects Client-side object cache Large collections Complex Object Retrieval Object Type Translator (OTT) CERN/IT/DB Object Persistency C++ object persistency is implemented using ORACLE SQL object types (user defined types) C++ object attributes are stored as ORACLE native types Which Oracle data types are of central importance for physics data models? NUMBER VARRAY Large Objects: LOBs, BFILEs Is their current implementation and interface adequate for our use cases? CERN/IT/DB Client-side Object Cache Cache size management control size to adapt to access patterns Object pinning creation, navigation C pointers, REFs Object caching, garbage collection, refreshing… CERN/IT/DB Large Collections How to create & handle large collections of O(109)? Oracle supports: ordered collections: VARRAYs unordered collections: nested tables OCCI maps both to STL vectors Hierarchical structure needed for scalability Adequate for physics data use cases? What about index-organized tables? CERN/IT/DB Group fetching OO data models interrelated objects forming graphs navigate/traverse: fetch & pin each object individually costly network round-trips (client/server) Oracle’s Complex Object Retrieval pre-fetching (depth level) does it fit to physics data models? CERN/IT/DB Object Type Translator OTT translates an object data model from ORACLE SQL object types into C++ classes Each type is translated 1-1 The inheritance tree is preserved Areas of focus: How to convert existing C++ designs to SQL? Which C++ types are allowed? Which SQL types are usefull? Can OTT handle complicated data models? CERN/IT/DB Using OTT Does OTT support many developers working simultaneously on a shared data store? Does OTT support continuous development? Creating OTT conventions in HEP Type naming conventions • Each persistent class requires several type names – SQL, OCCI generated, user File naming conventions Makefile rules (a la HepODBMS makefiles) CERN/IT/DB General Database Features Concurrent access Data consistency Performance CERN/IT/DB Concurrency Shared access to a large collections of complex objects by many (hundreds) simultaneous readers • Need to support hundreds of readers… • … with an aggregate rate “close” to disk rate for user data by many writers ( e.g. a reconstruction farm) • Some concurrency problems in the past – with shared and updated meta data (e.g. registry of collections) – excessive reload of unchanged data after transaction commits by both at the same time? CERN/IT/DB Transaction Model Understand how Oracle’s transaction model affects our (existing) applications Objy: Default: read and write locks Optionally: MROW read locks Oracle: no read locks does promise better concurrency • Could we allow our (high priority) writers to work concurrently with (low priority) readers? navigational applications with the intend to update may have to do a “late refresh” CERN/IT/DB Data Consistency In the presence of faulty application software How do to recover from a SEGV in our reconstruction software? Can we insure that faulty client programs can not corrupt shared data or database metadata? but also… What is the impact of redo and undo logging on the data loading rate? • Could/should we switch off logging? CERN/IT/DB Hardware and Media Faults How to recover from a temporary or permanent Hardware Fault – – – – – a single disk sector an offline file (e.g. Tape) a disk a complete node a complete site What manual cleanup is required in our application? How fast can we get online again? CERN/IT/DB Availability How close can we get to 24x7 access to our store? Which DBA activities require to bring an instance down? • Backup? Any activities would require all instances, tablespaces, files, … to be accessible? CERN/IT/DB Performance – Data Transfer • What is the achievable “sequential” user data rate? How much CPU is required per MB/s transfer rate? • On DB client and server? How does the storage overhead of “Number” affect us in I/O bandwidth and CPU? • How do different object sizes affect the transfer rate? • Blocking may influence storage overheads • Number of objects affects CPU requirements • What rate for sparse (“random”) access? CERN/IT/DB Performance – Cache Efficiency • Repeat above measurements… • … with hot and cold server side buffer cache • Navigate through a cyclic object list of increasing length using PL/SQL • Repeat with hot and cold client side object cache • Do the same using OCCI • How much cache memory do we need per MB of user data? CERN/IT/DB Other Test Topics Data clustering Object naming Database links Server-side queries Stored procedures Data replication CERN/IT/DB Final Goals Create a VLDB: 10-100TB With MSS interface? Implement a real life data model in OCCI Working on the CMS data model Prove that OCCI can be used in HEP applications Cache management, performance, reliability