Download Future testplan - hep-proj-database Site

CERN/IT/DB ORACLE 9i for Physics Data CERN Testplan Database Group, IT DB Workshop 11-13 July 2001 CERN/IT/DB Main Areas of Interest  VLDB & Oracle database architecture  C++ Binding (OCCI)  Data consistency and concurrent access  Database performance CERN/IT/DB VLDB & Oracle Database Architecture  Scalability  Maximum database size  Maximum size and number of DB entities  Runtime limits  Storage Overhead  Mass Storage System integration  Data Import/Export CERN/IT/DB Maximum Database Size ORACLE stores data in files or disk partitions  Similar characteristics - in most places in this presentation datafile can be replaced by disk partition Maximum number of datafiles per database  Oracle 8i – 216 files  Oracle 9 – maybe more  Maximum datafile size  4 million blocks, up to 64GB Oracle 8i database: ~100TB Oracle 9i …? CERN/IT/DB Limitations of DB Entities  Size limitations of most likely used database types:  Objects  LOBS  VArrays  How big can they be and how many can we have of:  Tables, columns, rows  Table partitions  Indexes  How many classes in the schema? CERN/IT/DB Runtime Limits Server side  Instances per database  Locks  SGA size  Sessions  Open files Client side  Cache size  Max. # opened objects  Single object larger than cache CERN/IT/DB Storage overhead What storage overhead is introduced by the various levels in the Oracle storage hierarchy?  Motivation: O(PB) -> even factor 2 is a lot  Can we reliably estimate the storage consumption?  Which overheads are most relevant for our particular data models? CERN/IT/DB Storage overhead (cont’d) Physical & logical storage hierarchy  database, tablespace, extent, segment, block  table, partition, column, row Relational data types  NUMBER, VARCHAR, LOB, index, nested table, REF OO data types  OID, VArray CERN/IT/DB MSS/HSM Interface Too much data (many PBs) to keep it all online at the same time  Need an integration with a MSS Only files that are not in use (closed) can be moved to MSS and removed from disk  ORACLE instance does not close files when the client closes connection Changing tablespace status to offline will close all files that belong to it Moving tablespaces offline/online needs to be done manually CERN/IT/DB Data Export/Import  Data distribution to other computing sites  needs to be done efficiently  in self-consistent portions  Transportable tablespaces seem to be the best solution  Experience in the DB group already exists  Need more testing with object tables • How the restrictions for “self containment” relate to object references?  Compare with the standard ORACLE data export tools  See if transportable tablespaces can be used with MSS  Is is possible to import back the same tablespace?  Is it worth working around the “self containment” restrictions in order to have the schema in MSS? CERN/IT/DB C++ Binding (OCCI) Persistent objects Client-side object cache Large collections Complex Object Retrieval Object Type Translator (OTT) CERN/IT/DB Object Persistency  C++ object persistency is implemented using ORACLE SQL object types (user defined types)  C++ object attributes are stored as ORACLE native types  Which Oracle data types are of central importance for physics data models?  NUMBER  VARRAY  Large Objects: LOBs, BFILEs  Is their current implementation and interface adequate for our use cases? CERN/IT/DB Client-side Object Cache Cache size management  control size to adapt to access patterns Object pinning  creation, navigation  C pointers, REFs Object caching, garbage collection, refreshing… CERN/IT/DB Large Collections How to create & handle large collections of O(109)? Oracle supports:  ordered collections: VARRAYs  unordered collections: nested tables OCCI maps both to STL vectors  Hierarchical structure needed for scalability Adequate for physics data use cases?  What about index-organized tables? CERN/IT/DB Group fetching OO data models  interrelated objects forming graphs  navigate/traverse: fetch & pin each object individually  costly network round-trips (client/server) Oracle’s Complex Object Retrieval  pre-fetching (depth level)  does it fit to physics data models? CERN/IT/DB Object Type Translator OTT translates an object data model from ORACLE SQL object types into C++ classes  Each type is translated 1-1  The inheritance tree is preserved Areas of focus:     How to convert existing C++ designs to SQL? Which C++ types are allowed? Which SQL types are usefull? Can OTT handle complicated data models? CERN/IT/DB Using OTT Does OTT support many developers working simultaneously on a shared data store? Does OTT support continuous development? Creating OTT conventions in HEP  Type naming conventions • Each persistent class requires several type names – SQL, OCCI generated, user  File naming conventions  Makefile rules (a la HepODBMS makefiles) CERN/IT/DB General Database Features Concurrent access Data consistency Performance CERN/IT/DB Concurrency Shared access to a large collections of complex objects  by many (hundreds) simultaneous readers • Need to support hundreds of readers… • … with an aggregate rate “close” to disk rate for user data  by many writers ( e.g. a reconstruction farm) • Some concurrency problems in the past – with shared and updated meta data (e.g. registry of collections) – excessive reload of unchanged data after transaction commits  by both at the same time? CERN/IT/DB Transaction Model Understand how Oracle’s transaction model affects our (existing) applications Objy:  Default: read and write locks  Optionally: MROW read locks Oracle: no read locks  does promise better concurrency • Could we allow our (high priority) writers to work concurrently with (low priority) readers?  navigational applications with the intend to update may have to do a “late refresh” CERN/IT/DB Data Consistency In the presence of faulty application software  How do to recover from a SEGV in our reconstruction software?  Can we insure that faulty client programs can not corrupt shared data or database metadata? but also…  What is the impact of redo and undo logging on the data loading rate? • Could/should we switch off logging? CERN/IT/DB Hardware and Media Faults  How to recover from a temporary or permanent Hardware Fault – – – – – a single disk sector an offline file (e.g. Tape) a disk a complete node a complete site  What manual cleanup is required in our application?  How fast can we get online again? CERN/IT/DB Availability How close can we get to 24x7 access to our store?  Which DBA activities require to bring an instance down? • Backup?  Any activities would require all instances, tablespaces, files, … to be accessible? CERN/IT/DB Performance – Data Transfer • What is the achievable “sequential” user data rate?  How much CPU is required per MB/s transfer rate? • On DB client and server?  How does the storage overhead of “Number” affect us in I/O bandwidth and CPU? • How do different object sizes affect the transfer rate? • Blocking may influence storage overheads • Number of objects affects CPU requirements • What rate for sparse (“random”) access? CERN/IT/DB Performance – Cache Efficiency • Repeat above measurements… • … with hot and cold server side buffer cache • Navigate through a cyclic object list of increasing length using PL/SQL • Repeat with hot and cold client side object cache • Do the same using OCCI • How much cache memory do we need per MB of user data? CERN/IT/DB Other Test Topics Data clustering Object naming Database links Server-side queries Stored procedures Data replication CERN/IT/DB Final Goals Create a VLDB: 10-100TB  With MSS interface? Implement a real life data model in OCCI  Working on the CMS data model Prove that OCCI can be used in HEP applications  Cache management, performance, reliability

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Future testplan - hep-proj-database Site