* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download World`s Largest OLTP Systems
Extensible Storage Engine wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Tandem Computers wikipedia , lookup
Concurrency control wikipedia , lookup
Functional Database Model wikipedia , lookup
Relational model wikipedia , lookup
Oracle Database wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Clusterpoint wikipedia , lookup
World’s Largest Databases Howard Fosdick (630)-279-4286 (C) 2004 FCI Who Am I? Hands-on DBA (and SA) for … • Oracle, DB2, SQL Server • Unix, Linux, Windows • Founder IDUG, MWDUG, CAMP • Author, Speaker Independent Contractor (630)-279-4286 [email protected] Outline 1. 2. 3. 4. What’s a “Big Database” DSS OLTP Observations Statistics Sources 1. Winter Corp. ----- Database Top Ten Yearly survey Vendor neutral Free at: www.wintercorp.com 2. Survey.com -- High-End BI/DW Competitive Analysis -- Survey of 150 companies w/ big warehouses -- Free at: www.survey.com “Thank You” to both sources Classifying Large Databases DSS Decision Support Systems Online Analytical Processing Data Warehouses Multi-dimensional Databases OLTP (DSS) (OLAP) (DW) (MDD) + Query oriented, mainly Read-only Online Transaction Processing (OLTP) + Update with short transactions (transaction = small CPU & data resources) Commercial IT vs. Scientific/Research databases What’s a Large Database ? Database Size - User data - User data plus metadata & indexes - DASD farm VLDB = Very Large Database Users - Concurrent users - Total user population Load - Concurrent queries - Queries / day or hour (simple vs complex queries) Good definitions and measurements are key to success II. World’s Biggest DSS Systems Data Warehouses VS. Data Marts DW • Application neutral • Service multiple organizational needs DM • Application specific • Organizationally focused Largest systems are usually data warehouses What’s Driving the Growth of Large Data Warehouses ? !!!!! Super Big Groceries !!!!! Web Sites -- Clickstream data Retail -- - Transaction Level Detail (TLD) Preferred Customer Card #283736 Hello, I’m Scot94 03/04/04 02:38 3284 03 2918 33 Store 493 Loc 229 PRETTY-LADY HAIRCLR AARP MAGAZINE DIAPERS BEER SIX-PACK Understanding customer behavior means $$$ ! Tax 2.40 Cash Change 1 5.99 1 4.95 2 10.00 1 3.45 BAL 36.79 40.00 3.21 Save this Receipt – Get $2.00 off on Prozac When You Buy Super-Baby Food ! What’s Driving the Growth of Large Data Warehouses ? Necessary Preconditions -• Cheap Hardware • Higher reliability / availability (based on dynamic hardware swapping) • Better Software • Lax privacy laws in USA • EU curtails cross-usage of data • EU has stronger privacy laws World’s Largest DSS Systems • • • • • • • • • © 2003 Way bigger than just 3 years ago All Unix “mainframes” All use SANs (Storage Area Networks) (aka ESS) No IBM Mainframes No Windows or Wintel No SQL Server No Linux or Open Source databases NCR/Teradata niche market at 2.7% (Gartner 05/28/03) Goodbye Informix! Winter Corp. Database Size = disk storage for user tables, indices, aggregates Large DSS Systems Unix “mainframe” Query Users Storage Area Network Sun E12/15K HP Superdome EMC IBM Regatta Hitachi HP LSI Unix “mainframes” – + Dynamically add/drop CPUs, RAM (Sun calls it partitioning) + High reliability (as good as clusters or Mainframes) + Capacity on Demand SANs – + Flash (“snap”) backup (OS-level backup) + Large Cache + Intelligent data placement/movement Example Evolution – Scaling a Unix “Mainframe” 35 concurrent users 25 concurrent users 12 concurrent users 8 CPUs @ 16 Gig RAM 32 CPUs @ 64 Gig RAM 64 CPUs @ 64 Gig RAM Other upgrades: Oracle 8i -> 9i Sun E10K -> E12K World’s Largest DSS Systems -- Windows © 2003 Winter Corp. • • • • • • Way smaller than Unix systems Way bigger than just 3 years ago Oracle vs SQL Server (like market share battle for Windows DBMSs) Also use SANs (Storage Area Networks) No IBM DB2 UDB No Teradata World’s Largest DSS Systems -- By Peak Workload © 2003 Winter Corp. © 2003 Winter Corp. Where did IBM Mainframes Go ? 1994 2004 Big Iron Big Silicon Poof! -- Goodbye… -- Largest databases -- Smaller mainframes (VM, VSE) -- Reliability advantage eroded -- High cost per CPU + Hello Linux ! + Good for -+ Consolidation platform + Legacy systems + Virtualization (multi-OS platform) Oracle Rising • Joined the Top Ten list 3 to 5 years ago • 8i added essential DSS technologies ... + + + + + + + + + Partitions New ROW ID (for bigger databases) Thorough Parallelism (DML, DDL, utilities) Index improvements (bit mapped IXs, function-based, desc, others) Resource Manager (proactive) Materialized Views Large memory mgmt Optimizer is Partition-aware Online DDL operations and Utilities Example Oracle Warehouses © 2003 Winter Corp. Amazon Best Buy Colgate Telecom Italia Mobile System HP Superdome Sun 15K HP AlphaServer Architecture SMP SMP IBM p690 Regatta SMP Storage EMC EMC IBM EMC Processors 64 24 24 2 node cluster Oracle Version 9i 8i 9i 8i DB Size 13 T 6.3 T 3.8 T 16 T Number of Tables 600 4025 27,000 1,200 Clickstream data Sales Transaction data Varied detail data Call detail records Detail Data User Population 800 16,000 6,200 400 Concurrent Users 55-60 600-700 600-700 55 2 2 n/a 3 4300 queries / day 150,000 queries / 4 hour period 14,200 steps / day 700 M records loaded / day DBAs Peak Workload Cluster Why Not Oracle Clustering ? + Great for non-disruptive scaling of existing systems . . . But the biggest systems tend not to use it -- Unix “mainframe” no longer requires clustering for reliability, availability or easy scalability -- Clustering means complexity in minimizing the… -- Locking issues 9i improved this via Cache Fusion – but SMP Unix “mainframe” will still be favored Where’s SQL Server 2000 ? • Big in OLTP but lacks essential DSS technologies ... -- Parallelism restricted to SELECTs -- Needs it for other DML, DDL, utilities -- Partitions -- Wintel restriction Yukon ? -- Many new features. . . ready for “Top Ten” DSS ? (Features = partitioning, database mirroring, mirrored backups, online Indexing & Restore, fast recovery, ANSI 1999 T-SQL, CLR support, native XML, XML Query, better .NET support, Reporting Services, Service Broker (async messaging), extensible data types…) Where’s Open Source ? Linux + 2.6 kernel now out + More CPUs (to 16) + More RAM (> 4+ Gig) + Better threading, file system support MySQL and PostgresQL -- Top out at 500,000 page views per day (EWeek 2003) (or 15 per second) + Improving rapidly Prediction – open source will support big databases but not “Top Ten” list sites Risks of Large DWs • 40% of IT projects fail due to … Management (time & budget issues) • “Large warehouses are unforgiving” -- Survey.com • Design issues critical • Database Design • Query design (and EXPLAINs) • ETL design and scheduling • Pre-program wherever possible (control users and the resources they use) • Monitoring and alerts • Scale gradually (staggered loads on a schedule…) • Benchmarks (after each Scaling Point) Risks of Large DWs • Partitioning data properly is critical • For better physical management (utilities) • Optimizers use this info • Parallelism via multiple partitions • How to partition • Depends on data usage • Examples: geographical, hash, unique id, ranges… III. World’s Biggest OLTP Systems World’s Largest OLTP Systems © 2003 Winter Corp. • • • • • Wintel “mainframes” arrive ! SQL Server arrives Use SANs CA can do the job (but has tiny overall database market share) Oracle has big systems -- but not in the top ten World’s Largest OLTP Systems -- Unix -- Windows © 2003 Winter Corp. © 2003 Winter Corp. World’s Largest OLTP Systems -- By Number of Rows © 2003 Winter Corp. © 2003 Winter Corp. OLTP Observations • Wintel “mainframes” w/ SQL Server displace MVS/CICS • SQL Server dominates Wintel OLTP • Great for pre-programmed, resource-limited txns • Oracle dominates Unix OLTP IV. Observations Architectures Shared-disk Clusters Shared-nothing (Massively Parallel Processing or MPP) Large SMP “mainframe” The “architectural debate” means far less than it used to ! Vendor Architectures Product: Architecture: Implementation: DB2 UDB for z/OS Shared-disk clustering DB2 Data Sharing on Sysplex DB2 UDB for LUW Shared nothing DB2 UDB ESE partitioning feature Oracle Shared-disk clustering or SMP Real Application Clusters (RAC) -- previously known as Oracle Parallel Server (OPS) SQL Server 2000 Teradata Shared nothing or SMP Customer-developed partitioning based on SQL Server features Shared nothing Teradata on NCR MPP DBMS Licensing Costs + Low-cost SQL Server supports the biggest OLTP systems Teradata -- Pressure on Teradata to keep its niche $$$$$ + Open Source DBMSs have a role but it’s not “Top Ten” databases Oracle DB2 UDB Biggest DSS Systems SQL Server 2000 $ Open Source (MySQL, PostgreSQL) Biggest OLTP Systems Database pricing varies by the options selected and by the deal an IT organization cuts with the vendor. TCO ? Your mileage may vary! DW Labor Costs © 2002 Survey.com Like TCO, Labor Costs may be an un-measurable … • • • • • • Figures applicable across sites ? Every vendor claims lowest labor costs “Terabytes per DBA” may be non-linear! 1 or 2 DBAs for a 24/7 site ? Development staff will be larger than Maintenance staff Your mileage will vary Multi- Machine Mixed Systems Sabre / Travelocity 45 Linux w/ MySQL servers EWeek, 2/23/04 (Fare look-up and routing) 17 Himalaya Non-stop w/ Master database (Transactional updates) Multi- Machine Mixed Systems Omaha Steaks * 50,000 to 68,000 daily sessions * 1 year in Production / 8 Million sessions 17 Linux w/ MySQL servers (Shopping cart) EWeek 2003 ISeries DB2 (Transactional updates) Conclusions • Databases are growing exponentially • IT is closing in on Scientific/Research databases • “Multiple machine” mixed systems are becoming popular (Monolithic central databases are no longer the only game in town) • “Mixed use” databases are becoming more common • Multiple applications • Read and update • Open Source supports large systems -- but not “Top Ten” • VLDBs are instructive – but unique in some ways ? ? ? questions... ? ? ? ? ? ?