* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Extreme Performance Data Warehousing
Survey
Document related concepts
Transcript
<Insert Picture Here> Extreme Performance Data Warehousing Presenter‟s Name Presenter‟s Title The Rise of the Intelligent Economy “From recession comes an opportunity to reset a number of industry structures…there is an opportunity to infuse industries with technologies that position them to operate more effectively in the next 50 years.” Lessons Learned in Building the Intelligent Economy, May 2010 All Businesses Want Better Insight Industry Retail Typical Questions What stores should be closed or sold? Which customers will respond to new promotion? Telecommunications What are the issues effecting churn by region? What is the average revenue per user (ARPU)? Healthcare What are most common patient service requests? What is average level of clinical supplies on-hand? Financial Services How will new online services impact deposits? How does average loan compare to last year? Utilities Who do we target for energy efficiency program? What resources are needed to restore an outage? Public Sector What is the trend on budget and expenditures? What is most cost-effective way to manage waste? Challenge: Much More Data to Analyze Data Warehouse Size and Growth 34% More than 10 TB 17% 25% 3 - 10 TB 19% 18% 21% 1 - 3 TB 12% 500 GB - 1 TB Less than 500 GB 20% 5% 21% In 3 Years Source: TDWI Next Generation Data Warehouse Platforms Report, 2009 Today Challenge: No Single Source of Truth Expensive Data Warehouse Architecture Data Marts OLAP Data Mining ETL Data Marts ETL OLAP Data Mining Challenge: User Requirements Not Met High Churn in Data Warehouse Platforms Poor query response 45% Can't support advanced analytics 40% Inadequate data load speed 39% Can't scale to large data volumes 37% Cost of scaling up is too expensive 33% Poorly suited to real-time or on demand workloads 29% Current platform is a legacy we must phase out 23% Can't support data modeling we need 23% We need platform that supports mixed workloads Source: TDWI Next Generation Data Warehouse Platforms Report, 2009 21% Consolidate Onto a Single Platform Faster Performance, Single Source of Truth Data Marts Online Analytics ETL Data Mining Oracle Database 11g Oracle Exadata Database Machine Oracle Exadata Database Machine For OLTP, Data Warehousing & Consolidated Workloads • Improve query performance by 10x – Better insight into customer requirements – Expand revenue opportunities • Consolidate OLTP and analytic workloads – Lower admin and maintenance costs – Reduce points of failure • Integrate analytics and data mining – Complex and predictive analytics • Lower risk – Streamline deployment – One support contact Oracle Exadata Database Machine Family Oracle Exadata Database Machine X2-2 Oracle Database Server Pool • 8 2-processor Database Servers – 96 CPU Cores – 768 GB Memory – Oracle Linux or Solaris 11 Express Exadata Storage Server Pool • 14 Storage Servers – 5 TB Smart Flash Cache – 336 TB Disk Storage Unified Server/Storage Network • 40 Gb/sec Infiniband Links Available in full, half, quarter racks Oracle Exadata Database Machine Family Oracle Exadata Database Machine X2-8 Oracle Database Server Pool • 2 8-processor Database Servers – 128 CPU Cores – 2 TB Memory – Oracle Linux or Solaris 11 Express Exadata Storage Server Pool • 14 Storage Servers – 5 TB Smart Flash Cache – 336 TB Disk Storage Unified Server/Storage Network • 40 Gb/sec Infiniband Links Full and multi-rack configuration Traditional Query Problem What Were Yesterday’s Sales? Select sum(sales) where salesdate= „22-Jan-2010‟… Return entire Sales table Discard most of sales table Sum • Data is pushed to database server for processing • I/O rates are limited by speed and number of disk drives • Network bandwidth is strained, limiting performance and concurrency Exadata Smart Scan Improve Query Performance by 10x or More What Were Yesterday’s Sales? Select sum(sales) where salesdate= „22-Jan-2010‟… Return Sales for Jan 22 2010 Sum • Data intensive processing runs in Exadata Storage Servers • Rows and columns filtered as data streams from disks • Complex operations also run in storage • Parallelize query execution and removes bottlenecks Built-in Analytics Secure, Scalable Platform for Advanced Analytics Oracle OLAP Analyze and summarize Oracle Data Mining Uncover and predict • Complex and predictive analytics embedded into Oracle Database 11g • Reduce cost of additional hardware, management resources • Improve performance by eliminating data movement and duplication Exadata Storage Index Transparent I/O Elimination with No Overhead A B C D Index 1 3 Min B = 1 Max B =5 5 5 8 Select * from Table where B<2 Only first set of rows can match Min B = 3 Max B =8 3 • Maintain summary information about table data in memory • Eliminate disk I/Os if MIN / MAX never match “where” clause • Completely automatic and transparent Exadata Hybrid Columnar Compression Reduce Disk Space Requirements 100 90 Data – Terabytes 80 1.4x 70 60 50 40 2.5 x 3x 30 20 10 10x 15x DW Data Archive Data 0 Uncompressed Data Warehouse Data Appliances OLTP Data Oracle Benefits Multiply Converting Terabytes to Gigabytes 10 TB of User Data 1 TB of User Data 100 GB of User Data 10 TB of User Data With 10x Compression With Partition Pruning 20 GB of User Data 5 GB of User Data Sub second “10 TB” Scan With Smart Scan No Indexes 10 TB of User Data With Storage Indexes Partition to Manage Data Growth Compress Data and Lower Storage Costs Archive Data Read Only Data Active Data 15-50x Archive Compression 10x DW Compression 3x OLTP Compression • Distribute partitions across multiple compression tiers • Free up storage space and execute queries faster • No changes to existing applications Turkcell Runs 10x Faster on Exadata Compresses Data Warehouse by 10x • Replaced high-end SMP Server and 10 Storage Cabinets • Reduced Data Warehouse from 250TB to 27TB • Using OLTP & Hybrid Columnar Compression • Ready for future growth where data doubles every year • Experiencing 10x faster query performance • Delivering over 50,000 reports per month • Average report runs reduced from 27 to 2.5 mins • Up to 400x performance gain on some reports Softbank Runs 2x–8x Faster on Exadata 36 Teradata Racks Replaced by 3 Exadata Racks Teradata 36 Racks Exadata 3 Racks Oracle Exadata for Data Warehousing Oracle Exadata Momentum Rapid adoption in all geographies and industries Exadata Smart Flash Cache Extreme Performance for OLTP Applications Frequently Used Data Infrequently Used Data • Full rack has 5 TB of Smart Flash Cache • Can process over 1 million IOs per second • 50 GB/sec query throughput on uncompressed data • 5x more I/Os than 1000 Disk Enterprise Storage Array Oracle Database 11g The Best Database for Data Warehousing Real Application Clusters Advanced Compression Partitioning OLAP Data Mining • World record performance for fast access to information • Manage growing volumes of information cost-effectively • Reduce costs through server and data consolidation ETL with Oracle Database 11g Staging Raw Files BCP Unload FTP Parallel Loads Non-Oracle Source Data Pump Unload SCP Oracle Source • Fast data loading using DBFS and External Tables • Fast transforms in Oracle Database 11g via Parallel DML operations • Best-in-class performance for large batch oriented data loads The Concept of Partitioning Maintain Consistent Performance as Database Grows SALES SALES SALES Europe USA Jan Feb Jan Feb Large Table Partition Composite Partition • Difficult to Manage • Divide and Conquer • Higher Performance • Easier to Manage • Match to business needs • Improve Performance Partition for Performance Partition Pruning Sales Table 5/19 What was the total sales amount for May 20 and May 21 2010? Select sum(sales_amount) From SALES 5/20 Where sales_date between to_date(„05/20/2010‟,‟MM/DD/YYYY‟) And to_date(„05/22/2010‟,‟MM/DD/YYYY‟); 5/21 5/22 • Performs operations only on relevant partitions • Dramatically reduces amount of data retrieved from disk • Improves query performance and optimizes resource utilization In-Memory Parallel Execution Efficient use of memory on clustered servers In-Memory Parallel Query in Database Tier • Compress more data into available memory on cluster • Intelligent algorithm – Places table fragments in memory on different nodes • Reduces disk IO and speeds query execution © 2010 Oracle Corporation Automated Degree of Parallelism Queue statements if not enough parallel servers available 64 32 16 When required number of servers are available, execute first statement Automatically determine DOP 8 Enough parallel servers available Execute immediately • Optimizer derives the best Degree of Parallelism • Based on resource requirements of all concurrent operations • Less DBA management, better resource utilization Summary Management Improve Response Time with Materialized Views Region SQL Query Date Query Rewrite Products Relational Star Schema Sales by Region Sales by Date Sales by Product Sales by Channel Channel Materialized Views • Pre-summarized information stored within Oracle Database 11g • Separate database object, transparent to queries • Supports sophisticated transparent query rewrite • Fast incremental refresh of changed data Cube Organized Materialized Views Region SQL Query Summaries Date Query Rewrite Automatic Refresh Products Channel • Exposes Oracle OLAP cubes as relational materialized views • Provides SQL access to data stored in an OLAP cubes • Any BI tool or SQL application can leverage OLAP cubes Oracle OLAP Built-in Access to Analytic Calculations • How do sales in the Western region this quarter compare with sales a year ago? • What will sales next quarter be? • What factors can we alter to improve the sales forecast? • Multidimensional analytic engine that analyzes summary data • Offers improved query performance and fast, incremental updates • Embedded in Oracle Database instance and storage Oracle OLAP and OBIEE Calculations Computed Faster in OLAP Engine Oracle Data Mining Find Hidden Patterns, Make Predictions Retail Financial Services • Customer Segmentation • Response Modeling • Credit Scoring • Possibility of default Communications Utilities • Customer churn • Network intrusion • Product bundling • Predict power line failure Healthcare Public Sector • Patient outcome prediction • Fraud detection • Tax fraud • Crime analysis • Collection of data mining algorithms that solve business problems • Simplifies development of predictive BI applications • Embedded in Oracle Database instance and storage Oracle Data Mining and OBIEE Prediction and Probability Results Integrated in Reports Oracle Spatial and OBIEE • Enrich BI with map visualization of Oracle Spatial data • Enable location analysis in reporting, alerts and notifications • Use maps to guide data navigation, filtering and drill-down • Increase ROI from geospatial and non-spatial data Oracle Exadata Intelligent Warehouse For Industries Data Models Business Intelligence Exadata • Combine deep industry knowledge with data warehousing expertise • Help jump-start design and implementation of data warehouses • Available for Retail and Communications industries Advanced Customer Services for Oracle Data Warehousing and Exadata Lifecycle Services Operations Management Expert Services Installs & configuration Quality of Service monitoring Solution architectures Upgrades Incident & problem management Service delivery Patching Configuration management Data loading & migration Performance assessment Onsite or remote management Backup & recovery Solution Support Center 24x7 Expert Service Desk Problem priority & escalation Integrated with Premier support • Proactive advice to reduce deployment costs and risk • Preventive assessments to identify & resolve issues • Predictive management for high quality of service Oracle Exadata for Data Warehouse © 2010 Oracle Corporation Extreme Performance Data Warehousing Integrated Technology Stack BI Applications • Single source of truth BI Tools ELT Tools Data Models • Easy to deploy and manage • Extreme performance • Meets all end user requirements • Lower cost of ownership Database Smart Storage Oracle #1 for Data Warehousing Source: IDC, July 2009 – “Worldwide Data Warehouse Management Tools 2008 Vendor Shares” Oracle Database 11g R2: Fully Optimized to Run on Intel Xeon Processors • Robust optimization using Intel Compilers and Intel Integrated Performance Primitives (Intel IPP) – Speeds up and increases efficiency of compression and decompression of data. • Optimized encryption for security without compromise – Strong protection of data in transit with Oracle Advanced security using Intel IPP – Use of AES-NI to deliver more efficient encryption and decryption • Performance Optimizations enable Oracle 11g to work with the hardware to increase throughput and overall performance • Advanced Reliability Features (RAS) – Machine Check Architecture Recovery – Error Containment – Corrected Error Signaling Oracle Software Optimized for Intel Xeon Processors: Huge Performance Gain with AES-NI Encryption/Decryption Database Encryption/Decryption1 Oracle Database 11g* Decryption Time "Across the stack, increases of 50 percent in both core count and cache drive up performance on the Intel Xeon processor 5600 series. We are especially excited about accelerated encryption using AES New Instructions." Marie-Anne Neimat, VP Embedded Databases Development, Oracle Lower is better. Intel® Xeon® processor Intel® Xeon® processor X5560 w/o Intel® AES-NI X5680 Oracle Database Servers with Intel Xeon 7500 2005: 30 Legacy Database Servers 2010: 2 Servers Estimated IT BENEFITS Floor Space 94% REDUCTION 15:1 ~$70k HW Investment Single Core Servers Estimated Business BENEFITS Over 4 years Annual Energy Costs 90% REDUCTION Intel Xeon 7500 based Servers Lower Operating Costs Lower Software Costs As low as $200K $1.28M 4 Month Power / Cooling SAVINGS SW Licensing SAVINGS Estimated Payback Source: Intel estimates as of February 2010. Performance comparison using internal workload. Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software