Download Extreme Performance Data Warehousing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft Jet Database Engine wikipedia , lookup

Big data wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Relational model wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Oracle Database wikipedia , lookup

Database model wikipedia , lookup

Transcript
<Insert Picture Here>
Extreme Performance Data Warehousing
Presenter‟s Name
Presenter‟s Title
The Rise of the Intelligent Economy
“From recession comes an opportunity to reset a number of
industry structures…there is an opportunity to infuse
industries with technologies that position them to operate
more effectively in the next 50 years.”
Lessons Learned in Building the Intelligent Economy, May 2010
All Businesses Want Better Insight
Industry
Retail
Typical Questions
What stores should be closed or sold?
Which customers will respond to new promotion?
Telecommunications What are the issues effecting churn by region?
What is the average revenue per user (ARPU)?
Healthcare
What are most common patient service requests?
What is average level of clinical supplies on-hand?
Financial Services
How will new online services impact deposits?
How does average loan compare to last year?
Utilities
Who do we target for energy efficiency program?
What resources are needed to restore an outage?
Public Sector
What is the trend on budget and expenditures?
What is most cost-effective way to manage waste?
Challenge: Much More Data to Analyze
Data Warehouse Size and Growth
34%
More than 10 TB
17%
25%
3 - 10 TB
19%
18%
21%
1 - 3 TB
12%
500 GB - 1 TB
Less than 500 GB
20%
5%
21%
In 3 Years
Source: TDWI Next Generation Data Warehouse Platforms Report, 2009
Today
Challenge: No Single Source of Truth
Expensive Data Warehouse Architecture
Data
Marts
OLAP
Data Mining
ETL
Data
Marts
ETL
OLAP
Data Mining
Challenge: User Requirements Not Met
High Churn in Data Warehouse Platforms
Poor query response
45%
Can't support advanced analytics
40%
Inadequate data load speed
39%
Can't scale to large data volumes
37%
Cost of scaling up is too expensive
33%
Poorly suited to real-time or on demand workloads
29%
Current platform is a legacy we must phase out
23%
Can't support data modeling we need
23%
We need platform that supports mixed workloads
Source: TDWI Next Generation Data Warehouse Platforms Report, 2009
21%
Consolidate Onto a Single Platform
Faster Performance, Single Source of Truth
Data
Marts
Online
Analytics
ETL
Data Mining
Oracle Database 11g
Oracle Exadata Database Machine
Oracle Exadata Database Machine
For OLTP, Data Warehousing & Consolidated Workloads
• Improve query performance by 10x
– Better insight into customer requirements
– Expand revenue opportunities
• Consolidate OLTP and analytic workloads
– Lower admin and maintenance costs
– Reduce points of failure
• Integrate analytics and data mining
– Complex and predictive analytics
• Lower risk
– Streamline deployment
– One support contact
Oracle Exadata Database Machine Family
Oracle Exadata Database Machine X2-2
Oracle Database Server Pool
• 8 2-processor Database Servers
– 96 CPU Cores
– 768 GB Memory
– Oracle Linux or Solaris 11 Express
Exadata Storage Server Pool
• 14 Storage Servers
– 5 TB Smart Flash Cache
– 336 TB Disk Storage
Unified Server/Storage Network
• 40 Gb/sec Infiniband Links
Available in full, half, quarter racks
Oracle Exadata Database Machine Family
Oracle Exadata Database Machine X2-8
Oracle Database Server Pool
• 2 8-processor Database Servers
– 128 CPU Cores
– 2 TB Memory
– Oracle Linux or Solaris 11 Express
Exadata Storage Server Pool
• 14 Storage Servers
– 5 TB Smart Flash Cache
– 336 TB Disk Storage
Unified Server/Storage Network
• 40 Gb/sec Infiniband Links
Full and multi-rack configuration
Traditional Query Problem
What Were
Yesterday’s
Sales?
Select sum(sales)
where salesdate=
„22-Jan-2010‟…
Return entire
Sales table
Discard
most of
sales table
Sum
• Data is pushed to database server for processing
• I/O rates are limited by speed and number of disk drives
• Network bandwidth is strained, limiting performance and concurrency
Exadata Smart Scan
Improve Query Performance by 10x or More
What Were
Yesterday’s
Sales?
Select sum(sales)
where salesdate=
„22-Jan-2010‟…
Return Sales for
Jan 22 2010
Sum
• Data intensive processing runs in Exadata Storage Servers
• Rows and columns filtered as data streams from disks
• Complex operations also run in storage
• Parallelize query execution and removes bottlenecks
Built-in Analytics
Secure, Scalable Platform for Advanced Analytics
Oracle OLAP
Analyze and summarize
Oracle Data Mining
Uncover and predict
• Complex and predictive analytics embedded into Oracle Database 11g
• Reduce cost of additional hardware, management resources
• Improve performance by eliminating data movement and duplication
Exadata Storage Index
Transparent I/O Elimination with No Overhead
A B C D
Index
1
3
Min B = 1
Max B =5
5
5
8
Select * from Table where B<2 Only first set of rows can match
Min B = 3
Max B =8
3
• Maintain summary information about table data in memory
• Eliminate disk I/Os if MIN / MAX never match “where” clause
• Completely automatic and transparent
Exadata Hybrid Columnar Compression
Reduce Disk Space Requirements
100
90
Data – Terabytes
80
1.4x
70
60
50
40
2.5 x
3x
30
20
10
10x
15x
DW
Data
Archive
Data
0
Uncompressed Data Warehouse
Data
Appliances
OLTP
Data
Oracle
Benefits Multiply
Converting Terabytes to Gigabytes
10 TB of User Data
1 TB of User Data
100 GB of User Data
10 TB of User Data
With 10x Compression
With Partition Pruning
20 GB of User Data
5 GB of User Data
Sub second “10 TB” Scan
With Smart Scan
No Indexes
10 TB
of User
Data
With
Storage
Indexes
Partition to Manage Data Growth
Compress Data and Lower Storage Costs
Archive Data
Read Only Data
Active Data
15-50x Archive
Compression
10x DW
Compression
3x OLTP
Compression
• Distribute partitions across multiple compression tiers
• Free up storage space and execute queries faster
• No changes to existing applications
Turkcell Runs 10x Faster on Exadata
Compresses Data Warehouse by 10x
• Replaced high-end SMP Server and 10 Storage Cabinets
• Reduced Data Warehouse from 250TB to 27TB
• Using OLTP & Hybrid Columnar Compression
• Ready for future growth where data doubles every year
• Experiencing 10x faster query performance
• Delivering over 50,000 reports per month
• Average report runs reduced from 27 to 2.5 mins
• Up to 400x performance gain on some reports
Softbank Runs 2x–8x Faster on Exadata
36 Teradata Racks Replaced by 3 Exadata Racks
Teradata
36 Racks
Exadata
3 Racks
Oracle Exadata for Data Warehousing
Oracle Exadata Momentum
Rapid adoption in all geographies and industries
Exadata Smart Flash Cache
Extreme Performance for OLTP Applications
Frequently
Used Data
Infrequently
Used Data
• Full rack has 5 TB of Smart Flash Cache
• Can process over 1 million IOs per second
• 50 GB/sec query throughput on uncompressed data
• 5x more I/Os than 1000 Disk Enterprise Storage Array
Oracle Database 11g
The Best Database for Data Warehousing
Real Application Clusters
Advanced Compression
Partitioning
OLAP
Data Mining





• World record performance for fast access to information
• Manage growing volumes of information cost-effectively
• Reduce costs through server and data consolidation
ETL with Oracle Database 11g
Staging
Raw Files
BCP
Unload
FTP
Parallel
Loads
Non-Oracle Source
Data Pump
Unload
SCP
Oracle Source
• Fast data loading using DBFS and External Tables
• Fast transforms in Oracle Database 11g via Parallel DML operations
• Best-in-class performance for large batch oriented data loads
The Concept of Partitioning
Maintain Consistent Performance as Database Grows
SALES
SALES
SALES
Europe
USA
Jan
Feb
Jan
Feb
Large Table
Partition
Composite Partition
• Difficult to Manage
• Divide and Conquer
• Higher Performance
• Easier to Manage
• Match to business needs
• Improve Performance
Partition for Performance
Partition Pruning
Sales Table
5/19
What was the total
sales amount for May
20 and May 21 2010?
Select sum(sales_amount)
From SALES
5/20
Where sales_date between
to_date(„05/20/2010‟,‟MM/DD/YYYY‟)
And
to_date(„05/22/2010‟,‟MM/DD/YYYY‟);
5/21
5/22
• Performs operations only on relevant partitions
• Dramatically reduces amount of data retrieved from disk
• Improves query performance and optimizes resource utilization
In-Memory Parallel Execution
Efficient use of memory on clustered servers
In-Memory Parallel Query in Database Tier
• Compress more data into available memory on cluster
• Intelligent algorithm
– Places table fragments in memory on different nodes
• Reduces disk IO and speeds query execution
© 2010 Oracle Corporation
Automated Degree of Parallelism
Queue statements if not enough parallel servers available
64
32
16
When required number of servers are
available, execute first statement
Automatically
determine
DOP
8
Enough parallel servers available
Execute
immediately
• Optimizer derives the best Degree of Parallelism
• Based on resource requirements of all concurrent operations
• Less DBA management, better resource utilization
Summary Management
Improve Response Time with Materialized Views
Region
SQL Query
Date
Query
Rewrite
Products
Relational Star
Schema
Sales by
Region
Sales by
Date
Sales by
Product
Sales by
Channel
Channel
Materialized Views
• Pre-summarized information stored within Oracle Database 11g
• Separate database object, transparent to queries
• Supports sophisticated transparent query rewrite
• Fast incremental refresh of changed data
Cube Organized Materialized Views
Region
SQL Query
Summaries
Date
Query Rewrite
Automatic
Refresh
Products
Channel
• Exposes Oracle OLAP cubes as relational materialized views
• Provides SQL access to data stored in an OLAP cubes
• Any BI tool or SQL application can leverage OLAP cubes
Oracle OLAP
Built-in Access to Analytic Calculations
• How do sales in the Western region this
quarter compare with sales a year ago?
• What will sales next quarter be?
• What factors can we alter to improve the
sales forecast?
• Multidimensional analytic engine that analyzes summary data
• Offers improved query performance and fast, incremental updates
• Embedded in Oracle Database instance and storage
Oracle OLAP and OBIEE
Calculations Computed Faster in OLAP Engine
Oracle Data Mining
Find Hidden Patterns, Make Predictions
Retail
Financial Services
• Customer Segmentation
• Response Modeling
• Credit Scoring
• Possibility of default
Communications
Utilities
• Customer churn
• Network intrusion
• Product bundling
• Predict power line failure
Healthcare
Public Sector
• Patient outcome prediction
• Fraud detection
• Tax fraud
• Crime analysis
• Collection of data mining algorithms that solve business problems
• Simplifies development of predictive BI applications
• Embedded in Oracle Database instance and storage
Oracle Data Mining and OBIEE
Prediction and Probability Results Integrated in Reports
Oracle Spatial and OBIEE
• Enrich BI with map visualization of Oracle Spatial data
• Enable location analysis in reporting, alerts and notifications
• Use maps to guide data navigation, filtering and drill-down
• Increase ROI from geospatial and non-spatial data
Oracle Exadata Intelligent Warehouse
For Industries
Data Models
Business Intelligence
Exadata
• Combine deep industry knowledge with data warehousing expertise
• Help jump-start design and implementation of data warehouses
• Available for Retail and Communications industries
Advanced Customer Services
for Oracle Data Warehousing and Exadata
Lifecycle Services
Operations Management
Expert Services
Installs & configuration
Quality of Service monitoring
Solution architectures
Upgrades
Incident & problem management
Service delivery
Patching
Configuration management
Data loading & migration
Performance assessment
Onsite or remote management
Backup & recovery
Solution Support Center
24x7 Expert Service Desk
Problem priority & escalation
Integrated with Premier support
• Proactive advice to reduce deployment costs and risk
• Preventive assessments to identify & resolve issues
• Predictive management for high quality of service
Oracle Exadata for Data Warehouse
© 2010 Oracle Corporation
Extreme Performance Data Warehousing
Integrated Technology Stack
BI Applications
• Single source of truth
BI Tools
ELT Tools
Data Models
• Easy to deploy and manage
• Extreme performance
• Meets all end user requirements
• Lower cost of ownership
Database
Smart Storage
Oracle #1 for Data Warehousing
Source: IDC, July 2009 – “Worldwide Data Warehouse Management Tools 2008 Vendor Shares”
Oracle Database 11g R2: Fully Optimized
to Run on Intel Xeon Processors
• Robust optimization using Intel Compilers and Intel Integrated
Performance Primitives (Intel IPP)
– Speeds up and increases efficiency of compression and decompression of
data.
• Optimized encryption for security without compromise
– Strong protection of data in transit with Oracle Advanced security using
Intel IPP
– Use of AES-NI to deliver more efficient encryption and decryption
• Performance Optimizations enable Oracle 11g to work with the
hardware to increase throughput and overall performance
• Advanced Reliability Features (RAS)
– Machine Check Architecture Recovery
– Error Containment
– Corrected Error Signaling
Oracle Software Optimized for Intel Xeon
Processors: Huge Performance Gain with
AES-NI Encryption/Decryption
Database
Encryption/Decryption1
Oracle Database 11g*
Decryption Time
"Across the stack, increases of 50
percent in both core count and cache
drive up performance on the Intel
Xeon processor 5600 series. We are
especially excited about accelerated
encryption using AES New
Instructions."
Marie-Anne Neimat,
VP Embedded Databases
Development, Oracle
Lower is better.
Intel® Xeon® processor Intel® Xeon® processor
X5560 w/o Intel® AES-NI
X5680
Oracle Database Servers with Intel Xeon
7500
2005: 30
Legacy
Database
Servers
2010: 2 Servers
Estimated IT
BENEFITS
Floor Space
94%
REDUCTION
15:1
~$70k HW
Investment
Single Core Servers
Estimated
Business
BENEFITS
Over 4 years
Annual
Energy Costs
90%
REDUCTION
Intel Xeon 7500 based Servers
Lower Operating Costs
Lower Software Costs
As low as
$200K
$1.28M
4 Month
Power / Cooling SAVINGS
SW Licensing SAVINGS
Estimated Payback
Source: Intel estimates as of February 2010. Performance comparison using internal workload. Results have
been estimated based on internal Intel analysis and are provided for informational purposes only. Any
difference in system hardware or software