Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Oracle Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Microsoft Access wikipedia , lookup

Btrieve wikipedia , lookup

Team Foundation Server wikipedia , lookup

Tandem Computers wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database model wikipedia , lookup

Clusterpoint wikipedia , lookup

Null (SQL) wikipedia , lookup

Relational model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

SQL wikipedia , lookup

PL/SQL wikipedia , lookup

Transcript
DATA WAREHOUSING:
SQL SERVER PARALLEL DATA WAREHOUSE AU3 UPDATE
Dandy Weyn
Sr. Technical Product Manager
This document
has been prepared
for limited distribution within Microsoft. This document
Microsoft
Corporation
contains materials and information that Microsoft considers confidential, proprietary, and
significant for the protection of its business. The distribution of this document is limited to
those solely involved with the program described within.
@ilikesql
Confidential and Proprietary © 2011 Microsoft
Last Updated: Monday, May 22, 2017
FAST GROWING INDUSTRY AND ENTERPRISE
DATA..
Problem:
DataWarehousing systems continue to grow
at fast pace
New types of large data sets and sources have
emerged
Data is not in uniform format and shape
What is needed?
A solution that:
Scales from few TBs to PBs of data
Allows adding capacity/power as needed
Offers variety of choices tailored towards
custom needs
Handles all the data:
Structured, semi-structured and unstructured
Unicode and Non-Unicode
MICROSOFT DATA WAREHOUSE OFFERINGS
Effort to Build
Very High
Very Low
Moderate
Moderate
Moderate
Moderate
Very Low
Capacity
Variable
5 TB
14 TB
20 TB
40 TB
80 TB
500 TB+
Concurrency
Variable
Light
Light
Medium
Medium
High
Very High
Medium
Medium
Medium
Medium
High
Very High
Query Complexity Variable
SQL SERVER | APPLIANCES
SQL SERVER PARALLEL DATA WAREHOUSE
• Tier-1 Enterprise Data Warehouse Appliance
Offering
• High scalability from tens to hundreds of terabytes
• High performance through the MPP system
• Flexibility and Choice
• Choice of deployment options through distributed
architecture
• Most Comprehensive Solution
• Complete data warehouse solution spanning desktop,
enterprise data warehouse, and data marts
PDW – CLIENT CONNECTIVITY
SQL
SQL
Client Drivers
SQL
SQL
SQL
SQL
Support/Patching
SQL
SQL
ETL Load
Interface
SQL
SQL
SQL
Corporate Backup
Solution
CONTROL
RACK
DATA RACK
MICROSOFT PDW APPLIANCE – POWERED BY DELL
PowerEdge R610
MD3620f
Storage Nodes
Database Servers
Control Nodes (R710)
Active / Passive
Client Drivers
Landing Zone (R510)
Dual Fiber Channel
Data Center
Monitoring
Dual Infiniband
Management Servers (R610)
ETL Load Interface
Backup Node (R710 and
MD3600f w/MD1200’s)
Corporate Backup
Solution
Corporate Network
Spare Database Server
Private Network
PDW – QUERY PROCESSING
SQL
???
QUERY
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
CONTROL NODE
SQL
SQL
SQL

Client connections always go through the control node

Contains no persistent user data

SQL advantages:
Parallel Data Warehouse
SQL
oProcesses SQL requests
SQL
oPrepares execution plan
oOrchestrates SQL
distributed execution

Local SQL Server processes final query plan and
aggregates results SQL

Provided by DataDirect
SQL
oOpen database connectivity (ODBC), object
linking and embedding database (OLE DB), Java
SQL
Database Connectivity
(JDBC), and ActiveX® Data
Objects (ADO.net) client drivers
oWire protocolSQL
(SeQuel link)
oDrivers are available for 32 bits and 64 bits
MANAGEMENT NODE
SQL
SQL
SQL
 Provides Support and Patching for the
Appliance
SQL
 Holds image for re-deployment of compute
SQL
node
 Holds Active Directory
SQL
SQL
SQL
SQL
SQL
SQL
LANDING ZONE
SQL
SQL
SQL
 Provides high-capacity storage for data files
from ETL processes
SQL
 Is available as a sandbox for other
SQL
applications and scripts that run on the
internal networkSQL
 Provides SQL Server Integration Services
SQL
SQL
SQL
Source
Landing
Zone
SQL
Files
DWLoader or
SQL Server
Integration
Services
SQL
Data
Loader
Compute
Nodes
SQL
•
Data Rack Servers 10
active + 1 passive
SQL
SQL
•
InfiniBand, FC and
Ethernet switching
•
Expansion Grow from 1–4
data racks, storage
options, test/dev system
SQL
SQL
SQL
SQL
•
Consists of COMPUTE
NODES and STORAGE
NODES
SQL
SQL
SQL
SQL
COMPUTE NODE
•
Data Rack Servers 10
active + 1 passive SQL
•
InfiniBand, FC and
Ethernet switching
•
Expansion Grow from
1–4 data racks,
storage options,
test/dev system

Each MPP node is a highly tuned symmetric multiprocessing (SMP) node with standard interfaces

Provides dedicated hardware, database, and
storage

Runs SQL Server

Spare Node provides failover in case of node
failure
 Drives are configured as RAID 1
BACKUP NODE
SQL
SQL
 Provides Integrated Backup Solution
SQL
 Integrates with SQL
3rd party backup option
 Orderable in different
sizes
SQL
SQL
SQL
SQL
SQL
SQL
SQL
COMPUTE NODE
•
Data Rack Servers 10
active + 1 passive SQL
•
InfiniBand, FC and
Ethernet switching
•
Expansion Grow from
1–4 data racks,
storage options,
test/dev system
 Each MPP node is a highly tuned symmetric
multi-processing (SMP) node with standard
interfaces
 Provides dedicated hardware, database, and
storage
 Runs SQL Server
 Spare Node provides failover in case of node
failure
 Drives are configured as RAID 1
DATA LAYOUT APPROACHES
Replicated
A table structure exists as a full copy within each discrete Parallel
Data Warehouse node.
Distributed
A table structure is hashed on a single column and uniformly
distributed across all nodes on the appliance. Each distribution is a
separate physical table in the database management system
(DBMS).
Ultra Shared-Nothing
Provides the ability to design a schema of both distributed and
replicated tables to minimize data movement between nodes.
 Small sets of data can be more efficiently stored in full
(replicated).
 Certain set operations (such as single-node operations) are
more efficient against full sets of data.
ULTRA SHARED-NOTHING ARCHITECTURE
Extends Traditional Shared-Nothing Design
 Pushes shared-nothing architecture into the SMP node—there is IO and CPU
affinity within SMP nodes
o Eliminates contention for user queries
o Uses full resources for each user query
 Provides multiple physical instances of tables
o Distributes large tables
o Replicates small tables
 Redistributes rows as needed
Provides Fault Tolerance
 All hardware components have redundancy (including CPUs, disks, networks,
power, and storage processors)
 Control and compute nodes use failover clustering
 Management nodes have active and standby states
SQL SERVER 2008 R2 PARALLEL DATA WAREHOUSE
APPLIANCE UPDATE 3
Improve
Performance
Broaden
Functionality
Expand
Flexibility
Cost Based Optimizer
Collations and Stored Procedures
Entry Appliances
THEME: PERFORMANCE AT SCALE
COST-BASED OPTIMIZER
Goal:
• Generate better execution plans
Functionality:
• Large space of execution alternatives
explored
• Best alternative picked based on the
costing
• Cost model that is sensitive to amount
of data to be moved
Benefits:
• Leverages existing SQL Server
optimizer and years of development
• 10X or more performance
improvement compared to AU2
• Plan adaptable to heuristics change
TPCH - Power Metric
60000
50000
40000
30000
20000
10000
0
Power Metric
AU2
AU3
19711
54602
TPCH - Total Elapsed Time (s)
80,000
60,000
40,000
20,000
-
Total Time
AU2
AU3
59,314
9,969
THEME: PERFORMANCE AT SCALE
ZERO DATA CONVERSIONS
DMS CPU Utilization - TPCH
Goal:
Benefits:
– Better resource, CPU, utilization
– 6x or more faster move
operations, compared to AU2
40
30
20
10
AU2
Q22
Q21
Q20
Q19
Q18
Q17
Q16
Q15
Q14
Q13
Q12
Q11
Q10
Q9
Q8
Q7
Q6
Q5
Q4
Q3
0
Q2
– Using ODBC instead of
ADO.NET for reading and writing
data
– Minimizing appliance resource
utilization for data moves
50
Q1
Functionality:
60
CPU (%)
– Eliminate CPU utilization spent
on data conversions
AU3
Improvement Factor
Replicated table load
Shuffle
Replicate
Trim
Broadcast
0
1
2
3
4
5
6
7
* Improvement factor calculated based on PDW PGQL
THEME: PERFORMANCE AT SCALE
PDW ENTRY APPLIANCE (”… FOR THE RIGHT PRICE …”)
Goal:
– Appliance for lower end of the market
Functionality:
– ~40% less processing power (4+1 Compute
Nodes)
– Up to 50TB disk capacity (4 Storage Arrays)
– Dell based hardware reference architecture
– Complete PDW functionality (no less, no more)
Benefits:
– ~40% cheaper than 1 rack appliance
– The lowest cost/TB on the market
– Increased flexibility and choice (appliances for
different needs)
THEME: SQL SERVER COMPATIBILITY
STORED PROCEDURES
Goal:
– Common code encapsulation and reuse
Functionality:
– System and user-defined stored
procedures
– Invocation using RPC or EXECUTE
– Support for: control flow logic, input
parameters
Benefits:
– Enables common logic re-use
– Allows porting existing scripts
– Increases compatibility with SQL
Server
Syntax:
CREATE { PROC | PROCEDURE } [dbo.]procedure_name
[ { @parameter data_type } [ = default ] ] [ ,...n ]
AS { [ BEGIN ] sql_statement [;] [ ...n ] [ END ] } [;]
ALTER { PROC | PROCEDURE } [dbo.]procedure_name
[ { @parameter data_type } [ = default ] ] [ ,...n ]
AS { [ BEGIN ] sql_statement [;] [ ...n ] [ END ] } [;]
DROP { PROC | PROCEDURE } { [dbo.]procedure_name } [;]
[ { EXEC | EXECUTE } ]
{
{ [database_name.][schema_name.]procedure_name
}
[{ value | @variable }] [ ,...n ]
} [;]
{ EXEC | EXECUTE }
( { @string_variable | [ N ]'tsql_string' } [ + ...n ] ) [;]
THEME: IMPROVED INTEGRATION
HADOOP CONNECTOR
Goal:
– Handle both structured and unstructured
data
Functionality:
– Bi-directional (import/export) interface
between MSFT Hadoop and PDW
– Delimited file support
– Adapter uses existing PDW tools (bulk
loader, dwsql)
– Data transfer to/from PDW Landing Zone
node over FTP channel
– Low cost solution that handles all the data
– Additional agility, flexibility and choice
Hadoop
SQOOP
based
adapter
Landing Zone Node
Bulk Data Loader
PDW agent
HDFS
dwsql
PDW
Benefits:
Config
file
HDFS
THEME: IMPROVED INTEGRATION
Examples:
Goal:
– Support local and international
customers / data
Functionality:
–
–
–
–
Fixed server level collation
User-defined column level collation
Supporting all Windows collations
Allow COLLATE clauses in Queries
and DML
Benefits:
– Store all the data in PDW w/
additional querying flexibility
– Existing DDLs and Query scripts
– SQL Server alignment and
functionality
CREATE TABLE T (
c1 varchar(3) COLLATE traditional_Spanish_ci_ai,
c2 varchar(10) COLLATE …)
SELECT c1 COLLATE Latin1_General_Bin2
FROM T
SELECT * FROM T
ORDER BY c1 COLLATE Latin1_General_Bin2
DISTRIBUTED ARCHITECTURE / HUB - SPOKE
SSRS
Excel/Excel Services
SharePoint
SSIS
PowerPivot
FLEXIBLE BUSINESS ALIGNMENT
Parallel database
copy technology
enables rapid data
movement and
consistency between
EDW and data marts
Supports user groups
with very different
service-level
agreements (SLAs):
• Performance
• Capacity
• Loading
• Concurrency
Create SQL Server 2012, Fast Track Data Warehouse for SQL 2012,
and SQL Server Analysis Services Data Marts
A distributed architecture gives you the flexibility to add or change diverse
workloads
or user groups while maintaining data consistency across the enterprise
© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.