Download DATA - PASS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Functional Database Model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

PL/SQL wikipedia , lookup

SQL wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Database model wikipedia , lookup

Transcript
Yossi Elkayam
Sr. BI & Azure Architect
Microsoft Services
[email protected]
Age of data
BI
Database
http://db-engines.com/en/ranking
Cloud
Relational
Beyond Relational
Azure Data Lake
SQL Server Azure VM
HDInsight
Azure SQL DB
DocumentDB
Azure SQL DW
Microsoft
Data Platform
Power BI
Azure Machine Learning
Azure Data Factory
On-premises
Federated Query
SQL Server
APS
Comprehensive
Connected
SQL Server
Choice
Cortana Analytics
Azure Internet of Things (IoT) Suite & Dynamics
Devices
RTOS, Linux, Windows, Android, iOS
Batch Analytics & Visualizations
Azure HDInsight, AzureML, Power BI,
Azure Data Factory
Protocol
Adaptation
Hot Path Analytics
Field
Gateway
Azure Stream Analytics, Azure HDInsight Storm
Protocol
Adaptation
Presentation &
Business Connectivity
App Service, Websites
Cloud Gateway
Field
Gateway
Event Hubs
&
IoT Hub
Device
Connectivity & Management
Hot Path Business Logic
Service Fabric & Actor Framework
Dynamics, BizTalk Services,
Notification Hubs
Analytics &
Operationalized Insights
Presentation &
Business Connectivity
7
Streaming Analytics
Event
producers
Collection
Event Queuing
System
Transformation
Azure ML
Long-term
storage
Live Dashboards
Web/thick client
dashboards
Applications
Devices
Cloud gateways
(web APIs)
Storm
HDInsight
Event hubs
Stream
Analytics
Sensors
Kafka/RabbitMQ/
ActiveMQ
Web and social
Field
gateways
Storage
adapters
Stream processing
Presentation
and action
Apache HBase on
HDInsight
DocumentDB
Solr Azure
Search
MongoDB SQL
Search and query
Data analytics (Excel)
Event hub
Devices to take action
IoT Scenario - Connected Cars /
Devices
DocumentDB
Document
Store
HBase
SQL Azure
No SQL
Store
Relational
Store
Event Hubs
Queue
Service
PowerBI
Get Data
Cloud gateways
Apache
Storm
Get
Reference
Data
Business
Logic
Store Raw
Data
Store
Reporting
Data
Queue
Service
Event Hubs
Live
Dashboard
CLOUD
MICROSOFT
DATAPLATFORM
PLATFORM
MICROSOFT DATA
DATA VISUALIZATION
Analyze and Authoring
ON PREMISES
SQL Server in Azure VM
AG -Async replica
SQL Database
+ Elastic Scale
LRS – Geo-Replication
BI and Advanced
Analytics
Database Engine
Buffer Pool Ext.
Query Store
StretchDB
Columnstore
Row Store
Resource
Governor
Analysis Services
Tabular, Multi-Dimensional
PowerPivot, BISM, Data Mining, KPI,
BISM
Reporting Services
SQL Agent, Database Mail,
Linked Servers, Managed
Backup, Backup to Azure
Redis Cache
DocumentDB
Transactional Replication
In-Memory
Row level security, Transparent
Data Encryption, Always
Encrypted, Data Masking,
Auditing, Compliance
Analytical Reports
Microsoft Excel
Power BI Desktop
Mobile Reports
SQL Server Database Engine
Relational, XML, JSON, Spatial,
FullText, Binary, Image,
FileTable, Filestream
Azure
Marketplace
Native, SharePoint Integrated
Table Storage
Blob Storage
Azure HDInsight
Map Reduce, Pig,
Hive, Hbase, Storm,
Spark
Mobile Report Publisher
Paginated Reports
Report Builder
Report Designer
Delivery
MPP
Azure SQL Data Warehouse Azure Data Lake Store
R Services
Cloud
Power BI Service
On-Premises
Information Management
& Data Orchestration
Data Quality Services
Master Data Services
Integration Services
HA/DR
AlwaysOn
Data Warehousing
Azure Data Factory
Dimensional Modeling, Star,
Snowflake, Polybase
Replication
Log Shipping
Reference
Architectures
Appliances
Polybase
APS
Massively Parallel
Processing
Common Tools
SQL Server
Management
Studio
SQL Server
Data Tools
SQL Reporting Services
Azure Search
Consume
AG
Polybase
Physical / Virtual
Deployment
FCI
Azure
Data Catalog
Command Line
(PowerShell, BCP, SQLCMD)
Power BI Web Portal
Event Hub
Stream
Analytics
Azure Data Lake
Analytics
Azure ML
Windows Phone App
Android App
iOS App
Reporting Services Portal
Migration & Upgrade Tools
(SSMA, Upgrade Advisor, Map
Toolkit)
SharePoint
Cortana
“Big Data Reality Framework…”
http://static.googleusercontent.com/exte
rnal_content/untrusted_dlcp/research.goo
gle.com/en/us/pubs/archive/fdp.41344
‫סיפור אישי‬
Yossi Elkayam
Sr. BI & Azure Architect
Microsoft Services
[email protected]
Ever growing data, ever shrinking IT
End-users
DBAs
Storage Admins
& TDM’s
...‫להמציא את עצמנו מחדש‬
Yossi Elkayam
Sr. BI & Azure Architect
Microsoft Services
[email protected]
SQL Server 2016: Everything built-in
built-in
built-in
built-in
1
0
0
-10
6
2
3
0
1
-40
5
3
5
22
20
15
3
Tableau
Oracle
$120
SQL Server
SQL Server
18
built-in
built-in
Microsoft
6
0
4
-20
-30
4
built-in
$480
SQL Server
22
29
34
43
-50
49
-60
-70
#1
69
-80
SQL Server
Oracle
MySQL2
SAP HANA
#2
#3
Oracle
is #4
TPC-H non-clustered 10TB
$2,230
Self-service BI per user
In-memory across all workloads
Consistent experience from on-premises to cloud
The above graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Microsoft. Gartner does not endorse any
vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner's research
organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
National Institute of Standards and Technology Comprehensive Vulnerability Database update 10/2015
TPC-H non-clustered results as of 04/06/15, 5/04/15, 4/15/14 and 11/25/13, respectively. http://www.tpc.org/tpch/results/tpch_perf_results.asp?resulttype=noncluster
at massive scale
SQL Data Warehouse Architecture
Application or
User connection
SQL
DB
Control – “The Brain”
An endpoint for connection
and tools. Coordinates
storage/compute activity.
DMS
Data Loading
Control
(PolyBase, ADF, SSIS, REST,
Node
OLE, ODBC, ADF, AZCopy, PS)
Massively Parallel
Processing (MPP) Engine
SQL
DB
DMS
SQL
DB
DMS
SQL
DB
DMS
SQL
DB
DMS
Compute
Compute
Compute
Compute
Node
Node
Node
Node
Azure Infrastructure and
Storage
Blob storage [WASB(S)]
Compute – “The Brawn”
Handles query processing,
ability to scale up/ down
Data Movement Services
Coordinates data movement
from nodes/storage
Storage
Add\Load data to WASB(S)
without incurring compute
costs
How does Stretch work?
Internet boundary
Source
Database
Hot/Active
Data
TRICKLE MIGRATION
Cold/Historical
Closed Data
Source SQL Server
Creates a secure connection between
the Source SQL Server and Azure
Remote
Database
Remote
Table
Provisions remote instance and
begins migration
Apps and Queries continue to run for
both the local database and remote
endpoint
Security controls and maintenance
remain local
Microsoft Azure
Machine
Learning
(Mahout)
Query
(Hive)
Distributed Processing
(MapReduce)
Distributed Storage
(HDFS)
Data Integration
(Sqoop/REST/ODBC)
Scripting
(Pig)
NoSQL Database
(HBase)
Workflow &Scheduling
(Oozie)
Coordination
(ZooKeeper)
Management & Monitoring
(Ambari)
Azure Data Lake
Batch, real-time and interactive analytics made easy
Azure Data Lake
Analytics service
Managed clusters
(HDInsight)
`
YARN
WebHDFS
Store
Unstructured
Semi-Structured
Structured
Principles
• Maximize return on accessible data
• Reduce time to value
• Reduce time to insight
Approach
• Productivity day one (Developers,
Scientists, Analysts)
• Open (Yarn, HDFS), designed for the
cloud
• All data available for analysis
• Leverages existing skills, use SQL,
Spark, Hive, Storm, Hbase
• Dynamically scales to meet your
business objectives
• Managed and supported with an
enterprise grade SLA
Ingest all data
regardless of requirements
Devices
Store all data
in native format without
schema definition
Do analysis
Using analytic engines
like Hadoop
Batch queries
Interactive queries
Real-time analytics
Machine Learning
Data warehouse
PolyBase
Query relational and non-relational data with T-SQL
Quote:
************************
T-SQL query
**********************
*********************
**********************
***********************
SQL Server
Name
DOB
Denny Usher 11/13/58
Usher
Gina Burch 04/29/76
State
WA
ME
Hadoop \
Data Lake
Store
$658.39
CREATE EXTERNAL DATA SOURCE HadoopCluster
WITH (TYPE = Hadoop, LOCATION = 'hdfs://10.193.26.177:8020',
RESOURCE_MANAGER_LOCATION = '10.193.26.178:8050');
Once per Hadoop Cluster
CREATE EXTERNAL FILE FORMAT TextFile
WITH ( FORMAT_TYPE = DELIMITEDTEXT,
DATA_COMPRESSION = 'org.apache.hadoop.io.compress.GzipCodec',
FORMAT_OPTIONS (FIELD_TERMINATOR ='|', USE_TYPE_DEFAULT = TRUE));
Once per File Format
CREATE EXTERNAL TABLE [dbo].[Customer] (
[SensorKey] int NOT NULL,
int NOT NULL,
[Speed] float NOT NULL
)
WITH (LOCATION='//Sensor_Data//May2014/sensordata.tbl',
DATA_SOURCE = HadoopCluster,
FILE_FORMAT = TextFile
)
HDFS File Path
CREATE DATABASE SCOPED CREDENTIAL HadoopCredential
WITH IDENTITY = 'hadoopUserName', Secret = 'hadoopPassword';
Once per Hadoop User
CREATE EXTERNAL DATA SOURCE HadoopCluster
WITH (TYPE = Hadoop, LOCATION = 'hdfs://10.193.26.177:8020',
RESOURCE_MANAGER_LOCATION = '10.193.26.178:8050',
HadoopCredential);
Once per Hadoop Cluster
per user
CREATE EXTERNAL FILE FORMAT TextFile
WITH ( FORMAT_TYPE = DELIMITEDTEXT,
DATA_COMPRESSION = 'org.apache.hadoop.io.compress.GzipCodec',
FORMAT_OPTIONS (FIELD_TERMINATOR ='|', USE_TYPE_DEFAULT = TRUE));
Once per File Format
CREATE EXTERNAL TABLE [dbo].[Customer] (
[SensorKey] int NOT NULL,
int NOT NULL,
[Speed] float NOT NULL
)
WITH (LOCATION='//Sensor_Data//May2014/',
DATA_SOURCE = HadoopCluster,
FILE_FORMAT = TextFile
);
HDFS File Path
Federated queries: Query data where it lives
Easily query data in multiple Azure data stores without moving it to a single store
Benefits
• Avoid moving large amounts of data across the
network between stores
• Single view of data irrespective of physical
location
• Minimize data proliferation issues caused by
maintaining multiple copies
• Single query language for all data
• Each data store maintains its own sovereignty
• Design choices based on the need
Azure
Storage Blobs
Query
U-SQL
Query
Azure SQL
in VMs
Azure Data
Lake Analytics
• Push SQL expressions to remote SQL sources
•
Filters
•
Joins
Azure
SQL DB
Azure
SQL Data Warehouse
Handling variety
of data and
model changes
Information are
stored in JSON
format
Support complex
analysis on JSON
documents
Pillars
Benefits
Drivers
Modern services
exchange data in
JSON format
Fast built-in
JSON/relational
data conversion
Combination of
relational and
JSON data
The power of
T-SQL and SQL
Server engine
Integration with
all SQL Server
components
Built-in functions
ISJSON
JSON_VALUE
JSON_MODIFY
JSON_QUERY
[
{
},
{
]
}
"Number":"SO43659",
"Date":"2011-05-31T00:00:00"
"AccountNumber":"AW29825",
"Price":59.99,
"Quantity":1
"Number":"SO43661",
"Date":"2011-06-01T00:00:00“
"AccountNumber":"AW73565“,
"Price":24.99,
"Quantity":3
OPENJSON
Transforms JSON
text to table
SO43659
2011-05-31T00:00:00
MSFT
59.99
1
SO43661
2011-06-01T00:00:00
Nokia
24.99
3
FOR JSON
Formats result set
as JSON text.
OPENJSON
{
"name": "Microsoft",
"homepage_url": "www.microsoft.com",
"blog_url": "blogs.microsoft.com/",
"products": [
{
"name": "Azure",
"permalink": "azure.com"
}
],
"offices": [
{
"address1": "1 Redmond Way",
"zip_code": "98052",
"city": "Redmond",
"state_code": "WA",
"country_code": "USA"
}
]
Part of NoSQL family
Built for simplicity
cale and performance
Non-relational
o enforced schema
}
Part of NoSQL family
Built for simplicity
cale and performance
Non-relational
o enforced schema
{
"id": "itemdata2344",
"data": "TWFuIGlzIGRpc3Rpbmd1aXNoZWQsI
G5vdCBvbmx5IGJ5IGhpcyByZWFzb24
sIGJ1dCBieSB0aGlzIHNpbmd1bGFyIHB
c3Npb24gZnJvbSBvdGhlcibmltYWxz
LCB3aGljaCBpcyBhIGx1c3Qgb2YdGhlI
G1pbmQsIHRoYXQgYnkgYSBwZXJzZX
ZlcmFuY2Ugb2YgZGVsaWdodCBpb
B0aGUgY29udGludWVkIGFuZCBpbm
RlZmF0aWdhYmxlIGdlbmVyYXRpb24
gb2Yga25vd2xlZGdlLCBleGNlZWRzIH
RZSBzaG9ydCB2ZWhlbWVuY2Ugb2
YgYW55IGNhcm5hbCBwbhc3VyZ4=="
}
Part of NoSQL family
Built for simplicity
cale and performance
Non-relational
o enforced schema
{
Jill
Ben
Susan
Andrew
Sven
Thomas
{id:
{id:
{id:
{id:
{id:
{id:
"Jill" },
"Ben", manager: "Jill" },
"Susan", manager: "Jill" },
"Andrew", manager: "Ben" },
"Sven", manager: "Susan" },
"Thomas", manager: "Sven" }
}
To get the manager of any employee is trivial -
SELECT manager FROM org WHERE id = "Susan"
{
id: "CDC101",
title: “The Fundamentals of Database Design",
titleWords: ["database","design","database design"],
credits: 10
}
Consider using a RegEx to transform words to lowercase and remove punctuation.
Strip out stop words like “to”, “the”, “of” etc.
Denormalize keywords in to key phrases
SELECT books.title
FROM books
WHERE ARRAY_CONTAINS(books.titleWords, "database")
{
id: "...",
timestampMinute: "...",
readings: [
{minute:0, reading:123},
{minute:1, reading:456},...
{minute:59,reading:999}
]
}
{
id: "...",
timestamp: "...",
logData: {attr1: value1, attr2: value2, ...}
}
Ever growing data, ever shrinking IT
End-users
DBAs
Storage Admins
& TDM’s