Download Start_to_Finish_with_Azure_Data_Factory

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oracle Database wikipedia , lookup

Microsoft Access wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Tandem Computers wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Btrieve wikipedia , lookup

Team Foundation Server wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Functional Database Model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

SQL wikipedia , lookup

PL/SQL wikipedia , lookup

Transcript
Andy Roberts
Data Architect
[email protected]
 Session Objectives
 Understand where ADF fits in Cortana Analytics
 Understand how ADF Works, and its components
 Be able to deploy and manage a simple ADF implementation
 Key Takeaway:
 ADF can be used in real world data pipeline scenarios, quickly
and easily
A Suite of Products that allow
you to Predict
Outcomes, Prescribe Actions
and Automate Decisions
Cortana
Power BI
Azure Stream Analytics
Azure HDInsight
Azure Machine Learning
Azure SQL DB, Data Warehouse, DocumentDB
Azure Data Lake
Azure Event Hubs
Azure Data Catalog
Azure Data Factory
Microsoft Azure
Transform
Store
Analyze
Orchestrate
Cortana Analytics Process:
https://tinyurl.com/caprocess
Ingest
Act
Create, orchestrate, and manage
data movement and enrichment
through the cloud
ADF Components
ADF Logical Flow
ADF Process
1. Define Architecture: Set up objectives and flow
2. Create the Data Factory: Portal, PowerShell, VS
3. Create Linked Services: Connections to Data and
Services
4. Create Datasets: Input and Output
5. Create Pipeline: Define Activities
6. Monitor and Manage: Portal or PowerShell, Alerts
and Metrics
Define data sources,
processing requirements,
and output – also
management and
monitoring
Example - Churn
Azure Data
Factory:
Data Sources
Call Log Files
Ingest
Transform & Analyze
Publish
Call Log Files
Customer Table
Customer Table
Customer
Call Details
Customers
Likely to
Churn
Customer
Churn Table
Our ADF:
• Business Goal: Transform and Analyze Web Logs
each month
• Design Process: Transform Raw Weblogs stored in
a temporary location, using a Hive Query, storing
the results in Blob Storage
Web
Logs in
HDFS
File store
Files ready
for analysis
and use in
AzureML
Portal, PowerShell
and Visual Studio
Using the
Portal
• Use in Non-MS Clients
• Use for Exploration
• Use when teaching or in a Demo
Using
PowerShell
• Use in MS Clients
• Use for Automation
• Use for quick set up and tear down
PowerShell ADF Example
1.
2.
3.
4.
5.
6.
Run Add-AzureAccount and enter the user name and password
Run Get-AzureSubscription to view all the subscriptions for this
account.
Run Select-AzureSubscription to select the subscription that
you want to work with.
Run Switch-AzureMode AzureResourceManager
Run New-AzureResourceGroup -Name
ADFTutorialResourceGroup -Location "West US"
Run New-AzureDataFactory -ResourceGroupName
ADFTutorialResourceGroup –Name DataFactory(your
alias)Pipeline –Location "West US"
Using
Visual
Studio
• Use in mature dev environments
• Use when integrated into larger
development process
Connection to Data or
Connection to Compute
Resource – Also termed
“Data Store”
Data Options
Source
Blob
Table
SQL Database
SQL Data Warehouse
DocumentDB
Data Lake Store
SQL Server on IaaS
OnPrem File System
OnPrem SQL Server
OnPrem Oracle Database
OnPrem MySQL Database
OnPrem DB2 Database
OnPrem Teradata Database
OnPrem Sybase Database
OnPrem PostgreSQL Database
Sink
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
OnPrem File System, Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
OnPrem File System, Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
System, Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
SQL Server on IaaS, DocumentDB,
SQL Server on IaaS, DocumentDB,
SQL Server on IaaS, DocumentDB,
SQL Server on IaaS, DocumentDB,
SQL Server on IaaS, DocumentDB,
SQL Server on IaaS, Data Lake Store
SQL Server on IaaS, OnPrem File
SQL Server on IaaS,
SQL Server on IaaS,
SQL Server on IaaS,
SQL Server on IaaS,
SQL Server on IaaS,
SQL Server on IaaS,
SQL Server on IaaS,
Data Lake Store
Data Lake Store
Data Lake Store
Data Lake Store
Data Lake Store
Data Lake Store
Data Lake Store
Activity Options
Transformation activity
Hive
Pig
MapReduce
Hadoop Streaming
Machine Learning activities: Batch
Execution and Update Resource
Stored Procedure
Data Lake Analytics U-SQL
DotNet
Compute environment
HDInsight [Hadoop]
HDInsight [Hadoop]
HDInsight [Hadoop]
HDInsight [Hadoop]
Azure VM
Azure SQL
Azure Data Lake Analytics
HDInsight [Hadoop] or Azure Batch
Named reference
or pointer to data
Dataset Concepts
{
"name": "<name of dataset>",
"properties":
{
"structure": [ ],
"type": "<type of dataset>",
"external": <boolean flag to indicate external data>,
"typeProperties":
{
},
"availability":
{
},
"policy":
{
}.
}
Logical Grouping
of Activities
Pipeline Concepts
{
}
"name": "PipelineName",
"properties":
{
"description" : "pipeline description",
"activities":
[
}
],
"start": "<start date-time>",
"end": "<end date-time>"
Scheduling, Monitoring,
Disposition
Locating Failures within a Pipeline