* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Start_to_Finish_with_Azure_Data_Factory
Survey
Document related concepts
Oracle Database wikipedia , lookup
Microsoft Access wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Tandem Computers wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Team Foundation Server wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Clusterpoint wikipedia , lookup
Database model wikipedia , lookup
Relational model wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Transcript
Andy Roberts Data Architect [email protected] Session Objectives Understand where ADF fits in Cortana Analytics Understand how ADF Works, and its components Be able to deploy and manage a simple ADF implementation Key Takeaway: ADF can be used in real world data pipeline scenarios, quickly and easily A Suite of Products that allow you to Predict Outcomes, Prescribe Actions and Automate Decisions Cortana Power BI Azure Stream Analytics Azure HDInsight Azure Machine Learning Azure SQL DB, Data Warehouse, DocumentDB Azure Data Lake Azure Event Hubs Azure Data Catalog Azure Data Factory Microsoft Azure Transform Store Analyze Orchestrate Cortana Analytics Process: https://tinyurl.com/caprocess Ingest Act Create, orchestrate, and manage data movement and enrichment through the cloud ADF Components ADF Logical Flow ADF Process 1. Define Architecture: Set up objectives and flow 2. Create the Data Factory: Portal, PowerShell, VS 3. Create Linked Services: Connections to Data and Services 4. Create Datasets: Input and Output 5. Create Pipeline: Define Activities 6. Monitor and Manage: Portal or PowerShell, Alerts and Metrics Define data sources, processing requirements, and output – also management and monitoring Example - Churn Azure Data Factory: Data Sources Call Log Files Ingest Transform & Analyze Publish Call Log Files Customer Table Customer Table Customer Call Details Customers Likely to Churn Customer Churn Table Our ADF: • Business Goal: Transform and Analyze Web Logs each month • Design Process: Transform Raw Weblogs stored in a temporary location, using a Hive Query, storing the results in Blob Storage Web Logs in HDFS File store Files ready for analysis and use in AzureML Portal, PowerShell and Visual Studio Using the Portal • Use in Non-MS Clients • Use for Exploration • Use when teaching or in a Demo Using PowerShell • Use in MS Clients • Use for Automation • Use for quick set up and tear down PowerShell ADF Example 1. 2. 3. 4. 5. 6. Run Add-AzureAccount and enter the user name and password Run Get-AzureSubscription to view all the subscriptions for this account. Run Select-AzureSubscription to select the subscription that you want to work with. Run Switch-AzureMode AzureResourceManager Run New-AzureResourceGroup -Name ADFTutorialResourceGroup -Location "West US" Run New-AzureDataFactory -ResourceGroupName ADFTutorialResourceGroup –Name DataFactory(your alias)Pipeline –Location "West US" Using Visual Studio • Use in mature dev environments • Use when integrated into larger development process Connection to Data or Connection to Compute Resource – Also termed “Data Store” Data Options Source Blob Table SQL Database SQL Data Warehouse DocumentDB Data Lake Store SQL Server on IaaS OnPrem File System OnPrem SQL Server OnPrem Oracle Database OnPrem MySQL Database OnPrem DB2 Database OnPrem Teradata Database OnPrem Sybase Database OnPrem PostgreSQL Database Sink Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, OnPrem File System, Data Lake Store Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, Data Lake Store Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, Data Lake Store Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, Data Lake Store Blob, Table, SQL Database, SQL Data Warehouse, Data Lake Store Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, OnPrem File System, Data Lake Store Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, System, Data Lake Store Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server, SQL Server on IaaS, DocumentDB, SQL Server on IaaS, DocumentDB, SQL Server on IaaS, DocumentDB, SQL Server on IaaS, DocumentDB, SQL Server on IaaS, DocumentDB, SQL Server on IaaS, Data Lake Store SQL Server on IaaS, OnPrem File SQL Server on IaaS, SQL Server on IaaS, SQL Server on IaaS, SQL Server on IaaS, SQL Server on IaaS, SQL Server on IaaS, SQL Server on IaaS, Data Lake Store Data Lake Store Data Lake Store Data Lake Store Data Lake Store Data Lake Store Data Lake Store Activity Options Transformation activity Hive Pig MapReduce Hadoop Streaming Machine Learning activities: Batch Execution and Update Resource Stored Procedure Data Lake Analytics U-SQL DotNet Compute environment HDInsight [Hadoop] HDInsight [Hadoop] HDInsight [Hadoop] HDInsight [Hadoop] Azure VM Azure SQL Azure Data Lake Analytics HDInsight [Hadoop] or Azure Batch Named reference or pointer to data Dataset Concepts { "name": "<name of dataset>", "properties": { "structure": [ ], "type": "<type of dataset>", "external": <boolean flag to indicate external data>, "typeProperties": { }, "availability": { }, "policy": { }. } Logical Grouping of Activities Pipeline Concepts { } "name": "PipelineName", "properties": { "description" : "pipeline description", "activities": [ } ], "start": "<start date-time>", "end": "<end date-time>" Scheduling, Monitoring, Disposition Locating Failures within a Pipeline