* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download DATA - PASS
Extensible Storage Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Clusterpoint wikipedia , lookup
Relational model wikipedia , lookup
Yossi Elkayam Sr. BI & Azure Architect Microsoft Services [email protected] Age of data BI Database http://db-engines.com/en/ranking Cloud Relational Beyond Relational Azure Data Lake SQL Server Azure VM HDInsight Azure SQL DB DocumentDB Azure SQL DW Microsoft Data Platform Power BI Azure Machine Learning Azure Data Factory On-premises Federated Query SQL Server APS Comprehensive Connected SQL Server Choice Cortana Analytics Azure Internet of Things (IoT) Suite & Dynamics Devices RTOS, Linux, Windows, Android, iOS Batch Analytics & Visualizations Azure HDInsight, AzureML, Power BI, Azure Data Factory Protocol Adaptation Hot Path Analytics Field Gateway Azure Stream Analytics, Azure HDInsight Storm Protocol Adaptation Presentation & Business Connectivity App Service, Websites Cloud Gateway Field Gateway Event Hubs & IoT Hub Device Connectivity & Management Hot Path Business Logic Service Fabric & Actor Framework Dynamics, BizTalk Services, Notification Hubs Analytics & Operationalized Insights Presentation & Business Connectivity 7 Streaming Analytics Event producers Collection Event Queuing System Transformation Azure ML Long-term storage Live Dashboards Web/thick client dashboards Applications Devices Cloud gateways (web APIs) Storm HDInsight Event hubs Stream Analytics Sensors Kafka/RabbitMQ/ ActiveMQ Web and social Field gateways Storage adapters Stream processing Presentation and action Apache HBase on HDInsight DocumentDB Solr Azure Search MongoDB SQL Search and query Data analytics (Excel) Event hub Devices to take action IoT Scenario - Connected Cars / Devices DocumentDB Document Store HBase SQL Azure No SQL Store Relational Store Event Hubs Queue Service PowerBI Get Data Cloud gateways Apache Storm Get Reference Data Business Logic Store Raw Data Store Reporting Data Queue Service Event Hubs Live Dashboard CLOUD MICROSOFT DATAPLATFORM PLATFORM MICROSOFT DATA DATA VISUALIZATION Analyze and Authoring ON PREMISES SQL Server in Azure VM AG -Async replica SQL Database + Elastic Scale LRS – Geo-Replication BI and Advanced Analytics Database Engine Buffer Pool Ext. Query Store StretchDB Columnstore Row Store Resource Governor Analysis Services Tabular, Multi-Dimensional PowerPivot, BISM, Data Mining, KPI, BISM Reporting Services SQL Agent, Database Mail, Linked Servers, Managed Backup, Backup to Azure Redis Cache DocumentDB Transactional Replication In-Memory Row level security, Transparent Data Encryption, Always Encrypted, Data Masking, Auditing, Compliance Analytical Reports Microsoft Excel Power BI Desktop Mobile Reports SQL Server Database Engine Relational, XML, JSON, Spatial, FullText, Binary, Image, FileTable, Filestream Azure Marketplace Native, SharePoint Integrated Table Storage Blob Storage Azure HDInsight Map Reduce, Pig, Hive, Hbase, Storm, Spark Mobile Report Publisher Paginated Reports Report Builder Report Designer Delivery MPP Azure SQL Data Warehouse Azure Data Lake Store R Services Cloud Power BI Service On-Premises Information Management & Data Orchestration Data Quality Services Master Data Services Integration Services HA/DR AlwaysOn Data Warehousing Azure Data Factory Dimensional Modeling, Star, Snowflake, Polybase Replication Log Shipping Reference Architectures Appliances Polybase APS Massively Parallel Processing Common Tools SQL Server Management Studio SQL Server Data Tools SQL Reporting Services Azure Search Consume AG Polybase Physical / Virtual Deployment FCI Azure Data Catalog Command Line (PowerShell, BCP, SQLCMD) Power BI Web Portal Event Hub Stream Analytics Azure Data Lake Analytics Azure ML Windows Phone App Android App iOS App Reporting Services Portal Migration & Upgrade Tools (SSMA, Upgrade Advisor, Map Toolkit) SharePoint Cortana “Big Data Reality Framework…” http://static.googleusercontent.com/exte rnal_content/untrusted_dlcp/research.goo gle.com/en/us/pubs/archive/fdp.41344 סיפור אישי Yossi Elkayam Sr. BI & Azure Architect Microsoft Services [email protected] Ever growing data, ever shrinking IT End-users DBAs Storage Admins & TDM’s ...להמציא את עצמנו מחדש Yossi Elkayam Sr. BI & Azure Architect Microsoft Services [email protected] SQL Server 2016: Everything built-in built-in built-in built-in 1 0 0 -10 6 2 3 0 1 -40 5 3 5 22 20 15 3 Tableau Oracle $120 SQL Server SQL Server 18 built-in built-in Microsoft 6 0 4 -20 -30 4 built-in $480 SQL Server 22 29 34 43 -50 49 -60 -70 #1 69 -80 SQL Server Oracle MySQL2 SAP HANA #2 #3 Oracle is #4 TPC-H non-clustered 10TB $2,230 Self-service BI per user In-memory across all workloads Consistent experience from on-premises to cloud The above graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Microsoft. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose. National Institute of Standards and Technology Comprehensive Vulnerability Database update 10/2015 TPC-H non-clustered results as of 04/06/15, 5/04/15, 4/15/14 and 11/25/13, respectively. http://www.tpc.org/tpch/results/tpch_perf_results.asp?resulttype=noncluster at massive scale SQL Data Warehouse Architecture Application or User connection SQL DB Control – “The Brain” An endpoint for connection and tools. Coordinates storage/compute activity. DMS Data Loading Control (PolyBase, ADF, SSIS, REST, Node OLE, ODBC, ADF, AZCopy, PS) Massively Parallel Processing (MPP) Engine SQL DB DMS SQL DB DMS SQL DB DMS SQL DB DMS Compute Compute Compute Compute Node Node Node Node Azure Infrastructure and Storage Blob storage [WASB(S)] Compute – “The Brawn” Handles query processing, ability to scale up/ down Data Movement Services Coordinates data movement from nodes/storage Storage Add\Load data to WASB(S) without incurring compute costs How does Stretch work? Internet boundary Source Database Hot/Active Data TRICKLE MIGRATION Cold/Historical Closed Data Source SQL Server Creates a secure connection between the Source SQL Server and Azure Remote Database Remote Table Provisions remote instance and begins migration Apps and Queries continue to run for both the local database and remote endpoint Security controls and maintenance remain local Microsoft Azure Machine Learning (Mahout) Query (Hive) Distributed Processing (MapReduce) Distributed Storage (HDFS) Data Integration (Sqoop/REST/ODBC) Scripting (Pig) NoSQL Database (HBase) Workflow &Scheduling (Oozie) Coordination (ZooKeeper) Management & Monitoring (Ambari) Azure Data Lake Batch, real-time and interactive analytics made easy Azure Data Lake Analytics service Managed clusters (HDInsight) ` YARN WebHDFS Store Unstructured Semi-Structured Structured Principles • Maximize return on accessible data • Reduce time to value • Reduce time to insight Approach • Productivity day one (Developers, Scientists, Analysts) • Open (Yarn, HDFS), designed for the cloud • All data available for analysis • Leverages existing skills, use SQL, Spark, Hive, Storm, Hbase • Dynamically scales to meet your business objectives • Managed and supported with an enterprise grade SLA Ingest all data regardless of requirements Devices Store all data in native format without schema definition Do analysis Using analytic engines like Hadoop Batch queries Interactive queries Real-time analytics Machine Learning Data warehouse PolyBase Query relational and non-relational data with T-SQL Quote: ************************ T-SQL query ********************** ********************* ********************** *********************** SQL Server Name DOB Denny Usher 11/13/58 Usher Gina Burch 04/29/76 State WA ME Hadoop \ Data Lake Store $658.39 CREATE EXTERNAL DATA SOURCE HadoopCluster WITH (TYPE = Hadoop, LOCATION = 'hdfs://10.193.26.177:8020', RESOURCE_MANAGER_LOCATION = '10.193.26.178:8050'); Once per Hadoop Cluster CREATE EXTERNAL FILE FORMAT TextFile WITH ( FORMAT_TYPE = DELIMITEDTEXT, DATA_COMPRESSION = 'org.apache.hadoop.io.compress.GzipCodec', FORMAT_OPTIONS (FIELD_TERMINATOR ='|', USE_TYPE_DEFAULT = TRUE)); Once per File Format CREATE EXTERNAL TABLE [dbo].[Customer] ( [SensorKey] int NOT NULL, int NOT NULL, [Speed] float NOT NULL ) WITH (LOCATION='//Sensor_Data//May2014/sensordata.tbl', DATA_SOURCE = HadoopCluster, FILE_FORMAT = TextFile ) HDFS File Path CREATE DATABASE SCOPED CREDENTIAL HadoopCredential WITH IDENTITY = 'hadoopUserName', Secret = 'hadoopPassword'; Once per Hadoop User CREATE EXTERNAL DATA SOURCE HadoopCluster WITH (TYPE = Hadoop, LOCATION = 'hdfs://10.193.26.177:8020', RESOURCE_MANAGER_LOCATION = '10.193.26.178:8050', HadoopCredential); Once per Hadoop Cluster per user CREATE EXTERNAL FILE FORMAT TextFile WITH ( FORMAT_TYPE = DELIMITEDTEXT, DATA_COMPRESSION = 'org.apache.hadoop.io.compress.GzipCodec', FORMAT_OPTIONS (FIELD_TERMINATOR ='|', USE_TYPE_DEFAULT = TRUE)); Once per File Format CREATE EXTERNAL TABLE [dbo].[Customer] ( [SensorKey] int NOT NULL, int NOT NULL, [Speed] float NOT NULL ) WITH (LOCATION='//Sensor_Data//May2014/', DATA_SOURCE = HadoopCluster, FILE_FORMAT = TextFile ); HDFS File Path Federated queries: Query data where it lives Easily query data in multiple Azure data stores without moving it to a single store Benefits • Avoid moving large amounts of data across the network between stores • Single view of data irrespective of physical location • Minimize data proliferation issues caused by maintaining multiple copies • Single query language for all data • Each data store maintains its own sovereignty • Design choices based on the need Azure Storage Blobs Query U-SQL Query Azure SQL in VMs Azure Data Lake Analytics • Push SQL expressions to remote SQL sources • Filters • Joins Azure SQL DB Azure SQL Data Warehouse Handling variety of data and model changes Information are stored in JSON format Support complex analysis on JSON documents Pillars Benefits Drivers Modern services exchange data in JSON format Fast built-in JSON/relational data conversion Combination of relational and JSON data The power of T-SQL and SQL Server engine Integration with all SQL Server components Built-in functions ISJSON JSON_VALUE JSON_MODIFY JSON_QUERY [ { }, { ] } "Number":"SO43659", "Date":"2011-05-31T00:00:00" "AccountNumber":"AW29825", "Price":59.99, "Quantity":1 "Number":"SO43661", "Date":"2011-06-01T00:00:00“ "AccountNumber":"AW73565“, "Price":24.99, "Quantity":3 OPENJSON Transforms JSON text to table SO43659 2011-05-31T00:00:00 MSFT 59.99 1 SO43661 2011-06-01T00:00:00 Nokia 24.99 3 FOR JSON Formats result set as JSON text. OPENJSON { "name": "Microsoft", "homepage_url": "www.microsoft.com", "blog_url": "blogs.microsoft.com/", "products": [ { "name": "Azure", "permalink": "azure.com" } ], "offices": [ { "address1": "1 Redmond Way", "zip_code": "98052", "city": "Redmond", "state_code": "WA", "country_code": "USA" } ] Part of NoSQL family Built for simplicity cale and performance Non-relational o enforced schema } Part of NoSQL family Built for simplicity cale and performance Non-relational o enforced schema { "id": "itemdata2344", "data": "TWFuIGlzIGRpc3Rpbmd1aXNoZWQsI G5vdCBvbmx5IGJ5IGhpcyByZWFzb24 sIGJ1dCBieSB0aGlzIHNpbmd1bGFyIHB c3Npb24gZnJvbSBvdGhlcibmltYWxz LCB3aGljaCBpcyBhIGx1c3Qgb2YdGhlI G1pbmQsIHRoYXQgYnkgYSBwZXJzZX ZlcmFuY2Ugb2YgZGVsaWdodCBpb B0aGUgY29udGludWVkIGFuZCBpbm RlZmF0aWdhYmxlIGdlbmVyYXRpb24 gb2Yga25vd2xlZGdlLCBleGNlZWRzIH RZSBzaG9ydCB2ZWhlbWVuY2Ugb2 YgYW55IGNhcm5hbCBwbhc3VyZ4==" } Part of NoSQL family Built for simplicity cale and performance Non-relational o enforced schema { Jill Ben Susan Andrew Sven Thomas {id: {id: {id: {id: {id: {id: "Jill" }, "Ben", manager: "Jill" }, "Susan", manager: "Jill" }, "Andrew", manager: "Ben" }, "Sven", manager: "Susan" }, "Thomas", manager: "Sven" } } To get the manager of any employee is trivial - SELECT manager FROM org WHERE id = "Susan" { id: "CDC101", title: “The Fundamentals of Database Design", titleWords: ["database","design","database design"], credits: 10 } Consider using a RegEx to transform words to lowercase and remove punctuation. Strip out stop words like “to”, “the”, “of” etc. Denormalize keywords in to key phrases SELECT books.title FROM books WHERE ARRAY_CONTAINS(books.titleWords, "database") { id: "...", timestampMinute: "...", readings: [ {minute:0, reading:123}, {minute:1, reading:456},... {minute:59,reading:999} ] } { id: "...", timestamp: "...", logData: {attr1: value1, attr2: value2, ...} } Ever growing data, ever shrinking IT End-users DBAs Storage Admins & TDM’s