* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Johan Åhlén
Survey
Document related concepts
Transcript
The New Possibilities in Microsoft Business Intelligence Johan Åhlén & Tim Peterson, SolidQ Guest speakers: Tim Mallalieu & Miguel Llopis, Microsoft • "Information is the Oil of the 21st Century - BI and Analytics are the Refinery” (Gartner) Presenters • Johan Åhlén – SolidQ Mentor & Sweden CEO – President, Swedish SQL Server User Group – Microsoft SQL Server MVP – Blog: www.joinsights.com • Tim Peterson – SolidQ Mentor & Nordic Board Member – Co-author of the SSAS 2008 R2 Maestros course – Blog: http://timpetersonbi.wordpress.com Data Explorer presenters • Timothy Mallalieu – Group Program Manager, Cloud Data Services Team – Microsoft – Blog: http://blogs.msdn.com/timmall • Miguel Llopis – Program Manager, Cloud Data Services Team – Microsoft – Blog: http://blogs.msdn.com/mllopis Challenge: New data sources TOMORROW TODAY YESTERDAY What was the result ? How does the customer see us ? How can we continue to succeed ? Efficient processes? Business Processes Customers Finance VISION/ STRATEGY Social Media Competitor Data etc Challenge: Data explosion • World wide information stored volume is at least doubling each year. (EMC) • 87% of performance issues in application databases are related in some way to data growth. (OAUG) Challenge: The BI dilemma Operational Analytics Data Warehouse / ETL Management’s Perceived value Developer’s Effort Scorecards and Dashboards The New Possibilities New Data Sources Big Data Windows Azure Marketplace PDW Self-service BI PowerPivot Power View Codename Data Explorer Hadoop (not covered in this session) Codename Data Explorer End-to-end self service BI DEMO The Business Intelligence Semantic Model • The Past - The Unified Data Model (UDM) in Analysis Services 2005/2008 • The Future – The Business Intelligence Semantic model in Analysis Services 2012 – Multidimensional model – Tabular model Upgrading to BISM • Upgrading to 2012 BISM Multidimensional – Almost no change from Analysis Services 2008 – No preparation needed – Some improvements • Upgrading to 2012 BISM Tabular – Very different structure – Standard recommendation – start over! Tabular/Multidimensional Differences Calculations Querying Use with Crescent In-Memory Aggregations Multidimensional MDX MDX No No Yes (optional) Tabular DAX DAX or MDX Yes Yes - as option No Querying Relational Database Yes - as option (ROLAP/HOLAP) Yes -as option (Direct Query) Client Choice Direct Query No Yes - as option Multidimensional/Tabular Advantages Multidimensional Speed Scalability Ease of Use Migration from AS2008 Integration with PowerPivot in Excel Tabular X – When In-Memory X – MOLAP scales more than Vertipaq X – More like relational, DAX like Excel formulas, less tuning needed X – Almost no change X – Uses the same Vertipaq engine Advantages (Continued) Multidimensional Use with Crescent Multidimensional Logic Querying Relational Database – Ease of Use Querying Relational Database - Logic Tabular X – Only option for now X – More with MDX X – Direct Query appears to be easier than ROLAP X – Direct Query supports limited DAX logic Migrating from AS2008 Cubes to 2012 BISM Tabular Model DEMO The Parallel Data Warehouse • Large capacity data warehouse – 100’s of terabytes – Massive Parallel Processing • Sold as an appliance – Software/hardware package – Multiple servers running the SQL Server database engine – Pre-configured, centrally managed, so it is manageable PDW Configuration • Control Rack – – – – Control nodes Management Nodes Landing Zone Backup Nodes • 1-4 Data Racks – Compute Nodes – Storage Nodes PDW Data Racks • Each rack has 10 active nodes and 1 passive node (in case one of the other nodes fails) • Each node has 16 processors • Each node receives 8 distributions (instances of a distribute table) • A full 4 data rack system has 320 distributions – 4 racks X 10 nodes X 8 distributions How the processing is distributed • Replicated Tables – Full copy with all data created on every node – Used for dimension tables • Distributed Tables – Table created on every node, each with a portion of the data – Data divided as evenly as possible • Use a hash function on a key with a large number of values – Used for fact tables (and very large dimension tables) Three Types of PDW Joins • Ultra Shared-Nothing Join – Join made between a distributed table and a repliated table – Fully local on every node • Shared-Nothing Join – Join made between two distributed tables with compatible distribution keys • Redistribution (or Shuffle) Join – Join made between two distributed tables that do not have compatible distribution keys Speed of Joins • At TechNet in May a demo was done comparing a Shared-Nothing Join and a Redistribution Join – 6 billion rows joined with 1.5 billion rows – Only difference between the two demos was that one had compatible distribution keys and the other did not • Shared-Nothing Join took 3 seconds • Redistribution Join took 3 minutes PDW Database Design • If you have a multidimensional data structure (star schema), your design is almost done – Replicate the dimension tables – Distribute the fact tables • If you have one large dimension table, you can distribute the fact tables along the same key as the dimension table – You will still have excellent performance How Do You Get Speed in Retrieving Data? • Create good indexes • Put data into a multidimensional database • Add aggregation tables in the relational database or aggregations in the multidimensional database • Create a better type of index for data retrieval (columnar) • Put all the data into memory and compress it (Vertipaq) Speed – The PDW Solution • Use Massively Parallel Processing – Divide the data into small parts – Retrieve the data from each of the parts – Combine all the results together • MPP gives the most effective result when you have a very large amount of data • And you can still use indexes to improve performance further – Columnar indexes in Denali Using PDW with Analysis Services • Using the multidimensional model with ROLAP • Using the multidimensional model with HOLAP • Using the tabular model with Direct Query Microsoft’s Vision for Cloud Data Services Microsoft Codename “Data Explorer” Add & Manage Data Sources Classify Transform Snapshot Understand Mash up Publish Recommend Cleanse Sell http://www.microsoft.com/en-us/sqlazurelabs/labs/dataexplorer.aspx Demo - codename “Data Explorer” DEMO Learn more • Power View – http://joinsights.com/tag/power-view/ • Migrating to BISM Tabular – Link to Tim’s whitepaper • Windows Azure Data Market – https://datamarket.azure.com/ • Parallel Data Warehouse (PDW) – http://www.microsoft.com/sqlserver/en/us/editions/data-warehouse.aspx • Codename “Data Explorer” – http://blogs.msdn.com/b/dataexplorer THANK YOU! For attending this session and PASS SQLRally Nordic 2011, Stockholm