Download Johan Åhlén

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft SQL Server wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Big data wikipedia , lookup

Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
The New Possibilities in Microsoft
Business Intelligence
Johan Åhlén & Tim Peterson, SolidQ
Guest speakers: Tim Mallalieu & Miguel Llopis, Microsoft
• "Information is the Oil of
the 21st Century - BI
and Analytics are the
Refinery” (Gartner)
Presenters
•
Johan Åhlén
– SolidQ Mentor & Sweden CEO
– President, Swedish SQL Server User Group
– Microsoft SQL Server MVP
– Blog: www.joinsights.com
•
Tim Peterson
– SolidQ Mentor & Nordic Board Member
– Co-author of the SSAS 2008 R2 Maestros course
– Blog: http://timpetersonbi.wordpress.com
Data Explorer presenters
• Timothy Mallalieu
– Group Program Manager, Cloud Data Services Team
– Microsoft
– Blog: http://blogs.msdn.com/timmall
• Miguel Llopis
– Program Manager, Cloud Data Services Team
– Microsoft
– Blog: http://blogs.msdn.com/mllopis
Challenge: New data sources
TOMORROW
TODAY
YESTERDAY
What was the
result ?
How does the
customer see us ?
How can we
continue to
succeed ?
Efficient
processes?
Business Processes
Customers
Finance
VISION/
STRATEGY
Social Media
Competitor Data
etc
Challenge: Data explosion
• World wide information stored
volume is at least doubling
each year. (EMC)
• 87% of performance issues
in application databases are
related in some way to
data growth. (OAUG)
Challenge: The BI dilemma
Operational Analytics
Data Warehouse / ETL
Management’s
Perceived value
Developer’s
Effort
Scorecards and Dashboards
The New Possibilities
New Data
Sources
Big Data
Windows Azure
Marketplace
PDW
Self-service BI
PowerPivot
Power View
Codename Data
Explorer
Hadoop (not
covered in this
session)
Codename Data
Explorer
End-to-end self service BI
DEMO
The Business Intelligence Semantic Model
• The Past - The Unified Data Model (UDM) in
Analysis Services 2005/2008
• The Future – The Business Intelligence
Semantic model in Analysis Services 2012
– Multidimensional model
– Tabular model
Upgrading to BISM
• Upgrading to 2012 BISM Multidimensional
– Almost no change from Analysis Services 2008
– No preparation needed
– Some improvements
• Upgrading to 2012 BISM Tabular
– Very different structure
– Standard recommendation – start over!
Tabular/Multidimensional Differences
Calculations
Querying
Use with Crescent
In-Memory
Aggregations
Multidimensional
MDX
MDX
No
No
Yes (optional)
Tabular
DAX
DAX or MDX
Yes
Yes - as option
No
Querying Relational
Database
Yes - as option
(ROLAP/HOLAP)
Yes -as option
(Direct Query)
Client Choice Direct Query No
Yes - as option
Multidimensional/Tabular Advantages
Multidimensional
Speed
Scalability
Ease of Use
Migration from AS2008
Integration with
PowerPivot in Excel
Tabular
X – When In-Memory
X – MOLAP scales more
than Vertipaq
X – More like relational,
DAX like Excel formulas,
less tuning needed
X – Almost no change
X – Uses the same
Vertipaq engine
Advantages (Continued)
Multidimensional
Use with Crescent
Multidimensional Logic
Querying Relational
Database – Ease of Use
Querying Relational
Database - Logic
Tabular
X – Only option for now
X – More with MDX
X – Direct Query appears
to be easier than ROLAP
X – Direct Query supports
limited DAX logic
Migrating from AS2008 Cubes to 2012 BISM
Tabular Model
DEMO
The Parallel Data Warehouse
• Large capacity data warehouse
– 100’s of terabytes
– Massive Parallel Processing
• Sold as an appliance
– Software/hardware package
– Multiple servers running the SQL Server database engine
– Pre-configured, centrally managed, so it is manageable
PDW Configuration
• Control Rack
–
–
–
–
Control nodes
Management Nodes
Landing Zone
Backup Nodes
• 1-4 Data Racks
– Compute Nodes
– Storage Nodes
PDW Data Racks
• Each rack has 10 active nodes and 1 passive
node (in case one of the other nodes fails)
• Each node has 16 processors
• Each node receives 8 distributions (instances of
a distribute table)
• A full 4 data rack system has 320 distributions
– 4 racks X 10 nodes X 8 distributions
How the processing is distributed
• Replicated Tables
– Full copy with all data created on every node
– Used for dimension tables
• Distributed Tables
– Table created on every node, each with a portion of the
data
– Data divided as evenly as possible
• Use a hash function on a key with a large number of values
– Used for fact tables (and very large dimension tables)
Three Types of PDW Joins
• Ultra Shared-Nothing Join
– Join made between a distributed table and a repliated table
– Fully local on every node
• Shared-Nothing Join
– Join made between two distributed tables with compatible
distribution keys
• Redistribution (or Shuffle) Join
– Join made between two distributed tables that do not have
compatible distribution keys
Speed of Joins
• At TechNet in May a demo was done comparing
a Shared-Nothing Join and a Redistribution Join
– 6 billion rows joined with 1.5 billion rows
– Only difference between the two demos was that one
had compatible distribution keys and the other did not
• Shared-Nothing Join took 3 seconds
• Redistribution Join took 3 minutes
PDW Database Design
• If you have a multidimensional data structure (star
schema), your design is almost done
– Replicate the dimension tables
– Distribute the fact tables
• If you have one large dimension table, you can
distribute the fact tables along the same key as the
dimension table
– You will still have excellent performance
How Do You Get Speed in Retrieving Data?
• Create good indexes
• Put data into a multidimensional database
• Add aggregation tables in the relational database or
aggregations in the multidimensional database
• Create a better type of index for data retrieval
(columnar)
• Put all the data into memory and compress it
(Vertipaq)
Speed – The PDW Solution
• Use Massively Parallel Processing
– Divide the data into small parts
– Retrieve the data from each of the parts
– Combine all the results together
• MPP gives the most effective result when you have
a very large amount of data
• And you can still use indexes to improve
performance further
– Columnar indexes in Denali
Using PDW with Analysis Services
• Using the multidimensional model with ROLAP
• Using the multidimensional model with HOLAP
• Using the tabular model with Direct Query
Microsoft’s Vision for Cloud Data Services
Microsoft Codename “Data Explorer”
Add & Manage
Data Sources
Classify
Transform
Snapshot
Understand
Mash up
Publish
Recommend
Cleanse
Sell
http://www.microsoft.com/en-us/sqlazurelabs/labs/dataexplorer.aspx
Demo - codename “Data Explorer”
DEMO
Learn more
• Power View
– http://joinsights.com/tag/power-view/
• Migrating to BISM Tabular
– Link to Tim’s whitepaper
• Windows Azure Data Market
– https://datamarket.azure.com/
• Parallel Data Warehouse (PDW)
– http://www.microsoft.com/sqlserver/en/us/editions/data-warehouse.aspx
• Codename “Data Explorer”
– http://blogs.msdn.com/b/dataexplorer
THANK YOU!
For attending this session and
PASS SQLRally Nordic 2011, Stockholm