Download SharePoint 2010 Business Intelligence Module 6: Analysis Services

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
SharePoint 2010
Business Intelligence
Module 6: Analysis Services
Overview
Analysis Services
Lesson: Analysis Services
Introduction
ETL
OLAP Terms
Storage Modes
Queries
Tools
Mining Models
Introduction
Analysis Services provides access to large data sets
Running SQL queries against a 100 million row table just
doesn’t work
 When your data sets get large, you need a better way of
handling the load
Online analytical processing (OLAP) provides all your
answers with speed!
 Analysis services is an OLAP implementation
ETL
The Extract, Transform and Load (ETL) process is vital
to the OLAP results
 If you input junk, you get junk back out!
 All data that enters the OLAP database must be valid or
your results could be exponentially wrong!
Make sure that all possible means are employed when
ensuring only valid data is entered into the system and
that it is entered only once!
 Common issue is data inserted more than once which
causes invalid results
OLAP Terms
Data Source
 A source or destination of data
Fact table
 A table that contains numerical information with keys that map to the dimension
keys
Measure
 A single numerical value in a fact table
Dimension
 A set of labels/attributes that describe the measures in a fact table
Cube
 A set of aggregations of all the dimensions and facts complied together to
produce valuable information
Perspective
 A subset of dimensions and measures specific to some group of users
Data Modeling
 Using statistical analysis to determine patterns in large sets of data
Tools
Visual Studio / BI Workbench
 Several project templates to facilitate the creation of
cubes, dimensions and work with data sources/views
IntelliCube
 A heuristic analysis tool for automatic generation of a
cube based on its data and relationships
External Viewer for
Outliers
Candidate keys
Value distributions
Patterns
Business Intelligence Development Studio
BI Studio is just Visual Studio with project templates
installed
Project templates provide item templates and wizards
 New 2008 Wizards are much easier to use and more
powerful
Context sensitive functionality keeps toolbars and
menus trimmed to what elements you need based on
what you are looking at
Because it is Visual Studio it is fully customizable
 3rd party add-ons can make it even more powerful
Designing an Analysis Services Database
BI Studio will be used to create new AS Databases
Steps include:
 Define data source
 Define data view
 Create a new cube
 Define fact tables
 Define dimensions (data and time)
 Define measures
 Aggregate/Run the cube
Key Performance Indicators (KPI)
KPIs are used to show very simply whether a target is
being accomplished
 Components include the Goal, Value, Status, and Trend
 Example: Sales and Quality targets
Analysis Services allows you to build MDX expressions
off of Cube data to build KPIs
 KPI values can be queried from client applications
 Allows for visual display of meaningful data
Actions
Actions
 Allows client application users to be able to interact with
what the data means
 Example: Browse to a customer or product via URL
Types of Actions:
 CommandLine, DataSet, Drillthrough, Html, Custom,
Report, URL
 Define and assign to objects in the Cube
Perspectives
Perspectives work similar to views in a relational
database
 Shows different users the data they need to see for a
particular role they may be in
Used to reduce complexity of cube data
Not meant to be security mechanism for data
Storage Modes
Data in an Analysis Services database is stored differently
than a relational database
 Optimized storage provides the OLAP query performance
Partition
 MOLAP – multidimensional OLAP (fact data and aggregations
are stored in special format)
 ROLAP – Relational OLAP (fact data and aggregations remain
in relational database)
 HOLAP – Hybrid OLAP (fact data is relational, aggregations
are stored in special format)
Dimension (dimension attributes only)
 MOLAP – stored in special format
 ROLAP – stay in relational format
Querying Cubes
SQL is not used in OLAP databases
MDX (multidimensional queries) is used for querying
cubes
DMX (Data Mining queries) is used for querying data
mining models
MDX Queries
Multidimensional Expressions (MDX) are used to query
multidimensional data
Some common terms are:
 Cell – the space at an intersection of a measure and
attribute
 Tuple – a unique cell based on a set of attribute
members
 Set – an order set of tuples with same dimensionality
Calculated Members and Named Sets
Calculated Members are used when you need to
determine something at query time
 Can be query or session scoped
 Value are only stored in memory not on disk
Named Sets are basically predefined MDX queries that
can be reused in other queries
 Used to group dimension members
Analyzing Data with Data Mining Algorithms
Out of the box, Analysis services provides five algorithms:
 Classification (Decision Tree)
Predict one or more discrete variables, based on the other attributes in
the dataset
 Regression (Time Series)
Predict one or more continuous variables, such as profit or loss, based
on other attributes in the dataset
 Segmentation (Clustering)
Divide data into groups, or clusters, of items that have similar
properties.
 Association (Association)
Find correlations between different attributes in a dataset
 Sequence analysis (Sequence Clustering)
Summarize frequent sequences or episodes in data, such as a Web
path flow
What could go wrong?
Cubes really are simple things to build and utilize
 As simple as they are, it is easy to create a cube that has
the wrong data
Always validate that the data that is displayed in the
Cube is valid and accurate
 Never hurts to be overly aggressive when testing Cube
data
Lab 1: Analysis Services
Explore Analysis Services
Lab 2: Building a Cube
Explore Cubes and Dimensions
Lab 3: Data Mining Algorithms
Explore Data Mining Models
Review
Your instructor will ask a series of
questions on this module
Summary
Extra Large databases are not easily queried for data
Analysis Services is an OLAP tool to manage large
databases
Ensure that your ETL process is accurate
Data Mining Algorithms can help you find patterns you didn’t
know about before