Download BI2016 - Home

Document related concepts
no text concepts found
Transcript
Business Intelligence
We help people see and understand their data
(Tableau)
Prof. Stefano Bordoni
[email protected]
Day/hours:
Start/end date:
Delivery Type:
Total Hours/lessons:
Thursday 8,30-10
29/09/2016
Lecture/Lab
48/24
Friday 8,30-10
23/12/2016
Course framework and learning resources (Text book, Guides, Help, Tutorial, Whitepapers, Support):
Course Credits:
6/8h
Module 1: Microsoft BI Stack - Excel
Excel File -> help https://support.office.com/en-us/excel
Module 2: Tableau
http://www.tableau.com/learn
Module 3: Data mining with Microsoft Excel
https://technet.microsoft.com/en-us/library/bb510516.aspx
Module 4 (optional): Zoho ecosystem
https://www.zoho.com/
Related resources and additional documentation (slides, software, exercises, tutorial, source data)
In common: Slides (BI2016.pptx), Data sources, Exercises, Exam information: Syllabus Exam.docx
http://morgana.unimore.it/bordoni_stefano/BI2016/
Recommended:
Self-service Business Intelligence e Data Mining con Microsoft Excel, Bordoni 2013, ed. Pitagora
Tableau Your Data! Fast and Easy Visual Analysis with Tableau Software [Paperback] Dan Murray
Learning assessment method :
Multiple choice final test. Not mandatory midterm Exam. A self assessment copy of the exam software is available at:
http://morgana.unimore.it/bordoni_stefano/BI2016/tester/testcourseBI2016.rar
Free Software to download:
Power Pivot: http://www.microsoft.com/it-it/download/details.aspx?id=29074
DM add-ins: http://www.microsoft.com/it-it/download/details.aspx?id=35578
SQL Server® 2012 Evaluation: http://www.microsoft.com/it-it/download/details.aspx?id=29066
Tableau Public: http://www.tableausoftware.com/public/ Desktop: http://www.tableau.com/products/desktop
Course goals:
Create meaningful visualizations by learning the science of data visualization and visual best practices. Provide competence to become an
analyst and developer of Self-service Business Intelligence and Data Mining applications
Free Microsoft Software from “Webstore dreamspark”*
http://e5.onthehub.com/d.ashx?s=u1vjaekwzj
*to install iso file you may use virtualclonedrive (http://www.slysoft.com/it/virtual-clonedrive.html) or
Course modules - Documentation and resources related to Exam
Modules
Software
Related resources and source files
Introduction to BI and Self-service BI
BI process in 4 steps:
1 - Business objective and data preparation
2 - Data Analysing
3 - Producing Results
4 - Sharing results
Excel 2010
Access 2010
Google site
Tableau
Zoho reports
MYBI.mdb - Dataset_libro.mdb - Sample - Superstore
Subset (Excel).xlsx
Microsoft BI stack:
BI process and Dashboard in Excel
Excel 2010
Access 2010
Power Pivot
OneDrive
2
Tableau
Tableau
Desktop 9.3
Public 9.3
Online
3
Data Mining in Excel
Excel 2010
DM add-in
Excel (functions, connections) File -> help
Tutorial: “Functions and techniques for reports and
dashboard.xlsx“
Tutorial: “Dashboard in Excel with code.xlsm“
http://www.tableau.com/learn
More on tableau:
Video, Whitepapers, Quickstart, OnlineHelp,
Knowledge base, Community, TC14 content,
Reference guide
Excel Data mining functions and commands
“DMAddins_SampleData.xlsx”
“DMAddins_SampleData modificato per libro.xlsx”
https://technet.microsoft.com/enus/library/bb510516.aspx
4
Optional - recommended
Zoho ecosystem (Reports, CRM)
1
Zoho
https://www.zoho.com/
https://www.zoho.com/reports/
https://www.zoho.com/reports/help/
2
Introduction to Visual Analytics, Business Intelligence and Data Mining. Highlights. Subscribe 4 account: Zoho, Tableau Public,
Google Site, One Drive. Slides Show dash. Build a Tableau dashboard
Self-service BI process. How to develop a dashboard in 3 different environment: Zoho, Excel and Tableau. Course structure.
Documentation and text books. Data sources (Sample superstore) https://public.tableau.com/s/resources
3
Data connections: mybi.mdb, sample superstore, custbase.csv
Excel 1
4
Small, Large, Rank, CountIf, SumIf, Lookup, VLookup, Match, Index
Excel 2
5
SumProduct 1-2, Offset 1-2
Excel 3
6
Offset 3-4, Date, Left, Right, Mid
Excel 4
7
Table, Pivot table
Excel 5
8
Sparkline, other techniques, conditional formatting
Excel 6
9
Speedometer, thermometer
Excel 7
10
Dashboard in Excel
Excel 8
11
Dashboard in Excel
Excel 9
12
Google site and object embedding
Excel 10
13
Getting Started Fundamentals -Highlights
Tableau 1
14
Connecting to Data
Tableau 2
15
Visual Analytics
Tableau 3
16
Mapping
Tableau 4
17
Dashboards and Stories
Tableau 5
18
Calculations
Tableau 6
19
How to -Special charts
Tableau 7
20
Publishing: Server – Online –Public – Embedding
Tableau 8
21
Link Analysis
DM addin
22
Cluster Analysis
DM addin
23
Predictive Analysis
DM addin
24
FAQ and assessment
Assessment
1
Intro 1
Intro 2
Business Intelligence and
Self service Business Intelligence
Definitions
BI Goal
How To Select A Software Stack
BI vs Self service BI
Which Technology
Which Vendor
Business Intelligence and
Self service Business Intelligence
Definitions
BI Goal
How To Select A Software Stack
BI vs Self service BI
Which Technology
Which Vendor
Performance measurement system
BI - PMS: system of measures and indicators that
compares goals with current results. It verifies
expectations, detects potential deviations to correct
and improve business decisions
Analytical CRM, Data and Text Mining:
support sales pipeline. It provides Customer
and Marketing Intelligence analysis
Data layer
ETL process - Staging Area - Data Warehouse - Data Mart / (KDD) Knowledge discovery in Database
Analysis layer
Query traditional
MDA OLAP cubes
Univariate and multivariate statistical analysis
Data and Text mining
Query and MDA:
Drill down
Drill up
Slice and dice
Executive information system
CRM Executive
BSC
I
N
T
E
L
L
I
G
E
N
Z
A
Static Report
(operational level)
Customer and marketing
intelligence
Dashboard
(management level)
Widgets (executive
level)
Web MKT
EIS - ESS
KPI and Alert
Customer value
Customer satisfaction
DSS – Customer analytics
Reporting and sharing
Risk Management
Fraud Detection
Customer e Marketing
intelligence
Fuzzy Expert system
Business Intelligence (BI)
1.
BI is a set of processes, technologies, techniques, tools, skills, applications
and practices used to provide information and support decision making
2.
Help people see and understand their data (Tableau)
3.
BI is getting the right information into the right people’s hands in a format
that allows them to understand the data quickly
4.
BI is a broad category of applications and technologies for gathering,
storing, analyzing and providing data to help enterprise users make better
decisions (TechTarget)
5.
BI is an umbrella term that refers to a variety of software applications used
to analyze an organization’s raw data. Is made up of several related
activities, including data mining, online analytical processing, querying and
reporting (CIO.com)
6.
BI is a three steps process to collect consistent Data Sources (1) analyse
data (2) and provide knowledge workers with unique, interactive, correct,
navigable, real time results (3)
PMS Perfomance measurement system (see also EIS, ESS): «if you don’t measure results, you can’t tell success
from failure». PMS is the whole system of measures and indicators that lets the Company to support and verify
the correct implementation of the strategy and assess goals meeting or need for corrective actions
KPI (key performance indicators): a business metric used to evaluate factors that are
crucial to the success of an organization. A KPI is a measurable value that demonstrates
how effectively a company is achieving key business objectives. KPIs differ per organization,
industry, department (marketing and sales have different KPIs)
Report: A document providing information usually organized in a typical tabular form.
It’s the first format of results, in the past often used as synonimous of print
Widgets (CFO and executive): Also known as gauge chart, dial chart, thermometer,
bullet chart or speedometer, it often uses needles to show key business information
(sales, expired credit…) as a reading on a dial.
Dashboard :(cruscotto, cockpit, tableau de bord): it’s an easy and visual
information system similar to a car or plane dashboard. "An easy to
read, often single page, real-time user interface, showing a graphical
presentation of the current status (snapshot) and historical trends of an
organization’s Key Performance Indicators (KPIs) to enable
instantaneous and informed decisions to be made at a glance.“ They
often support drill-down, slice & dice, pivoting and other interactive
actions to navigate through different scenarios. Used by managers,
dashboards may be tailored for a specific role and display metrics
targeted for a single point of view or department.
Balanced Scorecard: a "4 perspective" approach to identify 15-20 measures to
track the implementation of strategy. The original four "perspectives" proposed
are: Financial (short term), Customer (middle term), Internal business processes
(middle term), Learning and growth (long term). The Balanced Scorecard
"translates an organization's mission and strategy into a comprehensive set of
performance measures that provide the framework for a strategic measurement
and management system
What happened - Reporting
Why did it happen - Analysis
What’s happening now - Monitoring
What might happen - Predicting
Who Uses What:
Executive/Managers
Business Analysts
Middle management
IT /
Operational employees
Data, information and knowledge
Knowledge-Wisdom
Information become knowledge when is
used to make decisions and take actions
accordingly
Information
It’s the outcome of a data processing and
has a specific meaning in a specific
domain
Data
Entity and transaction stored as simple
facts
Business Intelligence and
Self service Business Intelligence
Definitions
BI Goal
How To Select A Software Stack
BI vs Self service BI
Which Technology
Which Vendor
Which is our commitment?
We have to select and gather transactional data from OLTP (DBMS – ERP):
Flat files
Relational databases
Big data
And transform data into information and
knowledge to support decisions
Business and knowledge needs - goals
BI is a 3-4 steps process!!!
DATA LAYER
Check data existence, select, collect, gather, clean, preprocess, transform:
In brief: Prepare data
DISCOVER-ANALYZE LAYER
Which information do we need?
Data Analysis – Results evaluation (KPI, Report, Widget, Dashboard)
DISPLAY- VISUALIZE LAYER
Who and where need access to results?
Data Presentation and Sharing
(Local, Web Server, Cloud, Embedded)
Business Intelligence and
Self service Business Intelligence
Definitions
BI Goal
How To Select A Software Stack
BI vs Self service BI
Which Technology
Which Vendor
In a typical BI architecture all data from all data
sources is pulled into a centralized database
through a process called Extract-TransformLoad (ETL). This database is called a data
warehouse (DW). “A Data warehouse is a
subject-oriented, integrated, time-variant and
non-volatile collection of data in support of
management’s decision making process.”
A data warehouse contains a transformed copy
of line-of-business (LOB) data. You load data to
your DW occasionally, on a schedule - typically
in an overnight job. The DW data is not online,
real-time data. It’s considered the company’s
‘single source of truth’ by containing a
compilation of only clean and accurate data.
A data mart (DM) is a subset of the data
warehouse that is oriented to a specific
business group or team. Whereas data
warehouses have an enterprise-wide depth,
the information in data marts usually is limited
and only pertains to a single department.
Business analytics software is offered in
three different configurations:
• Back-End Software Stacks: Provide
only back-end functionality such as
data storage, transformation and
management (i.e., data warehouse and
data mart functionality, as well as ETL
capabilities).
• Front-End Software Stacks: Provide
only front-end (end-user facing)
functionality, such as data visualization
and visual analysis.
• Full Software Stacks: Deliver both
back-end and front-end functionality.
To determine the best database
technology for your business
analytics project, you’ll need to
consider the number of data
sources, the need for ETL, the
complexity of the data, and the
scale and scope of the project.
For projects with limited scope
that utilize a single data source,
a data mart solution is
probably your best bet. When
your requirements grow to
include multiple data sources
with terabytes of data, along
with your data analytics needs,
a data warehouse is the
solution to support scale.
What are the general objectives for the
solution? These may include some or all of
the following: data prep (integration,
cleaning, normalization etc.); analysis;
visualizations; reporting; and reporting
automation.
What are the specific objectives for the
solution? For example, is this to replace an
existing tool or tool set? To replace or fix an
existing process or set of processes? To create
a new process? If it’s to “replace,” what was
working and not working with the previous
process(es)? If it’s “new,” what are the specific
bottlenecks and pain points illustrating the
need for a new solution?
What is the scale of implementation? Is it
small-scale (department-level), or enterpriselevel?
Who will be using the software? Who are the
users, what are their roles and what are the
levels of their technical proficiency?
Who are the customers? Who are the internal
(or external) customers, and what are the
expectations for deliverables?
How well is the solution supported? Is the
solution provider flexible and available to
work with the organization to train,
implement, customize (if needed) or
otherwise support the solution?
When Would You Need the Full Software Stack?
In most cases, implementing only a back-end stack or a front-end stack
would not suffice to ensure a real, effective and scalable business analytics
solution. In particular, you would need a full stack when:
• Thousands of tables: Finding the appropriate tables and columns you
need for a report can be painful using a front end stack
• Bad naming conventions in an LOB database (es. Tables named Table1,
Table2, and so on, and columns named Column1, Column2)
• Data quality could be low: parts of the data could be missing; other
parts could be wrong
• End users want to access centralized data and maintain a single version
of the truth rather than build a solution around CSV/Excel extracts.
• ETL functionality is required, which often happens when multiple data
sources are involved and when the data is especially dirty or especially
large.
• The data sources cannot be directly queried, either because they are
not supported or because they are part of a critical operational system.
Business Intelligence and
Self service Business Intelligence
Definitions
BI Goal
How To Select A Software Stack
BI vs Self service BI
Which Technology
Which Vendor
Business intelligence (BI) e Self-service (SSBI)
Business Intelligence (BI)
Self service Business Intelligence (SSBI)
A traditional (corporate) BI solution is handled by the IT
department and includes typically an ETL phase, a
staging area and a data warehouse, a data modeling
(OLAP, cube), a set of queries and a web based system of
reports and dashboards to share results
“Self-service business intelligence (SSBI) is an approach
to data analytics that enables business users to access
and work with corporate information without the IT
department’s involvement (except, of course, to set up
the data warehouse and data marts underpinning the
business intelligence (BI) system and deploy the selfservice query and reporting tools)”
Pros
Reduce complexity of tables and names
Build a DW as ‘single source of truth’ as a sole source
for decision making
A DW contains transformed, merged, cleaned or
cleansed (more radical), historical data
Certified data quality and integrity
Manage large amount of data sources
Cons
Pros
It’s user-friendly
It’s intuitive
It’s interactive
Empower users by giving them tools that let them
design their own reports
Puts more control in the hands of users
Easy to use, less IT support, results are quickly
delivered
It’s slow
It’s rigid
It is expensive, complex, difficult to use, IT or ERP Cons
specific solution
Self-service BI tools are NOT a replacement for a data
DW preparation and Creating reports is very time mart or data warehouse (they don’t have back-end
consuming
functionality)
Can have high costs of implementations and
maintenance
Traditional BI
“il prodotto dell’analisi di dati quantitativi di business. È destinata a produrre riflessioni atte a consentire ai responsabili
aziendali di operare decisioni consapevoli e informate, oltre che a stabilire, modificare e trasformare le strategie e i
processi di business in modo da trarne vantaggi competitivi, migliorare le performance operative e la profittabilità e, più
in generale, raggiungere gli obiettivi prefissati.”
Self Service BI: QlikView, Tableau, Zoho
Self Service BI
Business Intelligence and
Self service Business Intelligence
Definitions
BI Goal
How To Select A Software Stack
BI vs Self service BI
Which Technology
Which Vendor
Business Intelligence – Different context
Who decides which and how many Reports?
Who and how share BI results?
IT dept decide a set of fixed results
Local, on premise
Through a company web Serve
Managers can build their own reports
On Cloud, pay as you go
Commercial Software starting from 10k/y
Free Software under GNU license (free)
How much can I pay for?
On-premise, Local, Personal solution
(Microsoft BI, QlikView, Tableau Desktop)
Data preparation (ETL, modeling and eventual DWH), exploratory and advanced data analysis (evaluation,
processing) to get insights, results delivery (KPI, reports, dashboard, widget) and publishing (sharing) are
developed in a stand alone computer. Results are not delivered through web, but can be shared as embedded
objects (Es 2).
Es 1: http://morgana.unimore.it/bordoni_stefano/BI2013/Cap4/Dashboard%20libro%20%20con%20codice.xlsm
Es 2:
https://sites.google.com/site/bompaniste/
Soluzione Web-based
(Pentaho, SAP-BO, Tableau Server)
Data preparation (eventual DWH), analysis and results are produced, delivered and shared through a Web
server. It requires a data server, a web server and a server BI software
Es:
http://155.185.65.51:8080/pentaho
Cloud Solution -> Software as a service
(Tableau Online)
Data preparation (DWH) must be developed
internally, while data evaluation and
visualization can be obtained on premise or
through Saas resources On Cloud. Results
delivering and sharing is always On Cloud. Es:
http://public.tableausoftware.com/views/dashDemo/Dashboardfinale?amp;:embed=y&:displ
ay_count=no
https://public.tableausoftware.com/views/gettingstart/SalesDashboardGettingstart?:embed=
y&:display_count=no
Cloud Solution -> Software as a service
(Zoho)
Data preparation (DWH) must be developed
internally, while data processing, visualization
and sharing are developed on Cloud through
Saas resources. When an internal DWH is
available, I can buy any requested hardwaresoftware capabilities “On demand” and “Pay
as you go” from the Saas provider. Es:
https://reports.zoho.com/ZDBDataSheetView.cc?DBID=4000000037680
Hype cycle for business intelligence
Business Intelligence and
Self service Business Intelligence
Definitions
BI Goal
How To Select A Software Stack
BI vs Self service BI
Which Technology
Which Vendor
Gartner is a private,
undisputed authority on
IT industry rating
https://public.tableau.com/views/GartnerMag
icQuadrant/GartnerMagicQuadrant?:showVizH
ome=no
COMPARISON CRITERIA
Integration
BI infrastructure
Metadata
management
Development
tools
Collaboration
Information
Delivery
Reporting.
Dashboards
Ad hoc query
Search-based
Mobile BI
Analysis
Online analytical
processing (OLAP)
Interactive visualization
Predictive modeling and
data mining
Scorecards
3 environments: Tableau, Microsoft, Zoho
Local, server, cloud, embedded
Local, server, cloud, embedded
Cloud, embedded
Data Mart – Self service BI Technologies
OLAP (Online Analytical Processing):
• An OLAP Cube’s main purpose is to enable fast performing multidimensional slicing and dicing. OLAP achieves fast performance by precalculating metrics (field aggregations) for all sets and subsets of unique
values in all dimensions (fields) ‘over-night’. This avoids performing these
slow operations in real-time during the workday. Storing the results of
these pre- calculations takes exponentially more storage resources than
the actual raw data does, limiting the size of raw data that can make up a
cube to gigabyte scale.
• IMDB (In-Memory Databases): Similar to OLAP, IMDB’s primary purpose
is fast performance. It achieves fast performance by loading the entire
data mart into RAM, thus avoiding slow disk-reads (“I/O Bottlenecks”).
The size of data mart is effectively limited by the size of RAM, which
today is limited to gigabytes in size.
BI typical techniques based on OLAP - Cubes - Pivoting
Slice and dice: filtering data into smaller parts, and repeating this process until arriving at the right level of detail for analysis
Drill down: move deeper into a chain of data, from high-level information to more detailed. Move downward through a data hierarchy
Drill through: moves horizontally through a database showing the
underlying data which produced a certain outcome
‘One source, one truth’ vs
‘The western world is being measured using
Excel’
Advantages
Disadvantages
Good quality: Excel is a good, low cost software
to develop BI solutions
Tool familiarity. Excel is western world calculator.
No time or cost for training and support. Users
can easily help each other
Built in flexibility: analysis can be done or
changed as and when needed
Rapid development: no IT load or support
required and less time to develop quickly
Hands-off application: when linked to a data
source, any elements of the worksheet refresh
automatically and offer real time solution
Little to no incremental costs: any application has
no additional license fee or training costs
Excel isn’t scalable: it’s difficult to distribute,
manage and control Excel application for high
number of users
Excel produces multiple version of truth: it’s not
a problem of Excel, but personal copies of
worksheets may be changed and manipulated
Hidden errors: poorly constructed spreadsheet
often contain errors
Spreadsheet spawn spreadsheet: any additional
user of a spreadsheet tends to change it. When
spawn again, it spreads throughout the
organization out of control nor documentation
Lack of referential integrity: it’s very hard to
understand where the data come from, when
changes were made and what rules lie behind
formulas
Business Intelligence and
Self service Business Intelligence
Examples
BI Goal
How To Select A Software Stack
BI vs Self service BI
Which Technology
Which Vendor
Widget showing revenue
Sales Dashboard
dashboard
Mobile
Report
BI Widgets
Soluzione professionale analisi dati Kyros
Soluzione originale Self Service BI
con Excel 2010
4 requisiti fondamentali:
congruità e tempestività nella fase ETL
costruzione di una Staging area, DWH e Data marts
risultati corretti, aggiornati, intuitivi, interattivi e navigabili
separazione dati, elaborazione, presentazione e distribuzione
SSBI process in Excel
Data modeling
The ETL phase (Extract, Transform and Load) and
data modeling is resolved by the the command GET
External DATA or through Power Pivot which creates
a permanent connection to data source and a data
model. Both Excel and PowerPivot use ROLAP
(Relational On Line Analytical Processing) technology
to model a cube from a dataset or a relational
database (DBMS)
Data analysis
It can be realized by simple formulas, pivot tables,
charts, or using the new set of Power BI tools (Power
Query, Map, View, Pivot, KPI), that pull source data
from Staging Area.
Data Presentation
visuals are the end-result of any business
intelligence undertaking. The outcome of data
analysis shall be transformed through reports,
tables, pivot tables, charts, dashboard, widget or a
mix of visualizations to provide consistent
information and insights
Publishing and sharing
The product's client user interface for the BI
consumer-role can be local on desktop or Webbased. There are many different possible solutions
to publish and share results on a host web service.
E.g. Power BI is a business analytics service which
fits into the cloud-based computing model of
Software-as-a-Service (SaaS)
Definizione di layout, misure e metriche
SALES DASHBOARD 2013
Revenue analysis
Quali sono i risultati di fatturato degli
ultimi due anni?
Quali tendenze evidenzia l'andamento
mensile delle vendite?
Quali tendenze evidenzia un grafico a
tachimetro che confronta i risultati di
fatturato negli ultimi due anni?
Quali tendenze evidenzia un
grafico a termometro che
confronta i risultati di
fatturato negli ultimi due
anni?
Customer ranking
Product analysis
Quali sono i dati di fatturato dei primi/ultimi 5 clienti?
Quali sono i dati di fatturato dei primi/ultimi 10 prodotti?
Geographic analysis
Come si distribuiscono per area geografica i valori di
fatturato e quantità vendute?
Salesman analysis
Quali sono i dati di vendita di un venditore a
scelta rispetto ad una certa area?
Cosa mostra il grafico delle vendite di un
venditore a scelta rispetto all'andamento medio?
Cosa mostra il confronto tra i grafici delle vendite
degli ultimi due anni per Macro area ?
Quali sono i primi 5 venditori per volumi di
fatturato?
Quali dati ci servono?
Risultati desiderati dal management
1.
Classifica Clienti - Migliori/Peggiori (Top/Bottom) 5
2.
Ranking Clienti (Gold, Silver, Bronze)
3.
Venduto per Cliente
4.
Analisi vendite per Agente
5.
Trend e benchmark
6.
Analisi vendite per Area di vendita
7.
Analisi vendite per Prodotto - Migliori 10
In quali archivi si trovano le variabili che consentono di costruire i Risultati?
Quali variabili consentono di costruire i Risultati?
Variabile
Classifica
Clienti
Ranking
Clienti
Ven
dut
o
per
Clie
nte
Trend e
benchm
ark
Analisi
Vendut
o Area
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
Risultato
CUSTOMER_DESCRIPTION
INVOICE_DATE
INVOICED_QUANTITY
INVOICED_AMOUNT_EUR
MACRO_AREA
SALE_AREA
SALESMAN_DESCRIPTION
ITEM_CATEGORY_DESCRIPTION
Analisi
vendite
per
Agente
Vendita
per
Prodott
o
X
X
X
X
X
X
X
X
X
Nome Variabile
1) CUSTOMER_DESCRIPTION
2) INVOICE_DATE
3) INVOICED_QUANTITY
4) INVOICED_AMOUNT_EUR
5) MACRO_AREA
6) SALE_AREA
7) SALESMAN_DESCRIPTION
8) ITEM_CATEGORY_DESCRIPTION
Tabella
DWH_CUSTOMER
DWH_SALE_ORDER_LINE
DWH_SALE_ORDER_LINE
DWH_SALE_ORDER_LINE
DWH_SALE_ORDER_LINE
DWH_SALE_ORDER_LINE
DWH_SALESMAN
DWH_SALE_ORDER_LINE
Situazione Venduto
Trend e benchmark
Classifica Clienti Top/Bot 5
Migliori 10 Prodotti
Venduto per Cliente
Analisi Venduto Area
Vendite per Agente
Intelligence geomarketing
uses geolocation (geographic information) in the process of planning and implementation of
marketing activities:
• Where do my customers live, and where do they buy?
• Who are my competitors?
• Visualize any data in a geographic context by linking it to a digital map
• Where should I locate my next Point of Sale?
• Creation of sales territories
• Where are my sales representatives? Driving time zones, saleseman performance analysis
• Are products sold better in particular zones? Where and why?
Data mining – Analytical CRM
Process of extracting previously
unknown, valid, actionable
information from large database and
then using that knowledge to make
crucial business decisions
Customer Relationship Management
(CRM)



is a business strategy that optimizes revenue and profitability while
promoting customer satisfaction and loyalty. CRM technologies
enable strategy, and identify and manage customer relationships,
in person or virtually. CRM software provides functionality to
companies in four segments: sales, marketing, customer service
and digital commerce (Gartner)
was coined in the 1990s and stands for: “managing the relationship
with your customer”. With CRM, you can store customer and
prospect contact information, accounts, leads and sales
opportunities in one central location, ideally in the cloud so the
information is accessible by many, in real time. Today it is used to
describe IT systems and software designed to help you manage and
increase the profitability of customers over their entire relationship
with the business (Customer’s Lifetime Value )
is a “customer centric” business strategy that includes relationship,
range of products and services, software and IT tools to store and
analyse data
KDD (knowledge discovery in databases)


Problem solving space (business objectives
determination)
Data preparation






Data selection
Data preprocessing (Cleaning data, invalid, noise-outliers,
missing value, skew distribution)
Data transformation
Data mining
Analysis of results (post processing)
Assimilation of knowledge (call to action)
KDD process
Inductive analysis (Data and Text mining)
1. Link Analysis
2. Cluster Analysis - Segmentation
3. Predictive Modeling - Classification
4. Data Dimensionality Reduction
Discovering the associations between the products or services that customers tend to purchase
together. In particular:
Detect the most associated product with all the others (loss leader)
Detect the best consequent for any antecedent (e.g. in a center aisle box)
Detect the best antecedent (shirt?) for each consequent (tie) to locate the best placement in the
store (e.g. shelves) or in a web e-commerce page
Detect products to create a bundle
Detect association in phone tapping
Detect crime and fraud pattern
DATA and TEXT MINING
Inductive Approach
Discovering the associations between the products or services that customers tend to
purchase in a sequence over time
How to manage stock with a proper inventory policy (sequences of sales)
Detect promotions to offer in sequence (sequence of products or services)
How to check that the users of a Web site reaches a page following the desired path
(clickstream analysis)
Partition a database into segment of similar, homogeneous records.
Detect n ideal customer profiles
Detect the impact of a marketing campiagn towards lead, prospect or potential customer
Detect migration from a segment to another
Analyze an existing database to determine regular occurence between data and results
Churn analysis (detect customers who leave the company), who buy what, who do
specific actions (join the club… )
Who reply a mail for a better targeted campaign and higher redemption
Analysis to determine the meaning, purpose, or effect of any type of communication, by
studying and evaluating the textual content of:
A market survey
An advertising campaign
Level of satisfaction of employees
practice of examining large collections of written resources in order to generate new information
(text mining)
Descriptive statistical methods that generate graphical displays from data matrices
Competitive Product Positioning Map – perceptual map
Brand map (words or phrases are most associated with different suppliers)
Associatian
discovery
and
Sequence
pattern
discovery
Cluster Analysis
(static and
dynamic)
Predictive
Analysis
(Classificazione)
Content Analysis
And Text mining
Correspondence
analysis -Factor
analysis
Link Analysis
Cluster Analysis
Tree Classification
NO
Link analysis: associations discovery
(A priori, Agrewal 1994)


Association Discovery
 An association rule has two parts, an antecedent (if) and a
consequent (then). An antecedent is an item A found in the
data. A consequent is an item B that is found in
combination with the antecedent. MBA discovers the
probability (confidence) of the co-occurrence of the two
(or more) items (product A and B) within the same event
(basket with A), the frequency of this combination over
total transactions (rule support) and the eventual increased
probability to sell B, once sold A (lift)
Sequential pattern discovery
 discover all significant sequential patterns (e.g. in products
selling)
Link analysis: associations discovery

MBA metrics (KPI)
 Item Support: is an indication of how frequently the items
A appear on total transaction (The minimum frequent
itemset support is a user-specified percentage that limits
the number of itemsets used for association rules) supportA  NN
N
supportA  B 
 Association Support
N
 Association confidence: frequency of co-selling of A->B on
N
the total transactions (basket) with A
confidence A  B 
A
A B
A B
NA

Lift: leverage effect on B sales probability once sold A
lift A  B 
confidence A  B
supportB
Association Discovery: MBA
Discover the associations between the products that
customers tend to purchase together
The association rule:
Stout
Lemonade
1. Lemonade is present (sold) once
every two selling (basket) of stout
beer (confidence)
2. Probability to sell lemonade (lift)
raises 2.7 times
3. Co-selling appear on 3% of total
transactions (rule support)
267*11,6%= 31
267*3,75%= 10
267*5,24% = 14
3%=8
Acqua minerale 31
Antigelo 10
Sapone A14
Sequential pattern discovery
Clickstreams analysis: explores the route that a
certain number of visitors choose when clicking or
navigating through a website
Cluster Analysis (Segmentation)
is the task of grouping a set of objects in such a way that
objects in the same group (called a cluster) are very similar
to one another and very different from the items in other
clusters. Customer segmentation divides the customer
base into distinct and internally homogeneous groups, such
as low value, high value customers, and low value
customers with high spending potential.
– Objective / variables
– Clustering method (Hierarchical / non Hierarchical)
– Distance Measures (e.g. Euclidean distance for
quantitative variables, Jaccard index , Condorcet for
qualitative variables)
– Technique (k-means, Demographic, Kohonen’s maps,
EM (Expectation Maximization)
– Cluster Profiling
Segmentation Variables
Demographic: is one of the most commonly used forms of
segmentation as it is one of the best ways to diversify
individuals. Variables:
–
–
–
–
–
–
–
–
–
–
Gender
Age
Income
Family Status, size, life cycle
Education
Occupation
Ethnicity
Life stage
Nationality
Social economic ranking
Segmentation Variables
Psycographic: uses people personality, lifestyle, their
activities, interests, opinions as well as the psychological
aspects of consumer buying behavior into account.
Psychographic data used for segmentation purposes are
typically from research instruments, such as surveys, polls,
and queries. Main variables are:
–
–
–
–
–
–
Values (ideal, achievements, religion…)
Attitude (primary motivation, self expression)
Lifestyle (social life, hobby…)
Activities (job, sport, travels…)
Interests (family, home, media…)
Opinions (politics, culture, money…)
Segmentation Variables
Customer type – buyer, shopping behaviour: explores the
economical features of customers
–
–
–
–
–
–
Loyalty to supplier
Usage rate/ buying pattern
Order size
Buying on occasions
Benefits sought
Type of customer (lead, qualified lead, prospect,
potential, active, loyal customer)
– Type of buyer (ABC class, avg ticket, quantity, classes
of purchased goods, regular, occasional)
– Attitude towards product, channel
– Profitability
Segmentation Variables
Website’s navigation visitors (web mining – web analytics).
Customers’ profiling through log files and navigation style):
– Recency: the time elapsed since the last purchase
– Frequency: the number of purchases made over a
defined time period
– Monetary value of the monetary
– Time spent on the site (stickiness)
– Sequence e number of pagine visitated (clickstream
analysis)
– Access link (banner or direct access)
– Value of the past purchases (or of a visiting session)
– Purchase yes/no
Desirability of a segmentation model
not all of the achieved segmentations are meaningful:
Measurable: The characteristics of the segments can be
measured and identified
Substantial: The segments should be large and profitable
enough, and worth investing in. They should also be
feasible to reach with a tailored marketing program
Accessible: The segments should be meaningful to the
company and be effectively reached and served
Differentiable: Each segment needs to be distinguishable
from others and to respond differently to a marketing
program
Actionable: Effective marketing programs can be carried
out for the selected segments by considering the objectives
and resources of the company
Segmentation goals
Static: One – time (occasional) modeling you don’t run on
a regular basis
• P ideal profiles (personas) are based on data points
and need to be run only once. A static segmentation
is useful to create profiles, one-off campaigns,
identify tribes, targeted sales support
Dynamic: changes in segment structures and composition
(it can identify temporal patterns of customers’
characteristics and their purchasing behavior e.g.
emerging, disappearing and changing segments )
Dynamic: changes in segment memberships of individual
customers (segment migration analysis) to identify patterns
and movements among segments of leads, prospect or
high potential targeted customer
Predictive Modeling: (Classification tree)
• If the target (output, dependent) variable is categorical
we use Classification tree, Kohonen maps or Logistic
Regression, if quantitative Neural network, Linear
regression or Radial Basis Function
• The aim of classification is to unveil the relationship
between explanatory (independent) and target
(dependent) variables and to predict future events by
analyzing historical and current data. The model ranks
the importance of independent variables to explain and
predict the value of a target variable
• A classification model is trained based upon training data
with labeled classes, aiming to search for relationships
between explanatory variables and the target variable
• A decision tree works by dividing data into smaller sets
by recursively applying two way and/or multi-way splits
to find the best variables for the partition at each split
Classification tree
Income Range
Life insurance
Credit card insurance
Gender
Age
40-50
No
No
M
45
30-40
Yes
No
F
40
40-50
No
No
M
42
30-40
Yes
Yes
M
43
50-60
Yes
No
F
38
20-30
No
No
F
55
30-40
Yes
Yes
M
35
20-30
No
No
M
27
30-40
No
No
M
43
30-40
Yes
No
F
41
40-50
Yes
No
F
43
20-30
Yes
No
M
29
50-60
Yes
No
F
39
40-50
No
No
M
55
20-30
Yes
Yes
F
19
Best explanatory variable
Income Range
20-30
30-40
40-50
50-60
Yes
2
4
1
2
9
No
2
1
3
0
6
15
Yes
Corr. Prediction
(precision, purity)
11 over 15 =
73%
Branches =
4
Yes
Information Gain for Attribute = 11/15/4 =0,183
NO
Yes
Best explanatory variable
Credit card insurance
Yes
No
Yes
3
6
No
0
6
Yes
NO
Corr. Prediction
(precision, purity)
9 su 15 =
60%
Branches =
2
Information Gain for Attribute = 9/15/2 =0,300
Best explanatory variable
Age
<43
>43
Yes
9
0
No
3
3
Yes
NO
Corr. Prediction
(precision, purity)
12 su 15 =
80%
Branches =
2
Information Gain for Attribute = 9/15/2 =0,400
Full tree
Age
>43
<=43
No (3/0)
Gender
Male
Female
Credit
card ins.
Yes (6/0)
No
No (4/1)
Yes
Yes (2/0)
Number of record correctly classified = 14 over 15
Classification (confusion) Matrix
Yes
(predicted)
No
Totale actual
(predicted)
Yes (people who actually
bought the life insurance)
8
1
9
No (people who actually didn’t
buy the life insurance)
0
6
6
Total predicted
8
7
15
•true positives (TP): These are cases in which we predicted yes (they have the disease), and they do have the disease.
•true negatives (TN): We predicted no, and they don't have the disease.
•false positives (FP): We predicted yes, but they don't actually have the disease. (Also known as a "Type I error.")
•false negatives (FN): We predicted no, but they actually do have the disease.
Number of record correctly classified = 14 over 15
Number of false positives = 0
Number of false negatives = 1
Overall accuracy of the model: 93%