Download Big Data Analytics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Big Data Analytics
Analysis of high-volume and unstructured Data
Stefan Weingaertner, DYMATRIX CONSULTING GROUP
KNIME Meetup Italia, 10th October 2013
1
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
Agenda
2
1
Company Introduction
2
Big Data - an Introduction
3
Big Data Analytics on high-volume Data
4
Big Data Analytics on unstructured Data
5
Livedemo: Advanced Email Classification
6
Q&A
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
Company Introduction
3
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
DYMATRIX – The analytical CRM Company
» Solution provider for Customer Intelligence, Marketing Automation and
Advanced Predictive Analytics
» Consulting, development and implementation know how, based upon
more than 900 projects with mid- and large cap companies across
industries
» Goal- and client- oriented project execution based upon award winning,
established solutions
» Owner managed and independent
4
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
Our Consulting Competence Centers
Business
Intelligence
»
Conception of (big)
data warehouse and
business intelligence
architectures
Customer
Segmentation
Customer Value
Analysis
»
Propensity Modeling
(Cross-/Upsell/Churn)
»
Shopping Basket
Analysis
Planning &
Forecasting
»
Credit Rating Analysis
& Credit Scoring
Balanced Scorecard
»
Text Mining
»
Data Mining
Automation
»
Big Data Analytics
Enterprise Reporting
Systems
»
Dashboards
»
»
Sales Controlling
»
»
»
»
Campaign
Management
Advanced
Analytics
»
»
»
E-commerce
insight
Design and
Optimization of
Campaign Processes
and Workflows
»
Web Tracking
»
Web Controlling
»
Web Mining
Implementation of
Campaign Management
Systems
»
Real Time
Recommendation
»
Social Media
Tracking & Analysis
»
Web Performance
Measurement
»
Customer Journey
Analytics
Integration of Data
Mining Models in
Campaign Processes
»
Campaign Optimization
»
Consulting &
Implementation of Next
Best Activity Processes
Analysis of client oriented processes
Initial situation – Analysis – Conception of processes for customer retention and its optimization customer reactivation and new customer activation – benchmarking against industry leaders
5
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
Solution Portfolio – The Customer Insight Suite
DynaCampaign
»
Intelligent multi-touchpoint campaign
management platform
»
Planning, target group selection, execution
and response measurement of campaigns
»
Event-triggered realtime campaigning
DynaMine
»
End2end automation of data mining processes
»
Intelligent model management for automation
of preprocessing, training & scoring of models
DynaCision
»
Realtime decision management platform
»
Design & exection of complex embedded
decision processess
DynaSocial
»
6
KNIME Meetup Italia 2013
Social CRM platform to listen, track, identify
and quantify customer needs and sentiments
© DYMATRIX CONSULTING GROUP
Our KNIME Solution Nodes & KNIME Consulting Services
PMML2SQL / PMML2SAS Converter
»
Convert PMML to executable SQL Code for InDatabase-Scoring
»
Convert PMML to executable SAS Code for
Model Scoring within SAS
Big Data Integration
+ Business Consulting
+ Analytical Consulting
+ Technical Consulting
+ Trainings
»
Access any Hadoop large-scale distributed
batch processing infrastructure from KNIME
»
Efficiently distribute large amounts of data &
preprocessing across a set of machines
Uplift Modeling
»
Predictive Modeling Nodes to predict the
incremental response to marketing actions
»
For up-sell, cross-sell, churn and retention
activities
Interactive Scorecard Builder
»
7
KNIME Meetup Italia 2013
interactive Scorecard Building Nodes for
Design of Credit or Marketing Scorecards
© DYMATRIX CONSULTING GROUP
References
Referenzen
Telecommunication
8
KNIME Meetup Italia 2013
Travel, Transportation
Retail, Service Provider
© DYMATRIX CONSULTING GROUP
References
Banks, Insurances
Media
Utilities, Industries, Public
Schwäbisch
Hall
9
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
Big Data - an Introduction
10
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
A Characterization of Big Data
Structured &
Unstructured
Structured
Batch
Big
Data
Zettabyte
Streaming
Terabyte
Volume
Source: Understanding Big Data (Zikopolous et al.), 2012
11
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
Challenge: Big Data Collection & Integration
Needs
Remember
Possibilities
Service &
Support
Decisions
Usage
Approach
Delivery
Purchase
Source: Phil Winters, 2011
12
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
Big Data Analytics: Learn, Target & Influence!
Needs
Remember
Possibilities
Service &
Support
Decisions
Usage
Approach
Delivery
Purchase
Source: Phil Winters, 2011
13
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
Big Data Analytics on high-volume Data
Structured &
Unstructured
Structured
Batch
Big
Data
Zettabyte
Streaming
Terabyte
Volume
14
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
Hive
HBase
MapReduce
Routines
Mahout
Hadoop
Extensions
Analytic
Applications
Big Data Access
Big Data
Sources
Hadoop Distributed File System (HDFS)
Hadoop
Core
MapReduce
15
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
Hive
HBase
MapReduce
Routines
Mahout
Hadoop
Extensions
Analytic
Applications
Big Data Analytics
PMML2SQL
Converter
Big Data
Sources
Hadoop Distributed File System (HDFS)
Hadoop
Core
MapReduce
16
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
Big Data Analytics on unstructured Data
Structured &
Unstructured
Structured
Zettabyte
Batch
Big
Data
Streaming
Terabyte
Volume
17
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
Big Data is not just about structured data…
80%
80% of the world’s data is
unstructured.
Unstructured data is growing at
15
15times
times the rate of structured
data.
Source: Google Trends April 6, 2012
18
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
Imagine…
» …to classify all customer related text
messages by
» …to identify unknown trends
» …to identify cause and effect relations
» …to react on that information, e.g.

Source / Origin

Sentiment

Technical Problems

Product or Service

Needs

Business Transaction

Usability

Context

Competition

etc.

etc.
The KNIME platform supports
these efforts with comprehensive
Text Analytics & Network Analytics
capabilities!
19
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
Deutsche Telekom: Social Earthquake
Facebook Posts & Comments March & April 2013
1000
First Rumours:
Limitation of Bandwidth (21.3. – 23.3.)
„DSL-Drossel“:
Official Pressrelease on Limitation of
Bandwidth leads to a Social Earthquake.
(22.4. – 27.4.)
800
600
Negativ
Neutral
400
Positiv
200
0
1. Mrz.
20
8. Mrz.
KNIME Meetup Italia 2013
15. Mrz.
22. Mrz.
29. Mrz.
5. Apr.
12. Apr.
19. Apr.
26. Apr.
© DYMATRIX CONSULTING GROUP
DYMATRIX Text Mining Process
21
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
DYMATRIX Text Mining Process (KNIME Text Processing)
Text Datasources
Datasources:
• Facebook
• Twitter
• Emails
• Data Provider
like GNIP,
Datasift etc.
• Crawled Data
• etc.
For Machine
Learning
• Provide Training
Data for
Classification
(e.g. Sentiment)
22
KNIME Meetup Italia 2013
Text
Enrichment
Language Detection
• English
• German
• Many more…
Language individual
NLP POS Tagging
• Penn Treebank
Tagger
• STTS Tagger
Text Cleansing
• Stop Words
• Punctuations
• Stemming
Sentiment Amplifier
• Matching of
Sentiment- &
EmoticonDictionaries
Subject
Matching
Text Tagging with
any Subjects
• Products
• Brands
• Business
Transactions
• Service
• Complaints
• Requests
• etc.
Fuzzy Matching
with Dictionary
Tagger
• Matching of
SubjectDictionaries
Sentiment
Classification
Text Vectorization
• Creation of text
predictors to
predict sentiments
Machine Learning
• Classification with
Predictive
Analytics (e.g.
Decision Tree)
Retraining Interface
• Adjustment of
misclassified
messages for
permanent
optimization of
classification
Information
Delivery
Text Data Mart
• Make information
available in central
Text Data Mart for
visualization,
alerting etc.
Fields of Application
• Email-Routing
• Event triggered
Campaign
Management
• etc.
© DYMATRIX CONSULTING GROUP
DYMATRIX Text Mining Process: Datasources
Text Datasources
Text
Enrichment
Subject
Matching
Sentiment
Classification
Information
Delivery
Access any Text Datasource to start the
Text Mining Process
»
»
»
»
»
Facebook
Twitter
Emails
Crawler
Data Provider like GNIP, Datasift
etc.
Exemplified contribution on
Facebook Fanpage
Vodafone UK
23
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
DYMATRIX Text Mining Process: Text Enrichment
Text Datasources
Text
Enrichment
Original Facebook Message
Why not sort your signal issues out instead of bringing
new phones out!!!! Wk 3 of crap signal but yet paying
FULL monthly contract! Vodafone sort it.
Subject
Matching
Sentiment
Classification
Information
Delivery
Sentiment Amplifier
Why not sort your signal issues out instead of bringing
new phones out!!!! Wk 3 of crap [----] signal but yet paying
FULL monthly contract! Vodafone sort it.
Penn Treebank POS Tagger (English Messages)
Why[WRB] not[RB] sort[VBG] your[PRP] signal[VBP] issues
[VBZ] out[IN] instead[RB] of[IN] bringing[VBG] new[JJ]
phones[NNS]!!!![SYM] Wk[NNP] 3[CD] of[IN] crap[NN]
but[CC] yet[RB] paying[VBG] FULL[NNP] monthly[RB]
contract[NN] ![SYM] Vodafone[NNP] sort[VBG] it[PRP]
.[SYM]
24
KNIME Meetup Italia 2013
Removal of Stop Words & Punctuations
sort[VBG] signal[VBP] issues [VBZ] instead[RB]
bringing[VBG] phones[NNS] Wk[NNP] 3[CD] crap[NN]
paying[VBG] monthly[RB] contract[NN] Vodafone[NNP]
© DYMATRIX CONSULTING GROUP
DYMATRIX Text Mining Process: Subject Matching
Text Datasources
Text
Enrichment
Subject
Matching
Sentiment
Classification
Information
Delivery
BUSINESS TRANSACTION: Complaint
Original Facebook Message
Why not sort your signal issues out instead of bringing
new phones out!!!! Wk 3 of crap signal but yet paying
FULL monthly contract! Vodafone sort it.
NETWORK: No Signal
Subject Matching (Fuzzy Matching)
Why not sort your signal issues out instead of bringing
new phones out!!!! Wk 3 of crap signal [NETWORK] but
yet paying FULL monthly contract! Vodafone sort it
[COMPLAINT].
25
KNIME Meetup Italia 2013
PRODUCT: Nokia Lumia 925
© DYMATRIX CONSULTING GROUP
DYMATRIX Text Mining Process: Sentiment Classification
Text Datasources
Text
Enrichment
Subject
Matching
Sentiment
Classification
Information
Delivery
Text Classification with Decision Tree
Original Facebook Message
Why not sort your signal issues out instead of bringing
new phones out!!!! Wk 3 of crap signal but yet paying
FULL monthly contract! Vodafone sort it.
Output from Text Enrichment
Text Vectorization (Transformation)
Predictors relevant for Text Classification , e.g.
- Emoticons positive/negative
- Fragments positive/negative
- Words positive/negative
- Author-related Inputs
26
KNIME Meetup Italia 2013
Resulting Classification
- Length of message
- Likes
- Comments
- Other linguistic Inputs
© DYMATRIX CONSULTING GROUP
DYMATRIX Text Mining Process: Information Delivery
Text
Enrichment
Text Datasources
Subject
Matching
Sentiment
Classification
Information
Delivery
Visualization in DynaSocial
Make information available in central Text Data Mart
Original Facebook Message
Why not sort your signal issues out instead of bringing
new phones out!!!! Wk 3 of crap signal but yet paying
FULL monthly contract! Vodafone sort it.
+
Sentiment
Business
Transaction
+
27
KNIME Meetup Italia 2013
Product
Relevance
+
+
Network
Other Fields of Application
»
Subject-oriented Email-Classification
& Email-Routing
© DYMATRIX CONSULTING GROUP
DYMATRIX Text Mining Process: KNIME Workflow
28
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
Benefits
29
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
KNIME Server: Develop once, deploy everywhere!
» Text Enrichment & Classification Workflows can be used for classification
of any electronic text message (e.g. Social Content, Blogs, Emails).
» KNIME Server-based Text Enrichment & Classification Workflows can be
deployed as a webservice and called easily from any other application.
Benefits
» Uniformed Sentiment- and Classification-Handling for all customerrelated messages.
» Batch- or Realtime-Execution from any application.
30
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
Application Integration I: DynaSocial
Social Media Monitoring & Analytics
31
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
DynaSocial – Social Media Excellence Architecture
Social Media Analytics
Content Extractor
Advanced Social Media Analytics
Text Mining & Network Mining
Social Media Analytics
Dashboard
Text Enrichment &
Classification
Network Insights
Facebook
Twitter
Social Media Data
Provider
Social Media Analytics
Data Management
Social Service
Platforms
Client individual
Sources
Emails
Generic Big Data
Model
Social Engagement
Integrated Social Inbox including all
Social Touchpoints
DynaSocial Configuration Center
Data Sources
32
KNIME Meetup Italia 2013
Sentiments & Classifications
Reports & Dashboard
© DYMATRIX CONSULTING GROUP
DynaSocial Management Dashboard
Activities
Platform Distribution
Overall Sentiments
Sentiment Ratio
Trends compared to
competition (Share of Voice)
Top Keywords
Key Influencer
Geographic Distribution
Flexible Selection of
Time Windows
…
33
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
DynaSocial Management Dashboard (Project Example)
34
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
Application Integration II: Advanced Email-Classification
Multidimensional realtime Email-Classification
35
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
Email Classification: MS Exchange Connector
2
.NET Batch
Call .NET Procedure
and transfer email
contents to KNIME
Server via Webservice
Call.
3
KNIME Server
Call KNIME Text
Enrichment &
Classification
Workflows und return
classification results.
Microsoft Outlook
36
KNIME Meetup Italia 2013
1
Incoming Email
4
Classification results
are returned to
Exchange Server and
are saved persistantly
with object categories.
5
Any clients having access
to Exchange Server get
the same classification.
Microsoft Exchange
Webservice
Microsoft Outlook
Webaccess
Other Email-Clients
© DYMATRIX CONSULTING GROUP
Livedemo
Realtime EmailClassification
37
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
Q&A
38
KNIME Meetup Italia 2013
© DYMATRIX CONSULTING GROUP
Contact
DYMATRIX CONSULTING GROUP GmbH
Zeppelin Carré
Lautenschlagerstrasse 2
D-70173 Stuttgart
Your Contact: Stefan Weingaertner
Thank you for your attention.
We are happy to answer any of your questions!
39
KNIME Meetup Italia 2013
Phone
Fax
E-Mail
Web
+49.711.22.007.88 - 12
+49.711.22.007.88 - 88
[email protected]
www.dymatrix.de
© DYMATRIX CONSULTING GROUP