Download Big Data Discovery

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
Finding new business potential
with Big Data Analytics
Carsten Frisch
Oracle Business Analytics
DOAG 2015 Business Solutions Conference
Darmstadt, 10. Juni 2015
Copyright
Copyright
©©2015,
2015,
Oracle
Oracleand/or
and/or
itsitsaffiliates.
affiliates.AllAllrights
rightsreserved.
reserved.|
Referent
» Carsten Frisch
» Senior Sales Consultant
» Business Analytics
Big Data Discovery Lead - DE/CH Cluster
» Kontakt
+49 (0)6103 397-380
» [email protected]
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
Safe Harbor Statement
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information
purposes
only,toand
may our
not general
be incorporated
into any contract.
It is not
The following
is intended
outline
product direction.
It is intended
fora
commitment
to deliveronly,
any material,
code,
functionality,
and
should
not be
information purposes
and may not
beor
incorporated
into
any
contract.
It isrelied
not aupon
in
making purchasing
decisions.
The development,
release, and
anybefeatures
or
commitment
to deliver
any material,
code, or functionality,
andtiming
shouldofnot
relied upon
functionality
describeddecisions.
for Oracle’s
remains
at the and
sole timing
discretion
of Oracle.
in making purchasing
Theproducts
development,
release,
of any
features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
4
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
5
Monetizing New Insights
Business Cases for Big Data and the Discovery Lab
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
6
Financial Services
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
7
Enabling Rich Customer Experience Across Channels Is
A Key Focus For Banks
Customers have become more
Email
Mail
demanding and their loyalties
Sales
are diffused with low-switching
costs. The customer experience
360 degree view of
customer
Branch
Phone
expectations for banking
services (across channels) are
being reset by the experiences
Mobile
Online
being provided by retailers and
online providers elsewhere
ATM
Source: Redefining Customer Experience, Infosys Whitepaper; PWC Report 2012
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
8
Banks Need To Move Towards Personalization And
Targeted Marketing To Enhance Customer Experience
Top 3 Emerging Changes in Customer Behavior That Impact
Banking (% of respondents)
Customer Demand…
Using Direct and Self-Service Channels
Seeking Better, More Personal Advice
Price Sensitivity, Discount Seeking
63%
49%
44%
……. More personalized services, offers
and enhanced customer experience
……. More relevant services and
transparent access to information across
all channels consistently
……. Increase simplicity, self-control,
mobility of banking services
Customers are making web / mobile as their primary channel of interaction with their banks. These
channels are already heavily personalized and there is a rising demand for more personalized services
and offers from customers
Source: Enhancing The Banking Customer Value Proposition Through Technology-led Innovation, Accenture
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
9
Market Challenges Are Compelling Banks To Focus On
Customer Insight And Real-Time Offers
KEY BIG DATA CAPABILITIES
INDUSTRY CHALLENGES
ENHANCE CUSTOMER
EXPERIENCE
 Develop deep client relationships by offering superior service
 Analyze internal customer logs and social media activity to generate indications of
customer dissatisfaction allowing time to act
 Analyze behavior profiles, spending habits, and segmentation to gain view on customer
risk and enhance risk management capabilities
REAL-TIME OFFERS
 Generate real-time, context sensitive, targeted offers based on analytical insights on
OPTIMIZE OPERATIONS
 Provide more visibility into performance in order to facilitate timely and cost effective
spending patterns
 Rapid time to market and improved customer value
 Leverage insights from social media during various stages of product and service
development
management of operations
 Discover opportunities to achieve greater efficiency across global operations
 Understand and forecast performance and drive strategies that improve operations
and financial results
Source: Oracle Financial Services Industry Solutions Overview; Oracle Insight; PWC Report
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
10
Leveraging Big Data for Competitive Advantage in FS
Customer Insight
Customer
Insight
Social Media Sentiment &
Engagement
Big Data Augmentation
Personalised Services
New Product Launch
Optimise Operations
Data Monetisation
Real Time
Offers
New
Revenue
Streams
Context Sensitive Offers /
Ads
Location Based Offers /
Ads
Compliance Processes
Fraud
Detection
Information as a
Service
Risk
Management
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
Fast Data
Quality of Models
Financial risk
Security risk
Digital Business,
Data-driven Decisioning
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
12
Characteristics of Digital Business Leaders
They ‘Reframe’ Challenges
They Sprint
Looking at them from new
perspectives and multiple angles
They work at pace - researching,
testing and evaluating current
ideas while generating new ones
They Appreciate That
Failure Can Be Good
and are not afraid of new ideas
They Convert Data Into Value
They invest heavily in analyzing
their own data and data from
external sources to establish
patterns and un-noticed
opportunities
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
13
Data-driven Decisions




Data Science + Knowledge Discovery
Apply a
statistical
model and
evaluate the
correctness of
the approach.
Repeat this
procedure
until the right
method has
been
identified.

Present & implement results
Gather all
available
information
about the
variables of
your hypothesis. The
relevance of a
dataset might
address your
business
question
directly or
needs to be
derived
Analyse the data
Formulate a
detailled
hypothesis
how specific
variables
might
influence the
result of the
chosen model
Gather all necessary data
Try to identify
alternatives to
your perception
Find out who
has investigated such or
a similar
problem in the
past and the
approach that
has been taken
Design of a solution model
Become clear
about all
aspects of the
decision to be
taken or the
problem to be
solved.
Verify earlier findings
Identify (business) question
Non-Analysts & Executives: should take a closer look on steps 1 and 6 of the
analysis process if they plan to make use of statistical analysis.
Frame the
results obtained
in a comprehensible story.
This kind of
presentation
intends to
motivate
decision makers
and relevant
stake-holders
to take action

Adopted from Thomas H. Davenport, Harvard Business Manager 2013
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
14
Vertical and Horizontal Data Science Skills
DataVertical
Warehouse
Deep technical skills
Eigenvalues,
Lasso-related regressions
Experts in Bayesian
networks, R
Support Vector Machine
Hadoop, NoSQL, Data
Modeling, DW
The Specialist
Horizontal
Cross-discipline knowledge
Machine
Learning &
Statistics
Visualization
skills
Domain
expertise
Storytelling
experts
Programming
experience
Aware of
pitfalls
& rules of
thumb
Look for the individual
Unicorn
or build a Data
Science Team?
The Unicorn
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
15
Enabling Data-driven Innovations in Organizations
Executive:
Decisions effecting
strategy and direction
Data Scientists:
Information analysis to
meet strategic goals
Business Analysts:
Day-to-Day performance
of a business unit
Analytical Competence Center (ACC)
Perf.
Mgmt.
ACC
Knowledge
Discovery
Insight
Dynamic Dashboards
and Reports
BICC
Information Consumer:
Reporting on
individual transactions
Automated Process:
Decisions effecting
execution of an
indiv. transactions
Volume and Fixed
Reporting
Knowledge Driven Business
Process
» Separate group reporting to CxO.
not part of a Business Intelligence
Competence Center (BICC)
» Mission: broadening the adoption
of Analytics across the organization
» Skilled resource pool of Data
Scientists, Statisticians and Business
Experts
» Data-driven approach (not
development-driven) with
privileged access to enterprise
data sources
» Group will be assigned to projects
for a limited time
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
16
Discovery Lab
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
17
Information Management – Conceptual View
Actionable
Events
Actionable
Insights
Actionable
Information
BICC
Data
Streams
Data
Reservoir
Event Engine
Data Factory
Structured
Enterprise
Data
Enterprise
Information Store
Business
Intelligence
Other
Data
Execution
Line of governance
Innovation
ACC
Events
& Data
Discovery Lab
Discovery
Output
Source: Oracle White Paper “Information Management and Big Data – A Reference Architecture”
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
18
Discovery Lab: Design Pattern
» Iterative development approach –
data oriented NOT development oriented
» Small group of highly skilled individuals
(aka “Data Scientists” or a team organized
as an Analytical Competence Center, ACC)
with privileged access to enterprise data
sources
» Specific focus on identifying commercial
value for exploitation
» Wide range of tools and techniques
applied
ACC
» Typically separate infrastructure but could
also be unified Reservoir if resource
managed effectively
» Data provisioned through Data Factory or
own ETL processes
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
20
Discovery Lab: Activity Cycles
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
21
Discovery Lab: Data Provisioning
Data
Factory
flow
1
2
Pre-Built
Intelligence
Assets
Virtualisation &
Information Services
General BI
flow
The majority of BI development
activity will be from existing
sources – performed by the BICC
developing new reports to existing
or new channels
Analysis Processing & Delivery
Scorecards
Charts & Graphs
Ad Hoc Query
& Analysis Tools
Intelligence
Analysis
Tools
OLAP Tools
Forecasting &
Simulation Tools
BICC
Reporting Tools
Discovery Lab & Development Environment
Sandbox – Project 3
Raw Data
Dashboards & Reports
Query & Search Tools
Statistics Tools
Sandbox – Project 2
Sandbox – Project 1
Data
Science
(Primary
Toolset)
Data store
Analytical
Processing
ACC may quickly develop new reporting through mashups
from any available internal and external sources and may
used advanced analytical tools for innovative analysis
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
Data Modelling Tools
Programming & Scripting
Data & Text Mining Tools
ACC
Faceted Query Tools
Data Quality & Profiling
Graphical rendering tools
22
Unified: Big Data Management and Analytics…
Experiment, Prototype, Collaborate
Oracle BI Foundation Suite
Exalytics
Oracle SQL
Queries
(ROLAP/MOLAP, Mobile,…)
Productize,
Secure &
Govern
Structured Data
In-Memory Appliance
Oracle Advanced Analytics
(Data Mining, Oracle R Enterprise)
Oracle
Big Data SQL
Tables in DB
Oracle Database
Exadata
Polystructured
Data
SQL join
Experiment,
Prototype &
Collaborate
BDA
Oracle R for
Hadoop
» Use to build predictive models
with Oracle R for Hadoop
» Connect published HDFS files to
secure Oracle DB using
Oracle Big Data SQL
» No data movement required
Hadoop (HDFS)
Data Reservoir
» Publish results to the Hadoop
Distributed File System (HDFS)
Productize, Secure, Govern
Data Warehouse
Oracle Big
Data Discovery
» Quickly find, explore, transform,
analyze and share discoveries
in Big Data Discovery
Tables in Hadoop
» Seamlessly extends existing
DWH and BI investments with
non-traditional data in Hadoop
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
23
Need To Get Analytic Value Fast
Data Uncertainty
80% effort typically
spent on evaluating
and preparing data
» Not familiar and overwhelming
» Potential value not obvious
» Requires significant manipulation
Tool Complexity
Overly dependent on
scarce and highly
skilled resources
» Early Hadoop tools only for experts
» Existing BI tools not designed for Hadoop
» Emerging solutions lack broad capabilities
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
24
Oracle Big Data Discovery
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
25
Oracle Big Data Discovery: The Visual Face of Hadoop
find
explore
transform
discover
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
share
26
Oracle Big Data Discovery: Components
Oracle Big Data Discovery Workloads
Hadoop Cluster
(Oracle Big Data Appliance or
Commodity Hardware with
Cloudera CDH 5.)
Studio
• Web UI: Find, Explore, Transform, Discover, Share
MapReduce
In-Memory Discovery Indexes
BDD node
• DGraph: Search, Guided Navigation, Analytics
name node
Hadoop 2.x
data node
Metadata
(HCatalog)
data node
Workload Mgmt
(YARN)
data node
Other Hadoop
Workloads
Filesystem
(HDFS)
Spark
Data Processing, Workflow & Monitoring
• Profiling: catalog entry creation, data type &
language detection, schema configuration
• Sampling: dgraph (index) file creation
• Transforms: >100 functions
• Enrichments: location (geo), text (cleanup,
sentiment, entity, key-phrase, whitelist tagging)
Self-Service Provisioning & Data Transfer
data node
• Personal Data: Upload CSV and XLS to HDFS
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
Hive
Pig
Oracle Big Data SQL
(Oracle Big Data Appliance only)
27
Oracle Big Data Discovery: Preparation of Data Sources
Have to be created as Hive Tables and registered in the Hive Metastore
Hive Table with a standard Regex SerDe
(“Serializer-Deserializer”) to map more
complex file structures by using Regular
Expressions into regular table columns
Hive Table definition for fixed-width
or delimited files
Hive Table using a custom developed SerDe
to map nested file structures of a JSON file into
regular table columns
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
29
Oracle Big Data Discovery: Preparation of Data Sources
There are multiple ways to get new Data Sets loaded…
Big Data Discovery
HUE (Hadoop User Experience)
Hive Command Line
Upload of XLS und CSV files and
automatic Hive Table creation
Upload of various file formats, table
creation wizzards, web-based
Hive Query Client
Interface is similar to the MySQL
command line
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
30
Oracle Big Data Discovery: Preparation of Data Sources
…or by using your favorite Data Integration / ETL Tool
IKM File To Hive
(Load Data)
IKM Hive Transform
IKM Hive Control Append
IKM File-Hive To Oracle
(OLH, OSCH)
Hive
Hive
LKM HBase to Hive
File
(FS/HDFS)
IKM SQL to HiveHBase-File (SQOOP)
Any RDBMS
Oracle DB
HBase
Hive
IKM Hive to HBase
Hive
IKM File-Hive to SQL
(SQOOP)
HBase
Any RDBMS
Oracle Data Integrator 12.1.3 with Advanced Big Data Option
(Supporting HDFS, Hive, HBase, Scoop, Pig, Spark)
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
31
Oracle Big Data Discovery: Data Ingestion
Data Processing Workflow including Profiling and Enrichment
access_logs
100m rows
access_logs
1 m rows
Hive /
HCatalog
Profiling and
Enrichment
Process
BDD
access_logs
1 m rows
access_logs
1 m rows
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
1M of 100M
32
Demonstration
Oracle Big Data Discovery
Oracle Big Data Discovery
Demonstration
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
35
Catalog
» Access a rich,
interactive catalog of
all data in Hadoop
» Familiar search and
guided navigation for
ease of use
» See data set
summaries, user
annotation and
recommendations
» Provision personal
and enterprise data
to Hadoop via selfservice
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
36
Explore
» Visualize all
attributes by type
» Sort attributes by
information
potential
» Assess attribute
statistics, data
quality and outliers
» Use scratch pad to
uncover
correlations
between attributes
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
37
Transform
» Intuitive, user
driven data
wrangling
» Extensive library of
powerful data
transformations and
enrichments
» Preview results,
undo, commit and
replay transforms
» Test on sample data
then apply to full
data set in Hadoop
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
38
Transform – User friendly…
Preferred method for the Business Analyst
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
39
Transform – … but flexible
(based on Groovy Programming Language / Library)
Preferred Method for IT / Data Engineer / Data Scientist …
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
40
Discover
» Join and blend data for
deeper perspectives
» Easy usage - compose
project pages via drag
and drop
» Use powerful search
and guided navigation
to ask questions
» See new patterns in
rich, interactive data
visualizations
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
41
Share
» Share projects,
bookmarks and
snapshots with
others
» Build galleries and
tell big data stories
» Collaborate and
iterate as a team
» Publish blended
data to HDFS for
leverage in other
tools
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
42
Data Discovery & Analytics
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
43
Data Discovery & Analytics Lifecycle
Typical Effort
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
44
Data Discovery & Analytics Lifecycle
More Time left for Analysis and Interpretation of Results
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
45
Analytics: More Data Variety available – Better Results
100
Data Mining-based prediction results with
Response Modelling including hundreds
of input variables like:
» Demographic data
» Purchase POS
transactional data
» Polystructured data,
text & comments
» Spatial location data
» Long term vs. recent
historical behaviour
» Web visits
» Sensor data
»…
% of Positive Responders
Example: Marketing Campaigns
Getting „lift“ on responders
0
Naïve Guess or
Random
Model with 20 variables
Model with 75 variables
Model with 250 variables
Population Size (% of Total Cases)
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
100
46
Oracle Advanced Analytics
Native SQL Data Mining/Analytic Functions + High-performance R Integration
Oracle R Enterprise (ORE)
» Allows distributed processing of huge data volumes
» Benefits from DB features, e.g. Security and SQL access
» R Studio = GUI for Data Analysts
Oracle Data Mining (ODM)
» Implemented in the Oracle Database kernel
» Direct access via PL/SQL API and SQL operators
» Oracle Data Miner GUI embedded in SQL Developer
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
47
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
48