Download Big Changes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Business intelligence wikipedia , lookup

Transcript
Architecture Services
Big Data – Big Changes
Business Analyst Professional Development Day
September 2013
Contents
What is Advanced Analytics & Big Data?
Business Intelligence, Advanced Analytics and Big Data seem to be used synonymously – they are different and build on
each other from a maturity perspective
Big Data & Analytics Continuum
Leveraging “Big Data” should be done on a stable foundation - Examples
Skills of the Data Analyst / Scientist
New skills and levels of maturity, certifications and training
Architecture Services
1
What is Advanced Analytics?
Advanced Analytics is comprised of both Business Intelligence technologies and
complex analytic practices that are used to uncover relationships and patterns within
large volumes of historical data that can be used to predict future behavior and events
or improve operational results.
Hindsight
What happened?
When did it happen?
•Standard Reports
How many?
How often?
Where?
•Adhoc Reports
Where exactly is the
problem?
How do I find the answers?
•Query Drilldown
Architecture Services
Current
Sight
Foresight
When should I react?
What if these trends continue?
What actions are needed now? How much is needed?
•Alerts
When will it be needed?
•Forecasting
Why is this happening?
What opportunities am
What will happen next?
I missing?
How will it affect my business?
•Statistical Analysis
•Predictive Analytics
How can we get better?
What is the best decision?
•Optimization
2
What is Big Data? Volume, Variety, Velocity (and sometimes
Veracity and Value)
Definition:
“Dealing with information management
challenges that don’t natively fit with
traditional approaches to handling the
problem.” – Tom Deutsch (IBM)
Architecture Services
Where has Nationwide been, and where can we go?
Internal Data
captured or streamed today in
Systems and Data Warehouses
(e.g., policy admin, claims)
Comprehensive advanced analytics have been built
around marketing, product and pricing, and other
areas of the business – mostly disconnected, some
using rudimentary technologies that are inefficient and
focused mainly on data movement and not getting
value out of the data.
NEW internal Data
not previously captured
(e.g., emails, clickstream, mobile,
telematics, unstructured notes from
agents or claims adjusters)
NEW External Data
from non-traditional sources
(e.g., internet, social networks,
demographic, local economy, price
elasticity, mobile location stream,
localized competitor intelligence)
Architecture Services
Industry estimates suggest that 80% of enterprise data
is in unmodeled/unstructured forms where it is nearly
inaccessible and traditional modeling does not fit.
Integrating text extraction techniques to varieties and
large volumes of data such as SEC filings can be
combined with traditional BI data to create new
structured metrics for analysis and exploration.
Text is also trapped in large description fields in our
operational data stores like the Claims DW.
What is the most likely answer?
What is the right question?
•
•
•
Reasoning
Learning
Natural Language
What’s the next best action?
What will happen when and why?
•
•
•
Optimization
Rules
Constraints
Predictive
What could happen?
What if these trends continue?
•
•
•
Machine Learning
Forecasting
Statistical Analysis
Descriptive
What has happened and why?
How many, how often, who & where?
•
•
•
Alerts & Drill Down
Ad hoc Reports
Standard Reports
Information
Layer
How do I integrate new data sources?
How is data managed and stored?
•
•
•
Big Data Platforms
Content Management
RDBMS and Integration
Cognitive
Prescriptive
Business Value
Big Data & Analytics Continuum
When entering the Big Data space, be cautious of your foundational competencies. Information
Management capabilities such as data integration, extensible data modeling, data quality and
data governance become even more important when dealing with these new, uncertain, high
volume data sources. Additionally, to achieve the full ROI, you must have mature analytics
methodology, appropriately skilled resources and technology.
Architecture Services
5
Use Case – Machine Learning: Advanced Analytics, Structured
Open Source R was chosen to accelerate the model development
process for the intern. Several external R packages were added to
complete the SVM capability in R as a desktop tool. Supplemental
data preparation of the S&P financial data was handled with
various scripts and spreadsheets.
The project will provide knowledge transfer to Freedom Specialty where
they currently intend to implement it in SAS.
Selected Results
Accrual Score (Bankruptcy) Prediction
The machine learning technique called Support
Vector Machine (SVM) was selected. This
supervised learning technique takes a set of factors
in a training set of labeled results and constructs a
model.
Model Validation
Results (Jan 2013)
Further Optimization
Pending
positive precision 0.81
positive recall 0.70
positive F1 score 0.75
negative precision 0.74
negative recall 0.83
negative F1 score 0.78
accuracy 0.77
Architecture Services
Cross Business Interest
•Freedom Specialty Insurance
•Enterprise Applications Investments
•NF opportunities just beginning to be explored
Although Freedom’s project was a predictive
modeling effort, the business is anxious to
pursue analyzing the “fine print” of unstructured
text in filings and media reports looking for red
flags to help triage the workload for analysts.
Principle: Start with solid advanced analytics
capabilities and add “Big Data” for added
ROI
Use Case – Speech Analytics: Volume, Variety (Unstructured)
Hypothesis:
Determine if there are certain words used more
prevalently during a first notice of loss call which would
indicate a fraudulent claim.
This will result in false
positives! Should be
combined with claims, billing,
contact history to enhance
accuracy of model.
 Convert first notice of loss call history to text and store in big data platform.
 Associate call text into two categories: those that resulted in fraud and
those that did not.
 Mine data for word patterns. Determine if there are differences in word
usage between fraudulent and non-fraudulent claims.
 Build model / rules to execute against call in real time using streaming
technology.
Principle: “Big data” does not replace your existing analytics using your structured data
warehouse. Big Data is simply an additional data set which enhances an existing set of
capabilities and should not be used out of context.
Architecture Services
7
New Roles, New Skills
Data Analyst / Data Scientist
Types of Tools Used
• What is Data Analysis?
• How do you recognize patterns in data?
• What is the process for inspecting the
data?
• How do you identify data cleansing and
transformation rules?
• Why / How do you visualize your findings
and information?
• How do you manage, manipulate and
query large, complex data on Hadoop as
an analyst?
• What statistical model is most
appropriate for the problem scenario?
What other type of model is appropriate?
•
•
•
•
Architecture Services
R
SPSS
Tableau
Data Mining tools such as Teradata
Miner
• Hadoop implementation specific tools
such as BigSQL & BigSheets (IBM)
Other Considerations
• Certifications: Certified Analytics
Professional from Informs
• Nationwide / IBM Client Center for
Advanced Analytics
Architecture Services
Appendix
9
More Terminology to Learn
Classes of Advanced
Analytics Problems
With a wide range of advanced modeling techniques…
• ARMA
• Linear Regression
• CART
• Logistic Regression
• CIR++
• Monte Carlo Simulation
• Classification
• Compression Nets
• Multinomial Logistic Regression
• Clustering
• Decision Trees
• Neural Networks
• Forecasting
• Discrete Time Survival
Analysis
• Optimization: LP; IP; NLP
• Optimization
• D-Optimality
• Restricted Boltzmann Machine
• Ensemble Model
• Sensitivity Trees
• Gaussian Mixture Model
• SVD, A-SVD, SVD++
• Genetic Algorithm
• Anomaly Detection
• SVM
• Gradient Boosted Trees
• Projection on Latent Structures
• Natural Language
Processing
• Hierarchical Clustering
• Spectral Graph Theory
• Intelligent Data Design
• K-Means
• Regression
• Simulation
• Sparse Data Inference
• Poisson Mixture Model
• Kalman Filter
• KNN
Architecture Services
10
Big Data Analytics – The Landscape
The technologies that deal with the big data
problems are broad and diverse, it is not
just Hadoop
Architecture Services
Presentation
Touchpoints – Just Two Use Cases
Architecture Services