Download Big Data in Munich RE

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

The Measure of a Man (Star Trek: The Next Generation) wikipedia , lookup

Pattern recognition wikipedia , lookup

Data (Star Trek) wikipedia , lookup

Time series wikipedia , lookup

Transcript
Center for International Earth Science Information Network - CIESIN - Columbia University. 2016. Gridded Population of the World,
Version 4 (GPWv4): Population Count. Palisades, NY: NASA Socioeconomic Data and Applications Center (SEDAC).
http://dx.doi.org/10.7927/H4X63JVC.
Big Data Analytics @ Munich Re
SAS Global Forum Executive Program - Orlando
Wolfgang Hauner
Chief Data Officer, Munich Re
Marc Wewers
IT Architect, Munich Re
Agenda
1
Data Analytics Framework
3
Method Example: AI
2
Technology
4
Case Study: Cross Selling
2
Big Data in Trend Radar
Augmented and virtual worlds
3D Printing
Computing
Everywhere
Industrialization 4.0
Big Data
Loc-based services
Telematics
Smart Home
Digitalization
Wearable Devices
Predictive
Analytics
Robotics/Drones
Internet
of Things
Open Data
Collaborative
Consumption
Big Data
Citizen Development
Mobile Health
Services
Crowdsourcing
Virtual
Assistant
Systems
Digital
Identity
Integrated
Systems
Cybersecurity
Automated
Decision
Taking
User Centered Design
On-Demand-Everything
Context-aware
Computing
Autonomous Systems
and Devices
Digitization
Risk-based
Security
Web 4.0
Web-Scale IT
Internet of Things
Software-defined
Anything
Haptic
Technologies
New Payment
Models
Cloud/Client
Architecture
3
When does it become BIG Data
Zettabyte
Exabyte
Petabyte
Terabyte
Gigabyte
Megabyte
Kilobyte
Byte
40,000,000,000,000,000,000,000
Google,
Facebook,
Microsoft, …
All words ever
spoken by
humans
Yes or No
Petabyte storage
big data platform
4 TB in Memory
Big Data Platform
MR
Data contained
in a library floor
3.5 inch
floppy disk
4 KB Commodore
VC 20
 43 zettabytes of data will probably be generated by 2020
 300 times the volume in 2005
Source: IBM
4
Big Data Analytics is a Combination of Methods,
Technology, Data and People
Technology
Data
 Hardware
(Compute power)
 Internal Data
 Software
(SAS, R, Spark, …)
 External Data
 Structured Data
 Unstructured Data
Methods
People
 Regression Models
 Data Scientists
 Machine Learning
Models
 Data Engineers
 Text Mining
Big Data
Analytics
 Business People
 IT Architects
5
Building the Team, and the Environment
Visualization
Modelling
Data Storage
Programming
BusinessUnits
Business-/
Domain
knowledge
System
Implementation
IT
Maths
Story-telling
DB Administration
Statistics
6
Building the infrastructure
BI Lab
Production
A2P
User Interface
User Interface
User Interface
User Interface
User Interface
User Interface
HANA
Hadoop Stack
SAS
HANA
Hadoop Stack
SAS
HANA
HANA
Long term unstructured and structured data
Data Lake (HDFS)
Industry icon, tool box and image: used under license from shutterstock.com
Other icons: Munich Re
7
Source: Munich Re
Wolfgang Hauner / Big Data Analytics & Artificial Intelligence
8
Which topics drive our clients?
Supply
Chain
Predictive
UW
Churn
Analysis
Textmining
Data
Sources
Up-/CrossSelling
Social Media
Analysis
Fraud
Detection
Big Data
Technology
Telematics
Geospatial
Sensor
Data/IoT
9
Big Data use cases in insurance
Make the uninsurable insurable
 Diabetics
 Wind Energy
Image: dpa Picture Alliance
Image: Getty Images
Consolidate the information and process
 Automated underwriting
 Risk management platform
Artificial Intelligence supported workflow
Image: used under license from shutterstock.com
Image: used under license from shutterstock.com
 Early Loss Detection
 Visual Loss Adjustment
Image: used under license from shutterstock.com
Image: Getty Images
10
Agenda
1
Data Analytics Framework
3
Method Example: AI
2
Technology
4
Case Study: Cross Selling
11
Design principles for Big Data & Analytics Platform
Self-Service
SAS & Hortonworks
Automation
Multi Tenancy
One Central Datalake
Hybrid
DevOps
Continious improvement
On-Prem & Cloud
12
Roadmap to Production via Lab environment
Design
Setup / Build


Setup of first
BI-Lab Hadoop
Cluster
Q2 - 2015
Enhance / Optimize
Run
Enhance / Optimize


Setup of new
BI-Lab Hadoop Cluster
Q3 - 2015
Stabilization of BI-Lab Hadoop cluster
Authentication & Security
Automation

On-boarding & support of
Big Data & Analytics pilots
Q4 - 2015
Pilot SAS – Hadoop Integration
New BI-Lab Hadoop Cluster available
• Large shared cluster
• Dedicated clusters
• Single-Node cluster
13
Building the Big Data & Analytics Platform
Production Environment
Design Setup / Build


Run
Enhance


Optimize
Enhance / Optimize


•
•
•
•
SAS 9.4 M3
SAS Visual Analytics (VA)
Self-Service Data Upload
SAS Embedded Process
for Hadoop
•
•
•
SAS Enterprise Guide (EG)
SAS MS Office Add-in
Data Access to SAP HANA,
Oracle & MS SQL-Server
•
•
•
SAS Enterprise Miner
SAS Contextual Analysis
SAS Mobile BI iOS App
•
•
SAS VA Row Level Security
SAS HA
•
•
•
•
•
HDP 2.3
Hue
Hive
Ambari
Ranger with LDAP
•
•
•
•
Sqoop
Pig
Spark 1.4
Oozie
•
•
•
•
•
HDP 2.4.2
Ambari Views
Spark 1.6
Solr Cloud
Tesseract
•
•
•
HDP 2.5
Atlas
Zeppelin
•
•
•
Data Catalogue Tool
Data Lineage
Compliance & Security
2016
12
01
Start setup platform
2017
02
03
04
05
06
07
08
09
10
2-week iterations with
Rolling Upgrades
Release v1.0
Release v2.0
Release v3.0
11
12
Release v4.0
14
Big Data & Analytics Production Environments
IT and Business Deployment
Scheduled
Analytics
Self-Service
Ad-hoc Analytics
Integration
Sandbox
IT Deployment
Production
Business
Deployment
Big Data & Analytics Production Environments
Scalability
Scalability
SAS
Production
HWX
Integration
Sandbox
LASR
LASR
EP
Hive
YARN
EP
Hive
YARN
LASR
LASR
EP
Hive
YARN
EP
Hive
YARN
SAS
LASR
EP
Hive
YARN
EP
EP
EP
EP
Hive
YARN
Hive
YARN
Hive
YARN
Hive
YARN
EP
Hive
YARN
EP
Hive
YARN
EP
Hive
YARN
EP
EP
Hive
YARN
Hive
YARN
LASR
„Simplified“ Server Architecture SAS and HDP
SAS
Mmgt &
Metadata
SAS
In-Memory
LASR
SAS
In-Memory
LASR
SAS
In-Memory
LASR
SAS
In-Memory
LASR
x<y
EP = Embedded Process
“bring calculation to data”
Ambari Views,
Zeppelin, …
Hadoop
Frontend
Hadoop
Mmgt &
Metadata
SAS EP
SAS EP
SAS EP
SAS EP
SAS EP
SAS EP
SAS EP
Spark
Spark
Spark
Spark
Spark
Spark
Spark
Solr
Solr
Solr
Sol
Solr
Solr
Solr
Hive
Hive
Hive
Hive
Hive
Hive
Hive
YARN
YARN
YARN
YARN
YARN
YARN
YARN
HDFS
HDFS
HDFS
HDFS
HDFS
HDFS
HDFS
Data Node 1
Data Node 2
Data Node 3
Data Node x
Data Node x+1
Data Node x+2
Data Node y
SERVER / OS
SERVER / OS
SERVER / OS
17
Lessons learned
1
Make use of Lab environment
4
Automatization
2
Enable Self-Service
5
Security
3
Agile IT-Project management
6
YARN queue management
18
Agenda
1
Data Analytics Framework
3
Method Example: AI
2
Technology
4
Case Study: Cross Selling
19
Artificial Intelligence (AI) is coming …
1st Machine Age
Automation of physical tasks
Image: used under license from shutterstock.com
2nd Machine Age
Automation of cognitive tasks
Image: dpa Picture Alliance
20
AI Evolution
Rule based
Hybrid
Image: dpa Picture Alliance / Stan_Honda
Learning based
Image: dpa Picture Alliance / Seth Wenig
Image: dpa Picture Alliance / Lee Jin-man
1997
2011
2016
 IBM’s deep blue defeats world
chess champion
 IBM’s Watson AI system wins
Jeopardy match against
human players
 Google’s DeepMind defeats top
ranked Go player (Lee Se-dol)
 Purely rule based
 Purely machine learning based
 Mixed machine learning and
rule based
21
Insurance specific AI
Insurance
specific AI
Image: used under license from
shutterstock.com
Image: dpa Picture Alliance
Image: dpa Picture Alliance
Image: used under license from
shutterstock.com
Munich Re as industry
leader in insurancespecific-AI
Image: used under license from
shutterstock.com
General AI
Image: dpa Picture Alliance
Google, Facebook,
Microsoft, Open AI, IBM,
Siemens, GE, Bosch, …
Image: used under license from
shutterstock.com
Image: Getty Images
Stanford, Oxford, MIT,
Open-Source
community, …
Image: used under license from
shutterstock.com
22
Methods
Neural Network
Input
Hidden
Output
No pothole identified
 System of interconnected nodes,
exchanging information
 Weights of connections can be
adjusted by supervised/
unsupervised “learning”
Image: Getty Images
Image: Getty Images
 Pros: Accuracy usually high,
prediction fast
Pothole identified
Image: used under license from shutterstock.com
Image: used under license from shutterstock.com
No pothole identified
Image: used under license from shutterstock.com
 Cons: “Black box” – acquired
knowledge not easily
comprehensible, training effort
high, appropriate data needed
 Application areas, e.g., speech
recognition, computer vision,
medical diagnosis, automated
trading, game-playing (AlphaGo)
Image: used under license from shutterstock.com
23
AI perspective in insurance
Sales
Underwriting
Claims
Insurance
value chain
Image: Emely / Cultura / Corbis
AI trends
Image: used under license from shutterstock.com
Image: used under license from shutterstock.com
Automated advisory
Automated underwriting
Smart claims support
 Complement face-to-face
interaction
 Leverage all data
and resources
 Enable fast response
 Enable innovative
distribution (e.g., P2P)
 Improve workflow and
customer experience
 Serve low volume, high
frequency segment
 Serve low volume, high
frequency segment
 Obtain more
detailed information
 Handle low volume,
high frequency claims
24
Use case example
Automated detection of triggers for insurance demand (for marketing campaigns)
German Geo Analytics Project
Project overview
 Using machine learning from geo data for insurance
 Implementation of algorithms for automated detection of triggers for insurance
demand from high resolution aerial images, e.g., solar panels
Image: dpa Picture Alliance
Achievements and outlook
 Algorithm for automated detection of solar panels and satellite dishes
implemented (high recognition performance)
 More detection capabilities (e.g., cars) and scaling to all of Germany easily
possible and planned for next project phase
Solar
panels
 First regional pilot currently in discussion
Image: dpa Picture Alliance
25
Agenda
1
Data Analytics Framework
3
Method Example: AI
2
Technology
4
Case Study: Cross Selling
26
Next Generation Sales Analytics Solutions:
Data Processing &
Data Enrichment
Machine Learning
Digital companies
established sales
analytics solutions
to improve
online-sales
$$
Scoring & Result Visualisation
Munich Re
transfers this
expertise to the
insurance sector
in order to predict
new cross-selling
opportunities
Verwendung unter Lizenz von Shutterstock.com
27
Our solution includes four main steps to improve cross-selling
for our client’s portfolio
$$
Data Processing &
Validation
Client data is
transmitted,
cleansed, validated
and enriched (using
external and MR
data).
Machine Learning
Using machine
learning methods
like “Random
Forrest”, hundreds
of decision trees will
be generated
simulating customer
characteristics.
Discuss Results
with Client
Campaign Design
and Execution
Results are
discussed with client.
Data Analysis is
modified, if
necessary, and a
detailed report on the
findings is sent to the
client.
Campaign is
designed and tested
on a small sample.
Findings of this test
are used to optimise
the strategy.
Campaign is carried
out.
28
How does it work? The portfolio is clustered based on
comparable characteristics
We make use of the methodology: “Clients who bought this, also bought that!”
Using machine learning methods, we identify typical
clusters of insureds that bought target product, e.g.
personal accident:
€75 average premium
3 policies
36 years old
6 years with client
Resides in N.
Germany
Afterwards, we match insureds that do not have the target
product yet with the clusters calculated in the first step.
This provides us with the individual probability of buying
the target product for a single insured.
Compare
€85 average premium
2 policies
32 years old
4 years with client
Resides in N.
Germany
Probability of
Buying: 37%
29
The clusters are the result of a data-driven decision tree
Description
Private Liability
= yes
Group A:
• N=300, Probability: 37%
Private Liability
= no
Group B:
• N=900, Probability: 18%
Northern
Germany
Group C:
• N=700, Probability: 9%
Southern
Germany
Group D:
• N=500, Probability: 3%
Duration > 8
Group E:
• N=500, Probability: 8%
Duration ≤ 8
Group F:
• N=250, Probability: 14%
Personal
Accident = yes
Group G:
• N=400, Probability: 9%
Personal
Accident = no
Group H:
• N=800, Probability: 2%
Building Ins. =
yes
# Policies ≥ 2
Building Ins. =
no
Age > 55
# Policies < 2
Age ≤ 55
 Clusters calculated are the
leaf nodes of a decision
tree.
 Each and every insured is
now assigned to a certain
leaf node according to his
or her attributes.
 The buying probability of a
customer is determined by
which leaf node he or she
is assigned.
 “Random Forest” is the
predominant machine
learning method in our
sales analytics approach
30
The clusters are visualised in an interactive report which can
be examined further
Each leaf node corresponds to one combined tile diagram:
Expected profit by sub-group
Frequency
Description
DESCRIPTION
 By selecting
a single tile,
more insight in the
Here, all insureds
our
composition
of theof
cluster
client
are visualized who
is available.
don’t have a HH.
 Again, the darker the color
of each tile, the higher the
Again, the darker the color of
profit potential.
each tile, the higher the profit
 The same
concept applies
potential.
for size. Tile size
corresponds
to the number
In addition,
the bigger
the tile,
of more
insureds
withininsureds
that
the
potential
cluster. are in.
Profit
31
The selected cluster is visualised and compared to the
whole portfolio
Portfolio visualisation:
Description
 The output shows the
distribution of different
characteristics within the
portfolio compared to the
chosen cluster.
 A comparison of cluster
characteristics against the
average portfolio reveals
new insights on targeted
customers.
32
Additional interesting findings can be made by comparing the
current hot spots of a product with the potential hot spots
Geocoded distribution of personal accident
Geocoded potential for personal accident
33
Campaign design and execution is an essential step to
harvest cross-selling opportunities
Define Campaigns from Analytics Results:
A/B-Testing
To ensure smooth transition and deployment of the
analysis steps, MR supports your sales campaign:

Discuss and set-up campaign strategy

Integrate existing campaign management

Select appropriate sales channels for
campaign (preferred: online channels)


Provide A/B-Testing approach to clarify on
ideal customer approach
Compare with recent campaign set-ups and
optimise first tests
Select top-scored
insureds with
highest potential
according to …
Insureds that don‘t
have target product
Sample A
x x
 
… existing
approach
Compare
hit ratio
… Munich Re‘s
approach
Sample B
x 
 
We support campaign design & execution to ensure full leverage of our Sales Analytics Solution
34
Image:
Bayerische
Zugspitzbahn
Bergbahn
/ Lechner
Center for International Earth Science Information Network - CIESIN
- Columbia
University.
2016. Gridded
Population AG
of the
World,
Version 4 (GPWv4): Population Count. Palisades, NY: NASA Socioeconomic Data and Applications Center (SEDAC).
http://dx.doi.org/10.7927/H4X63JVC.
Thank you!
Contact:
Wolfgang Hauner
Chief Data Officer, Munich Re
[email protected]
Marc Wewers
IT Architect, Munich Re
[email protected]
© 2017 Münchener Rückversicherungs-Gesellschaft © 2017 Munich Reinsurance Company