Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Center for International Earth Science Information Network - CIESIN - Columbia University. 2016. Gridded Population of the World, Version 4 (GPWv4): Population Count. Palisades, NY: NASA Socioeconomic Data and Applications Center (SEDAC). http://dx.doi.org/10.7927/H4X63JVC. Big Data Analytics @ Munich Re SAS Global Forum Executive Program - Orlando Wolfgang Hauner Chief Data Officer, Munich Re Marc Wewers IT Architect, Munich Re Agenda 1 Data Analytics Framework 3 Method Example: AI 2 Technology 4 Case Study: Cross Selling 2 Big Data in Trend Radar Augmented and virtual worlds 3D Printing Computing Everywhere Industrialization 4.0 Big Data Loc-based services Telematics Smart Home Digitalization Wearable Devices Predictive Analytics Robotics/Drones Internet of Things Open Data Collaborative Consumption Big Data Citizen Development Mobile Health Services Crowdsourcing Virtual Assistant Systems Digital Identity Integrated Systems Cybersecurity Automated Decision Taking User Centered Design On-Demand-Everything Context-aware Computing Autonomous Systems and Devices Digitization Risk-based Security Web 4.0 Web-Scale IT Internet of Things Software-defined Anything Haptic Technologies New Payment Models Cloud/Client Architecture 3 When does it become BIG Data Zettabyte Exabyte Petabyte Terabyte Gigabyte Megabyte Kilobyte Byte 40,000,000,000,000,000,000,000 Google, Facebook, Microsoft, … All words ever spoken by humans Yes or No Petabyte storage big data platform 4 TB in Memory Big Data Platform MR Data contained in a library floor 3.5 inch floppy disk 4 KB Commodore VC 20 43 zettabytes of data will probably be generated by 2020 300 times the volume in 2005 Source: IBM 4 Big Data Analytics is a Combination of Methods, Technology, Data and People Technology Data Hardware (Compute power) Internal Data Software (SAS, R, Spark, …) External Data Structured Data Unstructured Data Methods People Regression Models Data Scientists Machine Learning Models Data Engineers Text Mining Big Data Analytics Business People IT Architects 5 Building the Team, and the Environment Visualization Modelling Data Storage Programming BusinessUnits Business-/ Domain knowledge System Implementation IT Maths Story-telling DB Administration Statistics 6 Building the infrastructure BI Lab Production A2P User Interface User Interface User Interface User Interface User Interface User Interface HANA Hadoop Stack SAS HANA Hadoop Stack SAS HANA HANA Long term unstructured and structured data Data Lake (HDFS) Industry icon, tool box and image: used under license from shutterstock.com Other icons: Munich Re 7 Source: Munich Re Wolfgang Hauner / Big Data Analytics & Artificial Intelligence 8 Which topics drive our clients? Supply Chain Predictive UW Churn Analysis Textmining Data Sources Up-/CrossSelling Social Media Analysis Fraud Detection Big Data Technology Telematics Geospatial Sensor Data/IoT 9 Big Data use cases in insurance Make the uninsurable insurable Diabetics Wind Energy Image: dpa Picture Alliance Image: Getty Images Consolidate the information and process Automated underwriting Risk management platform Artificial Intelligence supported workflow Image: used under license from shutterstock.com Image: used under license from shutterstock.com Early Loss Detection Visual Loss Adjustment Image: used under license from shutterstock.com Image: Getty Images 10 Agenda 1 Data Analytics Framework 3 Method Example: AI 2 Technology 4 Case Study: Cross Selling 11 Design principles for Big Data & Analytics Platform Self-Service SAS & Hortonworks Automation Multi Tenancy One Central Datalake Hybrid DevOps Continious improvement On-Prem & Cloud 12 Roadmap to Production via Lab environment Design Setup / Build Setup of first BI-Lab Hadoop Cluster Q2 - 2015 Enhance / Optimize Run Enhance / Optimize Setup of new BI-Lab Hadoop Cluster Q3 - 2015 Stabilization of BI-Lab Hadoop cluster Authentication & Security Automation On-boarding & support of Big Data & Analytics pilots Q4 - 2015 Pilot SAS – Hadoop Integration New BI-Lab Hadoop Cluster available • Large shared cluster • Dedicated clusters • Single-Node cluster 13 Building the Big Data & Analytics Platform Production Environment Design Setup / Build Run Enhance Optimize Enhance / Optimize • • • • SAS 9.4 M3 SAS Visual Analytics (VA) Self-Service Data Upload SAS Embedded Process for Hadoop • • • SAS Enterprise Guide (EG) SAS MS Office Add-in Data Access to SAP HANA, Oracle & MS SQL-Server • • • SAS Enterprise Miner SAS Contextual Analysis SAS Mobile BI iOS App • • SAS VA Row Level Security SAS HA • • • • • HDP 2.3 Hue Hive Ambari Ranger with LDAP • • • • Sqoop Pig Spark 1.4 Oozie • • • • • HDP 2.4.2 Ambari Views Spark 1.6 Solr Cloud Tesseract • • • HDP 2.5 Atlas Zeppelin • • • Data Catalogue Tool Data Lineage Compliance & Security 2016 12 01 Start setup platform 2017 02 03 04 05 06 07 08 09 10 2-week iterations with Rolling Upgrades Release v1.0 Release v2.0 Release v3.0 11 12 Release v4.0 14 Big Data & Analytics Production Environments IT and Business Deployment Scheduled Analytics Self-Service Ad-hoc Analytics Integration Sandbox IT Deployment Production Business Deployment Big Data & Analytics Production Environments Scalability Scalability SAS Production HWX Integration Sandbox LASR LASR EP Hive YARN EP Hive YARN LASR LASR EP Hive YARN EP Hive YARN SAS LASR EP Hive YARN EP EP EP EP Hive YARN Hive YARN Hive YARN Hive YARN EP Hive YARN EP Hive YARN EP Hive YARN EP EP Hive YARN Hive YARN LASR „Simplified“ Server Architecture SAS and HDP SAS Mmgt & Metadata SAS In-Memory LASR SAS In-Memory LASR SAS In-Memory LASR SAS In-Memory LASR x<y EP = Embedded Process “bring calculation to data” Ambari Views, Zeppelin, … Hadoop Frontend Hadoop Mmgt & Metadata SAS EP SAS EP SAS EP SAS EP SAS EP SAS EP SAS EP Spark Spark Spark Spark Spark Spark Spark Solr Solr Solr Sol Solr Solr Solr Hive Hive Hive Hive Hive Hive Hive YARN YARN YARN YARN YARN YARN YARN HDFS HDFS HDFS HDFS HDFS HDFS HDFS Data Node 1 Data Node 2 Data Node 3 Data Node x Data Node x+1 Data Node x+2 Data Node y SERVER / OS SERVER / OS SERVER / OS 17 Lessons learned 1 Make use of Lab environment 4 Automatization 2 Enable Self-Service 5 Security 3 Agile IT-Project management 6 YARN queue management 18 Agenda 1 Data Analytics Framework 3 Method Example: AI 2 Technology 4 Case Study: Cross Selling 19 Artificial Intelligence (AI) is coming … 1st Machine Age Automation of physical tasks Image: used under license from shutterstock.com 2nd Machine Age Automation of cognitive tasks Image: dpa Picture Alliance 20 AI Evolution Rule based Hybrid Image: dpa Picture Alliance / Stan_Honda Learning based Image: dpa Picture Alliance / Seth Wenig Image: dpa Picture Alliance / Lee Jin-man 1997 2011 2016 IBM’s deep blue defeats world chess champion IBM’s Watson AI system wins Jeopardy match against human players Google’s DeepMind defeats top ranked Go player (Lee Se-dol) Purely rule based Purely machine learning based Mixed machine learning and rule based 21 Insurance specific AI Insurance specific AI Image: used under license from shutterstock.com Image: dpa Picture Alliance Image: dpa Picture Alliance Image: used under license from shutterstock.com Munich Re as industry leader in insurancespecific-AI Image: used under license from shutterstock.com General AI Image: dpa Picture Alliance Google, Facebook, Microsoft, Open AI, IBM, Siemens, GE, Bosch, … Image: used under license from shutterstock.com Image: Getty Images Stanford, Oxford, MIT, Open-Source community, … Image: used under license from shutterstock.com 22 Methods Neural Network Input Hidden Output No pothole identified System of interconnected nodes, exchanging information Weights of connections can be adjusted by supervised/ unsupervised “learning” Image: Getty Images Image: Getty Images Pros: Accuracy usually high, prediction fast Pothole identified Image: used under license from shutterstock.com Image: used under license from shutterstock.com No pothole identified Image: used under license from shutterstock.com Cons: “Black box” – acquired knowledge not easily comprehensible, training effort high, appropriate data needed Application areas, e.g., speech recognition, computer vision, medical diagnosis, automated trading, game-playing (AlphaGo) Image: used under license from shutterstock.com 23 AI perspective in insurance Sales Underwriting Claims Insurance value chain Image: Emely / Cultura / Corbis AI trends Image: used under license from shutterstock.com Image: used under license from shutterstock.com Automated advisory Automated underwriting Smart claims support Complement face-to-face interaction Leverage all data and resources Enable fast response Enable innovative distribution (e.g., P2P) Improve workflow and customer experience Serve low volume, high frequency segment Serve low volume, high frequency segment Obtain more detailed information Handle low volume, high frequency claims 24 Use case example Automated detection of triggers for insurance demand (for marketing campaigns) German Geo Analytics Project Project overview Using machine learning from geo data for insurance Implementation of algorithms for automated detection of triggers for insurance demand from high resolution aerial images, e.g., solar panels Image: dpa Picture Alliance Achievements and outlook Algorithm for automated detection of solar panels and satellite dishes implemented (high recognition performance) More detection capabilities (e.g., cars) and scaling to all of Germany easily possible and planned for next project phase Solar panels First regional pilot currently in discussion Image: dpa Picture Alliance 25 Agenda 1 Data Analytics Framework 3 Method Example: AI 2 Technology 4 Case Study: Cross Selling 26 Next Generation Sales Analytics Solutions: Data Processing & Data Enrichment Machine Learning Digital companies established sales analytics solutions to improve online-sales $$ Scoring & Result Visualisation Munich Re transfers this expertise to the insurance sector in order to predict new cross-selling opportunities Verwendung unter Lizenz von Shutterstock.com 27 Our solution includes four main steps to improve cross-selling for our client’s portfolio $$ Data Processing & Validation Client data is transmitted, cleansed, validated and enriched (using external and MR data). Machine Learning Using machine learning methods like “Random Forrest”, hundreds of decision trees will be generated simulating customer characteristics. Discuss Results with Client Campaign Design and Execution Results are discussed with client. Data Analysis is modified, if necessary, and a detailed report on the findings is sent to the client. Campaign is designed and tested on a small sample. Findings of this test are used to optimise the strategy. Campaign is carried out. 28 How does it work? The portfolio is clustered based on comparable characteristics We make use of the methodology: “Clients who bought this, also bought that!” Using machine learning methods, we identify typical clusters of insureds that bought target product, e.g. personal accident: €75 average premium 3 policies 36 years old 6 years with client Resides in N. Germany Afterwards, we match insureds that do not have the target product yet with the clusters calculated in the first step. This provides us with the individual probability of buying the target product for a single insured. Compare €85 average premium 2 policies 32 years old 4 years with client Resides in N. Germany Probability of Buying: 37% 29 The clusters are the result of a data-driven decision tree Description Private Liability = yes Group A: • N=300, Probability: 37% Private Liability = no Group B: • N=900, Probability: 18% Northern Germany Group C: • N=700, Probability: 9% Southern Germany Group D: • N=500, Probability: 3% Duration > 8 Group E: • N=500, Probability: 8% Duration ≤ 8 Group F: • N=250, Probability: 14% Personal Accident = yes Group G: • N=400, Probability: 9% Personal Accident = no Group H: • N=800, Probability: 2% Building Ins. = yes # Policies ≥ 2 Building Ins. = no Age > 55 # Policies < 2 Age ≤ 55 Clusters calculated are the leaf nodes of a decision tree. Each and every insured is now assigned to a certain leaf node according to his or her attributes. The buying probability of a customer is determined by which leaf node he or she is assigned. “Random Forest” is the predominant machine learning method in our sales analytics approach 30 The clusters are visualised in an interactive report which can be examined further Each leaf node corresponds to one combined tile diagram: Expected profit by sub-group Frequency Description DESCRIPTION By selecting a single tile, more insight in the Here, all insureds our composition of theof cluster client are visualized who is available. don’t have a HH. Again, the darker the color of each tile, the higher the Again, the darker the color of profit potential. each tile, the higher the profit The same concept applies potential. for size. Tile size corresponds to the number In addition, the bigger the tile, of more insureds withininsureds that the potential cluster. are in. Profit 31 The selected cluster is visualised and compared to the whole portfolio Portfolio visualisation: Description The output shows the distribution of different characteristics within the portfolio compared to the chosen cluster. A comparison of cluster characteristics against the average portfolio reveals new insights on targeted customers. 32 Additional interesting findings can be made by comparing the current hot spots of a product with the potential hot spots Geocoded distribution of personal accident Geocoded potential for personal accident 33 Campaign design and execution is an essential step to harvest cross-selling opportunities Define Campaigns from Analytics Results: A/B-Testing To ensure smooth transition and deployment of the analysis steps, MR supports your sales campaign: Discuss and set-up campaign strategy Integrate existing campaign management Select appropriate sales channels for campaign (preferred: online channels) Provide A/B-Testing approach to clarify on ideal customer approach Compare with recent campaign set-ups and optimise first tests Select top-scored insureds with highest potential according to … Insureds that don‘t have target product Sample A x x … existing approach Compare hit ratio … Munich Re‘s approach Sample B x We support campaign design & execution to ensure full leverage of our Sales Analytics Solution 34 Image: Bayerische Zugspitzbahn Bergbahn / Lechner Center for International Earth Science Information Network - CIESIN - Columbia University. 2016. Gridded Population AG of the World, Version 4 (GPWv4): Population Count. Palisades, NY: NASA Socioeconomic Data and Applications Center (SEDAC). http://dx.doi.org/10.7927/H4X63JVC. Thank you! Contact: Wolfgang Hauner Chief Data Officer, Munich Re [email protected] Marc Wewers IT Architect, Munich Re [email protected] © 2017 Münchener Rückversicherungs-Gesellschaft © 2017 Munich Reinsurance Company