Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Big Data Analytics Analysis of high-volume and unstructured Data Stefan Weingaertner, DYMATRIX CONSULTING GROUP KNIME Meetup Italia, 10th October 2013 1 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP Agenda 2 1 Company Introduction 2 Big Data - an Introduction 3 Big Data Analytics on high-volume Data 4 Big Data Analytics on unstructured Data 5 Livedemo: Advanced Email Classification 6 Q&A KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP Company Introduction 3 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP DYMATRIX – The analytical CRM Company » Solution provider for Customer Intelligence, Marketing Automation and Advanced Predictive Analytics » Consulting, development and implementation know how, based upon more than 900 projects with mid- and large cap companies across industries » Goal- and client- oriented project execution based upon award winning, established solutions » Owner managed and independent 4 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP Our Consulting Competence Centers Business Intelligence » Conception of (big) data warehouse and business intelligence architectures Customer Segmentation Customer Value Analysis » Propensity Modeling (Cross-/Upsell/Churn) » Shopping Basket Analysis Planning & Forecasting » Credit Rating Analysis & Credit Scoring Balanced Scorecard » Text Mining » Data Mining Automation » Big Data Analytics Enterprise Reporting Systems » Dashboards » » Sales Controlling » » » » Campaign Management Advanced Analytics » » » E-commerce insight Design and Optimization of Campaign Processes and Workflows » Web Tracking » Web Controlling » Web Mining Implementation of Campaign Management Systems » Real Time Recommendation » Social Media Tracking & Analysis » Web Performance Measurement » Customer Journey Analytics Integration of Data Mining Models in Campaign Processes » Campaign Optimization » Consulting & Implementation of Next Best Activity Processes Analysis of client oriented processes Initial situation – Analysis – Conception of processes for customer retention and its optimization customer reactivation and new customer activation – benchmarking against industry leaders 5 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP Solution Portfolio – The Customer Insight Suite DynaCampaign » Intelligent multi-touchpoint campaign management platform » Planning, target group selection, execution and response measurement of campaigns » Event-triggered realtime campaigning DynaMine » End2end automation of data mining processes » Intelligent model management for automation of preprocessing, training & scoring of models DynaCision » Realtime decision management platform » Design & exection of complex embedded decision processess DynaSocial » 6 KNIME Meetup Italia 2013 Social CRM platform to listen, track, identify and quantify customer needs and sentiments © DYMATRIX CONSULTING GROUP Our KNIME Solution Nodes & KNIME Consulting Services PMML2SQL / PMML2SAS Converter » Convert PMML to executable SQL Code for InDatabase-Scoring » Convert PMML to executable SAS Code for Model Scoring within SAS Big Data Integration + Business Consulting + Analytical Consulting + Technical Consulting + Trainings » Access any Hadoop large-scale distributed batch processing infrastructure from KNIME » Efficiently distribute large amounts of data & preprocessing across a set of machines Uplift Modeling » Predictive Modeling Nodes to predict the incremental response to marketing actions » For up-sell, cross-sell, churn and retention activities Interactive Scorecard Builder » 7 KNIME Meetup Italia 2013 interactive Scorecard Building Nodes for Design of Credit or Marketing Scorecards © DYMATRIX CONSULTING GROUP References Referenzen Telecommunication 8 KNIME Meetup Italia 2013 Travel, Transportation Retail, Service Provider © DYMATRIX CONSULTING GROUP References Banks, Insurances Media Utilities, Industries, Public Schwäbisch Hall 9 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP Big Data - an Introduction 10 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP A Characterization of Big Data Structured & Unstructured Structured Batch Big Data Zettabyte Streaming Terabyte Volume Source: Understanding Big Data (Zikopolous et al.), 2012 11 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP Challenge: Big Data Collection & Integration Needs Remember Possibilities Service & Support Decisions Usage Approach Delivery Purchase Source: Phil Winters, 2011 12 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP Big Data Analytics: Learn, Target & Influence! Needs Remember Possibilities Service & Support Decisions Usage Approach Delivery Purchase Source: Phil Winters, 2011 13 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP Big Data Analytics on high-volume Data Structured & Unstructured Structured Batch Big Data Zettabyte Streaming Terabyte Volume 14 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP Hive HBase MapReduce Routines Mahout Hadoop Extensions Analytic Applications Big Data Access Big Data Sources Hadoop Distributed File System (HDFS) Hadoop Core MapReduce 15 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP Hive HBase MapReduce Routines Mahout Hadoop Extensions Analytic Applications Big Data Analytics PMML2SQL Converter Big Data Sources Hadoop Distributed File System (HDFS) Hadoop Core MapReduce 16 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP Big Data Analytics on unstructured Data Structured & Unstructured Structured Zettabyte Batch Big Data Streaming Terabyte Volume 17 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP Big Data is not just about structured data… 80% 80% of the world’s data is unstructured. Unstructured data is growing at 15 15times times the rate of structured data. Source: Google Trends April 6, 2012 18 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP Imagine… » …to classify all customer related text messages by » …to identify unknown trends » …to identify cause and effect relations » …to react on that information, e.g. Source / Origin Sentiment Technical Problems Product or Service Needs Business Transaction Usability Context Competition etc. etc. The KNIME platform supports these efforts with comprehensive Text Analytics & Network Analytics capabilities! 19 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP Deutsche Telekom: Social Earthquake Facebook Posts & Comments March & April 2013 1000 First Rumours: Limitation of Bandwidth (21.3. – 23.3.) „DSL-Drossel“: Official Pressrelease on Limitation of Bandwidth leads to a Social Earthquake. (22.4. – 27.4.) 800 600 Negativ Neutral 400 Positiv 200 0 1. Mrz. 20 8. Mrz. KNIME Meetup Italia 2013 15. Mrz. 22. Mrz. 29. Mrz. 5. Apr. 12. Apr. 19. Apr. 26. Apr. © DYMATRIX CONSULTING GROUP DYMATRIX Text Mining Process 21 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP DYMATRIX Text Mining Process (KNIME Text Processing) Text Datasources Datasources: • Facebook • Twitter • Emails • Data Provider like GNIP, Datasift etc. • Crawled Data • etc. For Machine Learning • Provide Training Data for Classification (e.g. Sentiment) 22 KNIME Meetup Italia 2013 Text Enrichment Language Detection • English • German • Many more… Language individual NLP POS Tagging • Penn Treebank Tagger • STTS Tagger Text Cleansing • Stop Words • Punctuations • Stemming Sentiment Amplifier • Matching of Sentiment- & EmoticonDictionaries Subject Matching Text Tagging with any Subjects • Products • Brands • Business Transactions • Service • Complaints • Requests • etc. Fuzzy Matching with Dictionary Tagger • Matching of SubjectDictionaries Sentiment Classification Text Vectorization • Creation of text predictors to predict sentiments Machine Learning • Classification with Predictive Analytics (e.g. Decision Tree) Retraining Interface • Adjustment of misclassified messages for permanent optimization of classification Information Delivery Text Data Mart • Make information available in central Text Data Mart for visualization, alerting etc. Fields of Application • Email-Routing • Event triggered Campaign Management • etc. © DYMATRIX CONSULTING GROUP DYMATRIX Text Mining Process: Datasources Text Datasources Text Enrichment Subject Matching Sentiment Classification Information Delivery Access any Text Datasource to start the Text Mining Process » » » » » Facebook Twitter Emails Crawler Data Provider like GNIP, Datasift etc. Exemplified contribution on Facebook Fanpage Vodafone UK 23 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP DYMATRIX Text Mining Process: Text Enrichment Text Datasources Text Enrichment Original Facebook Message Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it. Subject Matching Sentiment Classification Information Delivery Sentiment Amplifier Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap [----] signal but yet paying FULL monthly contract! Vodafone sort it. Penn Treebank POS Tagger (English Messages) Why[WRB] not[RB] sort[VBG] your[PRP] signal[VBP] issues [VBZ] out[IN] instead[RB] of[IN] bringing[VBG] new[JJ] phones[NNS]!!!![SYM] Wk[NNP] 3[CD] of[IN] crap[NN] but[CC] yet[RB] paying[VBG] FULL[NNP] monthly[RB] contract[NN] ![SYM] Vodafone[NNP] sort[VBG] it[PRP] .[SYM] 24 KNIME Meetup Italia 2013 Removal of Stop Words & Punctuations sort[VBG] signal[VBP] issues [VBZ] instead[RB] bringing[VBG] phones[NNS] Wk[NNP] 3[CD] crap[NN] paying[VBG] monthly[RB] contract[NN] Vodafone[NNP] © DYMATRIX CONSULTING GROUP DYMATRIX Text Mining Process: Subject Matching Text Datasources Text Enrichment Subject Matching Sentiment Classification Information Delivery BUSINESS TRANSACTION: Complaint Original Facebook Message Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it. NETWORK: No Signal Subject Matching (Fuzzy Matching) Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal [NETWORK] but yet paying FULL monthly contract! Vodafone sort it [COMPLAINT]. 25 KNIME Meetup Italia 2013 PRODUCT: Nokia Lumia 925 © DYMATRIX CONSULTING GROUP DYMATRIX Text Mining Process: Sentiment Classification Text Datasources Text Enrichment Subject Matching Sentiment Classification Information Delivery Text Classification with Decision Tree Original Facebook Message Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it. Output from Text Enrichment Text Vectorization (Transformation) Predictors relevant for Text Classification , e.g. - Emoticons positive/negative - Fragments positive/negative - Words positive/negative - Author-related Inputs 26 KNIME Meetup Italia 2013 Resulting Classification - Length of message - Likes - Comments - Other linguistic Inputs © DYMATRIX CONSULTING GROUP DYMATRIX Text Mining Process: Information Delivery Text Enrichment Text Datasources Subject Matching Sentiment Classification Information Delivery Visualization in DynaSocial Make information available in central Text Data Mart Original Facebook Message Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it. + Sentiment Business Transaction + 27 KNIME Meetup Italia 2013 Product Relevance + + Network Other Fields of Application » Subject-oriented Email-Classification & Email-Routing © DYMATRIX CONSULTING GROUP DYMATRIX Text Mining Process: KNIME Workflow 28 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP Benefits 29 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP KNIME Server: Develop once, deploy everywhere! » Text Enrichment & Classification Workflows can be used for classification of any electronic text message (e.g. Social Content, Blogs, Emails). » KNIME Server-based Text Enrichment & Classification Workflows can be deployed as a webservice and called easily from any other application. Benefits » Uniformed Sentiment- and Classification-Handling for all customerrelated messages. » Batch- or Realtime-Execution from any application. 30 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP Application Integration I: DynaSocial Social Media Monitoring & Analytics 31 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP DynaSocial – Social Media Excellence Architecture Social Media Analytics Content Extractor Advanced Social Media Analytics Text Mining & Network Mining Social Media Analytics Dashboard Text Enrichment & Classification Network Insights Facebook Twitter Social Media Data Provider Social Media Analytics Data Management Social Service Platforms Client individual Sources Emails Generic Big Data Model Social Engagement Integrated Social Inbox including all Social Touchpoints DynaSocial Configuration Center Data Sources 32 KNIME Meetup Italia 2013 Sentiments & Classifications Reports & Dashboard © DYMATRIX CONSULTING GROUP DynaSocial Management Dashboard Activities Platform Distribution Overall Sentiments Sentiment Ratio Trends compared to competition (Share of Voice) Top Keywords Key Influencer Geographic Distribution Flexible Selection of Time Windows … 33 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP DynaSocial Management Dashboard (Project Example) 34 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP Application Integration II: Advanced Email-Classification Multidimensional realtime Email-Classification 35 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP Email Classification: MS Exchange Connector 2 .NET Batch Call .NET Procedure and transfer email contents to KNIME Server via Webservice Call. 3 KNIME Server Call KNIME Text Enrichment & Classification Workflows und return classification results. Microsoft Outlook 36 KNIME Meetup Italia 2013 1 Incoming Email 4 Classification results are returned to Exchange Server and are saved persistantly with object categories. 5 Any clients having access to Exchange Server get the same classification. Microsoft Exchange Webservice Microsoft Outlook Webaccess Other Email-Clients © DYMATRIX CONSULTING GROUP Livedemo Realtime EmailClassification 37 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP Q&A 38 KNIME Meetup Italia 2013 © DYMATRIX CONSULTING GROUP Contact DYMATRIX CONSULTING GROUP GmbH Zeppelin Carré Lautenschlagerstrasse 2 D-70173 Stuttgart Your Contact: Stefan Weingaertner Thank you for your attention. We are happy to answer any of your questions! 39 KNIME Meetup Italia 2013 Phone Fax E-Mail Web +49.711.22.007.88 - 12 +49.711.22.007.88 - 88 [email protected] www.dymatrix.de © DYMATRIX CONSULTING GROUP