Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Oracle Advanced Analytics, Data Mining, Predictive Analytics, Big Data, Exalytics—What, Where, When? Charlie Berger Sr. Director Product Management, Data Mining and Advanced Analytics Oracle Corporation [email protected] www.twitter.com/CharlieDataMine Sources for “Big Data” are Growing • 383+ Million Twitter accounts (100m+ tweeting) • 835+ Million Facebook subscribers • 1.2+ Billion Mobile Web users • Over 6 million OnStar subscribers 2 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Structured Data & “Big Data” Structured data from applications. 3 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Semi-structured “Big Data” from social media and logs, sensors, feeds, etc. “Big Data” + “Big Data Analytics” “There was 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days, and the pace is increasing.” 1.8 trillion gigabytes of data was created in 2011… More than 90% is unstructured data (IN BILLIONS) GIGABYTES OF DATA) CREATED 10,000 5,000 Approx. 500 quadrillion files - Google CEO Eric Schmidt Quantity doubles every 2 years Requires capability to rapidly: Collect and Integrate Understand Respond and Act 0 2005 4 2010 Content Provided By Cloudera. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 2015 Source: IDC 2011 STRUCTURED DATA UNSTRUCTURED DATA "Big data" warrants innovative processing solutions for a variety of new and existing data to provide real business benefits. But processing large volumes or wide varieties of data remains merely a technological solution unless it is tied to business goals and objectives. 5 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Customer Information Suddenly Worth Billions There's another mining boom you may have missed http://www.theage.com.au/technology/technology-news/digging-for-data-the-new-mining-boom-20120511-1yhu5.html#ixzz1uwlsl0gk “It's about building algorithms and crunching facts and numbers. It's mining for data. Big data is the new business black. It's a catch-all phrase for the billions of transactions and other bits of information about their customers, suppliers and operations logged by businesses and governments the world over every day. Yesterday's storage problem has become today's strategic asset. Turns out there's gold in them thar files. Enterprises are using data analysis not just to improve their everyday business processes, but also to build predictive models of consumer behavior. Retailers, telcos, airlines, hotels, health care and credit card companies are among those with information-rich customer data.‖ 6 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Sample of Big Data Analytics Use Cases Today AUTOMOTIVE Auto sensors reporting location, problems HIGH TECHNOLOGY / INDUSTRIAL MFG. Mfg quality Warranty analysis OIL & GAS Drilling exploration sensor analysis COMMUNICATIONS Location-based advertising CONSUMER PACKAGED GOODS Sentiment analysis of what’s hot, problems LIFE SCIENCES Clinical trials Genomics MEDIA/ ENTERTAINMENT Viewers / advertising effectiveness RETAIL Consumer sentiment Optimized marketing TRAVEL & TRANSPORTATION Sensor analysis for optimal traffic flows Customer sentiment FINANCIAL SERVICES Risk & portfolio analysis ON-LINE SERVICES / SOCIAL MEDIA People & career matching Web-site optimization UTILITIES Smart Meter analysis Challenged by: Data Volume, Velocity, Variety 7 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. EDUCATION & RESEARCH Experiment sensor analysis HEALTH CARE Patient sensors, monitoring, EHRs Quality of care LAW ENFORCEMENT & DEFENSE Threat analysis social media monitoring, photo analysis Oracle Big Data and Big Data Analytics Platform Accelerate time to market & reduce risk with end-to-end solution Stream Acquire Organize /Discover Analyze Visualize /Decide Oracle is the industry leader in database and information management. Oracle provides all the components you need to get real results from your big data initiatives 8 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle Exalytics BI at the Speed of Thought • Oracle Exalytics In-Memory Machine is the world's first engineered system specifically designed to deliver high performance analysis, modeling and planning • Built using industry-standard hardware, marketleading business intelligence software and inmemory database technology • Oracle Exalytics is an optimized system that delivers answers to all your business questions with unmatched speed, intelligence, simplicity and manageability • Oracle Exalytics delivers BI and Advanced/Predictive Analytics at the the Speed of Thought 9 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Advanced/Predictive Analytics R Oracle Real-Time Decisions Powering the Intelligent Enterprise and Decision Framework • Decision Management • Collaborative environment to define decision management strategies • Business user controls over decision optimization logic • Cross-channel customer experience management framework Choices / Assets Real-time Context + Historical Data Performance Goals • Learning Engine • Automatically learn from each interaction and discover important correlations • Learning can be analyzed by way of user friendly reports • Learning can be used to make predictions Rules & Predictive • Can be deployed independently from decision engine Models Decisions • Decision Engine • Combines rules and [automated] predictive models to define contextual, optimal and personalized decision logic • Decision logic is highly scalable and self-adjusts based on company defined performance goals • Can make SQL calls to previously built ODM models 10 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Closed-Loop Learning Insight & Foresights Recommendations Oracle Advanced Analytics, Data Mining, Predictive Analytics, Big Data, Exalytics—What, Where, When? 11 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. ―But the bigger you get, the more likely you are to be doing extensive data mining and the more likely you are to be implementing or moving towards in-database analytics. .‖ Quote from “Customary Data Warehouse Concepts vs. Hadoop: Forrester Makes the Call”, Mark Brunelli, Senior News Editor This RSS Reprints Published: 11 Aug 2011 http://searchdatamanagement.techtarget.com/news/2240039468/Customary-data-warehouse-concepts-vs-Hadoop-Forrester-makes-the-call?vgnextfmt=print 12 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle’s Big Data/Big Data Analytics Integrated Solution Endeca Information Discovery Oracle Big Data Appliance Cloudera Hadoop Oracle NoSQL Open-Source R Acquire 13 Oracle Exadata Big Data Connectors InfiniBand Oracle Data Integrator Oracle Advanced Analytics (ODM + ORE) InfiniBand Oracle Business Intelligence Oracle Database Organize & Discover Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle Exalytics Analyze Oracle Real-Time Decisions Decide Oracle Advanced Analytics—What, Where, When? Do I have some data and a problem to solve? No Yes No Does the data fit on a piece of paper? Is the data stored in an Oracle Database? Yes No Do you need realtime predictions Are you looking for complex patterns and relationships? Yes Use Oracle Oracle Data Use Data Mining Mining +/or Oracle Oracle R Enterprise +/or Enterprise Consider storing data in Oracle Is the problem mostly sum, %s, pie charts and maybe a map? Is the data outside of the database? Yes Yes Print it out or use Excel Use OBIEE No Consider RTD Is OBIEE Fast Enough? Consider Predictive Analytics Consider Exalytics Bummer! Inspired by ―Do I Need Hadoop or SQL? Decision flow chart https://s3.amazonaws.com/aaroncordova-published/DataFlowchart.svg 14 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Data Mining / Predictive Analytics Overview and Value • Predictive analytics enables you to develop mathematical models to help you better understand the variables driving success • Predictive analytics relies on formulas that compare past successes and failures, and then uses those formulas to predict future outcomes • Predictive analytics, pattern recognition, and classification problems have been long used in the financial services and insurance industries • Predictive analytics is about using statistics, data mining, and game theory to analyze current and historical facts in order to make predictions about future events. • The value of predictive analytics is obvious. The more you understand customer behavior and motivations, the more effective your marketing will be. – The more you understand why some customers are loyal and how to attract and retain different customer segments, the more you can develop relevant, compelling messages and offers. http://www.marketingprofs.com/articles/2010/3567/the-nine-most-common-data-mining-techniques-used-in-predictive-analytics 15 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Advanced Analytics Becoming Strategic and Mission Critical Competing on Analytics, by Tom Davenport ―Some companies have built their very businesses on their ability to collect, analyze, and act on data.‖ 16 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle Advanced Analytics Option Extending the Database into a Comprehensive Advanced Analytics Platform • Big Data Analytics—Oracle Advanced Analytics Option – Oracle Data Mining • SQL & PL/SQL focused in-database data mining and predictive analytics – Oracle R Enterprise • Integrates Open Source R statistical programming language with the Oracle Database • STRATEGY: – Extend Oracle Database into comprehensive adv. analytics platform • More than a “tool” for data analysts – Enable ―next-gen‖ enterprise-wide advanced analytical Applications 17 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. R Oracle In-Database Advanced Analytics Comprehensive Advanced Analytics Platform Oracle R Enterprise Oracle Data Mining • Popular open source R statistical programming language & environment • Integrated with database for scalability • Wide range of statistical and advanced analytical functions • R embedded in enterprise appls & OBIEE • Exploratory data analysis • Extensive graphics • Open source R (CRAN) packages • Integrated with Hadoop for HPC • SQL kernel; automated knowledge discovery inside the Database • 12 in-database data mining algorithms • Text mining • Predictive analytics applications development environment • Star schema and transactional data mining • Exadata "scoring" of ODM models • SQL Developer/Oracle Data Miner GUI Statistics 18 R Advanced Analytics Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Data & Text Mining Predictive Analytics Independent Samples T-Test (Pooled Variances) • Query compares the mean of AMOUNT_SOLD between MEN and WOMEN within CUST_INCOME_LEVEL ranges. Returns observed t value and its related two-sided significance SELECT substr(cust_income_level,1,22) income_level, avg(decode(cust_gender,'M',amount_sold,null)) sold_to_men, avg(decode(cust_gender,'F',amount_sold,null)) sold_to_women, stats_t_test_indep(cust_gender, amount_sold, 'STATISTIC','F') t_observed, stats_t_test_indep(cust_gender, amount_sold) two_sided_p_value FROM sh.customers c, sh.sales s WHERE c.cust_id=s.cust_id GROUP BY rollup(cust_income_level) ORDER BY 1; SQL Plus 19 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Data Mining Provides Better Information, Valuable Insights and Predictions Cell Phone Churners vs. Loyal Customers Segment #3: IF CUST_MO > 7 AND INCOME < $175K, THEN Prediction = Cell Phone Churner, Confidence = 83%, Support = 6/39 Insight & Prediction Segment #1: IF CUST_MO > 14 AND INCOME < $90K, THEN Prediction = Cell Phone Churner, Confidence = 100%, Support = 8/39 Customer Months Source: Inspired from Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management by Michael J. A. Berry, Gordon S. Linoff 20 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Data Mining Provides Better Information, Valuable Insights and Predictions Cell Phone Fraud vs. Loyal Customers ? Customer Months Source: Inspired from Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management by Michael J. A. Berry, Gordon S. Linoff 21 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Finding Needles in Haystacks • Haystacks are usually BIG • Needles are typically small and 22 rare Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Look for What is “Different” 23 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. A Real Fraud Example • My actual credit card statement—Can you see the fraud? May 22 1:14 PM FOOD May 22 7:32 PM WINE … Gas Station? June 14 2:05 PM MISC June 14 2:06 PM MISC June 15 11:48 AM MISC June 15 11:49 AM MISC May 28 6:31 PM WINE May 29 8:39 PM FOOD June 16 11:48 AM MISC June 16 11:49 AM MISC All same $75 amount? 24 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Monaco Café Wine Bistro $127.38 $28.00 Mobil Mart Mobil Mart Mobil Mart Mobil Mart Acton Shop Crossroads Mobil Mart Mobil Mart $75.00 $75.00 $75.00 $75.00 $31.00 $128.14 $75.00 $75.00 Monaco? Pairs of $75? Oracle’s In-Database Advanced Analytics Option Value Proposition •10-100x PERFORMANCE – Integrated features of the Database – Perform analytics in-DB to eliminate data movement – Reduce information latency: days-wks mins-hours •10x LOWER TOTAL COST OF OWNERSHIP – Eliminate/minimize expensive annual usage fees associated with traditional stats/mining packages – Leverage Oracle DB, DW & BI technology platform 25 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. In-Database Data Mining Traditional Analytics Oracle Data Mining Results Data Import Data Mining Model “Scoring” Data Preparation and Transformation Savings Data Mining Model Building Data Prep & Transformation Model ―Scoring‖ Data remains in the Database Embedded data preparation Data Extraction Cutting edge machine learning algorithms inside the SQL kernel of Database Model “Scoring” Embedded Data Prep Model Building Data Preparation Hours, Days or Weeks Source Data 26 • Faster time for “Data” to “Insights” • Lower TCO—Eliminates • Data Movement • Data Duplication • Maintains Security Dataset s/ Work Area Analytic al Process ing Process Output Target Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Secs, Mins or Hours SQL—Most powerful language for data preparation and transformation Data remains in the Database Big Data Analysis Example Social Network Analysis (SNA) • Identify social relationships – Communities, friends and families – Hubs, influencers, lone wolfs, • SNA-based strategies for Acquisition, Retention, and Customer Value Growth – Word of mouth – Promote positive messages – Suppress negative effects 27 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Big Data Analytics in Retail Deeper Analytics Leverage all customer touch-point information Consider each customer’s demographics and past and most recent POS behavior Deeper Analytics POS data—shifting “market basket” items (Target Stores predicts women is pregnant) Product description data—identify product clusters (“Green” products, “Favorite colors”) Mine both and identify customer segments (“Country Squires”, “Green”, “New Empty Nests”) Track and monitor shifts in customer behaviors and household purchases Target promotions for up-selling and cross-selling Anticipate customer’s likelihood to respond and optimize selling strategies Deploy predictive models for real-time customer recommendations 1:1 Marketing—treat each customer as an individual relationship Look for opportunities to combine sales, service, web, call center, payment, etc. data Customer’s comments with Reps telegraph customer’s sentiment and needs Geo-localized information provides opportunity for real-time recommendations Purchases, changing consumption, provide opportunities for cross-selling/up-selling 28 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Deeper Analytics Enhanced churn prediction with social network analytics Deeper Analytics Big Data Analytics in Communications Predictive network monitoring and anomaly detection Consider each customer’s value as part of their social network Focus retention campaigns on high-value social networks Identify new prospective high-value customers and their “friends and families” Target promotions for up-selling and cross-selling to key social network influencers Identify rotational churners and exclude from retention offers Mine network traffic performance data Identify patterns in network behavior Proactively manage networks and customer service levels Real-time prediction of future degraded service levels 29 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Deeper Analytics 360o view of the customer Deeper Analytics Big Data Analytics in Financial Services Identify and combat fraud Integrate silos of multi-business CRM data within large corporations Combine customer data from multiple sources: investments, retail banking, mortgage Gain 360o perspective of all touch points with a customer Develop “best” customer profiles and sell them the right product at the right time Anticipate customer’s needs for new products and services as their lifestyles change Real-time check fraud Transactional data combine with demographic data Monitor velocity of recent purchases and amounts of checks written vs. historical averages Flag transactions and individuals that appear “different” from normal behavior 30 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Big Data Analytics in Insurance Deeper Analytics Automated deep analytics for fraud and abuse in claims processing Include more data in analysis: Transactional data—trends in frequency of previous claims and transactions e.g. increasing rate of claims made and amount of claims Unstructured data—assessors’s report, police reports, witness interviews e.g. “fractured” + “wrist” << “broken” + “femur” Investigate claims that have the highest expected risk (P(fraud x $$ claim)) Focus scarce investigative resources and create feedback loop for automated analysis Deeper Analytics [See http://www.information-management.com/issues/20030701/6995-1.html among other “text mining insurance claims” references] Individualized auto-insurance policies based on vehicle telematics Insurers gain insight into customer’s driving habits More accurate assessments of risks Individualized pricing based on driving habits and risks Guide and motivate customers to improve their driving habits 31 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Big Data Analytics High Performance Operations Deeper Analytics Learn from manufacturing, warranty, service data Devices report back product’s performance Analyze usage and part failure correlations and patterns Identify new strategies for improved product design and service plans Increase product uptime, performance and quality Deeper Analytics Characterize and understand all product performance scenarios Streaming data from multiple sensors, weather, water, etc. Clustering and response modeling to optimize each scenario “The USA holds 250 sensors to collect raw data: pressure sensors on the wing; angle sensors on the adjustable trailing edge of the wing sail .... … But collecting data was only the beginning. BMW ORACLE Racing also had to manage that data, analyze it, and present useful results. The team turned to Oracle Data Mining in Oracle Database 11g.” 32 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. SQL Developer 3.0/Oracle Data Miner 11g Release 2 GUI • Graphical User Interface for data analyst • SQL Developer Extension (OTN download) • Explore data—discover new insights • Build and evaluate data mining models • Apply predictive models • Share analytical workflows • Deploy SQL Apply code/scripts 33 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. New GUI Oracle Data Miner Nodes (Partial List) Tables and Views Transformations Explore Data Modeling Text 34 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle Data Mining and Unstructured Data • Oracle Data Mining mines unstructured i.e. ―text‖ data • Include free text and comments in ODM models • Cluster and Classify documents • Oracle Text used to preprocess unstructured text 35 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle Data Miner 11g Release 2 GUI Churn Demo—Simple Conceptual Workflow Churn models to product and ―profile‖ likely churners 36 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Fraud Prediction Demo drop table CLAIMS_SET; exec dbms_data_mining.drop_model('CLAIMSMODEL'); create table CLAIMS_SET (setting_name varchar2(30), setting_value varchar2(4000)); insert into CLAIMS_SET values ('ALGO_NAME','ALGO_SUPPORT_VECTOR_MACHINES'); insert into CLAIMS_SET values ('PREP_AUTO','ON'); commit; begin dbms_data_mining.create_model('CLAIMSMODEL', 'CLASSIFICATION', 'CLAIMS', 'POLICYNUMBER', null, 'CLAIMS_SET'); end; / -- Top 5 most suspicious fraud policy holder claims select * from (select POLICYNUMBER, round(prob_fraud*100,2) percent_fraud, rank() over (order by prob_fraud desc) rnk from (select POLICYNUMBER, prediction_probability(CLAIMSMODEL, '0' using *) prob_fraud from CLAIMS where PASTNUMBEROFCLAIMS in ('2to4', 'morethan4'))) where rnk <= 5 order by percent_fraud desc; 37 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. POLICYNUMBER -----------6532 2749 3440 654 12650 PERCENT_FRAUD ------------64.78 64.17 63.22 63.1 62.36 RNK ---------1 2 3 4 5 Automated Monthly “Application”! Just add: Create View CLAIMS2_30 As Select * from CLAIMS2 Where mydate > SYSDATE – 30 Real-time Prediction for a Customer • On-the-fly, single record apply with new data (e.g. from call center) Select prediction_probability(CLAS_DT_1_1, 'Yes' USING 7800 as bank_funds, 125 as checking_amount, 20 as credit_balance, 55 as age, 'Married' as marital_status, 250 as MONEY_MONTLY_OVERDRAWN, 1 as house_ownership) from dual; Call Center Social Media Branch ECM BI Get Advice Web Email CRM 38 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Mobile RTD Calling In-Database Predictive Analytics • When appropriate, RTD can make SQL queries requesting retrieval of previously built indatabase predictive models OR additional real-time ODM predictions based on current data RTD SQL Call ODM & ORE RTD Score Returned In-Database Algorithms and Data Mining •Operationalize Decisions •Self-learning models •Arbitration of scores and KPIs 39 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Realtime Scoring Platform Exadata + Data Mining 11g Release 2 “DM Scoring” Pushed to Storage! Faster • In 11g Release 2, SQL predicates and Oracle Data Mining models are pushed to storage level for execution For example, find the US customers likely to churn: select cust_id from customers where region = ‘US’ and prediction_probability(churnmod,‘Y’ using *) > 0.8; 40 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Predictive Analytics Applications (Partial list as of 6/12) • Oracle CRM Sales Prospector • SaaS—Prediction of Sales opportunities, what to sell, amount, timing • Fusion CRM Sales Prediction Engine • Sales Prediction—Prediction of Sales opportunities, what to sell, amount, timing, etc. • Oracle Fusion Human Capital Management (HCM) • Predictive Workforce—Employee turnover and performance prediction and ―What if?‖ prediction • Oracle Industry Data Models—factory installed data mining for specific industries • Communications Data Model—churn, segmentation, profiling, etc. • Retail Data Model—market basket analysis, loyalty, etc. • Airline Data Model—frequent flyer loyalty, FF profiles, targeted promotions, etc. • Oracle Spend Classification—auto review/real-time correction of submission mistakes • Oracle Identify Management • Adaptive Access Manager Real-time Security at user login • Oracle FMW • Complex Event Processing integrated with integrated ODM models • Oracle Advanced Customer Support • Predictive Incident Monitoring (PIM) Service for Oracle Database customers • Oracle Retail Customer Analytics • Market basket analysis application for Retail GBU 41 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. CRM Sales Prospector/ Fusion Sales Prediction Engine Factory Installed PA/ODM Methodologies Oracle Sales Prospector Oracle Data Mining predicts likelihood of purchases ODM Predictions exposed via Social CRM Dashboards Oracle Database 11G Social CRM schema ships with Oracle Database EE 11g + Data Mining Option 42 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle Data Mining recommends products customer is likely to buy Oracle Data Mining suggests likely references Oracle Communications Industry Data Model Example Better Information for OBIEE Dashboards ODM’s predictions & probabilities are available in the Database for reporting using Oracle BI EE and other tools 43 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Exadata with Analytics and Business Intelligence Better Together • In-database data mining builds predictive models that predict customer behavior • OBIEE’s integrated spatial mapping shows where 44 Customer “most likely” be be HIGH and VERY HIGH value customer in the future Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Fusion HCM Predictive Analytics Factory Installed PA/ODM Methodologies Oracle Data Mining’s factory-installed predictive analytics show employees likely to leave, top reasons, expected performance and real-time "What if?" analysis 45 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Retail GBU (formerly Retek, GA in Q1FY13) Market Basket Analysis Market Basket Analysis to identify cooccurring items found in ―baksets‖ and potential product bundless 46 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. R Statistical Programming Language Open source language and environment Used for statistical computing and graphics Strength in easily producing publication-quality plots Highly extensible with open source community R packages 47 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle R Enterprise Compute Engines 1 R Engine 2 3 Oracle Database Other R packages SQL R Oracle R Enterprise packages Results R Engine R User tables ?x Open Source Other R packages Oracle R Enterprise packages Results User R Engine on desktop Database Compute Engine R Engine(s) spawned by Oracle DB • • Scale to large datasets • • Access tables, views, and external tables, as well as data through DB LINKS • • Leverage database SQL parallelism • Leverage new and existing in-database statistical and data mining capabilities • R-SQL Transparency Framework intercepts R functions for scalable in-database execution Function intercept for data transforms, statistical functions and advanced analytics • Interactive display of graphical results and flow control as in standard R • Submit entire R scripts for execution by Oracle Database 48 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. • • Database can spawn multiple R engines for database-managed parallelism Efficient data transfer to spawned R engines Emulate map-reduce style algorithms and applications Enables “lights-out” execution of R scripts R Graphics R> boxplot(split(CARSTATS$mpg, CARSTATS$model.year), col = "green") MPG increases over time… 49 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle R Enterprise Statistics Engine https://blogs.oracle.com/R/entry/introduction_to_the_ore_statistics Significance Tests Chi-square, McNemar, Bowker Simple and weighted kappas Cochran-Mantel-Haenzel correlation Cramer's V Binomial, KS, t, F, Wilcox Distribution Functions Beta distribution Binomial distribution Cauchy distribution Chi-square distribution Exponential distribution F-distribution Gamma distribution Geometric distribution Log Normal distribution Logistic distribution Negative Binomial distribution Normal distribution 50 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Poisson distribution Sign Rank distribution Student t distribution Uniform distribution Weibull distribution Density Function Probability Function Quantile distribution Other Functions Gamma function Natural logarithm of the Gamma function Digamma function Trigamma function Error function Complementary error function Base SAS Equivalents Freq, Summary, Sort Rank, Corr, Univariate Oracle R Enterprise ARIMA Forecasting Script year200801 <- ONTIME_S[(ONTIME_S$YEAR==2008)& (ONTIME_S$MONTH==1),] y <- ore.pull(year200801) gc() delays <- tapply(y$ARRDELAY, y$DAYOFMONTH, mean, na.rm=TRUE) delays <- ts(delays, start=1, end=31, frequency=1) # Create a Kalman filter with the first 5 delays and predict the rest preds <- c() ses <- c() # 1 step predictions for (i in 5:length(delays)) { fit <- arima(delays[1:i], c(1,2,1)) # predict 1 step into the future. pred <- predict(fit) preds <- c(preds, pred$pred) ses <- c(ses, pred$se) } plot(5:length(delays), preds, type='l', col='green', ylim=range(c(preds+2*ses, preds-2*ses)), xlab="DEay of month", ylab="Predicted average delay (in minutes)", main="Average delays by day for January 2008") lines(5:length(delays), preds+2*ses, col='red') lines(5:length(delays), preds-2*ses, col='red') points(5:length(delays), as.vector(delays[5:length(delays)])) legend( 23, -8, c("Delay", "Predicted delay", "2 se confidence"), col=c(1, 3, 8), lty=c(0, 1, 1), pch=c(1, -1, -1), merge=TRUE) 51 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle BI Applications Comprehensive, Prebuilt, Best Practice Analytics Comms & Media Auto Complex Mfg Service & Contact Center Sales Consumer Sector Energy Marketing Financial Services Procurement & Spend Direct / Indirect Pipeline Analysis Service Effectiveness Campaign Effectiveness Forecast Accuracy Customer Satisfaction Customer Insight Buyer Productivity Sales Team Effectiveness Resolution Rates Product Propensity Off Contract Purchases Up-sell/ Cross-sell Service Rep Efficiency Loyalty & Attrition Cycle Times Service Cost Market Basket Analysis Lead Conversion Churn & Service Trends Campaign ROI Spend High Tech Insurance & Health Supply Chain & Order Management Life Sciences Public Sector Human Resources Financials Revenue and Backlog General Ledger Employee Productivity Inventory Accounts Receivable Compensation Fulfillment Accounts Opportunities Status for predictive Payable Supplier Performance Customermining Cash Flow analytics/data Status Purchase Cycle Time Employee Expenses Compliance Reporting Workforce Profile Order Cycle Time Profitability Retention Analysis BOM Analysis Expense Management Return on Human Capital and Other Operational & Analytic Sources Source adapters: Oracle BI Suite Enterprise Edition Plus 52 Travel & Trans Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 52 Oracle In-Database Advanced Analytics Comprehensive Advanced Analytics Platform Oracle R Enterprise Oracle Data Mining • Popular open source statistical programming language & environment • Integrated with database for scalability • Wide range of statistical and advanced analytical functions • R embedded in enterprise appls & OBIEE • Exploratory data analysis • Extensive graphics • Open source R (CRAN) packages • Integrated with Hadoop for HPC • Automated knowledge discovery inside the Database • 12 in-database data mining algorithms • Text mining • Predictive analytics applications development environment • Star schema and transactional data mining • Exadata "scoring" of ODM models • SQL Developer/Oracle Data Miner GUI Statistics 53 Advanced Analytics Copyright © 2012, Oracle and/or its affiliates. All rights reserved. R Data & Text Mining Predictive Analytics Where, When? Answers: 1. Oracle Advanced Analytics Option 2. In-database 3. Start now! 54 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 1.8 trillion gigabytes of data was created in 2011… More than 90% is unstructured data (IN BILLIONS) Exalytics—What, 10,000 GIGABYTES OF DATA) CREATED Oracle Advanced Analytics, Data Mining, Predictive Analytics, Big Data, 5,000 Approx. 500 quadrillion files Quantity doubles every 2 years 0 2005 2010 2015 55 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. The preceeding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. 56 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 57 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.