Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BIWA SIG Wednesday TechCast Series START TIME: 12 NOON Eastern Data Mining Made Easy! Introducing Oracle Data Miner 11g Release 2 New "Workflow" GUI Charlie Berger - Oracle AUDIO DIAL-IN NUMBERS US Toll-Free Number: 866 682 4770 Conference ID: 1683901 Security Code: 334451 International Toll-Free Numbers: http://www.intercall.com/oracle/access_numbers.htm Copyright 2010 Oracle Corporation BIWA SIG Wednesday TechCast Series • Welcome to BIWA’s 21st TechCast! • Visit www.oraclebiwa.org for updates on our future TechCasts • Future TechCasts will include top-rated presentations from COLLABORATE 10 – IOUG Forum’s BIWA Training Days • Everyone invited to present! Copyright 2010 Oracle Corporation Oracle BIWA SIG Basics • Worldwide association of 2000+ professionals interested in Oracle Database-centric BI, data warehousing, and analytical products, features and options. • Membership is still FREE! • Open forum to foster success in use and development of Oracle BIWA products. • BIWA’s goals include sharing best practices and novel and interesting use cases of Oracle BIWA-centric technology • Search the BIWA knowledgebase of past presentations/TechCasts • See Mission Statement and Charter at oraclebiwa.org. • National conferences in 2007, 2008, 2010 Copyright 2010 Oracle Corporation BIWA Training Days at COLLABORATE 10 - IOUG Forum April 18-22, 2010 Las Vegas, NV • COLLABORATE 10 – IOUG, OAUG, Quest: 5,000 attendees, 200+ Exhibits • BIWA presented a conference within a conference called “Get Analytical with BIWA Training Days” • Hands on Labs, BI Boot Camp, BI Deep Dives, Reception 60+ Sessions with topics covering: • Data Warehousing: • • • • • • • Optimizer, Partitioning, ETL OBIEE Oracle Data Mining OLAP Essbase Data Visualization BI Publisher Copyright 2010 Oracle Corporation SUBMITTING a BIWA TechCast • Any Oracle user or professional may submit abstracts for 45-min webcasts to IOUG Oracle BIWA SIG Community (Visit: www.oraclebiwa.org) • Audience is technical • Presenters are encouraged to include a significant amount of technical detail. • Live demos are strongly encouraged Copyright 2010 Oracle Corporation Today’s Presenter: Charlie Berger • Senior Director for Data Mining Technologies at Oracle • Heads product management for Oracle Database's data mining and predictive analytics technology: • Oracle Data Mining • Text mining • Statistical functions • Winner of IOUG’s 2010 Ken Jacobs Award for User Community Contribution • 20+ years experience in Data Mining and Analytics in technology leaders including Thinking Machines and BBN Software Copyright 2010 Oracle Corporation Data Mining Made Easy! Introducing Oracle Data Miner 11g Release 2 New "Workflow" GUI Charlie Berger Sr. Director Product Management, Data Mining Technologies Oracle Corporation [email protected] www.twitter.com/CharlieDataMine Copyright 2010 Oracle Corporation The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Copyright 2010 Oracle Corporation Oracle Data Miner 11gR2 New • Workflow • Multi-attribute statistics and graphs • Multiple model build and evaluation • SQL Developer • SQL model deploy • Ability to save/share analytical workflows • Beta release available by OOW 2010 beta Copyright 2010 Oracle Corporation Oracle Data Miner “Classic” • Mining Activities • Wizards driven • Univariate statistics and graphs • Single model build and evaluation • Available now for 10g through 11gR2 Copyright 2010 Oracle Corporation Oracle Data Miner 11gR2 Availability • Customers • Beta release to coincide with SQL Developer 3.0 release— OOW 2010 • Download from OTN • Watch ODM OTN pages, ODM Blog and Twitter for announcements • Internal Oracle Employees • Available to download GUI and use hosted environment for customer demos • Contact Product Management, [email protected] • Available on Amazon Cloud under special arrangement Copyright 2010 Oracle Corporation What is Data Mining? • Automatically sifts through data to find hidden patterns, discover new insights, and make predictions • Data Mining can provide valuable results: • • • • Predict customer behavior (Classification) Predict or estimate a value (Regression) Segment a population (Clustering) Identify factors more associated with a business problem (Attribute Importance) • Find profiles of targeted people or items (Decision Trees) • Determine important relationships and “market baskets” within the population (Associations) • Find fraudulent or “rare events” (Anomaly Detection) Copyright 2010 Oracle Corporation Analytics: Strategic and Mission Critical • Competing on Analytics, by Tom Davenport • “Some companies have built their very businesses on their ability to collect, analyze, and act on data.” • “Although numerous organizations are embracing analytics, only a handful have achieved this level of proficiency. But analytics competitors are the leaders in their varied fields—consumer products finance, retail, and travel and entertainment among them.” • “Organizations are moving beyond query and reporting” - IDC 2006 • Super Crunchers, by Ian Ayers • “In the past, one could get by on intuition and experience. Times have changed. Today, the name of the game is data.” —Steven D. Levitt, author of Freakonomics • “Data-mining and statistical analysis have suddenly become cool.... Dissecting marketing, politics, and even sports, stuff this complex and important shouldn't be this much fun to read.” —Wired Copyright 2010 Oracle Corporation • 11 years “stem celling analytics” into Oracle • Designed advanced analytics into database kernel to leverage relational database strengths • Naïve Bayes and Association Rules—1st algorithms added • Leverages counting, conditional probabilities, and much more • Now, analytical database platform • 12 cutting edge machine learning algorithms and 50+ statistical functions • A data mining model is a schema object in the database, built via a PL/SQL API and scored via built-in SQL functions. • When building models, leverage existing scalable technology • (e.g., parallel execution, bitmap indexes, aggregation techniques) and add new core database technology (e.g., recursion within the parallel infrastructure, IEEE float, etc.) • True power of embedding within the database is evident when scoring models using built-in SQL functions (incl. Exadata) select cust_id from customers where region = ‘US’ and prediction_probability(churnmod, ‘Y’ using *) > 0.8; Copyright 2010 Oracle Corporation The Forrester Wave™: Predictive Analytics And Data Mining Solutions, Q1 2010 Oracle Data Mining Cited as a Leader; 2nd place in Current Offering • Ranks 2nd place in Current Offering • “Oracle focuses on indatabase mining in the Oracle Database, on integration of Oracle Data Mining into the kernel of that database, and on leveraging that technology in Oracle’s branded applications.” The Forrester Wave is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave are trademarks of Forrester Research, Inc. The Forrester Wave is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change. Copyright 2010 Oracle Corporation In-Database Data Mining Traditional Analytics Oracle Data Mining Results Data Import Data Mining Model “Scoring” Data Preparation and Transformation Savings Data Mining Model Building Data Prep & Transformation Model “Scoring” Data remains in the Database Embedded data preparation Data Extraction Cutting edge machine learning algorithms inside the SQL kernel of Database Model “Scoring” Embedded Data Prep Model Building Data Preparation Hours, Days or Weeks Source Data • Faster time for “Data” to “Insights” • Lower TCO—Eliminates • Data Movement • Data Duplication • Maintains Security SAS Work Area SAS Process ing Process Output SAS SAS SAS Target Secs, Mins or Hours SQL—Most powerful language for data preparation and transformation Data remains in the Database Copyright 2010 Oracle Corporation Oracle Data Mining Algorithms Problem Algorithm Classification Logistic Regression (GLM) Decision Trees Naïve Bayes Support Vector Machine Multiple Regression (GLM) Support Vector Machine Regression Anomaly Detection Attribute Importance Association Rules Clustering Feature Extraction One Class SVM Minimum Description Length (MDL) A1 A2 A3 A4 A5 A6 A7 Apriori Hierarchical K-Means Hierarchical O-Cluster NMF F1 F2 F3 F4 Copyright 2010 Oracle Corporation Applicability Classical statistical technique Popular / Rules / transparency Embedded app Wide / narrow data / text Classical statistical technique Wide / narrow data / text Lack examples Attribute reduction Identify useful data Reduce data noise Market basket analysis Link analysis Product grouping Text mining Gene and protein analysis Text analysis Feature reduction Oracle Data Mining + Exadata • In 11gR2, SQL predicates and Oracle Data Mining models are pushed to storage level for execution For example, find the US customers likely to churn: select cust_id from customers Scoring function executed in Exadata where region = ‘US’ and prediction_probability(churnmod,‘Y’ using *) > 0.8; Copyright 2010 Oracle June Corporation Company Confidential 2009 Predictive Analytics Applications Powered by Oracle Data Mining CRM OnDemand—Sales Prospector (Partial List as of March 2010) Oracle Communications Data Model Oracle Open World - Schedule Builder Oracle Retail Data Model Spend Classification Copyright 2010 Oracle Corporation Example: Simple, Predictive SQL Select customers who are more than 85% likely to be HIGH VALUE customers & display their AGE & MORTGAGE_AMOUNT SELECT * from( SELECT A.CUST_ID, A.AGE, MORTGAGE_AMOUNT,PREDICTION_PROBABILITY (CUST_INSUR_LT46939_DT, 'VERY HIGH' USING A.*) prob FROM CBERGER.CUST_INSUR_LTV A) WHERE prob > 0.85; Copyright 2010 Oracle Corporation Fraud Prediction Demo drop table CLAIMS_SET; exec dbms_data_mining.drop_model('CLAIMSMODEL'); create table CLAIMS_SET (setting_name varchar2(30), setting_value varchar2(4000)); insert into CLAIMS_SET values ('ALGO_NAME','ALGO_SUPPORT_VECTOR_MACHINES'); insert into CLAIMS_SET values ('PREP_AUTO','ON'); commit; begin dbms_data_mining.create_model('CLAIMSMODEL', 'CLASSIFICATION', 'CLAIMS2', 'POLICYNUMBER', null, 'CLAIMS_SET'); end; / -- Top 5 most suspicious fraud policy holder claims select * from (select POLICYNUMBER, round(prob_fraud*100,2) percent_fraud, rank() over (order by prob_fraud desc) rnk from (select POLICYNUMBER, prediction_probability(CLAIMSMODEL, '0' using *) prob_fraud from CLAIMS2 where PASTNUMBEROFCLAIMS in ('2 to 4', 'more than 4'))) where rnk <= 5 order by percent_fraud desc; Copyright 2010 Oracle Corporation POLICYNUMBER PERCENT_FRAUD RNK ------------ ------------- ---------- 6532 64.78 1 2749 64.17 2 3440 63.22 3 654 63.1 4 12650 62.36 5 Real-time Prediction with records as (select 78000 SALARY, On-the-fly, single record 250000 MORTGAGE_AMOUNT, apply with new data (e.g. 6 TIME_AS_CUSTOMER, 12 MONTHLY_CHECKS_WRITTEN, from call center) 55 AGE, 423 BANK_FUNDS, 'Married' MARITAL_STATUS, 'Nurse' PROFESSION, 'M' SEX, 4000 CREDIT_CARD_LIMITS, 2 N_OF_DEPENDENTS, 1 HOUSE_OWNERSHIP from dual) select s.prediction prediction, s.probability probability from ( select PREDICTION_SET(CUST_INSUR_LT46939_DT, 1 USING *) pset from records) t, TABLE(t.pset) s; Copyright 2010 Oracle Corporation Integration with Oracle BI EE Oracle Data Mining results available to Oracle BI EE administrators Oracle BI EE defines results for end user presentation Copyright 2010 Oracle Corporation Example Better Information for OBI EE Reports and Dashboards ODM’s predictions & Predictions probabilities are available in the Database for reporting Oracle BI using EE Oracle BI EE and other and othertools tools reporting Copyright 2010 Oracle Corporation Demo(s) Copyright 2010 Oracle Corporation Copyright 2010 Oracle Corporation Copyright 2010 Oracle Corporation Copyright 2010 Oracle Corporation Copyright 2010 Oracle Corporation Copyright 2010 Oracle Corporation Copyright 2010 Oracle Corporation Copyright 2010 Oracle Corporation Copyright 2010 Oracle Corporation Copyright 2010 Oracle Corporation Copyright 2010 Oracle Corporation Copyright 2010 Oracle Corporation Copyright 2010 Oracle Corporation Copyright 2010 Oracle Corporation Getting Started Copyright 2010 Oracle Corporation Oracle By Example—Online Course Copyright 2010 Oracle Corporation Cue Cards—Just-in-time Assistance Copyright 2010 Oracle Corporation Copyright 2010 Oracle Corporation Additional Information • • • • • • • • • • • Preview of the new Oracle Data Miner 11g R2 “work flow” New GUI Oracle Data Mining 11gR2 presentation at Oracle Open World 2009 Oracle Data Mining Blog Funny YouTube video that features Oracle Data Mining Oracle Data Mining on the Amazon Cloud Oracle Data Mining 11gR2 data sheet Oracle Data Mining 11gR2 white paper New TechCast (audio and video recording): ODM overview and several demos Fraud and Anomaly Detection using Oracle Data Mining 11g presentation Algorithm technical summary with links to Documentation Getting Started w/ ODM page w/ instructions to download • • • • • • Oracle Data Miner graphical user interface (GUI), ODM Step-by-Step Tutorial Demo datasets ODM Discussion Forum on OTN (great for posting questions/answers) ODM 11g Sample Code (examples of ODM SQL and Java APIs applied in several use cases; great for developers) Oracle’s 50+ SQL based statistical functions (t-test, ANOVA, Pearson’s, etc.) Oracle Data Mining Copyright 2010 Oracle Corporation Copyright 2010 Oracle Corporation “This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”