Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Going from Zero to 60 with Oracle Advanced Analytics - an intro to how YOU can get gold from your data Audience I’m addressing those who: Haven’t used Oracle Advanced Analytics or similar products before (RapidMiner, MatLab, Actian, Weka, etc.) Want to find out if/how predictive analytics can benefit them We’ll start at the beginning and point you in the right direction (hopefully) Image courtesy of marcolm at FreeDigitalPhotos.net What do I mean “gold?” Ore > Refining Process > Gold > Value Data > Patterns > Models > Value Img: courtesy Mcstrother at wikimedia commons Where’s the Value? Answers! *Just make sure they’re right… * Answers to WHAT??? What is the output value given a set of inputs? What kinds of groups could this data be organized into? What similarities exist inside this data set? What is the sentiment of this sentence? What inputs to this model really matter? What data points in this set just don’t fit? When you leave… Ask yourselves what can YOU do with Advanced Analytics? Can you start asking some of the types of questions we’ll ask today? Can your company benefit from the answers to those questions? Try some of these capabilities out in a sand box: Use your own enterprise sand box VirtualBox apex.oracle.com => this is awesome! Oracle Advanced Analytics Available as an Extra Cost add on to the Enterprise Edition Sorry, ONLY the Enterprise Edition (afaik…) Integrated with architectures of several Oracle Applications OBIEE – Generic and customizable integration HCM Fusion – Employee Performance Predictions CRM Fusion – Sales Opportunity predictions – what to sell, when, and for how much ORCA – Market Basket and Next Offer Analysis Industry Specific Models – Communications, Airline, etc The GUI is SQL Developer We’ll focus on this today IMG: © Oracle Getting things running… Images courtesy of ecee.colorado.edu …it’s NOT that complicated Oracle By Example (OBE) has some great tutorials Google “Oracle Data Mining 12c OBE Series” and you’ll find it These are great “0-60” tutorials that show you exactly how to get SQL Developer, Oracle Advanced Analytics, and the Oracle R Extension up and running. The steps… (50,000 ft view) Get your data Feed it to a model Tweak it until it’s accurate Use your model Does order matter? Everyone has an opinion… Lots of Paradigms KDD (www.kdd.org) SEMMA (Wikipedia : SEMMA) Five A’s ( Google SPSS ) CRISP-DM (Wikipedia : Cross Industry Standard Process for Data Mining) All are similar and contain the phases we’ll talk about today. Just remember about the NFL… NFL No Free Lunch Theorem: No one algorithm (or defined process) is always better than another. “Sometimes one process is better, sometimes it’s not” “IT DEPENDS” Common DM Steps Pre-Process Data Create a Model Evaluate Performance Use the Model Tune the Model Steps of Data Mining Pre-Process This can be a headache… Pre-Processing involves getting your data ready for analyses. PL/SQL and SQL can be used to further prepare your data. We’ll go over how Oracle Advanced Analytics makes Pre-Processing fast and easy Steps of Data Mining Common Pre-Processing tasks Get your data (It’s closer than you think) Format it (Use SQL and PL/SQL) Sample it Bin it Normalize it “Outlier” it (ok… no more made up words) Pre-Processing Overview Pre-Process Demo It doesn’t have to be a headache anymore! Sampling - Sometimes you don’t want it all Binning – Group numbers into Categories Normalization – Put data on the same scale Deal with Outliers Deal with Missing Values Sample It (who doesn’t love samples?) Sometimes you don’t want it all OAA provides several sampling options Sampling at the field level Sample Size can be % or a given # Sample Types can be Random, Stratified, or Top N Sampling creates many smaller data sets from a single big one Bin it Binning takes scalar values (say 0.1 through 99.0) and groups them into discrete “bins” or categories For example : 10 FICOs (-999, 403, 428, 446, 698, 700, 740,782,812,849) Bin it into 3 categories “Yes”, “No”, “Maybe”: “No” : -999, 403, 428, 446 “Maybe” : 698, 700, 740 “Yes” : 782,812,849 These bins can be used in algorithms (Models) that can’t work on scalar values Pre-Processing > Bin data into chunks Normalize it This isn’t your 3NF normalization Normalization means adjusting values measured on different scales to a common one (usually 0-1) Example 2 fields called Rate and Amount Rate has a scale of 1% to 29% Amount has a scale of -9,999,999 to 9,999,999 A change of .10 in the Rate scale has a bigger impact than a change of .10 in the Amount scale OAA has several methods built in (Min Max, Z Score, Linear, and others) Pre-Processing > Make your data Normal Outlier It OAA will detect outliers for you You can use various definitions of outliers, standard deviations, percent ranges, and arbitrary value ranges You can replace outlier values with null, edge values, etc. Example: Fico scores usually come in ranges between “about” 300 to “about” 850, sometimes they come in as negatives, 999, or some (seemingly) randomly generated very large number. Pre-Processing > Single out the odd ones Automatic Data Preparation Some algorithms need data put into certain formats OAA has options to prepare this data for you automatically OAA supports Binning, Normalization, Missing Value Replacement, etc. When testing and applying data to models ADP applies the same transformations Create a Model You have your questions What kinds of answers do you want? Answer Type Generate Grouping or Organization Discrete Value or Predefined Category A Number Free form text details, Comment Sentiment Anomalies or data sets that aren't normal Model Clustering Classification Regression Text Processing Anomaly Detection I’ve found 14 different model types (and sub types) Advanced Analytics natively has to offer Steps of Data Mining Model Types Clustering – Automated Grouping Feed a Clustering model data and it will group records into groups and tell you: Various groups that exist inside the data you gave it How are groups different from each other Why it put any given data point in the group it did Once you’ve got a model you like, you can use Advanced Analytics to assign a new data point to a group Lets use this to segment members (Account Holders) Phases of Data Mining > Creating Models Demo 1 : Member Segmentation Question: What groups do Members (account holders) fall into? Demos: Product Suggestions Classification – Supervised Grouping Similar to Clustering, but you pick the group(s) you want. Predicts one column from your dataset by looking at the other columns. Lets use this to predict loans more likely to be written off Phases of Data Mining > Creating Models Demo 2: Write Off Classification Question: Given details from loan applications, which loans are more likely to be written off than others? Demos: Anomaly Detection Regression – Predicting X because you know Y Similar to Classification, but the predicted value is a scalar (number) value not a discreet (group) value. Attempts to find a function that fits data being given to it Training Data builds the model, Testing Data sees how good the model really is Lets use this to look at a simple Payment Amount Function Phases of Data Mining > Creating Models Classic Regression 1200 1000 800 600 Data Point Linear (Data Point) 400 Poly. (Data Point) Poly. (Data Point) 200 0 0 2 4 6 8 10 12 14 16 18 -200 -400 Demos: Member Segmentation Demo 3 : Payment Amount Model Question: Given details from loan applications, what payment amount range can be expected? Association Discovery – Correlating “stuff” A.K.A. Market Basket Analysis/Discovery Give this model data groups and it will output patterns it detects Examples: Amazon : “Items Recommended for You” Netflix : “Movies you Might Like” Wal-Mart’s classic (and untrue) finding that people buy Beer and Diapers on Thursdays Target’s famous (and true) ability to detect pregnant women based upon purchases Lets use this to build a “Next Product Suggestion” model Phases of Data Mining > Creating Models Demo 4 : Product Suggestion Question: Which products commonly go together? Anomaly Detection – Find the oddballs Ever played “One of these things is not like the others?” This model type finds data points that are outside the norm Useful for fraud detection Sorry no demo for this one: Check out YouTube for one though… (Search for Oracle Advanced Analytics Anomaly Detection) Text Analysis – Get the Jist Text strings can be broken down into Tokens and Themes Example: “When I started Oracle, what I wanted to do was to create an environment where I would enjoy working. That was my primary goal” – Larry Ellison [started],[Oracle],[wanted],[create],[environment],[enjoy],[working],[primary],[g oal] [when],[I],[what],[to],[do],[was],[to],[an],[where],[would],[that],[my] These can be stemmed using dictionary operations work, worked, working, works => [work] Lets use this to get a general satisfaction from surveys Demo 5 : Comment Sentiment Question: What is the sentiment of comments given in feedback surveys? Maintaining your Model Models will get old and out dated as they “age” New data should be added and the model reprocessed If the data structure changes or new fields are used, reprocess your model Phases of Data Mining > Maintain the Model Questions? What will your next steps be? What questions can you ask? Check out YouTube for some great tutorials! Oracle Docs: Oracle® Data Mining Concepts Credits All images, where not attributed, courtesy of istockphoto.com or are otherwise used with permission. Attributed images are copyright © or trademark (TM) of their respective owner. No sponsorship or endorsement shall be implied by the educational fair use of these images.