Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
RStat Predictive Modeling & Scoring Applications for Operational BI Kathy Kendall Strategic Product Manager Agenda What is Predictive Modeling? What are Scoring Applications? What is RStat? DEMO: Building a Scoring Application Scoring Applications Project Life Cycle fgdd 1 Predictive Modeling Examines large volumes of historical data, 2 evaluates statistically, 3 identifies mathematical formulas or sets of rules, 4 which can be applied to new data 5 to predict an outcome – to score. 1 New data can be scored through: Scoring Applications In-Database Scoring Copyright 2007, Information Builders. Slide 3 Scoring Applications Integrate model formulas / rules into web applications that provide a UI to identify new data to be scored. return the predicted value in a report report, a graph graph, a map map, a dashboard or a process flow. In-Database Scoring PMML (Predictive Modeling Markup Language) XML standard for expressing p g statistical & data mining g models. In-Database scoring translates PMML into SQL scripts that score directly to the database. Copyright 2007, Information Builders. Slide 4 fgdd 2 What is RStat? RStat is the first fully integrated environment for creating BI, modeling, and scoring applications. Its offers: Low Cost: Built on the open source R engine, RStat eliminates all statistical software licensing costs. Organizations pay only maintenance and support. Better License Management: The full integration within Developer Studio allows organizations to scale down the number of other statistical software li licenses th thatt are used d primarily i il ffor query and d analysis, l i ii.e. ffor pure BI ffunctions. ti One Tool For All Users: Having a single BI and modeling tool, allows organizations to better maintain, manage, and share resources across BI and statistical projects. Top 10 Data Mining Algorithms: RStat includes the most commonly used statistical and data mining algorithms plus an extensive model evaluation tools that will satisfy 90% of your enterprise data mining requirements. Simple User Interface: RStat’s simple and intuitive user interface allows organizations to deploy it to more analysis with less training compared to other statistical packages. Deployment ep oy e t On O Any y Platform: at o The eu unique que WF scoring sco g routines out es ca can be deployed on all WF supported platforms giving organizations independence of expensive statistical servers, thus eliminating any additional software and maintenance costs. Reputation & Extensibility: R is used by over 1MM analysts worldwide, is taught in many universities, and has over 1000 packaged extensions for many different types of analysis giving your organization instant access to more models and techniques than any other statistical software. Comprehensive List of Data Mining Models fgdd Supervised modeling for classification and prediction: A target (dependent) variable and a training set are required to build the model. Regression – Linear, GLM, Logistic, Poisson and Multinomial Decision Trees randomForests (the algorithm name is the same) Boosting (Ada Boost - algorithm) Support Vector Machines Neural Networks (FeedForward Neural Network model) Unsupervised modeling for classification only: Classification is generated directly on the original data, without building a model first, i.e., dependent variables and training data are not required. Clustering – both K-means and Hierarchical clustering Association Rules Hypothesis Testing: T-Tests, Variance tests. Descriptive statistic: Summary statistics, distributions, correlations, principal components Model evaluation: Confusion table, Risk chart, Lift chart, ROC Curve, Precision and Sensitivity charts 3 RStat: Scoring Application for Marketing Using historical & demographic data t predict to di t future purchases Building a Model In RStat fgdd 4 Model Output Deploying a Model fgdd 5 Building a Scoring Application Scoring Applications – Adhoc Analysis fgdd 6 Scoring Applications Project Life Cycle: 3 to 9 Months 15% Business Requirements Business Objectives Background Information Prior History Resources Modeling Objectives Assumptions Constraints Risks Contingency Terminology Tools Techniques Criteria for Success Project Plan Project Manager 20% 25% Data Assessment 15% Data Preparation Modeling Data Extraction Data Fields Modeling Data Definitions Selection q Technique Data Exploration Data Cleansing Selection Data Data Quality Document Verification Data Assessment Report Construction – Assumptions for derived attributes, Modeling generated records Generate Test Missing Values Design Treatment Build Model Integrated Data – Assess Model Merge and Overlay Transform & Format Data Final Data Set Report BI Developer -80% BI Developer -80% Statistician – 20% Statistician – 20% Statistician 10% Model Evaluation 15% Deployment Evaluate model Application UI g criteria for Design g against success and Application business goals Workflow Design Model Platform Documentation considerations Model Approval Development Testing Final Deployment Statistician Project Manager BI Developer Sources: CRISP-DM (Cross Industry Standard Process for Data Mining) TDWI Best Practices in Predictive Analytics, Q1 2007. Thank you! fgdd 7