Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Predictive Analytics Pilot Certificate Program Learning Objectives Module 1: What is Predictive Analytics & R Basics? • • • Identify the problem and assess whether it should be addressed with predictive modeling. Understand differences and similarities between traditional analysis techniques. Learn predictive modeling tools - layout and basic commands of R Learn predictive modeling tools - Practice writing basic R scripts and complete additional suggested practice, if necessary Module 2: Effective Problem Definition and Project Management • • • • Translate a vague question into one that can be analyzed with data, statistics and machine leaning to solve a business problem. Use case design and evaluation/prioritization based on available data and technology, significance of business impact and/or implementation considerations Implement and select appropriate technology in order to efficiently utilize statistical and machine learning techniques taking into account problem objectives and implementation constraints List and understand the importance of key principles in creating and managing a predictive modeling team. Module 3: Data Design, Transformation & Visualization • • • • • • Identify common data types, structured, unstructured and semi-structured Learn variable types and applicable terminology Identify and evaluate the quality (including common data problems) of appropriate data sources for a problem Identify the types of regulatory, professional standard, and ethical issues surrounding predictive modeling and data collection/use and where they apply to situations Introduce lapse, mortality and health datasets use for exercises Implement effective data design: time frame, sampling, granularity • • • • • Use common data blending techniques, e.g. fuzzy matching Learn how, why and when to transform the data, using scaling, normalization, standardization, binarization, encoding and imputation. Apply each technique using an example model Create and interpret histograms, bar charts and frequency plots Visualize data using one-way, two-way, box-plot, to identify potential errors, outliers and trends in the data Module 4: Data Exploration • • • • • • Identify data issues by exploring one variable to understand the distribution is as expected and detect any outliers Determine the significant relationships between two variables using scatter plots, calculating correlations and investigating conditional means. Determine relationships between many variables and select material ones using principle component analysis Determine relationships between many variables and select material ones using independent component analysis Determine relationships between many variables and select material ones using singular value decomposition Take appropriate action when results of data exploration deviate from what is expected and apply judgment to resolve those differences Module 5: Feature Generation & Selection • • • • • • • Define the term "feature" and understand the difference to "variable" Use subject matter expertise and prior knowledge about the data to create features that lead to more effective models. List the principles, advantages and disadvantages and limitations of using filter based selection techniques for tuning a data set to be used in modelling. Select appropriate features for a model using Pearson, Kendall and Spearman correlation as selection criteria (Pearson, Kendall and Spearman correlation). Select appropriate features for a model using Mutual information as selection criteria (Mutual information). Select appropriate features for a model using Chi squared as selection criteria (Chi squared). List the principles, advantages and disadvantages and limitations of using permutation based selection techniques for tuning a data set to be used in modelling. PA Pilot Certificate Program: Learning Objectives 2 • • • • • Apply concepts such as accuracy, precision and recall to select features to be used in classification modelling problems (Classification - accuracy, precision, recall). Apply concepts such as MSE, RSE and coefficient of determination to select features to be used in regression modelling problems (Regression - MSE, RSE, coefficient of determination). List the principles, advantages and disadvantages and limitations of using algorithm based selection techniques for tuning a data set to be used in modelling. Use Ridge, Lasso, Elastic Net and tree based methods to select appropriate features to be used for modelling (Ridge, Lasso, Elastic Net, Trees (detailed lessons in section 5). Text mining: Apply various text mining methods in order to generate appropriate features for use when modelling text data. Module 6: Model Development & Validation • • • • • • • Understand how different business problems affect the decisions made about model development and validation. Understand the difference between supervised, unsupervised and reinforcement learning and identify examples of problems each would be applied to. Understand the difference between classification and regression problems and explain the features of models that make them suitable/unsuitable for each type/ Understand and explain the concepts of bias, variance and model complexity and the bias variance tradeoff and the implications this holds for building robust models Understand the importance of using train, test & holdout data samples during modeling and be able to apply this method appropriately to fit and validate a model. List advantages, disadvantages and common pitfalls when using this method. Understand and apply the method of using cross validation during modeling. List advantages, disadvantages and common pitfalls when using this method. Supervised learning - For each of the following techniques, understand when it is appropriate (including advantages, disadvantages, and limitations), describe data needed, apply the method to data, and interpret and describe the results. o Decision Trees o Generalized Linear models (identity, poisson, gamma, tweedie, binomial and shrinkage methods) o Ensemble methods (bagging, boosting and blending – specifically Gradient boost machines) PA Pilot Certificate Program: Learning Objectives 3 • • Unsupervised learning - For each of the following techniques, understand when it is appropriate (including advantages, disadvantages, and limitations), describe data needed, apply the method to data, and interpret and describe the results. o K-means clustering o Hierarchical clustering Advanced topics - For each of the following techniques, understand when it is appropriate (including advantages, disadvantages, and limitations), describe data needed o Instance based learning o Support Vector Machines o Bayesian Learning o Additive models o Topic modeling o Neural networks o Gaussian mixture models o Genetic equation search o Grid search PA Pilot Certificate Program: Learning Objectives 4