Download Predictive Analytics Pilot Certificate Program

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Predictive Analytics
Pilot Certificate Program
Learning Objectives
Module 1: What is Predictive Analytics & R Basics?
•
•
•
Identify the problem and assess whether it should be addressed with predictive
modeling. Understand differences and similarities between traditional analysis
techniques.
Learn predictive modeling tools - layout and basic commands of R
Learn predictive modeling tools - Practice writing basic R scripts and complete additional
suggested practice, if necessary
Module 2: Effective Problem Definition and Project Management
•
•
•
•
Translate a vague question into one that can be analyzed with data, statistics and
machine leaning to solve a business problem.
Use case design and evaluation/prioritization based on available data and technology,
significance of business impact and/or implementation considerations
Implement and select appropriate technology in order to efficiently utilize statistical and
machine learning techniques taking into account problem objectives and
implementation constraints
List and understand the importance of key principles in creating and managing a
predictive modeling team.
Module 3: Data Design, Transformation & Visualization
•
•
•
•
•
•
Identify common data types, structured, unstructured and semi-structured
Learn variable types and applicable terminology
Identify and evaluate the quality (including common data problems) of appropriate data
sources for a problem
Identify the types of regulatory, professional standard, and ethical issues surrounding
predictive modeling and data collection/use and where they apply to situations
Introduce lapse, mortality and health datasets use for exercises
Implement effective data design: time frame, sampling, granularity
•
•
•
•
•
Use common data blending techniques, e.g. fuzzy matching
Learn how, why and when to transform the data, using scaling, normalization,
standardization, binarization, encoding and imputation.
Apply each technique using an example model
Create and interpret histograms, bar charts and frequency plots
Visualize data using one-way, two-way, box-plot, to identify potential errors, outliers
and trends in the data
Module 4: Data Exploration
•
•
•
•
•
•
Identify data issues by exploring one variable to understand the distribution is as
expected and detect any outliers
Determine the significant relationships between two variables using scatter plots,
calculating correlations and investigating conditional means.
Determine relationships between many variables and select material ones using
principle component analysis
Determine relationships between many variables and select material ones using
independent component analysis
Determine relationships between many variables and select material ones using singular
value decomposition
Take appropriate action when results of data exploration deviate from what is expected
and apply judgment to resolve those differences
Module 5: Feature Generation & Selection
•
•
•
•
•
•
•
Define the term "feature" and understand the difference to "variable"
Use subject matter expertise and prior knowledge about the data to create features that
lead to more effective models.
List the principles, advantages and disadvantages and limitations of using filter based
selection techniques for tuning a data set to be used in modelling.
Select appropriate features for a model using Pearson, Kendall and Spearman
correlation as selection criteria (Pearson, Kendall and Spearman correlation).
Select appropriate features for a model using Mutual information as selection criteria
(Mutual information).
Select appropriate features for a model using Chi squared as selection criteria (Chi
squared).
List the principles, advantages and disadvantages and limitations of using permutation
based selection techniques for tuning a data set to be used in modelling.
PA Pilot Certificate Program: Learning Objectives
2
•
•
•
•
•
Apply concepts such as accuracy, precision and recall to select features to be used in
classification modelling problems (Classification - accuracy, precision, recall).
Apply concepts such as MSE, RSE and coefficient of determination to select features to
be used in regression modelling problems (Regression - MSE, RSE, coefficient of
determination).
List the principles, advantages and disadvantages and limitations of using algorithm
based selection techniques for tuning a data set to be used in modelling.
Use Ridge, Lasso, Elastic Net and tree based methods to select appropriate features to
be used for modelling (Ridge, Lasso, Elastic Net, Trees (detailed lessons in section 5).
Text mining: Apply various text mining methods in order to generate appropriate
features for use when modelling text data.
Module 6: Model Development & Validation
•
•
•
•
•
•
•
Understand how different business problems affect the decisions made about model
development and validation.
Understand the difference between supervised, unsupervised and reinforcement
learning and identify examples of problems each would be applied to.
Understand the difference between classification and regression problems and explain
the features of models that make them suitable/unsuitable for each type/
Understand and explain the concepts of bias, variance and model complexity and the
bias variance tradeoff and the implications this holds for building robust models
Understand the importance of using train, test & holdout data samples during modeling
and be able to apply this method appropriately to fit and validate a model. List
advantages, disadvantages and common pitfalls when using this method.
Understand and apply the method of using cross validation during modeling. List
advantages, disadvantages and common pitfalls when using this method.
Supervised learning - For each of the following techniques, understand when it is
appropriate (including advantages, disadvantages, and limitations), describe data
needed, apply the method to data, and interpret and describe the results.
o Decision Trees
o Generalized Linear models (identity, poisson, gamma, tweedie, binomial and
shrinkage methods)
o Ensemble methods (bagging, boosting and blending – specifically Gradient boost
machines)
PA Pilot Certificate Program: Learning Objectives
3
•
•
Unsupervised learning - For each of the following techniques, understand when it is
appropriate (including advantages, disadvantages, and limitations), describe data
needed, apply the method to data, and interpret and describe the results.
o K-means clustering
o Hierarchical clustering
Advanced topics - For each of the following techniques, understand when it is
appropriate (including advantages, disadvantages, and limitations), describe data
needed
o Instance based learning
o Support Vector Machines
o Bayesian Learning
o Additive models
o Topic modeling
o Neural networks
o Gaussian mixture models
o Genetic equation search
o Grid search
PA Pilot Certificate Program: Learning Objectives
4