Download Model Deployment - University of Toronto

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Mixture model wikipedia , lookup

Transcript
Model Deployment
Dr. Saed Sayad
University of Toronto
2010
[email protected]
http://chem-eng.utoronto.ca/~datamining/
1
Model Deployment
• Creation of the model is generally not the end of the
project. Even if the purpose of the model is to increase
knowledge of the data, the knowledge gained will need to
be organized and presented in a way that the customer can
use it.
• Depending on the requirements, the deployment phase can
be as simple as generating a report or as complex as
implementing a repeatable data mining process.
• In many cases it will be the customer, not the data analyst,
who will carry out the deployment steps. However, even if
the analyst will not carry out the deployment effort it is
important for the customer to understand up front what
actions will need to be carried out in order to actually make
use of the created models.
http://chem-eng.utoronto.ca/~datamining/
2
Model Deployment - Poll
May 2009
http://www.kdnuggets.com/
http://chem-eng.utoronto.ca/~datamining/
3
Model Deployments
• Use the data mining tool
• Programming Scripts
– Java, C, VB, …
– SAS, SPSS, …
• SQL Scripts
– TSQL, PL-SQL, …
– SQL functions
• PMML (Predictive Model Markup Language)
http://chem-eng.utoronto.ca/~datamining/
4
Using Data Mining Tool (Orange)
http://www.ailab.si/orange/
http://chem-eng.utoronto.ca/~datamining/
5
Programming Scripts - Visual Basic
http://chem-eng.utoronto.ca/~datamining/
6
SQL Scripts - SQL Function
select RegressionModel(null,25000,'street')
http://chem-eng.utoronto.ca/~datamining/
7
• PMML is an XML-based language used to
define statistical and data mining models and
to share these between compliant
applications.
• PMML defines a standard not only to
represent data-mining models, but also data
handling and data transformations (pre and
post processing).
http://chem-eng.utoronto.ca/~datamining/
8
PMML
• It is developed by the DMG (Data Mining
Group) to avoid proprietary issues and
incompatibilities and to deploy models.
• PMML eliminates need for custom model
deployment and allows for the clear
separation of tasks: model development vs.
model deployment.
http://chem-eng.utoronto.ca/~datamining/
9
Predictive Models
supported by PMML
•
•
•
•
•
•
•
•
•
•
•
Regression
Neural Networks
Support Vector Machines
Decision Trees
Naïve Bayes
Clustering
Sequences
Rule Sets
Association Rules
Time-Series (as of PMML 4.0)
Text Models
http://chem-eng.utoronto.ca/~datamining/
10
PMML Processes
1. Pre-Processing
– Data Dictionary: Allows for the explicit specification of valid, invalid and missing
values.
– Mining Schema: Used to define the appropriate treatment to be applied to
missing and invalid values.
– Transformations: Allow for variable discretization, normalization, and mapping
with handling of missing and default values.
– Built-in Functions: Arithmetic expressions, handling of date and time as well as
strings. Also used for implementing IF-THEN-ELSE logic and Boolean operations.
2. Models
–
PMML allows for several predictive modeling techniques to be fully expressed.
3. Post-Processing
–
Scaling of model outputs can be performed with PMML element Targets.
http://chem-eng.utoronto.ca/~datamining/
11
PMML Components
http://chem-eng.utoronto.ca/~datamining/
12
PMML Components - Header
• Header: contains general information about the PMML
document, such as copyright information for the
model, its description, and information about the
application used to generate the model such as name
and version.
• It also contains an attribute for a timestamp which can
be used to specify the date of model creation.
http://chem-eng.utoronto.ca/~datamining/
13
PMML Components – Data Dictionary
• Data Dictionary: contains definitions for all the
possible fields used by the model. It is here that a
field is defined as continuous, categorical, or
ordinal.
• Depending on this definition, the appropriate
value ranges are then defined as well as the data
type (such as, string or double).
http://chem-eng.utoronto.ca/~datamining/
14
PMML Components – Data Transformations
• Data Transformations: transformations allow for the
mapping of user data into a more desirable form to be
used by the mining model. PMML defines several kinds
of simple data transformations.
– Normalization: map values to numbers, the input can be
continuous or discrete.
– Discretization: map continuous values to discrete values.
– Value mapping: map discrete values to discrete values.
– Functions: derive a value by applying a function to one or
more parameters.
– Aggregation: used to summarize or collect groups of
values.
http://chem-eng.utoronto.ca/~datamining/
15
Data Transformations
http://chem-eng.utoronto.ca/~datamining/
16
PMML Components – Model
• Model: contains the definition of the data mining
model. For example a fee-forward neural network is
represented in PMML by a "NeuralNetwork" element
which contains attributes such as:
–
–
–
–
–
Model Name (attribute modelName)
Function Name (attribute functionName)
Algorithm Name (attribute algorithmName)
Activation Function (attribute activationFunction)
Number of Layers (attribute numberOfLayers)
http://chem-eng.utoronto.ca/~datamining/
17
PMML Components – Mining Schema
• Mining Schema: the mining schema lists all fields used in the model. This
can be a subset of the fields as defined in the data dictionary. It contains
specific information about each field, such as:
– Name (attribute name): must refer to a field in the data dictionary
– Usage type (attribute usageType): defines the way a field is to be used in the
model. Typical values are: active, predicted, and supplementary. Predicted
fields are those whose values are predicted by the model.
– Outlier Treatment (attribute outliers): defines the outlier treatment to be use.
In PMML, outliers can be treated as missing values, as extreme values (based
on the definition of high and low values for a particular field), or as is.
– Missing Value Replacement Policy (attribute missingValueReplacement): if
this attribute is specified then a missing value is automatically replaced by the
given values.
– Missing Value Treatment (attribute missingValueTreatment): indicates how
the missing value replacement was derived (e.g. as value, mean or median).
http://chem-eng.utoronto.ca/~datamining/
18
Model and Schema
http://chem-eng.utoronto.ca/~datamining/
19
PMML Components – Targets
• Targets: allow for post-processing of the
predicted value in the format of scaling if the
output of the model is continuous.
• Targets can also be used for classification tasks. In
this case, the attribute priorProbability specifies a
default probability for the corresponding target
category. It is used if the prediction logic itself did
not produce a result. This can happen, e.g., if an
input value is missing and there is no other
method for treating missing values.
http://chem-eng.utoronto.ca/~datamining/
20
Targets
http://chem-eng.utoronto.ca/~datamining/
21
PMML 4.0 – New Features
• Improved Pre-Processing Capabilities: Additions to built-in functions
include a range of Boolean operations and an If-Then-Else function.
• Time Series Models: New exponential Smoothing models; also place
holders for ARIMA, Seasonal Trend Decomposition, and Spectral Analysis,
which are to be supported in the near future.
• Model Explanation: Saving of evaluation and model performance
measures to the PMML file itself.
• Multiple Models: Capabilities for model composition, ensembles, and
segmentation (e.g., combining of regression and decision trees).
• Extensions of Existing Elements: Addition of multi-class classification for
Support Vector Machines, improved representation for Association Rules,
and the addition of Cox Regression Models.
http://chem-eng.utoronto.ca/~datamining/
22
References
• http://www.dmg.org/
• http://en.wikipedia.org/wiki/Predictive_Model_Mark
up_Language
http://chem-eng.utoronto.ca/~datamining/
23