Download Supplemental digital content 4: Supplemental Text S4. Classification

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Choice modelling wikipedia , lookup

Least squares wikipedia , lookup

Time series wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Supplemental digital content 4:
Supplemental Text S4. Classification Algorithm and Parameter Setting
Support Vector Machine
The Support Vector Machine (SVM) is a supervised learning method that generates
input-output mapping functions from a set of labeled training data. The mapping function can
be either a classification function or a regression function. For classification, nonlinear kernel
functions are often used to transform input data to a high-dimensional feature space in which
the input data become more separable compared to the original input space.
Maximum-margin hyperplanes are then created. The produced model depends on only a
subset of the training data near the class boundaries [1] .
In our research, two different kernel functions are selected: linear function and RBF
function. Primary parameters are set as follows:
Stoping Criteria = 1*10-3;Regression Precision = 0.1;RBF γ = 0.1;γ =1;Bias = 0;
Degree=3.
Neural Network
Neuron is the basic element of a neural network which is a simple virtual device that
accepts many inputs, sums them, applies a transfer function, and generates the result, either as
a model prediction or as input to other neurons.
Many such neurons connected in a systematic way to construct a neural network. The
neurons in networks are arranged in layers. Typically, there is one input layer for input
neurons, one or more hidden layers for internal processing units, and one output layer for
output neurons. Each layer is fully interconnected to the preceding layer and the following
layer.
The connections between neurons have weights associated with them, which determine
the strength of influence one neuron has on another. Information flows from the input layer
through the hidden layers to the output layer to generate predictions. The network “learns” to
generate better and better prediction, by adjusting the connection weights during training to
match predictions to target values for specific records.
In our research, the neural networks used are feed-forward neural networks. In
feed-forward neural network, the input neurons have their activations set to the values of the
encoded input fields. The activation of each neuron in a hidden or output layer is calculated as
[2] :
(1)
where ai is the activation of neuron; i, j is the set of neurons in the preceding layer; wij is the
weight of the connection between neuron i and neuron j, oj is the output of neuron j, and
σ(x)is the sigmoid or logistic transfer function.
(2)
Primary parameters are set as follows:
Number of Hidden layers = 2; Layer 1 = 20; Layer 2 =15;Persistence =200;α = 0.9;
Initial η=0.3;η decay=30;High η=0.1;Low η=0.01.
Decision Tree (C4.5)
Decision tree learning is a method commonly used in data mining whose goal is to create
a model that predicts the value of a target variable based on several input variables. The basic
components of decision tree include node, edge and leaf. Each node corresponds to one of the
input variables; there are edges to children for each of the possible values of that input
variable; each leaf represents a value of the target variable given the values of the input
variables represented by the path from the root to the leaf.
C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan [3] .
C4.5 builds decision trees from a set of training data using the concept of information entropy.
The attribute which can most effectively splits the sample set into subsets enriched in one or
the other will be selected at each node of the tree. Its criterion is the normalized information
gain (difference in entropy) that results from choosing an attribute for splitting the data. The
attribute with the highest normalized information gain is chosen to make the decision. The
C4.5 algorithm then recurses on the smaller sublists.
Primary parameters are set as follows:
Maximim Depth=7;Pruning Severity=75;Minimum Percentage of Records in a Node =
0.05; Minimum Percentage of Records for a Split = 0.10.
Logistic regression
Logistic regression is a statistical technique for classifying records based on values of
input fields. Logistic regression is a statistical model for classifying records by input variables.
Logistic regression is similar to linear regression but takes a categorical target field instead of
a numeric one. It supports two different models: binomial model (two classes) and
multinomial model (more than two classes).
Logistic regression builds a set of equations which relate the input field values and
evaluates the probabilities associated with each of the output field categories. The target
category with the highest probability is assigned as the predicted output value for that record.
In our research, binomial model was selected because of two kinds of subjects. Primary
parameters are set as follows:
Singularity Tolerance = 1*10-8; Scale = 1; Maximum Iterations = 100; Log-likelihood
Convergence=0; Parameter Convergence = 1*10-6; σ=0.
Linear Discriminant Analysis
Linear discriminant analysis (LDA) is the methods used in machine learning to find a
linear combination of features which characterizes or separates two or more classes of
objects.
LDA is closely related to ANOVA (analysis of variance) and regression analysis, which
also attempt to express one dependent variable as a linear combination of other features or
measurements [4] . In the other two methods however, the dependent variable is a numerical
quantity, while for LDA it is a categorical variable.
For two classes, LDA assumes the conditional probability density functions which are
both normally distributed with mean and covariance parameters respectively. Under this
assumption, the Bayes optimal solution is to predict points as being from the second class if
the log of the likelihood ratios is below some threshold T.
Primary parameters are set as follows:
Prior Probailites=All Groups Equal; Covariance Matrix=Within-groups; Estimation=
Maximum-likelihood Estimators;Tolerance to singularity = 1*10-4.
References:
[1] Ben-Hur A and Weston J. A User's Guide to Support Vector Machines. Methods Mol
Biol. 2010;609:223-39.
[2] Fine TL. Feedforward Neural Network Methodology, 3rd ed. New York: Springer-Verlag;
1999.
[3] Quinlan JR. C4.5: Programs for Machine Learning. Burlington: Morgan Kaufmann
Publishers; 1993.
[4] McLachlan GJ. Discriminant Analysis and Statistical Pattern Recognition. Wiley; 2004.