Download Data Mining Packages in R

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Mixture model wikipedia, lookup

Nonlinear dimensionality reduction wikipedia, lookup

Multinomial logistic regression wikipedia, lookup

Data Mining Packages in R:
logistic regression and SVM
Jiang Du
March 2008
Logistic Regression
• lrm in package ``Design”
• glm in package ``stats”
• …
Logistic Regression: lrm
lrm(formula, data, subset,
na.action=na.delete, method="",
model=FALSE, x=FALSE, y=FALSE,
penalty=0, penalty.matrix, tol=1e-7,
var.penalty=c('simple','sandwich'), weights,
normwt, ...)
• Formula
– a formula object. An offset term can be
included. The offset causes fitting of a
model such as logit(Y=1) = Xβ + W,
where W is the offset variable having no
estimated coefficient. The response
variable can be any data type; lrm converts
it in alphabetic or numeric order to an S
factor variable and recodes it 0,1,2,...
– data frame to use. Default is the current
## S3 method for class 'lrm':
predict(object, ..., type=c("lp", "fitted",
"fitted.ind", "mean", "x", "data.frame", "terms",
"adjto","", "model.frame"),, codes=FALSE)
• Object
– a object created by lrm
– arguments passed to predict.Design, such
as kint and newdata (which is used if you
are predicting out of data). See
predict.Design to see how NAs are
Logistic Regression: lrm
• Fitting training data
– model = lrm(Class ~ X + Y + Z, data=train)
• Prediction on new data
– To get logit(Y=1)
• predict(model, newdata = test, type = “lp”)
– To get Pr(Y=1)
• predict(model, newdata = test, type = “fitted.ind”)
The models fit by, e.g., the lm and glm functions are specified in a compact symbolic form.
The ~ operator is basic in the formation of such models. An expression of the form y ~
model is interpreted as a specification that the response y is modelled by
linear predictor specified symbolically by model. Such a model
consists of a series of terms separated by + operators. The terms
themselves consist of variable and factor names separated by :
operators. Such a term is interpreted as the interaction of all the variables and factors
appearing in the term.
In addition to + and :, a number of other operators are useful in model formulae. The * operator
denotes factor crossing: a*b interpreted as a+b+a:b. The ^ operator indicates crossing
to the specified degree. For example (a+b+c)^2 is identical to (a+b+c)*(a+b+c)
which in turn expands to a formula containing the main effects for a, b and c together with their
second-order interactions. The %in% operator indicates that the terms on its left are nested
within those on the right. For example a + b %in% a expands to the formula a + a:b. The operator removes the specified terms, so that (a+b+c)^2 - a:b is identical to a + b +
c + b:c + a:c. It can also used to remove the intercept term: y ~ x - 1 is a line through the
origin. A model with no intercept can be also specified as y ~ x + 0 or y ~ 0 + x.
Logistic Regression: glm
• Fitting training data
– model = glm(Class ~ X + Y + Z, data=train,
• Prediction on new data
– To get logit(Y=1)
• predict(model, newdata = test)
– To get Pr(Y=1)
• predict(model, newdata = test, type = “response”)
• svm in ``e1071”
• ksvm in ``kernlab”
SVM: svm
• the kernel used in training and predicting. You might
consider changing some of the following parameters,
depending on the kernel type.
– linear:
• u'*v
– polynomial:
• (gamma*u'*v + coef0)^degree
– radial basis:
• exp(-gamma*|u-v|^2)
– sigmoid:
• Tanh(gamma*u'*v + coef0)
SVM: svm
• Training
– model = svm(Class ~ X + Y + Z, data=train, type =
"C“, kernel = “linear”)
• Prediction
– predict(model, newdata = test)