Download mt11-req

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nearest-neighbor chain algorithm wikipedia , lookup

Principal component analysis wikipedia , lookup

Mixture model wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Multinomial logistic regression wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

K-means clustering wikipedia , lookup

Transcript
Dr. Eick
COSC 6342 2011 Midterm Exam Review List
Relevant Material: The midterm exam has been scheduled for Th., March 10, 1-2:30p in
343 PGH. The exam is open book and notes. All transparencies covered in the
lectures and the following pages of the textbook are relevant for the 2011 midterm exam:
1-14, 21-28, 30-42, 47-55, 61-73, 76-84, 87-93, 110-120, 125-128, 143-157, 163-172,
174-181 (except line smoother), 186-197 (no regression trees).
Moreover, I recommend to read the description of K-means, EM, and kNN in the “Top
10 data mining algorithms” article, posted on the webpage.
Checklist: hypothesis class, VC-dimension, basic regression, overfitting, underfitting,
training set, test set, validation set, cross-validation, model complexity, triple trade-off
(Dieterich), performance measure/loss function.
Bayes’ Theorem, Naïve Bayesian approach, losses and risks, decision rules.
Maximum likely hood estimation, variance and bias, noise, Bayes’ estimator and MAP,
parametric classification, model selection procedures, multivariate Gaussian, covariance
matrix, Malhalanobis distance, PCA (goals and objectives, what does it actually do, what
is it used for, how many principal components do we choose?), multidimensional scaling
(only what is it used for, and how is it different from PCA).
K-means (prototype-based/representative-based clustering, how does the algorithm work,
optimization procedure, algorithm properties), EM (assumptions of the algorithm,
mixture of Gaussians, how does it work, how is cluster membership estimated, how is the
model updated from cluster membership, relationship to K-means).
Non-parametric density estimation (histogram, naïve estimator, Gaussian kernel
estimator), non-parametric regression (Running mean and kernel smoother, how is it
different from regular regression), k-nearest neighbor classification (transparencies only).
Decision trees (how are the generated from data sets, how do they classify data, pruning,
properties of decision tree classifiers).
1