* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download mt11-req
Nearest-neighbor chain algorithm wikipedia , lookup
Principal component analysis wikipedia , lookup
Mixture model wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Cluster analysis wikipedia , lookup
Nonlinear dimensionality reduction wikipedia , lookup
Multinomial logistic regression wikipedia , lookup
Dr. Eick COSC 6342 2011 Midterm Exam Review List Relevant Material: The midterm exam has been scheduled for Th., March 10, 1-2:30p in 343 PGH. The exam is open book and notes. All transparencies covered in the lectures and the following pages of the textbook are relevant for the 2011 midterm exam: 1-14, 21-28, 30-42, 47-55, 61-73, 76-84, 87-93, 110-120, 125-128, 143-157, 163-172, 174-181 (except line smoother), 186-197 (no regression trees). Moreover, I recommend to read the description of K-means, EM, and kNN in the “Top 10 data mining algorithms” article, posted on the webpage. Checklist: hypothesis class, VC-dimension, basic regression, overfitting, underfitting, training set, test set, validation set, cross-validation, model complexity, triple trade-off (Dieterich), performance measure/loss function. Bayes’ Theorem, Naïve Bayesian approach, losses and risks, decision rules. Maximum likely hood estimation, variance and bias, noise, Bayes’ estimator and MAP, parametric classification, model selection procedures, multivariate Gaussian, covariance matrix, Malhalanobis distance, PCA (goals and objectives, what does it actually do, what is it used for, how many principal components do we choose?), multidimensional scaling (only what is it used for, and how is it different from PCA). K-means (prototype-based/representative-based clustering, how does the algorithm work, optimization procedure, algorithm properties), EM (assumptions of the algorithm, mixture of Gaussians, how does it work, how is cluster membership estimated, how is the model updated from cluster membership, relationship to K-means). Non-parametric density estimation (histogram, naïve estimator, Gaussian kernel estimator), non-parametric regression (Running mean and kernel smoother, how is it different from regular regression), k-nearest neighbor classification (transparencies only). Decision trees (how are the generated from data sets, how do they classify data, pruning, properties of decision tree classifiers). 1