Download mt13-req

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nearest-neighbor chain algorithm wikipedia , lookup

Principal component analysis wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Mixture model wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Naive Bayes classifier wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Dr. Eick
COSC 6342 2013 Midterm Exam Review List
The midterm exam has been scheduled for We. March 20, 2:30-2:50p in our class room.
The exam is open book and notes—you are not allowed to use any computers!
Relevant Material: All transparencies covered in the lectures (except transparencies
associate with Topic4 and DENCLUE transparencies with topic 9) and the following
pages of the textbook (second edition) are relevant for the 2013 midterm exam:
1-14, 21-28, 30-42, 47-55, 61-73, 76-84, 87-93, 108-120, 125-128, 143-154, 163-172,
174-181 (except line smoother), 186-197, DBSCAN paper, some Wikipedia pages
referenced by transparencies. Moreover, I recommend to read the descriptions of Kmeans, EM, and kNN in the “Top 10 data mining algorithms” article, posted on the
webpage.
Checklist:
What is machine learning?
hypothesis class, VC-dimension, basic regression, overfitting, underfitting, training set,
test set, cross-validation, model complexity, triple trade-off (Dieterich), performance
measure/loss function.
Bayes’ Theorem, Naïve Bayesian approach, losses and risks, derive optimal decision
rules for a given cost/risk function.
Maximum likely hood estimation, variance and bias, noise, Bayes’ estimator and MAP,
parametric classification, model selection procedures, multivariate Gaussian, covariance
matrix, Malhalanobis distance, PCA (goals and objectives, what does it actually do, what
is it used for, how many principal components do we choose?), multidimensional scaling
(only what is it used for, and how is it different from PCA).
K-means (prototype-based/representative-based clustering, how does the algorithm work,
optimization procedure, algorithm properties), EM (assumptions of the algorithm,
mixture of Gaussians, how does it work, how is cluster membership estimated, how is the
model updated from cluster membership, relationship to K-means), DBSCAN (how does
it form clusters, input parameters, how is it different from K-mean/EM).
Non-parametric density estimation (histogram, naïve estimator, Gaussian kernel
estimator), non-parametric regression (Running mean and kernel smoother, how is it
different from regular regression), k-nearest neighbor classification (transparencies only).
Decision trees (how are they generated from data sets, how do they classify data, pruning,
properties of decision tree classifiers), Regression Trees (what they are used for, how are
they generated from a dataset).
1
Transparencies and Other Teaching Material
Course Organization ML Spring 2011
Topic 1: Introduction to Machine Learning(Eick/Alpaydin Introduction, Tom Mitchell's
Introduction to ML---only slides 1-8 and 15-16 will be used)
Topic 2: Supervised Learning (examples of classification techniques: Decision Trees, kNN)
Topic 3: Bayesian Decision Theory (excluding Belief Networks) and Naive Bayes (Eick
on Naive Bayes)
Topic 4: Using Curve Fitting as an Example to Discuss Major Issues in ML (read Bishop
Chapter1 in conjuction with this material; not covered in 2011)
Topic 5: Parametric Model Estimation
Topic 6: Dimensionality Reduction Centering on PCA (PCA Tutorial, Arindam
Banerjee's More Formal Discussion of the Objectives of Dimensionality Reduction)
Topic 7: Clustering1: Mixture Models, K-Means and EM (Introduction to Clustering,
Modified Alpaydin transparencies, Top 10 Data Mining Algorithms paper)
Topic 8: Non-Parametric Methods Centering on kNN and Density Estimation(kNN, NonParametric Density Estimation, Summary Non-Parametric Density Estimation, Editing
and Condensing Techniques to Enhance kNN, Toussant's survey paper on editing,
condesing and proximity graphs)
Topic 9: Clustering 2: Density-based Clustering (DBSCAN paper, DENCLUE2 paper)
Topic 10: Decision Trees
2