Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
ML course, 2016 fall
What you should know:
Week 1, 2 and 3: Basic issues in Probabilities and Information theory
Read: Chapter 2 from the Foundations of Statistical Natural Language Processing book by
Christopher Manning and Hinrich Schütze, MIT Press, 2002, and/or Probability Theory Review
for Machine Learning, Samuel Ieong, November 6, 2006 (https://see.stanford.edu/materials/aimlcs229/cs229prob.pdf).
Week 1: Random events
Concepts/definitions:
Theoretical results/formulas:
• event/sample space, random event
• elementary probability formula:
# favorable cases
# all possible cases
• the “multiplication” rule; the “chain” rule
• probability function
• conditional probabilities
• independent random events (2 forms);
conditionally independent random events (2
forms)
• “total probability” formula (2 forms)
• Bayes formula (2 forms)
Exercises illustrating the above concepts/definitions and theoretical results/formulas,
in particular: proofs for certain properties derived from the definition of the probability function
for instance: P (∅) = 0, P (Ā) = 1 − P (A), A ⊆ B ⇒ P (A) ≤ P (B)
Ciortuz et al.’s exercise book: ch. 1, ex. 1-5 [6-7], 43-46 [47-49]
Week 2: Random variables
Concepts/definitions:
• random variables;
random variables obtained through function composition
• discrete random variables;
probability mass function (pmf)
examples: Bernoulli, binomial, geometric, Poisson
distributions
• cumulative function distribution
• continuous random variables;
probability density function (pdf)
examples: Gaussian, exponential, Gamma distributions
• expectation (mean), variance, standard variation; covariance. (See definitions!)
• multi-valued random functions;
joint, marginal, conditional distributions
Theoretical results/formulas:
•P
for any discrete variable X:
x p(x) = 1, where p is the pmf of X
for
R any continuous variable X:
p(x) dx = 1, where p is the pdf of X
• E[X + Y ] = E[X] + E[Y ]
E[aX + b] = aE[X] + b
Var[aX] = a2 Var[X]
Var[X] = E[X 2 ] − (E[X])2
Cov(X, Y ) = E[XY ] − E[X]E[Y ]
• X, Y independent variables ⇒
Var[X + Y ] = Var[X] + Var[Y ]
• X, Y independent variables ⇒
Cov(X, Y ) = 0, i.e. E[XY ] = E[X]E[Y ]
• independence of random variables;
conditional independence of random variables
Advanced issues:
• For any vector of random variables, the covariance matrix is symmetric and positive semidefinite.
Advanced issues:
• the likelihood function (see also Week 12)
• vector of random variables;
covariance matrix for a vector of random variables;
pozitive [semi-]definite matrices,
negative [semi-]definite matrices
1
Ciortuz et al.’s exercise book: ch. 1, ex. 25
Exercises illustrating the above concepts/definitions and theoretical results/formulas, concentrating
especially on:
− identifying in a given problem’s text the underlying probabilistic distribution: either a basic one
(e.g., Bernoulli, binomial, categorial, multinomial etc.), or one derived [by function composition
or] by summation of identically distributed random variables (see for instance pr. 14, 51, 53);
− computing probabilities (see for instance pr. 51, 53a,c);
− computing means / expected values of random variables (see for instance pr. 2, 14, 52, 53b,d);
− verifying the [conditional] independence of two or more random variables (see for instance pr. 12,
17, 53, 54).
Ciortuz et al.’s exercise book: ch. 1, ex. 8-17 [18-22], 50-56 [57-59]
Week 3: Elements of Information Theory
Concepts/definitions:
• entropy;
specific conditional entropy;
average conditional entropy;
joint entropy;
information gain (mutual information)
Advanced issues:
• relative entropy;
• cross-entropy
Theoretical results/formulas:
• 0 ≤ H(X) ≤ H(1/n, 1/n, . . . , 1/n)
{z
}
|
n times
• IG(X; Y ) ≥ 0
• H(X, Y ) = H(X) + H(Y |X) = H(Y ) + H(X|Y )
(generalisation: the chain rule)
• IG(X; Y ) = H(X) − H(X|Y ) = H(Y ) − H(Y |X)
• H(X, Y ) = H(X) + H(Y ) iff X and Y are indep.
• IG(X; Y ) = 0 iff X and Y are independent
Exercises illustrating the above concepts/definitions and theoretical results/formulas, concentrating especially on:
− computing different types of entropies (see ex. 35 and 37);
− proof of some basic properties (see ex. 34a, 67), including the functional analysis of the
entropy of the Bernoulli distribution, as a base for drawing its plot;
− finding the best IG(Xi ; Y ), given a dataset with X1 , . . . , Xn as input variables and Y
as output variable (see pr. LDP-104, i.e. CMU, 2013 fall, E. Xing, W. Cohen, Sample
Questions, pr. 4).
Ciortuz et al.’s exercise book: ch. 1, ex. 33ab-35, 37, 66-70 [33c-g, 36, 38-41, 71]
2
Week 4 and 5: Decision Trees
Read: Chapter 3 from Tom Mitchell’s Machine Learning book.
Important Note:
See the Overview (rom.: “Privire de ansamblu”) document for Ciortuz et al.’s exercise book,
chapter 2. It is in fact a “road map” for what we will be doing here. (This note applies also to
all chapters.)
Week 4: decision trees and their properties; application of ID3 algorithm, properties of ID3:
Ciortuz et al.’s exercise book, ch. 2, ex. 1-7, 20-28
Week 5: extensions to the ID3 algorithm:
− handling of continuous attributes: Ciortuz et al.’s ex. book, ch. 2, ex. 8-9 [10], 29-30 [31-33]
− overfitting: Ciortuz et al.’s ex. book, ch. 2, ex. 8, 19bc, 29, 37b
− pruning strategies for decision trees: Ciortuz et al.’s ex. book, ch. 2, ex. 17, 18, 34, 35
using other impurity neasures as [local] optimality criterion in ID3: Ciortuz et al.’s ex. book,
ch. 2, ex. 13
− other issues: Ciortuz et al.’s ex. book, ch. 2, ex. 11, 12, 14-16, 19ad, 36, 37a
3
Week 6 and 7: Bayesian Classifiers
Read: Chapter 6 from Tom Mitchell’s Machine Learning book (except subsections 6.11 and
6.12.2).
Week 6: application of Naive Bayes and Joint Bayes algorithms:
Ciortuz et al.’s exercise book, ch. 3, ex. 1-4, 11-14
Week 7:
computation of the [training] error rate of Naive Bayes:
Ciortuz et al.’s exercise book, ch. 3, ex. 5-7, 15-17;
some properties of Naive Bayes and Joint Bayes algorithms:
Ciortuz et al.’s exercise book, ch. 3, ex. 8, 20;
comparisons with other classifiers:
Ciortuz et al.’s exercise book, ch. 3, ex. 18-19.
Advanced issues:
classes of ML hypotheses: MAP hypotheses vs. ML hypotheses
Ciortuz et al.’s exercise book, ch. 1 (Probilities and Statistics), ex. 32, 63, and ch. 3, ex. 9-10.
Week 7++: The Gaussian distribution:
Read: “Pattern Classification”, R. Duda, P. Hart, D. Stork, 2nd ed. (Wiley-Interscience, 2000),
Appendices A.4.12, A.5.1 and A.5.2.
The uni-variate Gaussian distribution: pdf, mean and variance, and cdf;
the variable transformation enabling to go from the non-standard case to the standard case;
[Advanced issue: the Central Limit Theorem – the i.i.d. case]
The multi-variate Gaussian distribution: pdf, the case of diagonal covariance matrix
Ciortuz et al.’s exercise book: ch. 1, ex. 23, 24, [27,] 26;
(parameter estimation: 31, 61, 64)
Advanced issues:
Gaussian Bayes classifiers and their relationship to logistic regression
Read: the draft of an additional chapter to T. Mitchell’s Machine Learning book: Generative
and discriminative classifiers: Naive Bayes and logistic regression (especially section 3)
Ciortuz et al.’s exercise book, ch. 3, ex. 21-26.
Week 8: midterm EXAM
4
Week 9: Instance-Based Learning
Read: Chapter 8 from Tom Mitchell’s Machine Learning book.
application of the k-NN algorithm:
Ciortuz et al.’s exercise book, ch. 4, pr. 1-7, 15-21, 23a
comparisons with the ID3 algoritm:
Ciortuz et al.’s exercise book, ch. 4, pr. 11, 13, 23b.
Shepard’s algoritm: Ciortuz et al.’s exercise book, ch. 4, pr. 8.
Advanced issues:
some theoretical properties of the k-NN algorithm:
Ciortuz et al.’s exercise book, ch. 4, pr. 9, 10, 22, 24.
5
Weeks 10-11, 12-14: Clustering
Read: Chapter 14 from Manning and Schütze’ Foundations of Statistical Natural Language
Processing book.
Week 10: Hierarchical Clustering:
see section B in the overview (Rom.: “Privire de ansamblu”) of the Clustering chapter in Ciortuz
et al.’s exercise book;
pr. 1-6, 25-28, 46-47.
Week 11: The k-Means Algorithm
see section C1 in the overview of the Clustering chapter in Ciortuz et al.’s exercise book;
application: pr. 7-11, 29-30,
properties (convergence and other issues): pr. 12-13, 31-34,
a “kernelized” version of k-Means: pr. 35,
comparison with the hierarchical clustering algoritms: pr. 14, 36,
implementation: pr. 48.
Week 12:
Estimating the parameters of probabilistic distributions: the MAP and MLE methods.
Read: Tom Mitchell, Estimating Probabilities, additional chapter to the Machine Learning book
(draft of 24 January 2016).
Ciortuz et al.’s exercise book, ch. 1, pr. 28-31, 60-62, 64, 65.
Week 13:
Using the EM algoritm to solve GMMs (Gaussian Mixture Models), the uni-variate case.
Read: Tom Mitchell, Machine Learning, sections 6.12.1 and 6.12.3;
see section C2 in the overview of the Clustering chapter in Ciortuz et al.’s exercise book;
pr. 15bc-18, 38.
Week 14: EM for GMMs (Gaussian Mixture Models), the multi-variate case:
Ciortuz et al.’s exercise book, the Clustering chapter: pr. 19-22, 23 (the general version), 39-45;
implementation: pr. 49.
Advanced issues: The EM Algorithmic Schemata
Read: Tom Mitchell, Machine Learning, sect. 6.12.2
convergence: Ciortuz et al.’s exercise book, ch. 6, pr. 1-2;
EM for solving mixtures of certain distributions: Bernoulli: pr. 3, 11, categorical: pr. [4,] 5, 12,
Gamma: pr. 14, exponential: pr. 9;
EM for solving a mixture (linear combination) of unspecified distributions: pr. 8;
application to text categorization: pr. [6,] 13;
a “hard” version of the EM algorithm: pr. 15.
Weeks 15-16: final EXAM
6