Download Machine learning - Lyle School of Engineering

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Artificial neural network wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Transcript
CSE 5331/7331
Fall 2007
Machine Learning
Margaret H. Dunham
Department of Computer Science and Engineering
Southern Methodist University
Some Slides extracted from Data Mining, Introductory and Advanced Topics, Prentice Hall, 2002.
Other slides from CS 545 at Colorado State University, Chuck Anderson
CSE 5331/7331 F'07
© Prentice Hall
1
Table of Contents
Introduction (Chuck Anderson)
 Statistical Machine Learning Examples

– Estimation
– EM
– Bayes Theorem
Decision Tree Learning
 Neural Network Learning

CSE 5331/7331 F'07
2
The slides in this introductory section are
from
CS545: Machine Learning
By Chuck Anderson
Department of Computer Science
Colorado State University
Fall 2006
CSE 5331/7331 F'07
3
What is Machine Learning?





Statistics ≈ the science of inference from data
Machine learning ≈ multivariate statistics +
computational statistics
Multivariate statistics ≈ prediction of values of
a function assumed to underlie a multivariate
dataset
Computational statistics ≈ computational
methods for statistical problems (aka
statistical computation) + statistical methods
which happen to be computationally intensive
Data Mining ≈ exploratory data analysis,
particularly with massive/complex datasets
CSE 5331/7331 F'07
4
Kinds of Learning


Learning algorithms are often categorized
according to the amount of information
provided:
Least Information:
– Unsupervised learning is more exploratory.
– Requires samples of inputs. Must find regularities.

More Information:
– Reinforcement learning most recent.
– Requires samples of inputs, actions, and rewards
or punishments.

Most Information:
– Supervised learning is most common.
– Requires samples of inputs and desired outputs.
CSE 5331/7331 F'07
5
Examples of Algorithms

Supervised learning
– Regression
» multivariate regression
» neural networks and kernel methods
– Classification
» linear and quadratic discrimination analysis
» k-nearest neighbors
» neural networks and kernel methods

Reinforcement learning
– multivariate regression
– neural networks

Unsupervised learning
– principal components analysis
– k-means clustering
– self-organizing networks
CSE 5331/7331 F'07
6
CSE 5331/7331 F'07
7
CSE 5331/7331 F'07
8
CSE 5331/7331 F'07
9
CSE 5331/7331 F'07
10
Table of Contents
Introduction (Chuck Anderson)
 Statistical Machine Learning Examples

– Estimation
– EM
– Bayes Theorem
Decision Tree Learning
 Neural Network Learning

CSE 5331/7331 F'07
11
Point Estimation




Point Estimate: estimate a population
parameter.
May be made by calculating the parameter for a
sample.
May be used to predict value for missing data.
Ex:
–
–
–
–
R contains 100 employees
99 have salary information
Mean salary of these is $50,000
Use $50,000 as value of remaining employee’s
salary.
Is this a good idea?
CSE 5331/7331 F'07
© Prentice Hall
12
Estimation Error

Bias: Difference between expected value and
actual value.

Mean Squared Error (MSE): expected value
of the squared difference between the
estimate and the actual value:

Why square?
Root Mean Square Error (RMSE)

CSE 5331/7331 F'07
© Prentice Hall
13
Jackknife Estimate


Jackknife Estimate: estimate of parameter
is obtained by omitting one value from the set
of observed values.
Ex: estimate of mean for X={x1, … , xn}
CSE 5331/7331 F'07
© Prentice Hall
14
Maximum Likelihood
Estimate (MLE)



Obtain parameter estimates that maximize
the probability that the sample data occurs for
the specific model.
Joint probability for observing the sample
data by multiplying the individual probabilities.
Likelihood function:
Maximize L.
CSE 5331/7331 F'07
© Prentice Hall
15
MLE Example

Coin toss five times: {H,H,H,H,T}

Assuming a perfect coin with H and T equally
likely, the likelihood of this sequence is:

However if the probability of a H is 0.8 then:
CSE 5331/7331 F'07
© Prentice Hall
16
MLE Example (cont’d)

General likelihood formula:

Estimate for p is then 4/5 = 0.8
CSE 5331/7331 F'07
© Prentice Hall
17
Expectation-Maximization
(EM)
Solves estimation with incomplete data.
 Obtain initial estimates for parameters.
 Iteratively use estimates for missing
data and continue until convergence.

CSE 5331/7331 F'07
© Prentice Hall
18
EM Example
CSE 5331/7331 F'07
© Prentice Hall
19
EM Algorithm
CSE 5331/7331 F'07
© Prentice Hall
20
Bayes Theorem




Posterior Probability: P(h1|xi)
Prior Probability: P(h1)
Bayes Theorem:
Assign probabilities of hypotheses given a
data value.
CSE 5331/7331 F'07
© Prentice Hall
21
Bayes Theorem Example


Credit authorizations (hypotheses):
h1=authorize purchase, h2 = authorize after
further identification, h3=do not authorize,
h4= do not authorize but contact police
Assign twelve data values for all
combinations of credit and income:
1
Excellent
Good
Bad

x1
x5
x9
2
3
4
x2
x6
x10
x3
x7
x11
x4
x8
x12
From training data: P(h1) = 60%; P(h2)=20%;
P(h3)=10%; P(h4)=10%.
CSE 5331/7331 F'07
© Prentice Hall
22
Bayes Example(cont’d)

Training Data:
ID
1
2
3
4
5
6
7
8
9
10
CSE 5331/7331 F'07
Income
4
3
2
3
4
2
3
2
3
1
Credit
Excellent
Good
Excellent
Good
Good
Excellent
Bad
Bad
Bad
Bad
© Prentice Hall
Class
h1
h1
h1
h1
h1
h1
h2
h2
h3
h4
xi
x4
x7
x2
x7
x8
x2
x11
x10
x11
x9
23
Bayes Example(cont’d)



Calculate P(xi|hj) and P(xi)
Ex: P(x7|h1)=2/6; P(x4|h1)=1/6; P(x2|h1)=2/6;
P(x8|h1)=1/6; P(xi|h1)=0 for all other xi.
Predict the class for x4:
– Calculate P(hj|x4) for all hj.
– Place x4 in class with largest value.
– Ex:
»P(h1|x4)=(P(x4|h1)(P(h1))/P(x4)
=(1/6)(0.6)/0.1=1.
»x4 in class h1.
CSE 5331/7331 F'07
© Prentice Hall
24
Table of Contents
Introduction (Chuck Anderson)
 Statistical Machine Learning Examples

– Estimation
– EM
– Bayes Theorem
Decision Tree Learning
 Neural Network Learning

CSE 5331/7331 F'07
25
Twenty Questions Game
CSE 5331/7331 F'07
© Prentice Hall
26
Decision Trees

Decision Tree (DT):
– Tree where the root and each internal node is
labeled with a question.
– The arcs represent each possible answer to
the associated question.
– Each leaf node represents a prediction of a
solution to the problem.

Popular technique for classification; Leaf
node indicates class to which the
corresponding tuple belongs.
CSE 5331/7331 F'07
© Prentice Hall
27
Decision Tree Example
CSE 5331/7331 F'07
© Prentice Hall
28
Decision Trees
How do you build a good DT?
 What is a good DT?
 Ans: Supervised Learning

CSE 5331/7331 F'07
© Prentice Hall
29
Comparing DTs
Balanced
Deep
CSE 5331/7331 F'07
© Prentice Hall
30
Decision Tree Induction is often based on
Information Theory
So
CSE 5331/7331 F'07
© Prentice Hall
31
Information
CSE 5331/7331 F'07
© Prentice Hall
32
DT Induction
When all the marbles in the bowl are
mixed up, little information is given.
 When the marbles in the bowl are all
from one class and those in the other
two classes are on either side, more
information is given.

Use this approach with DT Induction !
CSE 5331/7331 F'07
© Prentice Hall
33
Information/Entropy

Given probabilitites p1, p2, .., ps whose sum is
1, Entropy is defined as:

Entropy measures the amount of randomness
or surprise or uncertainty.
Goal in classification

– no surprise
– entropy = 0
CSE 5331/7331 F'07
© Prentice Hall
34
Table of Contents
Introduction (Chuck Anderson)
 Statistical Machine Learning Examples

– Estimation
– EM
– Bayes Theorem
Decision Tree Learning
 Neural Network Learning

CSE 5331/7331 F'07
35
Neural Networks
Based on observed functioning of human
brain.
 (Artificial Neural Networks (ANN)
 Our view of neural networks is very
simplistic.
 We view a neural network (NN) from a
graphical viewpoint.
 Alternatively, a NN may be viewed from
the perspective of matrices.
 Used in pattern recognition, speech
recognition, computer vision, and
classification.

CSE 5331/7331 F'07
© Prentice Hall
36
Neural Networks

Neural Network (NN) is a directed graph
F=<V,A> with vertices V={1,2,…,n} and arcs
A={<i,j>|1<=i,j<=n}, with the following
restrictions:
– V is partitioned into a set of input nodes, VI,
hidden nodes, VH, and output nodes, VO.
– The vertices are also partitioned into layers
– Any arc <i,j> must have node i in layer h-1
and node j in layer h.
– Arc <i,j> is labeled with a numeric value wij.
– Node i is labeled with a function fi.
CSE 5331/7331 F'07
© Prentice Hall
37
Neural Network Example
CSE 5331/7331 F'07
© Prentice Hall
38
NN Node
CSE 5331/7331 F'07
© Prentice Hall
39
NN Activation Functions
Functions associated with nodes in
graph.
 Output may be in range [-1,1] or [0,1]

CSE 5331/7331 F'07
© Prentice Hall
40
NN Activation Functions
CSE 5331/7331 F'07
© Prentice Hall
41
NN Learning
Propagate input values through graph.
 Compare output to desired output.
 Adjust weights in graph accordingly.

CSE 5331/7331 F'07
© Prentice Hall
42