Download Wordsmithing - Personal Pages

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
cs339: Artificial Intelligence
Wordsmithing
Bayes Classifiers
It can be fun and amazing to see how a computer can learn just like people do. It
is especially cool to see when we can use a computer to put things into two different
categories. It is important that we compute the right statistics for a set of data,
called the training data. A set of data is a normal distribution if it has a humped
shape, like a camel. The center of the hump is the average value and how fat the
hump is is another important measurement. These two items can be measured from
the training data. These two items specify which normal distribution best matches
the training data. This is important because we can use these to classify the data
samples.
A classification problem is to use the training data to decide how to classify
new data. Suppose a new data item, x, comes in. We can look back at the training
data to see how items similar to x were categorized. Then we can classify x the
same way. More formally, we know that our training data is normal. So we can
measure its mean and variance. This allows us to compute the probablity P (x|C)
which is the probability of x given C. This is the normal formula. We also use the
evidence and the probability of each category to give us the category probability,
so we can decide how to classify x, because the category for x is the one with the
higher probability and therefore is more likely to have given us x.
1
Bayesian Dichotemizers
A Bayesian dichotemizer is a supervised learning technique to classify data
samples into one of two categories. We use a training set of data, consisting of
randomly drawn data samples (or observations) and the category (or label) to which
they belong, to learn to classify a future data sample, x, into one of two categories,
C1 or C2 . The classifier uses Baye’s formula of conditional probability:
P (Ci |x) =
P (x|Ci )P (Ci )
P (x)
(1)
The term P (Ci |x), called the posterior probability, computes the probability
that the data sample x originated from a particular category Ci . P (Ci ), the prior
probability, is our belief about the probability of some sample coming from category Ci before we yet observe x; this is measured by computing the percentage
of each category in a training set of samples assuming that future samples will
be drawn at the same relative frequency. P (x|Ci ) is called the likelihood of observing sample x knowing that we draw from category Ci . Finally the evidence,
P (x), gives us the probability of observing data sample x; this term acts like a
normalizing constant in computing probabilities.
The key to a Bayesian classifier is computing the likelihoods, P (x|Ci ), by assuming these probabilities fit a well-known distribution, typically the normal distribution. A training set of data samples is separated into the two categories specified
by the labels. Measuring the mean (µ) and variance (σ 2 ) of the training data in each
category allows us to use Equation 2 as a closed form expression for each likelihood. The likelihood computations are then used in the posteriors (Equation 1) to
decide which of two categories is most likely responsible for observation x.
(x−µ)2
1
P (x|Ci ) = √ e− 2σ2
σ 2π
Normal Probability Distribution
2
(2)
• Remove casual language.
It can be fun and amazing to see how a computer can learn just like people
do. It is especially cool to see when we can use a computer to put things into
two different categories.
Artificial Intelligence is the study of imitating human intelligence in algorithms. One aspect of AI is classification.
• Use terminology correctly.
A set of data is a normal distribution if it has a humped shape, like a camel.
The center of the hump is the average value and how fat the hump is is another important measurement.
A normal distribution, characterized by a single hump, is specified by the
mean of the data (the center of the hump) and its variance (the spread of the
hump).
• Combine short choppy sentences.
These two items can be measured from the training data. These two items
specify which normal distribution best matches the training data. This is important because we can use these to classify the data samples.
The mean and variance, which are measured from the training data, are used
to classify new data samples.
Measuring the mean (µ) and variance (σ 2 ) of the training data in each category allows us to use Equation 2 as a closed form expression for each likelihood.
• Rework run-on or extra long sentences.
We also use the evidence and the probability of each category to give us the
category probability, so we can decide how to classify x, because the category for x is the one with the higher probability and therefore is more likely
to have given us x.
3
The evidence and prior probabilities are combined with the likelihood to
compute the posterior probabilities, which can be used to classify a new data
sample.
• Be precise. Be concise.
A classification problem is to use the training data to decide how to classify
new data. Suppose a new data item, x, comes in. We can look back at the
training data to see how other items were categorized. Then we can classify
x the same way.
In supervised learning, statistical properties present in the training data are
used to classify new observations.
4