Download Wordsmithing - Personal Pages

cs339: Artificial Intelligence Wordsmithing Bayes Classifiers It can be fun and amazing to see how a computer can learn just like people do. It is especially cool to see when we can use a computer to put things into two different categories. It is important that we compute the right statistics for a set of data, called the training data. A set of data is a normal distribution if it has a humped shape, like a camel. The center of the hump is the average value and how fat the hump is is another important measurement. These two items can be measured from the training data. These two items specify which normal distribution best matches the training data. This is important because we can use these to classify the data samples. A classification problem is to use the training data to decide how to classify new data. Suppose a new data item, x, comes in. We can look back at the training data to see how items similar to x were categorized. Then we can classify x the same way. More formally, we know that our training data is normal. So we can measure its mean and variance. This allows us to compute the probablity P (x|C) which is the probability of x given C. This is the normal formula. We also use the evidence and the probability of each category to give us the category probability, so we can decide how to classify x, because the category for x is the one with the higher probability and therefore is more likely to have given us x. 1 Bayesian Dichotemizers A Bayesian dichotemizer is a supervised learning technique to classify data samples into one of two categories. We use a training set of data, consisting of randomly drawn data samples (or observations) and the category (or label) to which they belong, to learn to classify a future data sample, x, into one of two categories, C1 or C2 . The classifier uses Baye’s formula of conditional probability: P (Ci |x) = P (x|Ci )P (Ci ) P (x) (1) The term P (Ci |x), called the posterior probability, computes the probability that the data sample x originated from a particular category Ci . P (Ci ), the prior probability, is our belief about the probability of some sample coming from category Ci before we yet observe x; this is measured by computing the percentage of each category in a training set of samples assuming that future samples will be drawn at the same relative frequency. P (x|Ci ) is called the likelihood of observing sample x knowing that we draw from category Ci . Finally the evidence, P (x), gives us the probability of observing data sample x; this term acts like a normalizing constant in computing probabilities. The key to a Bayesian classifier is computing the likelihoods, P (x|Ci ), by assuming these probabilities fit a well-known distribution, typically the normal distribution. A training set of data samples is separated into the two categories specified by the labels. Measuring the mean (µ) and variance (σ 2 ) of the training data in each category allows us to use Equation 2 as a closed form expression for each likelihood. The likelihood computations are then used in the posteriors (Equation 1) to decide which of two categories is most likely responsible for observation x. (x−µ)2 1 P (x|Ci ) = √ e− 2σ2 σ 2π Normal Probability Distribution 2 (2) • Remove casual language. It can be fun and amazing to see how a computer can learn just like people do. It is especially cool to see when we can use a computer to put things into two different categories. Artificial Intelligence is the study of imitating human intelligence in algorithms. One aspect of AI is classification. • Use terminology correctly. A set of data is a normal distribution if it has a humped shape, like a camel. The center of the hump is the average value and how fat the hump is is another important measurement. A normal distribution, characterized by a single hump, is specified by the mean of the data (the center of the hump) and its variance (the spread of the hump). • Combine short choppy sentences. These two items can be measured from the training data. These two items specify which normal distribution best matches the training data. This is important because we can use these to classify the data samples. The mean and variance, which are measured from the training data, are used to classify new data samples. Measuring the mean (µ) and variance (σ 2 ) of the training data in each category allows us to use Equation 2 as a closed form expression for each likelihood. • Rework run-on or extra long sentences. We also use the evidence and the probability of each category to give us the category probability, so we can decide how to classify x, because the category for x is the one with the higher probability and therefore is more likely to have given us x. 3 The evidence and prior probabilities are combined with the likelihood to compute the posterior probabilities, which can be used to classify a new data sample. • Be precise. Be concise. A classification problem is to use the training data to decide how to classify new data. Suppose a new data item, x, comes in. We can look back at the training data to see how other items were categorized. Then we can classify x the same way. In supervised learning, statistical properties present in the training data are used to classify new observations. 4

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Wordsmithing - Personal Pages