Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
0) Set up your data You have a bunch of labeled examples. A label comes from one of two ways: 1) You asked a person to draw a shape of (or not of) a certain type – that is a positive (or negative) label for that type. 2) You asked a person to label a shape of a certain type. If you asked one person to draw a shape and then two people to verify the shape, that provides you with 3 labels for that shape. At this point you have 2 options for setting up your data. Option a) include all of your labels as separate examples (i.e., a drawn shape with two verifiers as in the example above will be added to your list of data examples three times – one for each label.) Option b) 1) Identify Fundamental and Contextual Features: Example, if you are choosing feature values to determine if two lines are parallel, possible values, which may affect if people label two lines as parallel are: Fundamental Features: Angle between the two lines Contextual Features: Angle of line 1 Angle of line 2 If lines are close to horizontal If lines are close to vertical Ratio of line lengths Length of longest line Length of shortest line Closest distance between the two lines Furthest distance between the two lines Distance between two closest endpoints Distance between two center points Screen size Ratio of longest length to screen size Contextual Features that you may not have access too: Pressure (line thickness) Accompanying words Accompanying hand gestures Drawing domain Other lines on the page You get the idea. There are many possible features, I have only listed a hand full, there are many more. Your job is to list as many as you possibly can. 2) Measure Feature Values In an ideal world, you would compute the value for each of the features above for each of your positive and negative examples. However, since this is a class project, you will select all of your fundamental features, and at least one contextual feature. (You should have at least two features, but ideally 3 or more features total.) For each examples, compute the value of each of your examples. E.g., Feature 1 (angle) = 23 Feature 2 (distance between midpoints) = 13 Feature 3 (length ratio) = 2.1 (23, 13, 2.1) for example 1. Do this for each example you have. 3) Compute Mean and Standard Deviation for Positive and Negative Examples Group the positive and negative examples into two groups. Group 1 = positive examples Group 2 = negative examples For each feature, a) compute the mean value of the feature for each group. The mean = E(X) is the average of all of the values. X = the actual value of a feature. E(X) is the expected value for that feature, which is the mean for our purposes. E(X) =( x1 + x2 + x3 + … + xn) / N b) compute the standard deviation of the feature for each group. The standard deviation is E(X – E(X))2)) or = ((E(X2) – (E)X))2) or (((x1 - E(X))2 + (x2 - E(X))2 + (x3 - E(X))2 + … + (xn - E(X))2) / N). or variance = v = (E(X2) – (E)X))2 v All of the above formulas are functionally equivalent. I wrote them in several different ways to help with comprehension. In English, to compute the value of the standard deviation, simply compute the distance from each feature value to the mean, and square that distance. Then take the average of all of the squares of the distances. That number is the variance. The square root of that is the standard deviation. The standard deviation represents far from the mean you can expect that values to stray. Note, in actuality, we are not computing the true standard deviation because we do not have access to every possible positive and negative example. Rather, we are computing the standard deviation of our sample set, which we hope to be a representation of the underlying population. Thus, we should really be dividing by (N-1), rather than N. The idea behind that is this is that necessarily the standard deviation is closer to the sample set, than to the actual population, and our sample set is going to have a slight bias. (Our sample is not an unbiased estimator for the true standard deviation; it tends to underestimate the population standard deviation.) Thus we add a little fudge factor to make the standard deviation slightly larger. By dividing by a slightly smaller number, we make the standard deviation to account for this. You should thus divide by N-1 rather than N in your computations of the standard deviation (but not the mean), since you are computing the sample deviation. To matter which value you use, make sure that you make your choice (and reasoning) clear in your paper. Here is another reasoning which I stole from a website: (http://helios.bto.ed.ac.uk/bto/statistics/tress3.html) Why do we use n-1 and not n? You should just accept this as standard and necessary practice! However, it has a reason, based on the fact that we almost always use the sample variance to obtain an estimate of the population variance (a population being all the measurements or events of the same type that could ever be found). Put in simple terms, the population variance is derived from the sample mean and from the deviation (d) of each measurement from the sample mean. But if we lacked any one of these measurements (the mean or a single d value) we could calculate it from the other information. So, with n measurements (data points) only n-1 of them are free to vary when we know the mean - we could calculate the missing one. "n-1" is therefore the number of degrees of freedom of our data. 4) Perform a t-test to determine if your two groups are statistically different from each other for each feature A) Compute the value of t. E(X1P) = Expected value of feature 1 in the positive examples E(X1N) = Expected value of feature 1 in the negative examples S1P = Standard deviation of feature 1 in positive examples S1N = Standard deviation of feature 1 in negative examples V1P = S1P2 = Variance of feature 1 in positive examples V1N = S1N2 = Variance of feature 1 in negative examples t = (E(X1N) - E(X1P)) / ( V1N + V1P) However… the formula for t above assumes that you have an equal number of positive and negative sample sizes (which you probably do not). Thus, instead, for the bottom half you should use a more complicated equation: Np = number of positive examples NN = number of negative examples Bottom value = ([( Np – 1) V1P + ( NN – 1) V1N] [(1/NN) + (1/NP)]/[NN + NP – 2]) t = (E(X1P) - E(X1P)) / bottom value Our samples are dependent, so really the formula is more complicated than this, but we shall stick with this formula for simplicity. B) Compute the probability that the two groups are significantly different. (Actually, standard practice is that we create a null hypothesis that the two groups are the same, and compute that probability and then invert it.) Compute your degrees of freedom = total number of examples – 2. Go to: http://helios.bto.ed.ac.uk/bto/statistics/table1.html#Student's%20t%20test Search for the line associated with your degrees of freedom, then scroll to the right to find the closest value that is below your t-value. This is the probability that the two groups are from the same population. You will state this as the following: “With 98 degrees of freedom and a t-value of 3.41, we get a p-value of less than .01, implying that our two groups are significantly different.” There is no such thing as “almost” significantly different. If you notice the values that I chose above were very “close” to the next value, but I said nothing about that. Compute the p-value for each of your features. I expect that your fundamental features will be significantly different, but not your other features. (Note at this point I switch to using f = feature value.) 5) Create a 1-feature exemplar-based linear-classifier for each of your features. The exemplar is the ‘ideal’ example of the group. This is calculated to be the mean of each group (positive and negative group). Each example will be classified into the group that it is closest two. a) Put your linear classifier into a linear equation in the form of y < af + b. Such that if y is less than zero, it is classified into the negative group, and if y is greater than zero it is classified into the positive group. Because you classifier is linear and has only one value, your classifier will look like y < f – c, where c is the average between the two means of the two groups. You have just built your first perceptual-based linear classifier! Congratulations! b) Reclassify each of your examples according to your exemplar-based classifier. Report the following: True positives: the number of positive examples classified correctly True negatives: the number of negative examples classified correctly False positives: the number of negative examples classified incorrectly as positives False negatives: the number of positive examples classified incorrectly as negatives Do this for each of your features. 6) Create a 1-feature standard-deviation-aware linear-classifier for each of your features. This classifier classifies based on, Z, the number of standard deviations a value is away from the exemplar, rather than a flat distance. ZP = (f – E(FP)) / SP ZN = (f – E(FN)) / SN Compute the value of Zp and ZN, and classify each shape according to the closest value. a) Put your linear classifier in terms of y < af + b, as you were instructed to above. This should be elementary math for each of you. You have just built a second perceptual-based linear classifier! Congratulations! b) Reclassify each of your examples according to your standard-deviation-aware linear classifier. Report the number of true positives, true negatives, false positives, and false negatives. Do this for each of your features. 7) Create a 1-feature perceptron linear classifier for each of your features. A perceptron works using a gradient-based descent method. Start with the cutoff value computed from part 6. Slowly move the cutoff value up and down trying to improve the total accuracy (true positives + true negatives / total number of examples). Stop when you have the best possible cutoff. a) Put your linear classifier in the form of y = af + b as above. You have just built a third perceptual-based linear classifier! Congratulations! b) Reclassify each of your examples according to your perceptron linear classifier. Report the number of true positives, true negatives, false positives, and false negatives. Do this for each of your features. 8) Combine 2 or more of your features to create a multi-feature non-weighted exemplar-based classifier. Compute the distance from each of your features in each group. E.g.: abs(F1 – E(F1)) + abs( F2 – E(F2)) + abs(F3 – E(F3)). Note that this classifier is not linear (because of the absolute value). a) Put your linear classifier into an equation of the form y < f(F1, F2, F3) b) Reclassify each of your examples according to your multi-feature non-weighted classifier. Report the number of true positives, true negatives, false positives, and false negatives. Do this for each of your features. 9) Combine 2 or more of your features to create a multi-feature non-weighted exemplar-based classifier. For each group (positive and negative), compute a covariance matrix. a) Compute a covariance matrix for positive examples: Looking only at the positive examples, compute: Covariance of feature i and feature j for the positive examples = For each example e, compute (fi – E(Fi))*(fj – E(fj)), sum all of the values for each example (we should also divide by N-1, but we won’t to save work later). b) Compute a similar matrix for the negative examples. For simplicity, we will assume that all of the classes have the same variance (which we already saw was not true in your standard-deviation-aware classifier above, but for simplicity, we shall assume it.) c) Then compute the common covariance matrix: degrees of freedom = number of positive examples + number of negative examples - 2. CCF(i,j) = Common Covariance of feature i and feature j = (covariance of features i and j for positive examples + covariance of features i and j for negative examples) / degrees of freedom d) For each group (positive and negative example groups), compute the weight for each feature: for example, to compute the weight of the first feature in the positive example group: WP1 = CCF(1,1) * E(F1P) + CCF(1,2) * E(F2P) + CCF(1,3) * E(F3P) + … WP2 = CCF(2,1) * E(F1P) + CCF(2,2) * E(F2P) + CCF(2,3) * E(F3P) + … e) Compute the initial weight for each group (positive and negative. Wp0 = -(WP1 * E(F1P) + WP2 * E(F2P) + WP3 * E(F3P) + …)/2 WN0 = -(WN1 * E(F1N) + WN2 * E(F2N) + WN3 * E(F3N) + …)/2 f) Create classifier values: VP = Wp0 + WP1 * f1P + WP2 * f2P + WP3 * f3P + … VN = WN0 + WN1 * f1N + WN2 * f2N + WN3 * f3N + … Shapes are classified to the group with the largest V value. g) Put your linear classifier in the form of y = af + b as above. You have just built a multi-featured perceptual-based linear classifier! Congratulations! h) Reclassify each of your examples according to your multi-feature linear classifier. Report the number of true positives, true negatives, false positives, and false negatives. 10) Graph your features. Note where your classifiers have divided your data. Is there a better division?