Download Instructions - TAMU Computer Science People Pages

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Receiver operating characteristic wikipedia , lookup

Transcript
0) Set up your data
You have a bunch of labeled examples. A label comes from one of two ways: 1) You
asked a person to draw a shape of (or not of) a certain type – that is a positive (or
negative) label for that type. 2) You asked a person to label a shape of a certain type. If
you asked one person to draw a shape and then two people to verify the shape, that
provides you with 3 labels for that shape.
At this point you have 2 options for setting up your data.
Option a) include all of your labels as separate examples (i.e., a drawn shape with two
verifiers as in the example above will be added to your list of data examples three times –
one for each label.)
Option b)
1) Identify Fundamental and Contextual Features:
Example, if you are choosing feature values to determine if two lines are parallel,
possible values, which may affect if people label two lines as parallel are:
Fundamental Features:
Angle between the two lines
Contextual Features:
Angle of line 1
Angle of line 2
If lines are close to horizontal
If lines are close to vertical
Ratio of line lengths
Length of longest line
Length of shortest line
Closest distance between the two lines
Furthest distance between the two lines
Distance between two closest endpoints
Distance between two center points
Screen size
Ratio of longest length to screen size
Contextual Features that you may not have access too:
Pressure (line thickness)
Accompanying words
Accompanying hand gestures
Drawing domain
Other lines on the page
You get the idea. There are many possible features, I have only listed a hand full, there
are many more.
Your job is to list as many as you possibly can.
2) Measure Feature Values
In an ideal world, you would compute the value for each of the features above for each of
your positive and negative examples. However, since this is a class project, you will
select all of your fundamental features, and at least one contextual feature. (You should
have at least two features, but ideally 3 or more features total.)
For each examples, compute the value of each of your examples.
E.g.,
Feature 1 (angle) = 23
Feature 2 (distance between midpoints) = 13
Feature 3 (length ratio) = 2.1
 (23, 13, 2.1) for example 1.
Do this for each example you have.
3) Compute Mean and Standard Deviation for Positive and Negative Examples
Group the positive and negative examples into two groups.
Group 1 = positive examples
Group 2 = negative examples
For each feature,
a) compute the mean value of the feature for each group. The mean = E(X) is the average
of all of the values. X = the actual value of a feature. E(X) is the expected value for that
feature, which is the mean for our purposes. E(X) =( x1 + x2 + x3 + … + xn) / N
b) compute the standard deviation of the feature for each group. The standard deviation is
E(X – E(X))2))
or
 = ((E(X2) – (E)X))2)
or
(((x1 - E(X))2 + (x2 - E(X))2 + (x3 - E(X))2 + … + (xn - E(X))2) / N).
or
variance = v = (E(X2) – (E)X))2
v
All of the above formulas are functionally equivalent. I wrote them in several different
ways to help with comprehension.
In English, to compute the value of the standard deviation, simply compute the distance
from each feature value to the mean, and square that distance. Then take the average of
all of the squares of the distances. That number is the variance. The square root of that is
the standard deviation.
The standard deviation represents far from the mean you can expect that values to stray.
Note, in actuality, we are not computing the true standard deviation because we do not
have access to every possible positive and negative example. Rather, we are computing
the standard deviation of our sample set, which we hope to be a representation of the
underlying population. Thus, we should really be dividing by (N-1), rather than N. The
idea behind that is this is that necessarily the standard deviation is closer to the sample
set, than to the actual population, and our sample set is going to have a slight bias. (Our
sample is not an unbiased estimator for the true standard deviation; it tends to
underestimate the population standard deviation.) Thus we add a little fudge factor to
make the standard deviation slightly larger. By dividing by a slightly smaller number, we
make the standard deviation to account for this. You should thus divide by N-1 rather
than N in your computations of the standard deviation (but not the mean), since you are
computing the sample deviation. To matter which value you use, make sure that you
make your choice (and reasoning) clear in your paper.
Here is another reasoning which I stole from a website:
(http://helios.bto.ed.ac.uk/bto/statistics/tress3.html)
Why do we use n-1 and not n?
You should just accept this as standard and necessary practice! However, it has a reason,
based on the fact that we almost always use the sample variance to obtain an estimate of
the population variance (a population being all the measurements or events of the same
type that could ever be found). Put in simple terms, the population variance is derived
from the sample mean and from the deviation (d) of each measurement from the sample
mean. But if we lacked any one of these measurements (the mean or a single d value) we
could calculate it from the other information. So, with n measurements (data points) only
n-1 of them are free to vary when we know the mean - we could calculate the missing
one. "n-1" is therefore the number of degrees of freedom of our data.
4) Perform a t-test to determine if your two groups are statistically different from
each other for each feature
A) Compute the value of t.
E(X1P) = Expected value of feature 1 in the positive examples
E(X1N) = Expected value of feature 1 in the negative examples
S1P = Standard deviation of feature 1 in positive examples
S1N = Standard deviation of feature 1 in negative examples
V1P = S1P2 = Variance of feature 1 in positive examples
V1N = S1N2 = Variance of feature 1 in negative examples
t = (E(X1N) - E(X1P)) / ( V1N + V1P)
However… the formula for t above assumes that you have an equal number of positive
and negative sample sizes (which you probably do not).
Thus, instead, for the bottom half you should use a more complicated equation:
Np = number of positive examples
NN = number of negative examples
Bottom value = ([( Np – 1) V1P + ( NN – 1) V1N] [(1/NN) + (1/NP)]/[NN + NP – 2])
t = (E(X1P) - E(X1P)) / bottom value
Our samples are dependent, so really the formula is more complicated than this, but we
shall stick with this formula for simplicity.
B) Compute the probability that the two groups are significantly different. (Actually,
standard practice is that we create a null hypothesis that the two groups are the same, and
compute that probability and then invert it.)
Compute your degrees of freedom = total number of examples – 2.
Go to: http://helios.bto.ed.ac.uk/bto/statistics/table1.html#Student's%20t%20test
Search for the line associated with your degrees of freedom, then scroll to the right to
find the closest value that is below your t-value. This is the probability that the two
groups are from the same population.
You will state this as the following: “With 98 degrees of freedom and a t-value of 3.41,
we get a p-value of less than .01, implying that our two groups are significantly
different.”
There is no such thing as “almost” significantly different. If you notice the values that I
chose above were very “close” to the next value, but I said nothing about that.
Compute the p-value for each of your features. I expect that your fundamental features
will be significantly different, but not your other features.
(Note at this point I switch to using f = feature value.)
5) Create a 1-feature exemplar-based linear-classifier for each of your features.
The exemplar is the ‘ideal’ example of the group. This is calculated to be the mean of
each group (positive and negative group).
Each example will be classified into the group that it is closest two.
a) Put your linear classifier into a linear equation in the form of y < af + b. Such that if y
is less than zero, it is classified into the negative group, and if y is greater than zero it is
classified into the positive group.
Because you classifier is linear and has only one value, your classifier will look like y < f
– c, where c is the average between the two means of the two groups.
You have just built your first perceptual-based linear classifier! Congratulations!
b) Reclassify each of your examples according to your exemplar-based classifier.
Report the following:
True positives: the number of positive examples classified correctly
True negatives: the number of negative examples classified correctly
False positives: the number of negative examples classified incorrectly as positives
False negatives: the number of positive examples classified incorrectly as negatives
Do this for each of your features.
6) Create a 1-feature standard-deviation-aware linear-classifier for each of your
features.
This classifier classifies based on, Z, the number of standard deviations a value is away
from the exemplar, rather than a flat distance.
ZP = (f – E(FP)) / SP
ZN = (f – E(FN)) / SN
Compute the value of Zp and ZN, and classify each shape according to the closest value.
a) Put your linear classifier in terms of y < af + b, as you were instructed to above. This
should be elementary math for each of you.
You have just built a second perceptual-based linear classifier! Congratulations!
b) Reclassify each of your examples according to your standard-deviation-aware linear
classifier. Report the number of true positives, true negatives, false positives, and false
negatives.
Do this for each of your features.
7) Create a 1-feature perceptron linear classifier for each of your features.
A perceptron works using a gradient-based descent method. Start with the cutoff value
computed from part 6. Slowly move the cutoff value up and down trying to improve the
total accuracy (true positives + true negatives / total number of examples). Stop when
you have the best possible cutoff.
a) Put your linear classifier in the form of y = af + b as above.
You have just built a third perceptual-based linear classifier! Congratulations!
b) Reclassify each of your examples according to your perceptron linear classifier.
Report the number of true positives, true negatives, false positives, and false negatives.
Do this for each of your features.
8) Combine 2 or more of your features to create a multi-feature non-weighted
exemplar-based classifier.
Compute the distance from each of your features in each group.
E.g.: abs(F1 – E(F1)) + abs( F2 – E(F2)) + abs(F3 – E(F3)).
Note that this classifier is not linear (because of the absolute value).
a) Put your linear classifier into an equation of the form y < f(F1, F2, F3)
b) Reclassify each of your examples according to your multi-feature non-weighted
classifier. Report the number of true positives, true negatives, false positives, and false
negatives.
Do this for each of your features.
9) Combine 2 or more of your features to create a multi-feature non-weighted
exemplar-based classifier.
For each group (positive and negative), compute a covariance matrix.
a) Compute a covariance matrix for positive examples:
Looking only at the positive examples, compute:
Covariance of feature i and feature j for the positive examples =
For each example e, compute (fi – E(Fi))*(fj – E(fj)), sum all of the values for each
example (we should also divide by N-1, but we won’t to save work later).
b) Compute a similar matrix for the negative examples.
For simplicity, we will assume that all of the classes have the same variance (which we
already saw was not true in your standard-deviation-aware classifier above, but for
simplicity, we shall assume it.)
c) Then compute the common covariance matrix:
degrees of freedom = number of positive examples + number of negative examples - 2.
CCF(i,j) = Common Covariance of feature i and feature j = (covariance of features i and j
for positive examples + covariance of features i and j for negative examples) / degrees of
freedom
d) For each group (positive and negative example groups), compute the weight for each
feature:
for example, to compute the weight of the first feature in the positive example group:
WP1 = CCF(1,1) * E(F1P) + CCF(1,2) * E(F2P) + CCF(1,3) * E(F3P) + …
WP2 = CCF(2,1) * E(F1P) + CCF(2,2) * E(F2P) + CCF(2,3) * E(F3P) + …
e) Compute the initial weight for each group (positive and negative.
Wp0 = -(WP1 * E(F1P) + WP2 * E(F2P) + WP3 * E(F3P) + …)/2
WN0 = -(WN1 * E(F1N) + WN2 * E(F2N) + WN3 * E(F3N) + …)/2
f) Create classifier values:
VP = Wp0 + WP1 * f1P + WP2 * f2P + WP3 * f3P + …
VN = WN0 + WN1 * f1N + WN2 * f2N + WN3 * f3N + …
Shapes are classified to the group with the largest V value.
g) Put your linear classifier in the form of y = af + b as above.
You have just built a multi-featured perceptual-based linear classifier! Congratulations!
h) Reclassify each of your examples according to your multi-feature linear classifier.
Report the number of true positives, true negatives, false positives, and false negatives.
10) Graph your features. Note where your classifiers have divided your data. Is
there a better division?