Download computational intelligence and visualisation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data (Star Trek) wikipedia , lookup

Minimax wikipedia , lookup

Gene expression programming wikipedia , lookup

Time series wikipedia , lookup

Genetic algorithm wikipedia , lookup

Pattern recognition wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Transcript
BIRKBECK
(University of London)
MSc/MRes EXAMINATION FOR INTERNAL STUDENTS
MSc IN ADVANCED INFORMATION SYSTEMS
MSc IN WEB INFORMATION MANAGEMENT
MRes IN COMPUTER SCIENCE
SCHOOL OF COMPUTER SCIENCE AND INFORMATION
SYSTEMS
COMPUTATIONAL INTELLIGENCE AND VISUALISATION
OFFICIAL MODULE NUMBER: COIY022P
DATE OF EXAMINATION: Friday 2nd June 2006
DURATION OF PAPER: 14.30-16.30
There are six questions on this paper. Candidates should attempt any
FOUR of them. Calculators are permitted.
COIY022P
Page 1 of 7
© Birkbeck College 2006
Question 1.
1D distributions
1.1. Consider feature x whose range is between 1 and 10. When the range of x is divided in 9
bins (in this case, intervals of the lengths one: [1,2), [2,3),…, [9,10]), the x frequencies in
the corresponding bins are: 10, 20, 10, 20, 30, 20, 40, 20, 30. Please answer these
questions:
1.1.1
1.1.2
1.1.3
1.1.4
1.1.5
How many observations of x are available? (1 mark)
What can be said about the value of the median of x? (3 marks)
Provide the minimum and maximum estimates of the average of x. (5 marks)
What can be said of 20% quantiles of x? (4 marks)
What is the distribution of x when the number of bins is 3? What is the
qualitative variance (Gini coefficient) for this distribution? (7 marks)
1.2. Given a triangular fuzzy set defined by the triple (1, 5, 6), draw a graph of its
membership function. (1 mark)
1.3. Given three triangular fuzzy sets defined by triples (0,1,2), (0, 2, 3), and (1, 3, 4),
determine the corresponding central triangular fuzzy set. (4 marks)
Answer:
1.1.1
1.1.2
There are 200 observations.
The median lies between 100-th and 101-th values in a sorted order, that is, in the 6th bin, that is, between 6 and 7.
1.1.3 The minimum estimate of the mean is computed with the minimal values in bins:
(1*10+2*20+3*10+4*20+5*30+6*20+7*40+8*20+9*30)/200=5.7
The maximum estimate is calculated using the same formula with all bin values increased by
1, which should lead to 5.7+1=6.7.
1.1.4 20% of 200 is 40. That means that the 20% quantile on the left end of x is 4, while
that on the right end must be in the 8-th bin, that is, between 8 and 9.
1.1.5 The three-bin distribution will be 40, 70, 90 or, in the relative frequencies, 0.2, 0.35,
.45, which leads to the Gini index equal to 1-0.2^2-0.35^2-0.45^2=0.635.
1.2
Graph of the triangular fuzzy set (1,5,6) has the following shape:
1
1.3
5
6
The central triangular fuzzy set is defined by the average values as (1/3, 2,3).
Question 2.
Multivariate data.
Consider a data table for 6 students and 2 features, as follows:
Student
____
1
2
3
COIY022P
!
!
!
!
!
Mark Occupation
_____________
60
IT
80
IT
80
IT
Page 2 of 7
© Birkbeck College 2006
4
5
6
!
!
!
60
40
40
AN
AN
AN
2.1. Data standardisation: Pre-process the data into a quantitative format and standardise it
using the averages and ranges. (5 marks)
2.2. Euclidean distances squared: Compute the between-entity distance matrix and draw an
edge-weighted graph whose vertices correspond to entities and edges to distances. (5 marks)
2.3. Compute the data scatter and determine contributions of features to it. (7 marks)
2.4. After data have been quantified, you must have different columns (features) for IT and
AN occupations.
Can you determine the correlation coefficient between these two features? (4 marks)
What is the inner product of these two features? (4 marks)
Answer:
2.1
The pre-processed and standardised data :
Mark
IT
AN
1
0
0.5
-0.5
2
0.5
0.5
-0.5
3
0.5
0.5
-0.5
4
0
-0.5
0.5
5
-0.5
-0.5
0.5
6
-0.5
-0.5
0.5
because the means of the original table are (60, 0.5, 0.5) and the ranges (40, 1, 1).
2.2
The Euclidean squared distance between, for example, entities 1 and 2 is d(1,2)=(00.5)^2+(0.5-0.5)^2+(-0.5-0.5)^2=0.25+0+0=0.25.
The distance graph can be presented as follows:
0.25
0.25
1
2
2.25
2.25
2.25
4
0
2
2.25
3
3
3
3
0.25
5
0
6
0.25
2.3
The data scatter is the sum of all entries in 2.1 squared, that is, 16*.25=4. The first
feature contributes to it 4*.25=1 and the other to, 6*.25=1.5 each. These can be
expressed as 25%, 37.5%, and 37.5%.
COIY022P
Page 3 of 7
© Birkbeck College 2006
2.4
Features IT and AN in 2.1 sum up to zero, which means they are linearly related so
that their correlation coefficient is –1. Their inner product is 0.
Question 3.
Neural networks.
3.1. What is an artificial neuron? (5 marks)
3.2. Explain the concept of perceptron and its relation to the gradient algorithm. (12 marks)
3.3. What are the main steps of the back-propagation algorithm for a neural network with one
hidden layer? (8 marks)
Answer:
3.1
3.2
An artificial neuron is a system implementing a mapping of a vector of its inputs
x=(xi) into a value f(w1*x1+w2*x2+…+wM*xM  w0) where w1,…,wM are
(wiring) weights, w0 is the bias, and f is the neuron activation function such as sign
or sigmoid s(x)=1/(1+exp(-x)).
The perceptron is an artificial neuron that implements the following algorithm for
learn weights to recognise a pattern, that is, to minimise the error of prediction of
target values u that are equal to either 1 or –1 and are associated with the input feature
vector x over a number of instances of known pairs (x,u):
0. Initialise weights w randomly or to zero.
1. For each training instance (xi,ui)
a. compute ůi = sign(<w,xi>)
b. if ůi ui, update weights w according to equation
w(new) = w(old) + (ui- ůi)xi
where , a real between 0 and 1, is the so-called learning rate.
2. Stop at convergence.
The gradient optimisation (the steepest ascent/descent, or hill-climbing) of a
function f(x) of a multidimensional variable works as this: given an initial state x0, perform a
sequence of iterations of finding a new x location. Each of the iterations updates the old xvalue as follows:
x(new) =x(old) ± *grad(f(x(old))
where grad(f(x)) is the vector of partial derivatives of f with respect to the components of x. It
is known from the calculus, that the vector grad(f(x)) shows the steepest rise of f at the point
x. Thus + is used for maximisation of f(x), and – for minimisation.
The  value controls the length of the change and should be small (to guarantee not
over jumping the slope) , but not too small (to guarantee changes when grad(f(x(old))
becomes too small; indeed grad(f(x(old))=0 in the optimum point).
It can be proven that the partial derivative of the quadratic error criterion with respect to wt,
in the case when only one incoming entity (xi,ui) is considered, is equal to –2(ui- ûi) xit,
which is similar to the perceptron learning rule. Thus, the perceptron is similar to the gradient
optimisation except that the continuous ûi is changed in it for the discrete ůi =sign(ûi).
3.3
The back-propagation algorithm is an implementation of the gradient optimisation
method in the framework of a multilayer neural network. It starts with random
weights and runs a pre-specified number of epochs (or, until convergence) by
processing entities (x, u) in a random order. Given an entity (xi, ui), first, it is feed-
COIY022P
Page 4 of 7
© Birkbeck College 2006
forwarded through the net to produce a computed output value u’ and the error e=uiu’. Then this error is back-propagated along the net topology to compute the weights
gradient, which is used then to update the weights.
Question 4.
MST and Single linkage clustering.
4.1. Find a maximum spanning tree in the following similarity graph using Prim’s algorithm,
stating the order in which the edges are added to the tree. (17 marks)
6
A
B
10
4
10
4
11
5
7
D
C
8
12
E
2
F
8
3
5
G
H
10
4.2. What is the total length of the tree? In what sense is the algorithm “greedy”? (3 marks)
4.3. Find a three-cluster single linkage partition by cutting the MST found above. (5 marks)
Answer:
4.1. A possible answer when starting from A. The maximum link from A is 1either AC or AD
(weight 10), of which we select C. The maximum link from A and C is AD (weight 10). The
maximum link from A, C, D to the rest is DB (weight 11). The next maximum links are DG
(8), GH (10), HE (8) and EF (12). This leads to the following MST (edges highlighted):
A
10
B
10
11
12
C
D
E
8
F
8
10
G
H
4.2
The total length is 69. The algorithm is greedy because at each step it considers
acquisition of only one entity, in a best possible way.
4.3
The MST must be cut at two shortest links, that are DG (8) and EH (8), thus leading
to the following three single link clusters: ABCD, GH, and F.
COIY022P
Page 5 of 7
© Birkbeck College 2006
Question 5.
K-Means clustering.
Consider a data table of 7 entities (1, 2, …, 7) and 2 features (F1, F2):
Entity
1
2
3
4
5
6
7
F1
25
15
18
10
22
25
25
F2
0
2
1
2
1
0
1
5.1. Standardise the data with the feature averages and ranges. (4 marks)
5.2. Set K=2 and initial seeds of two clusters so that they should be as far from each other as
possible. Assign entities to the seeds with the Minimum distance rule. (8 marks)
5.3. Calculate centroids of the found clusters; compare them with the initial seeds. (4 marks)
5.4. Is there any chance that the found clusters are final in the K-means process? (3 marks)
5.5. Take one of the clusters found in 5.2 and determine the relative feature contributions to
the cluster. Comment on the results. (6 marks)
Answers:
5.1.
The means are 20 and 1; the ranges, 15 and 2. Subtracting the means and dividing by
the ranges, one obtains the standardised data as
0.33
-0.33
-0.13
-0.67
0.13
0.33
0.33
-0.5
0.5
0
0.5
0
-0.5
0
5.2
Entities (rows) 1 and 4 are farthest away from each other, with the distance d(1,4)=2.
By taking them as initial seeds, the Minimum distance rule assigns entities 3,5,6,7 to
seed 1, and entity 2 to seed 4. This produces clusters 13567 and 24.
5.3
Centroid of cluster 13567 is (0.2, -0.2) (the seed was (0.33, -0.5)), centroid of cluster
24 is (-.5, .5) (the seed was (-.67, 0.5)).
5.4
The found clusters are indeed final, because applying the minimum distance rule to
the entities with centroids in 5.3, leads to the same clusters as in 5.2.
5.5
The relative feature contributions are proportional to their centroid values squared;
the centroids in 5.3 lead to the same relative contribution weights for both features in
both clusters. This means that the features have the same degree of variation within
the clusters.
COIY022P
Page 6 of 7
© Birkbeck College 2006
Question 6.
Nature inspired algorithms.
6.1. Explain the structure of a genetic algorithm (GA). (5 marks)
6.2. Explain main steps of GA for K-Means clustering in the setting of cluster label strings.
(10 marks)
6.1. Explain main steps of the evolutionary algorithm for K-Means clustering. (10 marks)
Answer:
6.1
A genetic algorithm is defined by a population comprising a number of structured
entities, called chromosomes, that evolve imitating the following biological mechanisms:
1. Selection
2. Cross-over
3. Mutation
These mechanisms apply to carry on the population from one iteration to the next one. The
initial population is selected, typically, randomly. The evolution stops when the population’s
fitness doesn’t change anymore or when a pre-specified threshold to the number of iterations
is reached.
6.2
A partition S = {S1, … , SK} of the entity set is represented by a “chromosome”
which is the string of cluster labels assigned to the entities in the order i=1,…, N. If, for
instance, N=8, and the entities are e1, e2, e3, e4, e5, e6, e7, e8, then the string 12333112
represents partition S with three classes, S1={e1, e6, e7}, S2={e2, e8}, and S3={e3, e4, e5},
which can be easily seen from the diagram
e1 e2 e3 e4 e5 e6 e7 e8
1 2 3 3 3 1 1 2
The main steps are: (1) Initial setting (Randomly generate strings s1,..,sP of K integers 1 ,…,
K and compute the values of K-Means criterion for each); (2) Selection (Randomly select
mating pairs); (3) Cross-over (For each of the mating pairs, generate a random number r
between 0 and 1. If r is smaller than a pre-specified probability p (typically, p is taken about
0.7-0.8), then perform a crossover; otherwise the mates themselves are considered the result);
(4) Mutation (Random alter a character in each chromosome); (5) Elitist survival (Store the
best fitting chromosome and put the record chromosome instead of the worst one into the
population); (6) Halt (typically, a limit on the number of iterations. If this doesn’t hold, go to
1; otherwise, halt).
6.3
In an evolutionary K-Means algorithm, a chromosome is represented by the set of K
centroids c1, c2, ck, which can be considered a string of K*V real (“float”) numbers. In
contrast to the GA representation, the length of the string here does not depend on the number
of entities that can be of advantage when the number of entities is massive. Furthermore, each
centroid in the string is analogous to a gene in the chromosome. Computations are performed
similarly to those in GA with the following steps: (1) Initial setting (Randomly generate
strings of K centroids and compute the values of K-Means criterion for each); (2) Selection
(Randomly select mating pairs); (3) Cross-over (For each of the mating pairs, generate a
random number r between 0 and 1. If r is smaller than a pre-specified probability p (typically,
p is taken about 0.7-0.8), then perform a crossover; otherwise the mates themselves are
considered the result); (4) Mutation (Add small normally distributed noise to each
chromosome); (5) Elitist survival (Store the best fitting chromosome and put the record
chromosome instead of the worst one into the population); (6) Halt (typically, a limit on the
number of iterations. If this doesn’t hold, go to 1; otherwise, halt).
COIY022P
Page 7 of 7
© Birkbeck College 2006