Download Data Mining for Image Features

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Principal component analysis wikipedia , lookup

Transcript
An Image Database Retrieval
Scheme Based Upon
Multivariate Analysis
and Data Mining
Presented by C.C. Chang
Dept. of Computer Science and Information
Engineering, National Chung Cheng University
Outline
Introduction
Image Retrieval
The Proposed Scheme Based Upon PCA and
Data Mining
Image Feature Extraction
Data Mining for Image Features
Illustration
Future Works
Conclusions
2
Introduction
Query image
Image database
3
Introduction
Image retrieval system
Text-based
Content-based
Keywords:
setting sun, mountain,
ocean, purple,…
Text-based retrieval
Query by keywords
4
Introduction
Content-based image retrieval
Images are indexed by their content, color,
shape, texture, features and so on.
Feature extraction methods
Histogram
Neural network (NN)
Support vector machines (SVM)
Genetic algorithm (GA)
Principal component analysis (PCA)
…
5
The proposed scheme based upon PCA
and data mining
6
Principal component analysis (PCA)
Given a set of points Y1, Y2, …, and YM where every Yi is
characterized by a set of variables X1, X2, …, and XN. We want
to find a direction D = (d1, d2, …, dN), where
variance of points projected onto D is maximized.
such that the
Principal component analysis (PCA)
Algorithm of PCA
Start by coding the variables Y = (Y1, Y2, …YN) to
have zero means and unit variances.
Calculate the covariance matrix C of the samples.
Find the eigenvalues λ1, λ2, …, λN, for C, where
λi λi+1, i = 1, 2, …, N-1. Let D1, D2, … DN
denote the corresponding eigenvectors.
D1 is the first principal component direction, D2 is
the second principal component direction, … , DN
is the Nth principal component direction .
Principal component analysis (PCA)
Let A be a n*n covariance matrix.
is an eigenvalue of A, and x is an
eigenvector associated with the eigenvalue
x = Ix, where I is an n*n identity matrix
The characteristic polynomial of the matrix A
9
Principal component analysis (PCA)
For example, Let A be a 2*2 matrix.
1
A  I  
3
2
1
 

4
0
0 1


1 3
2  


4  0
0
 
1 2
A

3
4


2 
1  


3
4




10
PCA
For example,
40 samples with 2
variables, X1 and X2
Covariance matrix
604 .400
V
561 .648
561 .648 

592 .519 
λ1 =1160.139
λ2 =36.780
Principal component analysis (PCA)
D1 = [0.710 0.703]
D2 = [-0.703 0.710]
12
Image Feature Extraction -PCA
Gray level value
M=
13
Image Feature Extraction -PCA
10*10 pixels
Each block with 4 pixels
Number of blocks
(NB) is 25
14
Image Feature Extraction -PCA
M=
 B1  20 8 15 6 
B  

3
2   5 17 20

A

   




 

B 25  33 14 21 20 
Image Feature Extraction -PCA
(1) Compute the covariance matrix of an image
C1
C4
 B1  20 8 15 6 
B  

5
17
20
3

A 2
   




 

B 25  33 14 21 20 
CM =
1
Cov (C s , C t ) 
N
N
 (b
k 1
Var (Ck) = Cov(Ck,Ck)
sk
C s )( b t k  C t )
Image Feature Extraction -PCA
(1) Compute the covariance matrix of an image
 B1  20 8 15 6 
B  

3
2   5 17 20

A

   




 

B 25  33 14 21 20 
CM =
Image Feature Extraction -PCA
(2) Determine eigenvalues and eigenvectors
 4850 .45
4176 .54
(CM  I)  
5340 .23

3990 .09
4176 .54 5340 .23 3990 .09 
5914 .63 5808 .51 5341 .36 
5808 .51 7476 .71 6132 .48 

5341 .36 6132 .48 6635 .93 
=21860, =1743, =877.335, and =393.73,
1
0
 
0

0
0 0 0
1 0 0
0
0 1 0

0 0 1
Eigenvalues
Image Feature Extraction -PCA
(2) Determine eigenvalues and eigenvectors
CM =
=21860, =1743, =877.335, and =393.73,
Eigenvectors
Image Feature Extraction -PCA
(3) Form the principal components (PCs)
M=
0.419 
0.488 

V1  
 0.57 


 0.511 
23.9 = 20 * 0.419 + 8 * 0.488 + 15 * 0.57 + 6 * 0.511
Image Feature Extraction -PCA
(4) Normalize the projected values
1, S j    

T j  2,     S j    
3, S    
j

Image Feature Extraction -PCA
(4) Normalize the projected values
Principal component analysis (PCA)
PCA is a popular multivariate analysis
technique, which can be used to extract features
from images and to filter candidate images from
image database.
Nerveless, the number of candidate images
offered by PCA is usually very large for a huge
image database.
Therefore, data mining technique is applied to
speed up the retrieving speed and increase the
accuracy rate.
23
Data Mining – Association Rules
I = {A, B, C, D}
Candidate 1-itemsets
Frequent 1-itemsets
Minimum Support = 3
24
Data Mining – Association Rules
I = {A, B, C, D}
Candidate 2-itemsets
Frequent 2-itemsets
Minimum Support = 3
25
Data Mining – Association Rules
Minimum Confidence = 100%
Sup (A  C) 4

Frequent 2-itemsets Conf (A  C) 
Sup (A)
4
Sup (A  C) 4
Conf (C  A) 

Sup (C)
5
Association Rules
AC
DC
Sup (C  D) 3
Conf (C  D) 

Sup (C)
5
Sup (C  D) 3
Conf (D  C) 

Sup (D)
3
26
Data Mining for Image Features
27
Data Mining for Image Features
28
Data Mining for Image Features
NPIDB H 
Database for Normalization Projected Image(NPIDB)
In Horizontal Direction
Data Mining for Image Features
NPIDB H 
Minimum Support = 3
Candidate 1-itemsets
Frequent 1-itemsets
Candidate 2-itemsets
Data Mining for Image Features
NPIDB H 
Frequent 2-itemsets
Minimum Confidence = 75%
Sup (1  1)
4
Conf (1  1) 

Sup (1)
5
Sup (1  2)
3
Conf (1  2) 

Sup (1)
5
Sup (1  3)
4

Sup (1)
5
Sup (1  3)
4
Conf (3  1) 

Sup (3)
5
Conf (1  3) 
Data Mining for Image Features
Association Rules in Horizontal Direction
1  1, 1  3, 3  1
32
Data Mining for Image Features
NPIDB 
V
Database for Normalization Projected Image(NPIDB)
In Vertical Direction
Data Mining for Image Features
Association Rules in Vertical Direction
1  1,
1 3
34
Data Mining for Image Features
NPIDB 
D
Database for Normalization Projected Image(NPIDB)
In Diagonal Direction
PCA and data mining
36
Illustration
450 full-color images
300 blocks for each image
4*4 pixels for a block
37
Illustration
A query image Q
The set of eigenvalues
of Q is {0, 2, 4, 6, 8}
Illustration
Rules of Q are
File name is “SW003.JPG.”
Future works - VQ and PCA
Vector Quantization (VQ)
An image is separated into
a set of input vectors
Each input vector is
matched with a codeword
of the codebook
40
Vector Quantization (VQ)
Definition of vector quantization (VQ):
Q: Rk  Y
, where Y is a finite subset of Rk.
VQ is composed of the following three parts:
Codebook generation process,
Encoding process, and
Decoding process.
41
Vector Quantization (VQ)
Image
Index table
Vector Quantization (VQ)
Codebook generation
0
1
.
.
.
.
.
.
N-1
N
Training Images
Training set
Separating the image to vectors
43
Vector Quantization (VQ)
Codebook generation
0
1
.
.
.
.
.
.
N-1
N
Training set
0
1
.
.
.
254
255
Initial codebook
Codebook initiation
44
Vector Quantization (VQ)
0
1
.
.
.
Index sets
(1, 2, 5, 9, 45, …)
(101, 179, 201, …)
(8, 27, 38, 19, 200, …)
0
1
.
.
.
254
255
.
.
.
N-1
N
Training set
(23, 0, 67, 198, 224, …)
Codebook Ci
Compute mean values
Replace the old vectors
0
1
.
.
254
.
255
New Codebook Ci+1
Training using iteration algorithm
Example
Codebook
To encode an input vector, for example, v = (150,145,121,130)
(1) Compute the distance between v with all vectors in codebook
d(v, cw1) = 114.2
d(v, cw2) = 188.3
d(v, cw3) = 112.3
d(v, cw4) = 124.6
d(v, cw5) = 122.3
d(v, cw6) = 235.1
d(v, cw7) = 152.5
d(v, cw8) = 63.2
(2) So, we choose 8 to replace the input vector v.
The Encoding algorithm using PCA
Codebook
The covariance matrix
500 .12
486 .52

 468 .15

513 .96
468 .15 513 .96 
474 .47 455 .52 500 .06 
455 .52 455 .95 491 .66 

500 .06 491 .66 548 .94 
486 .52
47
The Encoding algorithm using PCA
From the covariance matrix, we compute
D1: (0.5038, 0.4904, 0.4788, 0.5259), λ1=19552,
D2: (-0.4915, -0.5126, 0.4293, 0.5580), λ2=151,
D3: (-0.0294, -0.0292, 0.7658, -0.6418), λ3=86 and
D4: (0.7098, -0.7042, -0.0108, -0.0134), λ4=6.
D1: (0.5038, 0.4904, 0.4788, 0.5259) is a coordinate
D1 reserves 98.77% information of the variance of
the codewords.
48
The Encoding algorithm using PCA
The new sorted codebook and the corresponding
projected value of codewords
Codebook
The sorted codewords The projected values
D1: (0.5038, 0.4904, 0.4788, 0.5259)
The Encoding algorithm using PCA
Encode an input vector v = (150, 145, 121, 130)
•Transform v to α=D1*v
α= (0.5038, 0.4904, 0.4788, 0.5259) *
(150, 145, 121, 130)T= 272.98
•321.93 is the closet value to 272.98
• For 321.93, d(v, cw’5) = 63.2
• For 162.60, d(v, cw’4) = 122.3
• For 382.84, d(v, cw’6) = 114.2
•So, we choose cw’5 to replace the v.
VQ and PCA for image retrieval
Association Rules:
8  5, 4  3, 2  3, 1  1
51
VQ and PCA for image retrieval
Association Rules:
8  5, 4  3, 2  3, 1  1
85
432 .11  321 .93
400  290
~ 460  350
52
Query image
Projected image
400  290
~
460  350
53
Conclusions
An efficient image retrieval scheme based
upon multivariate analysis technique and a
data mining technique.
PCA – extracting image features
Association rules - matching the candidate
images.
VQ and PCA for similar image retrieval.
54