Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
An Image Database Retrieval Scheme Based Upon Multivariate Analysis and Data Mining Presented by C.C. Chang Dept. of Computer Science and Information Engineering, National Chung Cheng University Outline Introduction Image Retrieval The Proposed Scheme Based Upon PCA and Data Mining Image Feature Extraction Data Mining for Image Features Illustration Future Works Conclusions 2 Introduction Query image Image database 3 Introduction Image retrieval system Text-based Content-based Keywords: setting sun, mountain, ocean, purple,… Text-based retrieval Query by keywords 4 Introduction Content-based image retrieval Images are indexed by their content, color, shape, texture, features and so on. Feature extraction methods Histogram Neural network (NN) Support vector machines (SVM) Genetic algorithm (GA) Principal component analysis (PCA) … 5 The proposed scheme based upon PCA and data mining 6 Principal component analysis (PCA) Given a set of points Y1, Y2, …, and YM where every Yi is characterized by a set of variables X1, X2, …, and XN. We want to find a direction D = (d1, d2, …, dN), where variance of points projected onto D is maximized. such that the Principal component analysis (PCA) Algorithm of PCA Start by coding the variables Y = (Y1, Y2, …YN) to have zero means and unit variances. Calculate the covariance matrix C of the samples. Find the eigenvalues λ1, λ2, …, λN, for C, where λi λi+1, i = 1, 2, …, N-1. Let D1, D2, … DN denote the corresponding eigenvectors. D1 is the first principal component direction, D2 is the second principal component direction, … , DN is the Nth principal component direction . Principal component analysis (PCA) Let A be a n*n covariance matrix. is an eigenvalue of A, and x is an eigenvector associated with the eigenvalue x = Ix, where I is an n*n identity matrix The characteristic polynomial of the matrix A 9 Principal component analysis (PCA) For example, Let A be a 2*2 matrix. 1 A I 3 2 1 4 0 0 1 1 3 2 4 0 0 1 2 A 3 4 2 1 3 4 10 PCA For example, 40 samples with 2 variables, X1 and X2 Covariance matrix 604 .400 V 561 .648 561 .648 592 .519 λ1 =1160.139 λ2 =36.780 Principal component analysis (PCA) D1 = [0.710 0.703] D2 = [-0.703 0.710] 12 Image Feature Extraction -PCA Gray level value M= 13 Image Feature Extraction -PCA 10*10 pixels Each block with 4 pixels Number of blocks (NB) is 25 14 Image Feature Extraction -PCA M= B1 20 8 15 6 B 3 2 5 17 20 A B 25 33 14 21 20 Image Feature Extraction -PCA (1) Compute the covariance matrix of an image C1 C4 B1 20 8 15 6 B 5 17 20 3 A 2 B 25 33 14 21 20 CM = 1 Cov (C s , C t ) N N (b k 1 Var (Ck) = Cov(Ck,Ck) sk C s )( b t k C t ) Image Feature Extraction -PCA (1) Compute the covariance matrix of an image B1 20 8 15 6 B 3 2 5 17 20 A B 25 33 14 21 20 CM = Image Feature Extraction -PCA (2) Determine eigenvalues and eigenvectors 4850 .45 4176 .54 (CM I) 5340 .23 3990 .09 4176 .54 5340 .23 3990 .09 5914 .63 5808 .51 5341 .36 5808 .51 7476 .71 6132 .48 5341 .36 6132 .48 6635 .93 =21860, =1743, =877.335, and =393.73, 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 Eigenvalues Image Feature Extraction -PCA (2) Determine eigenvalues and eigenvectors CM = =21860, =1743, =877.335, and =393.73, Eigenvectors Image Feature Extraction -PCA (3) Form the principal components (PCs) M= 0.419 0.488 V1 0.57 0.511 23.9 = 20 * 0.419 + 8 * 0.488 + 15 * 0.57 + 6 * 0.511 Image Feature Extraction -PCA (4) Normalize the projected values 1, S j T j 2, S j 3, S j Image Feature Extraction -PCA (4) Normalize the projected values Principal component analysis (PCA) PCA is a popular multivariate analysis technique, which can be used to extract features from images and to filter candidate images from image database. Nerveless, the number of candidate images offered by PCA is usually very large for a huge image database. Therefore, data mining technique is applied to speed up the retrieving speed and increase the accuracy rate. 23 Data Mining – Association Rules I = {A, B, C, D} Candidate 1-itemsets Frequent 1-itemsets Minimum Support = 3 24 Data Mining – Association Rules I = {A, B, C, D} Candidate 2-itemsets Frequent 2-itemsets Minimum Support = 3 25 Data Mining – Association Rules Minimum Confidence = 100% Sup (A C) 4 Frequent 2-itemsets Conf (A C) Sup (A) 4 Sup (A C) 4 Conf (C A) Sup (C) 5 Association Rules AC DC Sup (C D) 3 Conf (C D) Sup (C) 5 Sup (C D) 3 Conf (D C) Sup (D) 3 26 Data Mining for Image Features 27 Data Mining for Image Features 28 Data Mining for Image Features NPIDB H Database for Normalization Projected Image(NPIDB) In Horizontal Direction Data Mining for Image Features NPIDB H Minimum Support = 3 Candidate 1-itemsets Frequent 1-itemsets Candidate 2-itemsets Data Mining for Image Features NPIDB H Frequent 2-itemsets Minimum Confidence = 75% Sup (1 1) 4 Conf (1 1) Sup (1) 5 Sup (1 2) 3 Conf (1 2) Sup (1) 5 Sup (1 3) 4 Sup (1) 5 Sup (1 3) 4 Conf (3 1) Sup (3) 5 Conf (1 3) Data Mining for Image Features Association Rules in Horizontal Direction 1 1, 1 3, 3 1 32 Data Mining for Image Features NPIDB V Database for Normalization Projected Image(NPIDB) In Vertical Direction Data Mining for Image Features Association Rules in Vertical Direction 1 1, 1 3 34 Data Mining for Image Features NPIDB D Database for Normalization Projected Image(NPIDB) In Diagonal Direction PCA and data mining 36 Illustration 450 full-color images 300 blocks for each image 4*4 pixels for a block 37 Illustration A query image Q The set of eigenvalues of Q is {0, 2, 4, 6, 8} Illustration Rules of Q are File name is “SW003.JPG.” Future works - VQ and PCA Vector Quantization (VQ) An image is separated into a set of input vectors Each input vector is matched with a codeword of the codebook 40 Vector Quantization (VQ) Definition of vector quantization (VQ): Q: Rk Y , where Y is a finite subset of Rk. VQ is composed of the following three parts: Codebook generation process, Encoding process, and Decoding process. 41 Vector Quantization (VQ) Image Index table Vector Quantization (VQ) Codebook generation 0 1 . . . . . . N-1 N Training Images Training set Separating the image to vectors 43 Vector Quantization (VQ) Codebook generation 0 1 . . . . . . N-1 N Training set 0 1 . . . 254 255 Initial codebook Codebook initiation 44 Vector Quantization (VQ) 0 1 . . . Index sets (1, 2, 5, 9, 45, …) (101, 179, 201, …) (8, 27, 38, 19, 200, …) 0 1 . . . 254 255 . . . N-1 N Training set (23, 0, 67, 198, 224, …) Codebook Ci Compute mean values Replace the old vectors 0 1 . . 254 . 255 New Codebook Ci+1 Training using iteration algorithm Example Codebook To encode an input vector, for example, v = (150,145,121,130) (1) Compute the distance between v with all vectors in codebook d(v, cw1) = 114.2 d(v, cw2) = 188.3 d(v, cw3) = 112.3 d(v, cw4) = 124.6 d(v, cw5) = 122.3 d(v, cw6) = 235.1 d(v, cw7) = 152.5 d(v, cw8) = 63.2 (2) So, we choose 8 to replace the input vector v. The Encoding algorithm using PCA Codebook The covariance matrix 500 .12 486 .52 468 .15 513 .96 468 .15 513 .96 474 .47 455 .52 500 .06 455 .52 455 .95 491 .66 500 .06 491 .66 548 .94 486 .52 47 The Encoding algorithm using PCA From the covariance matrix, we compute D1: (0.5038, 0.4904, 0.4788, 0.5259), λ1=19552, D2: (-0.4915, -0.5126, 0.4293, 0.5580), λ2=151, D3: (-0.0294, -0.0292, 0.7658, -0.6418), λ3=86 and D4: (0.7098, -0.7042, -0.0108, -0.0134), λ4=6. D1: (0.5038, 0.4904, 0.4788, 0.5259) is a coordinate D1 reserves 98.77% information of the variance of the codewords. 48 The Encoding algorithm using PCA The new sorted codebook and the corresponding projected value of codewords Codebook The sorted codewords The projected values D1: (0.5038, 0.4904, 0.4788, 0.5259) The Encoding algorithm using PCA Encode an input vector v = (150, 145, 121, 130) •Transform v to α=D1*v α= (0.5038, 0.4904, 0.4788, 0.5259) * (150, 145, 121, 130)T= 272.98 •321.93 is the closet value to 272.98 • For 321.93, d(v, cw’5) = 63.2 • For 162.60, d(v, cw’4) = 122.3 • For 382.84, d(v, cw’6) = 114.2 •So, we choose cw’5 to replace the v. VQ and PCA for image retrieval Association Rules: 8 5, 4 3, 2 3, 1 1 51 VQ and PCA for image retrieval Association Rules: 8 5, 4 3, 2 3, 1 1 85 432 .11 321 .93 400 290 ~ 460 350 52 Query image Projected image 400 290 ~ 460 350 53 Conclusions An efficient image retrieval scheme based upon multivariate analysis technique and a data mining technique. PCA – extracting image features Association rules - matching the candidate images. VQ and PCA for similar image retrieval. 54