* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide 1
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski Agenda    General description of the problem Functionality Data Mining aspects   Algorithm and optimisation Data Base aspects  General entities scheme 2 General Description     Universal Tool Different kinds of objects (e.g. preprocessed photos, hospital patients data) Finding similar objects Decision problems 3 Functionality  Independent system – user operated    Using sets of data already provided or uploading new types Influence on the way data is processed Possible usage in bigger systems as a processing engine  Additional module used as a helping tool in more complex systems 4 General Use Case 5 Data Mining  General Ideas    K-NN algorithm   Description of a object Definition of a distance Brief explanations of the algorithm Optimization   Problem of comparing large number of objects Optimized solution – using grouping idea 6 Definitions  Objects 7 K-NN  K – Nearest Neighbors    Idea standing behind k-nn Aim - finding k-similar objects to the one we are analyzing and eventually assigning appropriate decision Method - calculating distance from analyzed object to the others in our database and finding the closest ones 8 K-NN Graphical representation 9 Definitions  Distance   Calculations in multidimensional space Coefficients    Alfa wi – weights – underlining importance of particular attributes n – number of all the attributes n D(O1, O 2)   wi * ai(o1)  ai(o2) i i 1 10 Optimalisation    The reason – cost of multidimensional distance computation for 1-all elements Solution – improved Knn Result – better efficiency because of reduced number of distance computations due to narrowed set of possibly similar objects 11 Step 1 - Group-oriented plane division 12 Step 2 – new Object appeares 13 Step 3 14 Step 4 15 Step 5 16 Grouping problem    The problem – assigning object into appropriate groups according to chosen distance definition Solution – some clustering algorithm Brief example – k-means algorithm 17 DataBase – entities 18 DataBase     General structure of database results from optimization issues Due to universal purpose of the system database may contain many different tables of objects Need of using system tables for defining experiments Group Member as a temporary table ? 19 Summary There is still a lot of work to do...  20
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            