* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide 1
Survey
Document related concepts
Transcript
DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski Agenda General description of the problem Functionality Data Mining aspects Algorithm and optimisation Data Base aspects General entities scheme 2 General Description Universal Tool Different kinds of objects (e.g. preprocessed photos, hospital patients data) Finding similar objects Decision problems 3 Functionality Independent system – user operated Using sets of data already provided or uploading new types Influence on the way data is processed Possible usage in bigger systems as a processing engine Additional module used as a helping tool in more complex systems 4 General Use Case 5 Data Mining General Ideas K-NN algorithm Description of a object Definition of a distance Brief explanations of the algorithm Optimization Problem of comparing large number of objects Optimized solution – using grouping idea 6 Definitions Objects 7 K-NN K – Nearest Neighbors Idea standing behind k-nn Aim - finding k-similar objects to the one we are analyzing and eventually assigning appropriate decision Method - calculating distance from analyzed object to the others in our database and finding the closest ones 8 K-NN Graphical representation 9 Definitions Distance Calculations in multidimensional space Coefficients Alfa wi – weights – underlining importance of particular attributes n – number of all the attributes n D(O1, O 2) wi * ai(o1) ai(o2) i i 1 10 Optimalisation The reason – cost of multidimensional distance computation for 1-all elements Solution – improved Knn Result – better efficiency because of reduced number of distance computations due to narrowed set of possibly similar objects 11 Step 1 - Group-oriented plane division 12 Step 2 – new Object appeares 13 Step 3 14 Step 4 15 Step 5 16 Grouping problem The problem – assigning object into appropriate groups according to chosen distance definition Solution – some clustering algorithm Brief example – k-means algorithm 17 DataBase – entities 18 DataBase General structure of database results from optimization issues Due to universal purpose of the system database may contain many different tables of objects Need of using system tables for defining experiments Group Member as a temporary table ? 19 Summary There is still a lot of work to do... 20