Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad 5/22/2017 University of Toronto 1 Agenda Explosion of data Introduction to data mining Examples of data mining in science and engineering Challenges and opportunities 5/22/2017 University of Toronto 2 Explosion of Data Data in the world doubles every 20 months! NASA’s Earth Orbiting System: 46 megabytes of data per second 4,000,000,000,000 bytes a day FBI fingerprints image library: 200,000,000,000,000 bytes In-line image analysis for particle detection: 1 megabyte in one second 5/22/2017 University of Toronto 3 Explosion of Data (cont.) 5/22/2017 University of Toronto 4 Explosion of Data (cont.) 5/22/2017 University of Toronto 5 Explosion of Data (cont.) 5/22/2017 University of Toronto 6 Explosion of Data (cont.) 5/22/2017 University of Toronto 7 What we need? Fast, accurate, and scalable data analysis techniques to extract useful knowledge: The answer is Data Mining. 5/22/2017 University of Toronto 8 What is Data Mining? “Data Mining is the exploration and analysis of large or small quantities of data in order to discover meaningful patterns, trends and rules.” Data 5/22/2017 Data Mining University of Toronto Knowledge 9 Data Analysis AI, Machine Learning Statistics Data Mining Database Data Warehouse OLAP 5/22/2017 University of Toronto 10 Data Mining Data Analysis Statistics 5/22/2017 Database Machine Learning Data Warehouse University of Toronto OLAP 11 Database 5/22/2017 Text Files Relational Database Multidimensional Database Entities File Table Cube Attributes Row and Col Record, Field, Index Dimension, Level, Measurement Methods Read, Write Select, Insert, Update, Delete Drill down, Drill up, Drill through Language - SQL MDX University of Toronto 12 Data Analysis Classification Regression Clustering Association Sequence Analysis 5/22/2017 University of Toronto 13 Data Analysis Numeric X1 W1 Numeric Regression age, income, … Categorical Y1 X2 W2 Model (0,1) Y2 Categorical Classification gender, occupation, … (good, bad) Input Variables or Attributes 5/22/2017 Linear Models or Decision Trees University of Toronto Output Variables or Targets 14 Data Analysis (cont.) Clustering Association Income 1, chips, coke, chocolate 2, gum, chips 3, chips, coke 4, … Age Probability (chips, coke) ? Sequence Analysis …ATCTTTAAGGGACTAAAATGCCATAAAAATCCATGGGAGAGACCCAAAAAA… Xt-1 5/22/2017 University of Toronto T Xt 15 Data Mining in Research Life Cycle Questions Needs Report Library Search Data Analysis Modeling Database Research Data Experiment 5/22/2017 University of Toronto 16 Data Mining – Modeling Steps 1.Problem Definition 2.Data Preparation 3.Exploration 4.Modeling 5.Evaluation 6.Deployment 5/22/2017 University of Toronto 17 Agenda Explosion of data Introduction to data mining Examples of data mining in science and engineering Challenges and opportunities 5/22/2017 University of Toronto 18 Examples of data mining in science & engineering 1. Data mining in Biomedical Engineering “Robotic Arm Control Using Data Mining Techniques” 2. Data mining in Chemical Engineering “Data Mining for In-line Image Monitoring of Extrusion Processing” 5/22/2017 University of Toronto 19 1. Problem Definition “Control a robotic arm by means of EMG signals from biceps and triceps muscles.” Muscle Contraction Biceps Triceps Supination H L H L Flexion H L Extension L H Pronation 5/22/2017 Supination Pronation University of Toronto Flexion Extension 20 2. Data Preparation The dataset includes 80 records. There are two input variables; biceps signal and triceps signal. One output variable, with four possible values; Supination, Pronation, Flexion and Extension. 5/22/2017 University of Toronto 21 3. Exploration Scatter Plot Triceps Record# Flexion 5/22/2017 Extension Supination Pronation University of Toronto 22 3. Exploration (cont.) Scatter Plot Biceps Record# Flexion 5/22/2017 Extension Supination Pronation University of Toronto 23 5. Modeling Classification OneR Decision Tree Naïve Bayesian K-Nearest Neighbors Neural Networks Linear Discriminant Analysis Support Vector Machines … 5/22/2017 University of Toronto 24 6. Model Deployment A neural network model was successfully implemented inside the robotic arm. 5/22/2017 University of Toronto 25 Examples of data mining in science & engineering 1. Data mining in Biomedical Engineering “Robotic Arm Control Using Data Mining Techniques” 2. Data mining in Chemical Engineering “Data Mining for In-line Image Monitoring of Extrusion Processing” 5/22/2017 University of Toronto 26 Plastics Extrusion Plastic pellets Plastic melt 5/22/2017 University of Toronto 27 Film Extrusion Defect due to particle contaminant Extruder Plastic Film 5/22/2017 University of Toronto 28 In-Line Monitoring Transition Piece Window Ports 5/22/2017 University of Toronto 29 In-Line Monitoring Optical Assembly Light Light Source Extruder and Interface Imaging Computer 5/22/2017 University of Toronto 30 Melt Without Contaminant Particles (WO) 5/22/2017 University of Toronto 31 Melt With Contaminant Particles (WP) 5/22/2017 University of Toronto 32 1. Problem Definition Classify images into those with particles (WP) and those without particles (WO). WO 5/22/2017 WP University of Toronto 33 2. Data Preparation 2000 Images 54 Input variables all numeric One output variables with two possible values -With Particle -Without Particle 5/22/2017 University of Toronto 34 2. Data Preparation (cont.) Pre-processed images to remove noise Dataset 1 with sharp images: 1350 images including 1257 without particles and 91 with particles Dataset 2 with sharp and blurry images: 2000 images including 1909 without particles and blurry particles and 91 with particles 54 Input variables, all numeric One output variable, with two possible values (WP and WO) 5/22/2017 University of Toronto 35 3. Exploration Demo! 5/22/2017 University of Toronto 36 4. Modeling Classification: • OneR • Decision Tree • 3-Nearest Neighbors • Naïve Bayesian 5/22/2017 University of Toronto 37 5. Evaluation 10 -fold cross-validation Dataset Attrib. Class One-R C4.5 3.N.N Bayes Sharp Images 54 2 99.9 99.8 99.8 95.8 Sharp + Blurry Images 54 2 98.5 97.8 97.8 93.3 Sharp + Blurry Images 54 3 87 87 84 79 If pixel_density_max < 142 then WP 5/22/2017 University of Toronto 38 6. Deploy model A Visual Basic program will be developed to implement the model. 5/22/2017 University of Toronto 39 Agenda Explosion of data Introduction to data mining Examples of data mining in science & engineering Challenges and opportunities 5/22/2017 University of Toronto 40 Challenges and Opportunities Data mining is a ‘top ten’ emerging technology. High pay job! in the financial, medical and engineering. Faster, more accurate and more scalable techniques. Incremental, on-line and real-time learning algorithms. Parallel and distributed data processing techniques. 5/22/2017 University of Toronto 41 Data mining is an exciting and challenging field with the ability to solve many complex scientific and business problems. You can be part of the solution! 5/22/2017 University of Toronto 42