Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A Look at Data Mining Presented by: Charles Hollingsworth Flavia Peynado Ritch Overton DSc8020, Group Presentation, July 31, 2002 What is Data Mining? It may be described as the process of extracting previously unidentified, valid, and actionable information from large databases and then using the information to make crucial business decisions. Why the need for data mining? Business environment is constantly changing. Customer Behavior Patterns Market Saturation New niche markets Increased commoditization Time to market Shorter product life cycles Increased competition and business risks Drivers The Customer Products Competition Operations/Data Assets. Enablers Data flood Growth of data warehousing New IT solutions New research in machine learning Process overview contd. 1. 2. 3. 4. 5. 6. 7. Business Understanding Data understanding Data Preparation Data Transformation Data Mining Analysis of results Assimilation of results Effort needed at each stage of data mining 60 50 40 30 20 10 Effort 0 ti a c i ntif e Id on o of bje e c ti v pa e r P s io rat no ing ... n i e M g ta ed l a w D no K nd a ults s re a ta D f of s i lys a An Visualization Goal is to provide a summary and overview of a dataset Promotes Understanding: Deconstructive process Promotes Trust: Constructive process Narrows the gap between human and computer during data analysis Types of Visualization Tools Histograms Bar Charts Time-Series Plots Decision Trees Scatter plots Coxcomb Plots Pie Charts Stereograms Line Plots Mosley’s X-ray’s Histogram Graphically illustrates how many observations fall in various categories Histogram for Diam eter 100 80 60 40 20 Category >0 .5 45 <= 0. 45 5 .4 55 -. 46 5 .4 65 -. 47 5 .4 75 -. 48 5 .4 85 -. 49 5 .4 95 -. 50 5 .5 05 -. 51 5 .5 15 -. 52 5 .5 25 -. 53 5 .5 35 -. 54 5 0 Bar Chart Categories are placed on the vertical axis, instead of the horizontal axis in a histogram Scatter Plot Graphical representation of the relationship between two variables. Scatter Plot 25 Salary 20 15 Salary 10 5 0 0 50 100 Domestic Gross 150 200 Pie Chart Radii are used to divide a circle into wedges. The resulting angles represent the values of the wedges. Spring 2000 Salary Survey <$30,000 $30,000 to $39,999 $40,000 to $49,999 $50,000 to $59,999 $60,000 to $69,999 More than $70,000 No Answer Line Plot Connects consecutive data points to enhance visualization Time-Series Plot: Playfair’s •Helpful in forecasting future values •Time variable is placed on the horizontal axis •Makes patterns in data more apparent •The area between two time-series curves was emphasized to show the difference between them, representing the balance of trade. Decision Trees Conventions for Decision Trees: 1. Composed of nodes (points in time) and branches (possible decisions). 2. Squares represent decision nodes, circles represent probability nodes, triangles represent end nodes. 3. Probabilities are listed on probability branches. 4. Monetary values are listed on the branches where they occur. 5. Decision maker has no control over probability branches. Decision Trees Coxcomb Plot In 1858, Florence Nightingale constructed graphs of her own design, which she called “Coxcombs". The radii in a Coxcomb vary as opposed to the angle of the wedge in a pie chart. Stereogram Luigi Perozzo, from the Annali di Statistica, 1880 The population of Sweden from 1750-1875 by age groups Mosley’s X-ray’s Caused Henry Mosley to discover that the atomic number is more than a serial number; that it has some physical basis. Moseley proposed that the atomic number was the number of electrons in the atom of the specific element. Other Visualization Tools Doughnut Area Chart Box Plot Radar Algorithms Predictive Regression Classification Descriptive Parallel Formulation of Classification Association Rule Discovery Sequential Pattern Discovery Analysis Clustering Applying Relevance to managers Decreasing Costs Valuing Appropriately Effective Implementation Conclusion Converging Developments Data compilation Processing power Maturing Algorithms Visualization Accessible Resources