Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining Strategies Scales of Measurement Stevens, S.S. (1946). On the theory of scales of measurement. Science, 103, 677-680 Four Scales Categorical (nominal) Ordinal (only order matters) Interval (difference between two vars is meaningful) Ratio (when variable is 0.0 there is none of that data; Kelvin is but C and F are not) What to Know about the Scales The measurement principle involved for each scale Examples of the measurement scales Permissible arithmetic operations for each scale Categorical Scale Data The values of the scale have no numeric meaning Examples Gender Ethnicity Marital Status Hair Color Operations Counting (only) Ordinal Scale Data The categories can be ordered But the intervals between adjacent scale values are indeterminate Examples Movie ratings (0, 1 or 2 thumbs up) U.S.D.A. beef (good, choice, prime) The rank order of anything Operations Counting Greater than or less than operations Interval Scale Data Intervals between adjacent scale values are equal Examples Degrees Fahrenheit Most personality measures IQ intelligence score Operations Counting Greater than or less than operations Addition and subtraction of scale values. Ratio Scale Data There is a rationale zero point for the scale An absolute zero Examples Degrees Kelvin Annual income in dollars Length, distance, size cm, kB, inches, km Operations All plus Multiplication and division of scale values. Variables Independent Dependent Input Output x f(x) = 3+ 2x2 f(x) Data Mining Strategies Unsupervised (No dependent variables used) Clustering Market Basket Analysis Information Visualization Supervised (At least one dependent variable used for training) Classification Estimation Prediction Clustering Cluster analysis divides data into groups (clusters) that are meaningful, useful or both Clusters capture the natural structure of the data Clustering allows us to think about the data at a new level of abstraction Cluster analysis is often the first step in a data mining project Cluster of Stars Water Clusters Cellular Clusters Cluster Analysis Uses information found in the data that describes objects and their relationships Goal: That objects within a group be similar to one another and different from objects in other groups The greater the similarity within groups and the greater the difference between groups, the better the clustering How Many Clusters? Three Clusters Identified Six Clusters Identified Types of Clustering Partitional clustering Heirarchical clustering Exclusive clustering Overlaping clustering Fuzzy clustering Complete clustering Partial clustering Partitional Clustering A division of a set of data into nonoverlaping clusters Each data point is in exactly one cluster Example of Partitional Clustering Heirarchical clustering Permit subclusters (nested clusters within clusters) Example of Hierarchical Clustering Exclusive clustering Each object is assigned to a single cluster Overlaping Clustering Non-exclusive A data point can belong to two or more clusters simultaneously Fuzzy Clustering Every data point belongs to every cluster with a membership weight. C1 Membership ranges from 0 (absolutely does not belong) to 1 (absolutely belongs) The sum of the membership weightsC1 75% for each point is 1 C2 25% C1 40% C2 60% C1 01% C2 99% C2 Complete Clustering Assigns every data point to a cluster No data point is left out of a cluster Partial Clustering Does not assign every data point to a cluster Some data points can not belong to any cluster Noise Outliers Uninteresting background Classify newspaper stories Many fall into Global warming Terrorism Some stories are unique Cable Tie just graduated from the CofC in CS Chris Starr: A centroid is the center of a cluster K-Means 1. Select K points as initial centroids 2. Repeat 1. Form K cluster by assigning each point to its closest centroid. 2. Recompute the centroid of each cluster. 3. Until centroids so not change The centroids are repositioned until stable in the K-means algorithm. Observe Your Environment Start looking for clusters around you Think about how the clusters are formed Are they hierarchical? Are they fuzzy clusters? Are they complete clusters?