Download The types of an attribute

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
The types of an attribute
A simple way to specify the type of an attribute is to identify the properties of numbers that
correspond to underlying properties of the attribute.

Properties of Attribute Values
The type of an attribute depends on which of the following properties it possesses:
– Distinctness: = ≠
– Order: < >
– Addition: + – Multiplication: * /
There are different types of attributes
– Nominal
Examples: ID numbers, eye color, zip codes
– Ordinal
Examples: rankings (e.g., taste of potato chips on a scale from 1-10), grades, height in {tall,
medium, short}
– Interval
Examples: calendar dates, temperatures in Celsius or Fahrenheit.
– Ratio
Examples: temperature in Kelvin, length, time, counts
3.4.2 Describing attributes by the number of values
� Discrete Attribute– Has only a finite or countably infinite set of values, examples: zip codes,
counts, or the set of words in a collection of documents, often represented as integer variables.
Binary attributes are a special case of discrete attributes
� Continuous Attribute– Has real numbers as attribute values, examples: temperature, height, or
weight. Practically, real values can only be measured and represented using a finite number of
digits. Continuous attributes are typically represented as floating-point variables.
� Asymmetric Attribute-only a non-zero attributes value which is different from other values.
Preliminary investigation of the data to better understand its specific characteristics, it can help to
answer some of the data mining questions
– To help in selecting pre-processing tools
– To help in selecting appropriate data mining algorithms
� Things to look at: Class balance, Dispersion of data attribute values, Skewness, outliers, missing
values, attributes that vary together, Visualization tools are important, Histograms, box plots,
scatter plots Many datasets have a discrete (binary) attribute class
� Data mining algorithms may give poor results due to class imbalance problem, Identify the
problem in an initial phase.
General characteristics of data sets:

Dimensionality: of a data set is the number of attributes that the objects in the data set
possess. Curse of dimensionality refers to analyzing high dimensional data.

Sparsity: data sets with asymmetric features like most attributes of an object with value 0;
in some cases it may be with value non-zero.

Resolution: it is possible to obtain different levels of resolution of the data.
Now there are varieties of data sets are there, let us discuss some of the following.
1. Record
– Data Matrix
– Document Data
– Transaction Data
2. Graph
World Wide Web
Molecular Structures
3. Ordered
– Spatial Data
– Temporal Data
– Sequential Data
-– Genetic
Sequence Data
Record Data
Data that consists of a collection of records, each of which consists of a fixed set of attributes
Transaction or market basket Data
A special type of record data, where each transaction (record) involves a set of items. For example,
consider a grocery store. The set of products purchased by a customer during one shopping trip
constitute a transaction, while the individual products that were purchased are the items.
Transaction data is a collection of sets of items, but it can be viewed as a set of records whose
fields are asymmetric attributes.
Transaction data can be represented as sparse data matrix: market basket representation
– Each record (line) represents a transaction
– Attributes are binary and asymmetric