Download Data Mining - Helios Hud

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
AI Week 23
Machine Learning
Data Mining – Week 2
Lee McCluskey, room 2/07
Email [email protected]
http://scom.hud.ac.uk/scomtlm/cha2555/
Focus on one area: Data Mining
involves discovering patterns from large data bases or data warehouses
for different purposes. It is the science of extracting meaningful
information from (large) databases.
Applications - Market analysis and Retail, Decision support, Financial
analysis, Discovering environmental trends
Two Types of Learning: Data Mining can be supervised (“Learning from
Example”) or unsupervised (“Learning from Observation”)
Data Mining is often part of a larger process aimed at getting more out of
data warehouses and involves data clensing
data clensing: is the process of identifying and removing or correcting
corrupted record from a database. This makes the data consistent with
other similar data sets in the database. Eg the process may remove
invalid post codes, spurious extreme values (eg -999999.999).
Artform Research
Group
Association Rule Mining(ARM)
This is an “unsupervised learning activity” - briefly,
looking for strong associations between features
in data.
Definitions: A transactional database is a set of
“transactions” eg the details of individual sales.
A transaction can be though of as an “item-set”
where each item is an attribute-value
{height=6, temp = 20. weather = warm}
As a special case we could have nominal item sets
{bread, cheese, milk}
Artform Research
Group
Association Rule Mining(ARM):
Important Definitions
An association rule is an expression
X => Y
where X, Y are item-sets, and
The support of an association rule is defined as the
proportion of transactions in the database that contain
X U Y.
The confidence of an association rule is defined as the
probability that a transaction contains Y given that it
contains X, that is
= no of transactions containing (X U Y) / no of transactions
containing
X
Artform
Research
Group
Example
A trader deals in the following currencies in a series of 8 transactions…
1
Sterling
Yen Dollar
Euro
2
Dollar Euro
Rand
Sterling Ruble
3
Pesos Euro
Ruble
Rupee Yen
4
Rupee Sterling Ruble
Euro
Dollar
5
Sterling Dinars Rand
Yen
6
Pesos Kroner Sterling Dollar
7
Ruble Rupee Kroner Sterling Pesos
8
Dollar Euro
Sterling
What is the SUPPORT and CONFIDENCE of the following rules?
{Ruble } → {Rupee}
{Sterling, Euro} → {Ruble}
{Sterling, Euro} → {Ruble,,Pesos}
Find an association rule from the set of transactions that has
- at least 2 items in its antecedents,
- better support and better confidence than both rules above.
Artform Research
Group
Aims of ARM
Given a transactional database D, the association rule
problem is to find all rules that have supports and
confidences greater than certain user-specified thresholds,
denoted by minimum support (MinSupp) and minimum
confidence (MinConf), respectively.
The aim is the discovery of the most significant associations
between the items in a transactional data set. This process
involves primarily the discovery of so called frequent itemsets, i.e. item-sets that occurred in the transactional data
set above MinSupp and MinConf.
Artform Research
Group
Contract: Classification Rule Mining
The output of DM is a (set of) classification rule(s)
WHERE classes are known apriori (supervised
learning) and there is only one class on RHS.
Features => C(1)
….
Features => C(n)
Artform Research
Group
Classification Rule Mining
Size = medium, colour = green, shape = square => c1
Size = small, colour = red, shape = square => c1
Size = small, colour = blue, shape = circle => c1
Size = small, colour = green, shape = triangle => c2
Size = large, colour = white, shape = circle => c2
Aims is to find “hypotheses” that are
Characteristic – true of all members of a class
Discriminating – not true of ANY members of other classes
Artform Research
Group
Associative Classification
If we fuse ARM and CRM we get “Associative Classification” –
use the association technique, but learning about particular
items or item sets.
Associative Classification is a branch in data mining that
combines classification and association rule mining. In
other words, it utlises association rule discovery methods in
classification data sets.
Typically:
Find Association Rules using ARM
Sift out the “Class Association Rules” – ones that have the
class of interest on their Right Hand Sides
Artform Research
Group
Example in Road Traffic Control
Artform Research
Group
Example in Road Traffic Control
Artform Research
Group
Example in Road Traffic Control
Data ..
Numeric Data Record from individual CARS
(date, time, position, actual speed, expected
speed)
Textual Data of INCIDENTS
(date, time start, time cleared, position, severity,
road type, area, incident category, cause,
road-effect, traffic-effect, reporter ..)
Artform Research
Group
Example in Road Traffic Control
•
associations between variations in speeds with nearfuture incidents
•
effect of a particular type of incident (eg roadworks) on
average speeds on nearby trunk roads
•
looking for predictors in "heavy/slow traffic" incidents:
look for associations with speed variations or
accidents on roads downstream from the incident
position (hence causing the incident)
•
looking for associations between speeds around a
bypass and a later "heavy traffic" incident within the
town bypassed
•
extraction of the roads that have most impact to cause
congestion
•
formulation of rules that can predict conditions after a
period of road works or an incident (depending on
specific road, type of incident etc).
Artform Research
Group
Conclusions
Data Mining is a powerful set of techniques
to help discover hidden knowledge
It can be supervised or unsupervised.
ARM
CRM
AC
Are three important classes of technique
used in DM
Artform Research
Group
Related documents