Download AssociationRule

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Association Rule
Dr. Jieh-Shan George YEH
[email protected]
http://cran.r-project.org/web/packages/arules/index.html
ARULES: MINING ASSOCIATION
RULES AND FREQUENT ITEMSETS
Association Rule
• Association rules are rules presenting
association or correlation between itemsets.
• An association rule is in the form of A => B,
where A and B are two disjoint itemsets,
referred to respectively as the lhs (left-hand
side) and rhs (right-hand side) of the rule.
• The three most widely-used measures for
selecting interesting rules are support,
confidence and lift.
Association Rule
• support is the percentage of cases in the data
that contains both A and B
• confidence is the percentage of cases
containing A that also contain B
• lift is the ratio of confidence to the percentage
of cases containing B.
Example: The Titanic Dataset
• The Titanic dataset in the datasets package is
a 4-dimensional table with summarized
information on the fate of passengers on the
Titanic according to social class, sex, age and
survival.
• titanic.raw.rdata
– http://www.rdatamining.com/data/titanic.raw.rda
ta?attredirects=0&d=1
load("titanic.raw.rdata")
str(Titanic)
table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ...
- attr(*, "dimnames")=List of 4
..$ Class : chr [1:4] "1st" "2nd" "3rd" "Crew"
..$ Sex : chr [1:2] "Male" "Female"
..$ Age : chr [1:2] "Child" "Adult"
..$ Survived: chr [1:2] "No" "Yes"
df <- as.data.frame(Titanic)
head(df)
Mining Associations with Apriori
• Description
• Mine frequent itemsets, association rules or association
hyperedges using the Apriori algorithm. The Apriori algorithm
employs level-wise search for frequent itemsets. The
implementation of Apriori used includes some improvements
(e.g., a prefix tree and item sorting).
• Usage
apriori(data, parameter = NULL, appearance = NULL,
control = NULL)
Mining Associations with Apriori
• Arguments
• data
– object of class transactions or any data structure which can be coerced into
transactions (e.g., a binary matrix or data.frame).
• parameter
– object of class APparameter or named list. The default behavior is to mine
rules with support 0.1, confidence 0.8, and maxlen 10.
• appearance
– object of class APappearance or named list. With this argument item
appearance can be restricted. By default all items can appear unrestricted.
• control
– object of class APcontrol or named list. Controls the performance of the
mining algorithm (item sorting, etc.)
library(arules)
# find association rules with default settings
rules.all <- apriori(titanic.raw)
rules.all
inspect(rules.all)
rhs containing "Survived" only
# rules with rhs containing "Survived" only
rules <- apriori(titanic.raw, control = list(verbose=F),
parameter = list(minlen=2, supp=0.005,
conf=0.8),
appearance = list(rhs=c("Survived=No",
"Survived=Yes"),
default="lhs"))
quality(rules) <- round(quality(rules), digits=3)
rules.sorted <- sort(rules, by="lift")
inspect(rules.sorted)
find redundant rules
# find redundant rules
subset.matrix <- is.subset(rules.sorted, rules.sorted)
subset.matrix[lower.tri(subset.matrix, diag=T)] <NA
redundant <- colSums(subset.matrix, na.rm=T) >= 1
which(redundant)
# remove redundant rules
rules.pruned <- rules.sorted[!redundant]
inspect(rules.pruned)
######
rules <- apriori(titanic.raw,
parameter = list(minlen=3, supp=0.002, conf=0.2),
appearance = list(rhs=c("Survived=Yes"),
lhs=c("Class=1st", "Class=2nd", "Class=3rd",
"Age=Child", "Age=Adult"),
default="none"),
control = list(verbose=F))
rules.sorted <- sort(rules, by="confidence")
inspect(rules.sorted)
http://cran.r-project.org/web/packages/arulesViz/index.html
ARULESVIZ: VISUALIZING ASSOCIATION
RULES AND FREQUENT ITEMSETS
###Visualizing Association Rules
library(arulesViz)
plot(rules.all)
plot(rules.all, method="grouped")
plot(rules.all, method="graph")
plot(rules.all, method="graph", control=list(type="items"))
plot(rules.all, method="paracoord",
control=list(reorder=TRUE))
plot(rules.all)
plot(rules.all, method="grouped")
plot(rules.all, method="graph")
plot(rules.all, method="graph",
control=list(type="items"))
plot(rules.all, method="paracoord",
control=list(reorder=TRUE))
Other Packages
• arulesSequences: Mining frequent sequences
– http://cran.rproject.org/web/packages/arulesSequences/ind
ex.html
• arulesNBMiner: Mining NB-Frequent
Itemsets and NB-Precise Rules
– http://cran.rproject.org/web/packages/arulesNBMiner/index.
html
Related documents