Download AssociationRule

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Association Rule
Dr. Jieh-Shan George YEH
[email protected]
http://cran.r-project.org/web/packages/arules/index.html
ARULES: MINING ASSOCIATION
RULES AND FREQUENT ITEMSETS
Association Rule
• Association rules are rules presenting
association or correlation between itemsets.
• An association rule is in the form of A => B,
where A and B are two disjoint itemsets,
referred to respectively as the lhs (left-hand
side) and rhs (right-hand side) of the rule.
• The three most widely-used measures for
selecting interesting rules are support,
confidence and lift.
Association Rule
• support is the percentage of cases in the data
that contains both A and B
• confidence is the percentage of cases
containing A that also contain B
• lift is the ratio of confidence to the percentage
of cases containing B.
Example: The Titanic Dataset
• The Titanic dataset in the datasets package is
a 4-dimensional table with summarized
information on the fate of passengers on the
Titanic according to social class, sex, age and
survival.
• titanic.raw.rdata
– http://www.rdatamining.com/data/titanic.raw.rda
ta?attredirects=0&d=1
load("titanic.raw.rdata")
str(Titanic)
table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ...
- attr(*, "dimnames")=List of 4
..$ Class : chr [1:4] "1st" "2nd" "3rd" "Crew"
..$ Sex : chr [1:2] "Male" "Female"
..$ Age : chr [1:2] "Child" "Adult"
..$ Survived: chr [1:2] "No" "Yes"
df <- as.data.frame(Titanic)
head(df)
Mining Associations with Apriori
• Description
• Mine frequent itemsets, association rules or association
hyperedges using the Apriori algorithm. The Apriori algorithm
employs level-wise search for frequent itemsets. The
implementation of Apriori used includes some improvements
(e.g., a prefix tree and item sorting).
• Usage
apriori(data, parameter = NULL, appearance = NULL,
control = NULL)
Mining Associations with Apriori
• Arguments
• data
– object of class transactions or any data structure which can be coerced into
transactions (e.g., a binary matrix or data.frame).
• parameter
– object of class APparameter or named list. The default behavior is to mine
rules with support 0.1, confidence 0.8, and maxlen 10.
• appearance
– object of class APappearance or named list. With this argument item
appearance can be restricted. By default all items can appear unrestricted.
• control
– object of class APcontrol or named list. Controls the performance of the
mining algorithm (item sorting, etc.)
library(arules)
# find association rules with default settings
rules.all <- apriori(titanic.raw)
rules.all
inspect(rules.all)
rhs containing "Survived" only
# rules with rhs containing "Survived" only
rules <- apriori(titanic.raw, control = list(verbose=F),
parameter = list(minlen=2, supp=0.005,
conf=0.8),
appearance = list(rhs=c("Survived=No",
"Survived=Yes"),
default="lhs"))
quality(rules) <- round(quality(rules), digits=3)
rules.sorted <- sort(rules, by="lift")
inspect(rules.sorted)
find redundant rules
# find redundant rules
subset.matrix <- is.subset(rules.sorted, rules.sorted)
subset.matrix[lower.tri(subset.matrix, diag=T)] <NA
redundant <- colSums(subset.matrix, na.rm=T) >= 1
which(redundant)
# remove redundant rules
rules.pruned <- rules.sorted[!redundant]
inspect(rules.pruned)
######
rules <- apriori(titanic.raw,
parameter = list(minlen=3, supp=0.002, conf=0.2),
appearance = list(rhs=c("Survived=Yes"),
lhs=c("Class=1st", "Class=2nd", "Class=3rd",
"Age=Child", "Age=Adult"),
default="none"),
control = list(verbose=F))
rules.sorted <- sort(rules, by="confidence")
inspect(rules.sorted)
http://cran.r-project.org/web/packages/arulesViz/index.html
ARULESVIZ: VISUALIZING ASSOCIATION
RULES AND FREQUENT ITEMSETS
###Visualizing Association Rules
library(arulesViz)
plot(rules.all)
plot(rules.all, method="grouped")
plot(rules.all, method="graph")
plot(rules.all, method="graph", control=list(type="items"))
plot(rules.all, method="paracoord",
control=list(reorder=TRUE))
plot(rules.all)
plot(rules.all, method="grouped")
plot(rules.all, method="graph")
plot(rules.all, method="graph",
control=list(type="items"))
plot(rules.all, method="paracoord",
control=list(reorder=TRUE))
Other Packages
• arulesSequences: Mining frequent sequences
– http://cran.rproject.org/web/packages/arulesSequences/ind
ex.html
• arulesNBMiner: Mining NB-Frequent
Itemsets and NB-Precise Rules
– http://cran.rproject.org/web/packages/arulesNBMiner/index.
html
Related documents