Download APPLICATION OF AN EXPERT SYSTEM FOR ASSESSMENT OF

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Lecture 14
Data mining and knowledge
discovery
Introduction, or what is data mining?
 Data warehouse and query tools
 Decision trees
 Case study: Profiling people with high
blood pressure
 Summary

Slides are based on Negnevitsky, Pearson Education, 2005
1
What is data mining?




Data is what we collect and store, and
knowledge is what helps us to make informed
decisions.
The extraction of knowledge from data is called
data mining.
Data mining can also be defined as the
exploration and analysis of large quantities of
data in order to discover meaningful patterns and
rules.
The ultimate goal of data mining is to discover
knowledge.
Slides are based on Negnevitsky, Pearson Education, 2005
2
Why data mining

The Explosive Growth of Data: from terabytes to petabytes
– Data collection and data availability
» Automated data collection tools, database systems, Web, computerized
society
– Major sources of abundant data
» Business: Web, e-commerce, transactions, stocks, …
» Science: Remote sensing, bioinformatics, scientific simulation, …
» Society and everyone: news, digital cameras, YouTube

knowledge!
Slides are based on Negnevitsky, Pearson Education, 2005
3
Why Not Traditional Data Analysis?

Tremendous amount of data
– Algorithms must be highly scalable to handle
such as tera-bytes of data

High-dimensionality of data
– Micro-array may have tens of thousands of
dimensions
Slides are based on Negnevitsky, Pearson Education, 2005
4

High complexity of data
– Data streams and sensor data
– Time-series data, temporal data, sequence data
– Structure data, graphs, social networks and multi-linked
data
– Heterogeneous databases and legacy databases
– Spatial, spatiotemporal, multimedia, text and Web data
– Software programs, scientific simulations

New and sophisticated applications
Slides are based on Negnevitsky, Pearson Education, 2005
5
Knowledge Discovery (KDD) Process
– Data mining—core of
knowledge discovery
process
Pattern Evaluation
Data Mining
Task-relevant Data
Data Warehouse
Selection
Data Cleaning
Data Integration
Databases
Slides are based on Negnevitsky, Pearson Education, 2005
6
KDD Process: Several Key Steps

Learning the application domain
– relevant prior knowledge and goals of application

Creating a target data set: data selection

Data cleaning and preprocessing: (may take 60% of
effort!)

Data reduction and transformation
– Find useful features, dimensionality/variable
reduction, invariant representation
Slides are based on Negnevitsky, Pearson Education, 2005
7
KDD Process: Several Key Steps

Choosing functions of data mining
– summarization, classification, regression,
association, clustering

Choosing the mining algorithm(s)

Data mining: search for patterns of interest

Pattern evaluation and knowledge presentation
– visualization, transformation, removing
redundant patterns, etc.

Use of discovered knowledge
Slides are based on Negnevitsky, Pearson Education, 2005
8
Data Mining: Confluence of Multiple
Disciplines
Database
Technology
Machine
Learning
Pattern
Recognition
Statistics
Data Mining
Algorithm
Slides are based on Negnevitsky, Pearson Education, 2005
Visualization
Other
Disciplines
9
Architecture: Typical Data Mining
System
Graphical User Interface
Pattern Evaluation
Data Mining Engine
Knowl
edgeBase
Database or Data
Warehouse Server
data cleaning, integration, and selection
Database
Data
World-Wide Other Info
Repositories
Warehouse
Web
Slides are based on Negnevitsky, Pearson Education, 2005
10
Data Mining Functionalities(1)

Frequent patterns, association, correlation vs. causality
– Diaper  Beer [0.5%, 75%] (Correlation or causality?)

Classification and prediction
– Construct models (functions) that describe and
distinguish classes or concepts for future prediction
» E.g., classify countries based on (climate), or classify cars
based on (gas mileage)
– Predict some unknown or missing numerical values
Slides are based on Negnevitsky, Pearson Education, 2005
11
Data Mining Functionalities(2)


Cluster analysis
– Class label is unknown: Group data to form new
classes, e.g., cluster houses to find distribution patterns
– Maximizing intra-class similarity & minimizing
interclass similarity
Outlier analysis
– Outlier: Data object that does not comply with the
general behavior of the data
– Noise or exception? Useful in fraud detection, rare
events analysis
Slides are based on Negnevitsky, Pearson Education, 2005
12
Data Mining Functionalities(3)


Trend and evolution analysis
– Trend and deviation: e.g., regression analysis
– Sequential pattern mining: e.g., digital camera
 large SD memory
– Periodicity analysis
– Similarity-based analysis
Other pattern-directed or statistical analyses
Slides are based on Negnevitsky, Pearson Education, 2005
13
Top-10 Most Popular DM Algorithms:
18 Identified Candidates (I)

Classification
– #1. C4.5: Quinlan, J. R. C4.5: Programs for Machine
Learning. Morgan Kaufmann., 1993.
– #2. CART: L. Breiman, J. Friedman, R. Olshen, and C. Stone.
Classification and Regression Trees. Wadsworth, 1984.
– #3. K Nearest Neighbours (kNN): Hastie, T. and Tibshirani,
R. 1996. Discriminant Adaptive Nearest Neighbor
Classification. TPAMI. 18(6)
– #4. Naive Bayes Hand, D.J., Yu, K., 2001. Idiot's Bayes: Not
So Stupid After All? Internat. Statist. Rev. 69, 385-398.
Slides are based on Negnevitsky, Pearson Education, 2005
14
(II)

Statistical Learning
– #5. SVM: Vapnik, V. N. 1995. The Nature of
Statistical Learning Theory. Springer-Verlag.
– #6. EM: McLachlan, G. and Peel, D. (2000). Finite
Mixture Models. J. Wiley, New York. Association
Analysis
– #7. Apriori: Rakesh Agrawal and Ramakrishnan
Srikant. Fast Algorithms for Mining Association
Rules. In VLDB '94.
– #8. FP-Tree: Han, J., Pei, J., and Yin, Y. 2000.
Mining frequent patterns without candidate
generation. In SIGMOD '00.
Slides are based on Negnevitsky, Pearson Education, 2005
15
(III)

Link Mining
– #9. PageRank: Brin, S. and Page, L. 1998. The anatomy
of a large-scale hypertextual Web search engine. In
WWW-7, 1998.
– #10. HITS: Kleinberg, J. M. 1998. Authoritative sources
in a hyperlinked environment. SODA, 1998.
Slides are based on Negnevitsky, Pearson Education, 2005
16
(IV)


Clustering
– #11. K-Means: MacQueen, J. B., Some methods for
classification and analysis of multivariate observations,
in Proc. 5th Berkeley Symp. Mathematical Statistics
and Probability, 1967.
– #12. BIRCH: Zhang, T., Ramakrishnan, R., and Livny,
M. 1996. BIRCH: an efficient data clustering method
for very large databases. In SIGMOD '96.
Bagging and Boosting
– #13. AdaBoost: Freund, Y. and Schapire, R. E. 1997. A
decision-theoretic generalization of on-line learning and
an application to boosting. J. Comput. Syst. Sci. 55, 1
(Aug. 1997), 119-139.
Slides are based on Negnevitsky, Pearson Education, 2005
17
(V)


Sequential Patterns
– #14. GSP: Srikant, R. and Agrawal, R. 1996. Mining Sequential
Patterns: Generalizations and Performance Improvements. In
Proceedings of the 5th International Conference on Extending
Database Technology, 1996.
– #15. PrefixSpan: J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q.
Chen, U. Dayal and M-C. Hsu. PrefixSpan: Mining Sequential
Patterns Efficiently by Prefix-Projected Pattern Growth. In ICDE
'01.
Integrated Mining
– #16. CBA: Liu, B., Hsu, W. and Ma, Y. M. Integrating
classification and association rule mining. KDD-98.
Slides are based on Negnevitsky, Pearson Education, 2005
18
(VI)


Rough Sets
– #17. Finding reduct: Zdzislaw Pawlak, Rough
Sets: Theoretical Aspects of Reasoning about
Data, Kluwer Academic Publishers, Norwell,
MA, 1992
Graph Mining
– #18. gSpan: Yan, X. and Han, J. 2002. gSpan:
Graph-Based Substructure Pattern Mining. In
ICDM '02.
Slides are based on Negnevitsky, Pearson Education, 2005
19
Top-10 Algorithm Finally Selected at ICDM’06

#1: C4.5 (61 votes)

#2: K-Means (60 votes)

#3: SVM (58 votes)

#4: Apriori (52 votes)

#5: EM (48 votes)

#6: PageRank (46 votes)

#7: AdaBoost (45 votes)

#7: kNN (45 votes)

#7: Naive Bayes (45 votes)

#10: CART (34 votes)
Slides are based on Negnevitsky, Pearson Education, 2005
20
Conferences and Journals on Data Mining

KDD Conferences
– ACM SIGKDD Int. Conf. on Knowledge Discovery in
Databases and Data Mining (KDD)
– SIAM Data Mining Conf. (SDM)
– (IEEE) Int. Conf. on Data Mining (ICDM)
– Conf. on Principles and practices of Knowledge Discovery
and Data Mining (PKDD)
– Pacific-Asia Conf. on Knowledge Discovery and Data
Mining (PAKDD)
Slides are based on Negnevitsky, Pearson Education, 2005
21

Other related conferences
–
–
–
–
–

ACM SIGMOD
VLDB
(IEEE) ICDE
WWW, SIGIR
ICML, CVPR, NIPS
Journals
– Data Mining and Knowledge Discovery (DAMI or
DMKD)
– IEEE Trans. On Knowledge and Data Eng. (TKDE)
– KDD Explorations
– ACM Trans. on KDD
Slides are based on Negnevitsky, Pearson Education, 2005
22
Why Not Traditional Data Analysis?(1)

Tremendous amount of data
– Algorithms must be highly scalable to handle
such as tera-bytes of data

High-dimensionality of data
– Micro-array may have tens of thousands of
dimensions
Slides are based on Negnevitsky, Pearson Education, 2005
23
(2)

High complexity of data
– Data streams and sensor data
– Time-series data, temporal data, sequence data
– Structure data, graphs, social networks and multi-linked
data
– Heterogeneous databases and legacy databases
– Spatial, spatiotemporal, multimedia, text and Web data
– Software programs, scientific simulations

New and sophisticated applications
Slides are based on Negnevitsky, Pearson Education, 2005
24
Data warehouse


Modern organisations must respond quickly to
any change in the market. This requires rapid
access to current data normally stored in
operational databases.
However, an organisation must also determine
which trends are relevant. This task is
accomplished with access to historical data that
are stored in large databases called data
warehouses.
Slides are based on Negnevitsky, Pearson Education, 2005
25


The main characteristic of a data warehouse is its
capacity. A data warehouse is really big – it
includes millions, even billions, of data records.
The data stored in a data warehouse is


time dependent – linked together by the
times of recording – and
integrated – all relevant information from the
operational databases is combined and
structured in the warehouse.
Slides are based on Negnevitsky, Pearson Education, 2005
26
Query tools


A data warehouse is designed to support decisionmaking in the organisation. The information
needed can be obtained with query tools.
Query tools are assumption-based – a user must
ask the right questions.
Slides are based on Negnevitsky, Pearson Education, 2005
27
How is data mining applied in practice?



Many companies use data mining today, but
refuse to talk about it.
In direct marketing, data mining is used for
targeting people who are most likely to buy
certain products and services.
In trend analysis, it is used to determine trends
in the marketplace, for example, to model the
stock market. In fraud detection, data mining is
used to identify insurance claims, cellular
phone calls and credit card purchases that are
most likely to be fraudulent.
Slides are based on Negnevitsky, Pearson Education, 2005
28

Motivation: Finding latent relationships in data
– What products were often purchased together?—
Beer and diapers?!
– What are the subsequent purchases after buying a
PC?
– What kinds of DNA are sensitive to this new drug?
– Can we automatically classify web documents?
Slides are based on Negnevitsky, Pearson Education, 2005
29
Slides are based on Negnevitsky, Pearson Education, 2005
30

Applications
–
Market basket data analysis (shelf space
planning/increasing sales/promotion)
–
cross-marketing
–
catalog design
–
sale campaign analysis
–
Web log (click stream) analysis
–
DNA sequence analysis
Slides are based on Negnevitsky, Pearson Education, 2005
31
Data mining tools
Data mining is based on intelligent technologies
already discussed in this book. It often applies
such tools as neural networks and neuro-fuzzy
systems.
However, the most popular tool used for data
mining is a decision tree.
Slides are based on Negnevitsky, Pearson Education, 2005
32
Decision trees
A decision tree can be defined as a map of the
reasoning process. It describes a data set by a
tree-like structure.
Decision trees are particularly good at solving
classification problems.
Slides are based on Negnevitsky, Pearson Education, 2005
33
ID3












(tall, blond, blue) w
(short, silver, blue) w
(short, black, blue) w
(tall, blond, brown) w
(tall, silver, blue) w
(short, blond, blue) w
(short, black, brown) e
(tall, silver, black) e
(short, black, brown) e
(tall, black, brown) e
(tall, black, black) e
(short, blond, black) e
Slides are based on Negnevitsky, Pearson Education, 2005
34
Slides are based on Negnevitsky, Pearson Education, 2005
35
Slides are based on Negnevitsky, Pearson Education, 2005
36
Slides are based on Negnevitsky, Pearson Education, 2005
37
Slides are based on Negnevitsky, Pearson Education, 2005
38




A decision tree consists of nodes, branches and
leaves.
The top node is called the root node. The tree
always starts from the root node and grows
down by splitting the data at each level into
new nodes. The root node contains the entire
data set (all data records), and child nodes hold
respective subsets of that set.
All nodes are connected by branches.
Nodes that are at the end of branches are called
terminal nodes, or leaves.
Slides are based on Negnevitsky, Pearson Education, 2005
39
How does a decision tree select splits?


A split in a decision tree corresponds to the
predictor with the maximum separating power.
The best split does the best job in creating
nodes where a single class dominates.
One of the best known methods of calculating
the predictor’s power to separate data is based
on the Gini coefficient of inequality.
Slides are based on Negnevitsky, Pearson Education, 2005
40
Major Issues in Data Mining(1)

Mining methodology
– Mining different kinds of knowledge from diverse data types,
e.g., bio, stream, Web
– Performance: efficiency, effectiveness, and scalability
– Pattern evaluation: the interestingness problem
– Incorporation of background knowledge
– Handling noise and incomplete data
– Parallel, distributed and incremental mining methods
– Integration of the discovered knowledge with existing one:
knowledge fusion
Slides are based on Negnevitsky, Pearson Education, 2005
41
(2)

User interaction
– Data mining query languages and ad-hoc mining
– Expression and visualization of data mining results
– Interactive mining of knowledge at multiple levels of
abstraction

Applications and social impacts
– Domain-specific data mining & invisible data mining
– Protection of data security, integrity, and privacy
Slides are based on Negnevitsky, Pearson Education, 2005
42
Summary(1)

Data mining: Discovering interesting patterns from large
amounts of data

A natural evolution of database technology, in great
demand, with wide applications

A KDD process includes data cleaning, data integration,
data selection, transformation, data mining, pattern
evaluation, and knowledge presentation
Slides are based on Negnevitsky, Pearson Education, 2005
43
(2)

Mining can be performed in a variety of
information repositories

Data mining functionalities: characterization,
discrimination, association, classification,
clustering, outlier and trend analysis, etc.

Data mining systems and architectures

Major issues in data mining
Slides are based on Negnevitsky, Pearson Education, 2005
44
Thank you
Slides are based on Negnevitsky, Pearson Education, 2005
45
An example of a decision tree
Household
responded:
112
not responded: 888
Total:
1000
Homeownership
Yes
responded:
not responded:
Total:
9
334
343
No
responded:
not responded:
Total:
103
554
657
Household Income
$20,700
responded:
not responded:
Total:
14
158
172
$20,701
responded:
not responded:
Total:
89
396
485
Savings Accounts
Yes
responded:
not responded:
Total:
86
188
274
Slides are based on Negnevitsky, Pearson Education, 2005
No
responded:
not responded:
Total:
3
208
211
46
The Gini coefficient
The Gini coefficient is a measure of how well the
predictor separates the classes contained in the
parent node.
Gini, an Italian economist, introduced a rough
measure of the amount of inequality in the
income distribution in a country.
Slides are based on Negnevitsky, Pearson Education, 2005
47
Computation of the Gini coefficient
100
% Income
80
60
40
20
0
0
20
40
60
80
100
% Population
The Gini coefficient is calculated as the area between
the curve and the diagonal divided by the area below the
diagonal. For a perfectly equal wealth distribution, the
Gini coefficient is equal to zero.
Slides are based on Negnevitsky, Pearson Education, 2005
48
Selecting an optimal decision tree:
(a) Splits selected by Gini
Class A: 100
Class B: 50
Total:
150
Predictor 1
yes
no
Class A: 63
Class B: 38
Total:
101
Class A: 37
Class B: 12
Total:
49
Predictor 2
Predictor 4
yes
yes
no
no
Class A: 25
Class B: 4
Total:
29
Class A: 12
Class B: 8
Total:
20
Predictor 3
Predictor 5
Predictor 6
yes
yes
yes
Class A: 4
Class B: 37
Total:
41
Class A: 0
Class B: 36
Total:
36
Class A: 59
Class B: 1
Total:
60
no
Class A:
Class B:
Total:
4
1
5
Class A:
Class B:
Total:
2
1
3
no
Class A: 23
Class B: 3
Total:
26
Slides are based on Negnevitsky, Pearson Education, 2005
Class A:
Class B:
Total:
1
8
9
no
Class A: 11
Class B: 0
Total:
11
49
Selecting an optimal decision tree:
(b) Splits selected by guesswork
Class A: 100
Class B: 50
Total:
150
Predictor 5
yes
no
Class A: 19
Class B: 14
Total:
33
Class A: 81
Class B: 36
Total:
117
Predictor 2
Predictor 3
yes
Class A: 17
Class B: 6
Total:
23
no
yes
Class A: 2
Class B: 8
Total:
10
no
Class A: 46
Class B: 21
Total:
67
Class A: 35
Class B: 15
Total:
50
Predictor 1
Predictor 6
yes
yes
Class A: 37
Class B: 14
Total:
51
no
Class A: 9
Class B: 7
Total:
16
Class A: 23
Class B: 9
Total:
32
no
Class A: 12
Class B: 6
Total:
18
Predictor 4
yes
Class A: 8
Class B: 9
Total:
17
no
Class A: 29
Class B: 5
Total:
34
Slides are based on Negnevitsky, Pearson Education, 2005
50
Gain chart of Class A
100
% Class
80
60
40
The Gini splits
20
Manual split selection
0
0
20
40
60
80
100
% Population
Slides are based on Negnevitsky, Pearson Education, 2005
51
Can we extract rules from a decision tree?
The pass from the root node to the bottom leaf
reveals a decision rule.
For example, a rule associated with the right
bottom leaf in the figure that represents Gini splits
can be represented as follows:
if
and
and
then
(Predictor 1 = no)
(Predictor 4 = no)
(Predictor 6 = no)
class = Class A
Slides are based on Negnevitsky, Pearson Education, 2005
52
Case study:
Profiling people with high blood pressure
A typical task for decision trees is to determine
conditions that may lead to certain outcomes.
Blood pressure can be categorised as optimal,
normal or high. Optimal pressure is below
120/80, normal is between 120/80 and 130/85,
and a hypertension is diagnosed when blood
pressure is over 140/90.
Slides are based on Negnevitsky, Pearson Education, 2005
53
A data set for a hypertension study
Community Health Survey: Hypertension Study (California, U.S.A.)
Gender
 Male
Female
Age
18 – 34 years
35 – 50 years
 51 – 64 years
65 or more years
Race
 Caucasian
African American
Hispanic
Asian or Pacific Islander
Marital Status
Married
Separated
 Divorced
Widowed
Never Married
Household Income
Less than $20,700
$20,701  $45,000
 $45,001  $75,000
$75,001 and over
Slides are based on Negnevitsky, Pearson Education, 2005
54
A data set for a hypertension study (continued)
Community Health Survey: Hypertension Study (California, U.S.A.)
Alcohol Consumption
Abstain from alcohol
Occasional (a few drinks per month)
 Regular (one or two drinks per day)
Heavy (three or more drinks per day)
Smoking
Nonsmoker
1 – 10 cigarettes per day
 11 – 20 cigarettes per day
More than one pack per day
Caffeine Intake
Abstain from coffee
 One or two cups per day
Three or more cups per day
Salt Intake
Low-salt diet
 Moderate-salt diet
High-salt diet
Physical Activities
None
 One or two times per week
Three or more times per week
Weight
Height
170 cm
9 3 kg
0
Blood Pressure
Optimal
Normal
 High
Slides are based on Negnevitsky, Pearson Education, 2005
55
Data cleaning
Decision trees are as good as the data they
represent. Unlike neural networks and fuzzy
systems, decision trees do not tolerate noisy and
polluted data. Therefore, the data must be
cleaned before we can start data mining.
We might find that such fields as Alcohol
Consumption or Smoking have been left blank
or contain incorrect information.
Slides are based on Negnevitsky, Pearson Education, 2005
56
Data enriching
From such variables as weight and height we
can easily derive a new variable, obesity. This
variable is calculated with a body-mass index
(BMI), that is, the weight in kilograms divided
by the square of the height in metres. Men with
BMIs of 27.8 or higher and women with BMIs
of 27.3 or higher are classified as obese.
Slides are based on Negnevitsky, Pearson Education, 2005
57
A data set for a hypertension study (continued)
Community Health Survey: Hypertension Study (California, U.S.A.)
Obesity
 Obese
Not Obese
Slides are based on Negnevitsky, Pearson Education, 2005
58
Growing a decision tree
Blood Pressure
optimal:
319 (32%)
normal:
528 (53%)
high:
153 (15%)
Total:
1000
Age
18 – 34 years
optimal:
88 (56%)
normal:
64 (41%)
high:
5 (3%)
Total:
157
35 – 50 years
optimal:
208 (35%)
normal:
340 (57%)
high:
48 (8%)
Total:
596
51 – 64 years
optimal:
21 (12%)
normal:
90 (52%)
high:
62 (36%)
Total:
173
Slides are based on Negnevitsky, Pearson Education, 2005
65 or more years
optimal:
2 (3%)
normal:
34 (46%)
high:
38 (51%)
Total:
74
59
Growing a decision tree (continued)
51 – 64 years
optimal:
21 (12%)
normal:
90 (52%)
high:
62 (36%)
Total:
173
Obesity
Obese
optimal:
3 (3%)
normal:
53 (49%)
high:
51 (48%)
Total:
107
Not Obese
optimal:
18 (27%)
normal:
37 (56%)
high:
11 (17%)
Total:
66
Slides are based on Negnevitsky, Pearson Education, 2005
60
Growing a decision tree (continued)
Obese
optimal:
3 (3%)
normal:
53 (49%)
high:
51 (48%)
Total:
107
Race
Caucasian
optimal:
2 (5%)
normal:
24 (55%)
high:
17 (40%)
Total:
43
African American
optimal:
0 (0%)
normal:
13 (35%)
high:
24 (65%)
Total:
37
Hispanic
optimal:
0 (0%)
normal:
11 (58%)
high:
8 (42%)
Total:
19
Slides are based on Negnevitsky, Pearson Education, 2005
Asian
optimal:
normal:
high:
Total:
1 (12%)
5 (63%)
2 (25%)
8
61
Solution space of the hypertension study
The solution space is first divided into four
rectangles by age, then age group 51-64 is
further divided into those who are overweight
and those who are not. And finally, the group of
obese people is divided by race.
Slides are based on Negnevitsky, Pearson Education, 2005
62
Solution space of the hypertension study
157
74
8
19
596
37
66
43
Slides are based on Negnevitsky, Pearson Education, 2005
63
Hypertension study: forcing a split
Blood Pressure
optimal:
319 (32%)
normal:
528 (53%)
high:
153 (15%)
Total:
1000
Age
18 – 34 years
optimal:
88 (56%)
normal:
64 (41%)
high:
5 (3%)
Total:
157
Male
optimal:
111 (36%)
normal:
168 (55%)
high:
28 (9%)
Total:
307
35 – 50 years
optimal:
208 (35%)
normal:
340 (57%)
high:
48 (8%)
Total:
596
51 – 64 years
optimal:
21 (12%)
normal:
90 (52%)
high:
62 (36%)
Total:
173
Gender
Gender
Female
optimal:
97 (34%)
normal:
172 (59%)
high:
20 (7%)
Total:
289
Male
optimal:
11 (13%)
normal:
48 (56%)
high:
27 (31%)
Total:
86
Slides are based on Negnevitsky, Pearson Education, 2005
65 or more years
optimal:
2 (3%)
normal:
34 (46%)
high:
38 (51%)
Total:
74
Female
optimal:
10 (12%)
normal:
42 (48%)
high:
35 (40%)
Total:
87
64
Advantages of decision trees


The main advantage of the decision-tree
approach to data mining is it visualises the
solution; it is easy to follow any path through the
tree.
Relationships discovered by a decision tree can
be expressed as a set of rules, which can then be
used in developing an expert system.
Slides are based on Negnevitsky, Pearson Education, 2005
65
Drawbacks of decision trees



Continuous data, such as age or income, have to
be grouped into ranges, which can unwittingly
hide important patterns.
Handling of missing and inconsistent data –
decision trees can produce reliable outcomes
only when they deal with “clean” data.
Inability to examine more than one variable at a
time. This confines trees to only the problems
that can be solved by dividing the solution space
into several successive rectangles.
Slides are based on Negnevitsky, Pearson Education, 2005
66
In spite of all these limitations, decision
trees have become the most successful
technology used for data mining.
An ability to produce clear sets of rules
make decision trees particularly attractive
to business professionals.
Slides are based on Negnevitsky, Pearson Education, 2005
67