Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Predictive Clustering of
Australian Vegetation Data
Saso Dzeroski, Valentin
Gjorgjioski, Matt White
(In alphabetical order)
The Relational Database
• A relational database was constructed
from the data provided by Matt White
• Three tables:
– sites - sites with corresponding attributes for
them
– species - species with corresponding
attributes for them
– sitemeasurements - table that links sites and
measurements and shows what is the cover
of given species to a given site
Attributes of sites
•
•
•
•
•
•
•
•
•
•
dem
twi
solar
thinvk
thk
eff_rain
mint_jul
max_feb
grnd_dpth
salinity
Attributes/properties of species
•
•
•
•
•
•
•
•
•
•
Lifelook
Sprflow
Sumflow
Autflow
Winflow
Hitecat
Aquatic
Parasite
Fleshyf
Fleshyl
Data Preprocessing
• We remove all measurements with very low
cover (<0.5/+) and a few exotic species (as
suggested by Matt White (*)
• We changed the measured cover values from
nominal to numeric by using the following
mapping:
• 1 -> 2.5
• 2 -> 15
• 3 -> 37.5
• 4 -> 62.5
• 5 -> 87.5
Experimental setup
• The data mining task
– Descriptive attributes are description of sites
– Target attributes are aggregations of
characteristics of species
• The aggregation of species characteristics
is described below
Agg. of species characteristics
• We will do aggregation over characteristics of species by
using the following algorithm
• For each characteristic, we will create many target
attributes, one for each value of the given characteristic
• e.g. autflow has two values (0 and 1), so we will create
two target attributes:
– autflow=0
– autflow=1
• We calculate the value of each new attribute for a given
site Si, and given species Sp so that:
autflow1( Si, Sp) 
 cov er (Si, Sp)
SpSi , autflow( Sp ) 1
cov er ( Si )
cov er ( Si ) 
 cov er (Si, Sp)
SpSi
Experimental setup
• Data mining approach
– Use PCTs for multi-target prediction
– With descriptive and target attributes as
described above
– The SW package CLUS is used
An example tree
• Maxsize=16 (8lvs)
Description of clusters
•
Description of clusters given in details in excel table
– Site properties
• twi > 815
• max_feb > 3108
– Species properties (Cluster A):
• lifelook=SS
19%
• sprflow=1
76%
• sumflow=1
58%
• autflow=‘’
64%
• winflow=‘’
52%
• hitecat=1
36%
• aquatic=‘’
99%
• parasite=‘’
100%
• fleshyf=‘’
91%
• fleshyl=
80%
Description of clusters
•
Description of clusters given in details in excel table
– Site properties
•
•
•
•
twi > 815
max_feb <= 3108
numberspp > 9
thinvk > 4.758
– Species properties (Cluster B):
•
•
•
•
•
•
•
•
•
•
•
lifelook=S
sprflow=1
sumflow=1
autflow=
winflow=
hitecat=2
aquatic=
parasite=
winflow=1
fleshyf=
fleshyl=
26%
80%
69%
72%
67%
23%
95%
99%
33%
90%
97%
Description of clusters
•
Description of clusters given in details in excel table
– Site properties
•
•
•
•
twi > 815
max_feb <= 3108
numberspp > 9
thinvk <= 4.758
– Species properties (Cluster C):
• lifelook=MTG
27%
• sprflow=1
82%
• sumflow=1
76%
• autflow=
65%
• winflow=
80%
• hitecat=2
38%
• aquatic=
95%
• parasite=
100%
• winflow=1
20%
• fleshyf=
96%
• fleshyl=
96%
Description of clusters
•
Description of clusters given in details in excel table
– Site properties
•
•
•
•
twi > 815
max_feb <= 3108
numberspp <= 9
dem>15
– Species properties (Cluster D):
•
•
•
•
•
•
•
•
•
•
•
lifelook=H
sprflow=1
sumflow=1
autflow=
winflow=
hitecat=2
aquatic=
parasite=
winflow=1
fleshyf=
fleshyl=
16%
74%
75%
63%
85%
24%
71%
100%
15%
99%
95%
Description of clusters
•
Description of clusters given in details in excel table
– Site properties
•
•
•
•
twi > 815
max_feb <= 3108
numberspp <= 9
dem<=15
– Species properties (Cluster E):
•
•
•
•
•
•
•
•
•
•
•
lifelook=H
sprflow=1
sumflow=1
autflow=1
winflow=
hitecat=1
aquatic=
parasite=
winflow=1
fleshyf=
fleshyl=
36%
65%
55%
57%
78%
48%
84%
100%
22%
94%
56%
Description of clusters
• Description of clusters given in details in excel table
– Site properties
• twi <= 815
• dem>1260
– Species properties (Cluster F):
•
•
•
•
•
•
•
•
•
•
•
lifelook=S
sprflow=1
sumflow=1
autflow=
winflow=
hitecat=2
aquatic=
parasite=
winflow=1
fleshyf=
fleshyl=
20%
53%
83%
80%
91%
33%
99%
100%
9%
92%
100%
Description of clusters
• Description of clusters given in details in excel table
– Site properties
• twi <= 815
• dem<=1260
• eff_rain > 19
– Species properties (Cluster G):
•
•
•
•
•
•
•
•
•
•
•
lifelook=T
sprflow=1
sumflow=1
autflow=
winflow=
hitecat=2
aquatic=
parasite=
winflow=1
fleshyf=
fleshyl=
21%
65%
66%
74%
82%
21%
99%
99%
18%
94%
100%
Description of clusters
• Description of clusters given in details in excel table
– Site properties
• twi <= 815
• dem<=1260
• eff_rain <= 19
– Species properties (Cluster H):
•
•
•
•
•
•
•
•
•
•
•
lifelook=T
sprflow=1
sumflow=1
autflow=
winflow=
hitecat=2
aquatic=
parasite=
winflow=1
fleshyf=
fleshyl=
20%
75%
71%
70%
74%
24%
98%
99%
26%
95%
99%
Related documents