Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Predictive Clustering of Australian Vegetation Data Saso Dzeroski, Valentin Gjorgjioski, Matt White (In alphabetical order) The Relational Database • A relational database was constructed from the data provided by Matt White • Three tables: – sites - sites with corresponding attributes for them – species - species with corresponding attributes for them – sitemeasurements - table that links sites and measurements and shows what is the cover of given species to a given site Attributes of sites • • • • • • • • • • dem twi solar thinvk thk eff_rain mint_jul max_feb grnd_dpth salinity Attributes/properties of species • • • • • • • • • • Lifelook Sprflow Sumflow Autflow Winflow Hitecat Aquatic Parasite Fleshyf Fleshyl Data Preprocessing • We remove all measurements with very low cover (<0.5/+) and a few exotic species (as suggested by Matt White (*) • We changed the measured cover values from nominal to numeric by using the following mapping: • 1 -> 2.5 • 2 -> 15 • 3 -> 37.5 • 4 -> 62.5 • 5 -> 87.5 Experimental setup • The data mining task – Descriptive attributes are description of sites – Target attributes are aggregations of characteristics of species • The aggregation of species characteristics is described below Agg. of species characteristics • We will do aggregation over characteristics of species by using the following algorithm • For each characteristic, we will create many target attributes, one for each value of the given characteristic • e.g. autflow has two values (0 and 1), so we will create two target attributes: – autflow=0 – autflow=1 • We calculate the value of each new attribute for a given site Si, and given species Sp so that: autflow1( Si, Sp) cov er (Si, Sp) SpSi , autflow( Sp ) 1 cov er ( Si ) cov er ( Si ) cov er (Si, Sp) SpSi Experimental setup • Data mining approach – Use PCTs for multi-target prediction – With descriptive and target attributes as described above – The SW package CLUS is used An example tree • Maxsize=16 (8lvs) Description of clusters • Description of clusters given in details in excel table – Site properties • twi > 815 • max_feb > 3108 – Species properties (Cluster A): • lifelook=SS 19% • sprflow=1 76% • sumflow=1 58% • autflow=‘’ 64% • winflow=‘’ 52% • hitecat=1 36% • aquatic=‘’ 99% • parasite=‘’ 100% • fleshyf=‘’ 91% • fleshyl= 80% Description of clusters • Description of clusters given in details in excel table – Site properties • • • • twi > 815 max_feb <= 3108 numberspp > 9 thinvk > 4.758 – Species properties (Cluster B): • • • • • • • • • • • lifelook=S sprflow=1 sumflow=1 autflow= winflow= hitecat=2 aquatic= parasite= winflow=1 fleshyf= fleshyl= 26% 80% 69% 72% 67% 23% 95% 99% 33% 90% 97% Description of clusters • Description of clusters given in details in excel table – Site properties • • • • twi > 815 max_feb <= 3108 numberspp > 9 thinvk <= 4.758 – Species properties (Cluster C): • lifelook=MTG 27% • sprflow=1 82% • sumflow=1 76% • autflow= 65% • winflow= 80% • hitecat=2 38% • aquatic= 95% • parasite= 100% • winflow=1 20% • fleshyf= 96% • fleshyl= 96% Description of clusters • Description of clusters given in details in excel table – Site properties • • • • twi > 815 max_feb <= 3108 numberspp <= 9 dem>15 – Species properties (Cluster D): • • • • • • • • • • • lifelook=H sprflow=1 sumflow=1 autflow= winflow= hitecat=2 aquatic= parasite= winflow=1 fleshyf= fleshyl= 16% 74% 75% 63% 85% 24% 71% 100% 15% 99% 95% Description of clusters • Description of clusters given in details in excel table – Site properties • • • • twi > 815 max_feb <= 3108 numberspp <= 9 dem<=15 – Species properties (Cluster E): • • • • • • • • • • • lifelook=H sprflow=1 sumflow=1 autflow=1 winflow= hitecat=1 aquatic= parasite= winflow=1 fleshyf= fleshyl= 36% 65% 55% 57% 78% 48% 84% 100% 22% 94% 56% Description of clusters • Description of clusters given in details in excel table – Site properties • twi <= 815 • dem>1260 – Species properties (Cluster F): • • • • • • • • • • • lifelook=S sprflow=1 sumflow=1 autflow= winflow= hitecat=2 aquatic= parasite= winflow=1 fleshyf= fleshyl= 20% 53% 83% 80% 91% 33% 99% 100% 9% 92% 100% Description of clusters • Description of clusters given in details in excel table – Site properties • twi <= 815 • dem<=1260 • eff_rain > 19 – Species properties (Cluster G): • • • • • • • • • • • lifelook=T sprflow=1 sumflow=1 autflow= winflow= hitecat=2 aquatic= parasite= winflow=1 fleshyf= fleshyl= 21% 65% 66% 74% 82% 21% 99% 99% 18% 94% 100% Description of clusters • Description of clusters given in details in excel table – Site properties • twi <= 815 • dem<=1260 • eff_rain <= 19 – Species properties (Cluster H): • • • • • • • • • • • lifelook=T sprflow=1 sumflow=1 autflow= winflow= hitecat=2 aquatic= parasite= winflow=1 fleshyf= fleshyl= 20% 75% 71% 70% 74% 24% 98% 99% 26% 95% 99%