Download Document

Predictive Clustering of Australian Vegetation Data Saso Dzeroski, Valentin Gjorgjioski, Matt White (In alphabetical order) The Relational Database • A relational database was constructed from the data provided by Matt White • Three tables: – sites - sites with corresponding attributes for them – species - species with corresponding attributes for them – sitemeasurements - table that links sites and measurements and shows what is the cover of given species to a given site Attributes of sites • • • • • • • • • • dem twi solar thinvk thk eff_rain mint_jul max_feb grnd_dpth salinity Attributes/properties of species • • • • • • • • • • Lifelook Sprflow Sumflow Autflow Winflow Hitecat Aquatic Parasite Fleshyf Fleshyl Data Preprocessing • We remove all measurements with very low cover (<0.5/+) and a few exotic species (as suggested by Matt White (*) • We changed the measured cover values from nominal to numeric by using the following mapping: • 1 -> 2.5 • 2 -> 15 • 3 -> 37.5 • 4 -> 62.5 • 5 -> 87.5 Experimental setup • The data mining task – Descriptive attributes are description of sites – Target attributes are aggregations of characteristics of species • The aggregation of species characteristics is described below Agg. of species characteristics • We will do aggregation over characteristics of species by using the following algorithm • For each characteristic, we will create many target attributes, one for each value of the given characteristic • e.g. autflow has two values (0 and 1), so we will create two target attributes: – autflow=0 – autflow=1 • We calculate the value of each new attribute for a given site Si, and given species Sp so that: autflow1( Si, Sp)   cov er (Si, Sp) SpSi , autflow( Sp ) 1 cov er ( Si ) cov er ( Si )   cov er (Si, Sp) SpSi Experimental setup • Data mining approach – Use PCTs for multi-target prediction – With descriptive and target attributes as described above – The SW package CLUS is used An example tree • Maxsize=16 (8lvs) Description of clusters • Description of clusters given in details in excel table – Site properties • twi > 815 • max_feb > 3108 – Species properties (Cluster A): • lifelook=SS 19% • sprflow=1 76% • sumflow=1 58% • autflow=‘’ 64% • winflow=‘’ 52% • hitecat=1 36% • aquatic=‘’ 99% • parasite=‘’ 100% • fleshyf=‘’ 91% • fleshyl= 80% Description of clusters • Description of clusters given in details in excel table – Site properties • • • • twi > 815 max_feb <= 3108 numberspp > 9 thinvk > 4.758 – Species properties (Cluster B): • • • • • • • • • • • lifelook=S sprflow=1 sumflow=1 autflow= winflow= hitecat=2 aquatic= parasite= winflow=1 fleshyf= fleshyl= 26% 80% 69% 72% 67% 23% 95% 99% 33% 90% 97% Description of clusters • Description of clusters given in details in excel table – Site properties • • • • twi > 815 max_feb <= 3108 numberspp > 9 thinvk <= 4.758 – Species properties (Cluster C): • lifelook=MTG 27% • sprflow=1 82% • sumflow=1 76% • autflow= 65% • winflow= 80% • hitecat=2 38% • aquatic= 95% • parasite= 100% • winflow=1 20% • fleshyf= 96% • fleshyl= 96% Description of clusters • Description of clusters given in details in excel table – Site properties • • • • twi > 815 max_feb <= 3108 numberspp <= 9 dem>15 – Species properties (Cluster D): • • • • • • • • • • • lifelook=H sprflow=1 sumflow=1 autflow= winflow= hitecat=2 aquatic= parasite= winflow=1 fleshyf= fleshyl= 16% 74% 75% 63% 85% 24% 71% 100% 15% 99% 95% Description of clusters • Description of clusters given in details in excel table – Site properties • • • • twi > 815 max_feb <= 3108 numberspp <= 9 dem<=15 – Species properties (Cluster E): • • • • • • • • • • • lifelook=H sprflow=1 sumflow=1 autflow=1 winflow= hitecat=1 aquatic= parasite= winflow=1 fleshyf= fleshyl= 36% 65% 55% 57% 78% 48% 84% 100% 22% 94% 56% Description of clusters • Description of clusters given in details in excel table – Site properties • twi <= 815 • dem>1260 – Species properties (Cluster F): • • • • • • • • • • • lifelook=S sprflow=1 sumflow=1 autflow= winflow= hitecat=2 aquatic= parasite= winflow=1 fleshyf= fleshyl= 20% 53% 83% 80% 91% 33% 99% 100% 9% 92% 100% Description of clusters • Description of clusters given in details in excel table – Site properties • twi <= 815 • dem<=1260 • eff_rain > 19 – Species properties (Cluster G): • • • • • • • • • • • lifelook=T sprflow=1 sumflow=1 autflow= winflow= hitecat=2 aquatic= parasite= winflow=1 fleshyf= fleshyl= 21% 65% 66% 74% 82% 21% 99% 99% 18% 94% 100% Description of clusters • Description of clusters given in details in excel table – Site properties • twi <= 815 • dem<=1260 • eff_rain <= 19 – Species properties (Cluster H): • • • • • • • • • • • lifelook=T sprflow=1 sumflow=1 autflow= winflow= hitecat=2 aquatic= parasite= winflow=1 fleshyf= fleshyl= 20% 75% 71% 70% 74% 24% 98% 99% 26% 95% 99%

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Document