Download Projecting the Presence of Pecans:

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mixture model wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Projecting the Presence of Pecans:
Mining Environmental Data to Identify Ecological Limits of Species Distribution
Amber Johnson, Sarah Kemick, & Brian White
WHAT WE DID – A STEP BY STEP OUTLINE
Step 1: Identify weather stations that lie within boundaries of the distribution of interest and
record in a data file.

This can be done “by hand” (like we did Thursday before the MyWorld GIS
presentation)
o Observing an overlay using station numbers to identify stations which fall
within the distribution..
o This took two people a couple of hours to turn into a data file.
OR

This can be done using MyWorld GIS software.
o Construct a project with a shape file for the distribution of interest and the
locations of data points with environmental data.
o Analyze the relationship of the two layers to return values from the
distribution layer to a file. [File can be exported and used in Excel.]
o Our pecan data linked to our weather station locations in two quick steps
using MyWorld GIS. [<30 minutes from installing software to result
table!!!]
Step 2: Convert file to .arff type (for description see links on project site) for use with Weka
data mining software.
Step 3: Select model type and run Weka data mining software –
1. Try different algorithms to find best fit
2. Consider which algorithms are most informative
3. Save “classifier” [model] once it is trained on the input data
4. Apply classifier to data [will cross-validate leaving out 10% of the file at a time]
and save predictions to the file.
 Output file will be a comma separated value file.
Step 4: Attach map latitude and map longitude to model predictions in the comma separated
value (CSV) file.

Have a small file handy with site names, latitudes and longitudes for
mapping which can be easily pasted into the output file.

Remember S Latitude and W Longitude should be negative values to map in
a coordinate system. I usually call these variables MapLat and MapLong.
Step 5: Open MyWorld GIS
1. In the Construct Mode:
 select layers from among included maps
 Maps exist at many scales – world, country, state level maps are easily
accessible; you can find many others for free on the web. MyWorld GIS
comes with many maps and datasets.
b) In the FILE menu, IMPORT LAYER FROM FILE to add data from a CSV
file.
c) Drag layers up in the construct list to place them on top of other layers.
d) Pick the column of data by which to mark points.
e) Adjust the color scheme and number of colors showing as necessary.
2. In the Visualize Mode:

You can select a tool that displays information from a table for any point on
which you click – this data can be accumulated in a single file and exported.
3. In the Analyze Mode:

You can query the data to collect a list of stations, states, etc. with some
property.

Plus much, much more we haven’t had time to explore yet!
WHAT WE HAVE TO SHOW FOR IT
1. Several models that use environmental variables to project the presence or absence
of pecans.


Even with our limited knowledge of plant physiology and community
ecology, these models are fun to think in terms of what they tell us about the
niche of the pecan tree. [Example to follow.]
Could be used to compare model strategies or to investigate ecological
settings.
2. Maps with flexible options for what information is showing at any given time.
3. A few nascent ideas about how to develop the use of these data and tools into
curricular materials. [Please help us imagine further uses by providing feedback!]
4. Greater knowledge and confidence using Data Mining and GIS tools.
AN EXAMPLE OF A MODEL
The Weka software can fit data distributions using many different types of algorithms
from nearest neighbor to logistic models to decision trees. For this example, we focus
on a rule based classification model (JRip to those who know/ care).
The output looks like this:
JRIP rules:
===========
(MWM >= 26.5) and (BAR5 >= 14.6915) and (PTOAE >= 1.1925) and (ELEV <=
300) => Pecan=1 (82.0/4.0)
(AE >= 652.4943) and (PTOAE >= 1.1295) and (WATDGRC <= 3) and (WRET >=
104.8334) and (ELEV <= 625) => Pecan=1 (72.0/4.0)
(MWM >= 24.6) and (CVRAIN <= 44.3185) and (WSTORAGE >= 181.796) and (ELEV
<= 1030) => Pecan=1 (165.0/50.0)
(MWM >= 24.3) and (TRANGE >= 24.7) and (RLOW >= 25.91) and (PTOWATR >=
10.8738) and (Site <= 1517) => Pecan=1 (51.0/4.0)
(AE >= 622.0895) and (COKLM >= 506.9) and (EXPREY <= 520.5728) and
(PTOWATR >= 8.7045) => Pecan=1 (59.0/13.0)
(MWM >= 24.8) and (TRANGE >= 24.7) and (RLOW >= 25.91) and (RLOW <=
46.74) => Pecan=1 (52.0/24.0)
(MWM >= 27.22) and (RLOW >= 71.88) and (EXPREY <= 439.1472) and (WRET >=
102.7854) and (TEMP <= 56.0959) => Pecan=1 (15.0/1.0)
(MWM >= 27.44) and (CVRAIN <= 34.6388) and (WSTORAGE <= 161.2) => Pecan=1
(77.0/37.0)
=> Pecan=0 (4064.0/52.0)
Number of Rules : 9
Time taken to build model: 69.58 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances
Incorrectly Classified Instances
4394
94.7595 %
243
5.2405 %
Kappa statistic
0.7153
Mean absolute error
0.0744
Root mean squared error
0.2155
Relative absolute error
39.4775 %
Root relative squared error
70.2129 %
Total Number of Instances
4637
=== Detailed Accuracy By Class ===
TP Rate
FP Rate
Precision
0.974
0.275
0.968
0.974
0.971
0
0.725
0.026
0.765
0.725
0.744
1
=== Confusion Matrix ===
a
b
<-- classified as
4040
109 |
a = 0
134
354 |
b = 1
Recall
F-Measure
Class
WHAT DOES THAT MEAN?
Let’s look at one of the 9 rules in detail:
(MWM >= 26.5) and (BAR5 >= 14.6915) and (PTOAE >= 1.1925) and (ELEV <= 300) => Pecan=1
(82.0/4.0)
MWM is the Mean Temperature in the Warmest Month (C)
BAR5 is the Biomass Accumulation Ratio
 This is the amount of net above ground productivity added to standing
biomass each year.

Higher values indicate areas where we would find rapidly growing forests,
low values could be slow growing forests or grasslands.
PTOAE is the ratio of Potential Evapotranspiration to Actual Evapotranspiration
 higher values mark warmer/ drier settings where precipitation is not high
enough to match PET
ELEV is the elevation of the weather station in feet.
So, this statement can be read as:
Where the Mean of the Warmest Month is greater than or equal to 26.5 deg C, and
where Biomass Accumulation Ratio is greater than or equal to 14.69, and where
the ratio of Potential to Actual Evapotranspiration is greater than or equal to 1.19,
and where the Elevation is less than or equal to 300 feet, expect to find pecans.
In other words, pecans are found in warm locations where a moderate amount of
the productivity accumulates as standing biomass (think tree trunks, branches, etc)
in environments on the dry side and at low elevations.