• Study Resource
• Explore

Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Time series wikipedia, lookup

Categorical variable wikipedia, lookup

Transcript
```Agronomic Spatial
Variability and
Resolution
What is it?
How do we describe it?
What does it imply for
precision management?
Agronomic Variability
• Fundamental assumption of precision farming
• Agronomic factors vary spatially within a field
• If these factors can be measured then crop
yield and/or net economic returns can be
optimize
Agronomic Variables
• Soils
–
–
–
–
Classification
Texture
Organic matter
Water holding capacity
• Topography
– Slope
– Aspect
• Fertility
–
–
–
–
–
pH
Nitrogen
Phosphorus
Potassium
Other nutrients
• Plant available water
• Crop Cultivar
Agronomic Variables
• Temperature
• Rainfall
• Weeds
– Species
– Population
• Insects
– Species
– Feeding patterns
• Tillage Practices
• Soil Compaction
• Diseases
– Macro and micro
environment
• Crop Stand
• Method and Uniformity of
Application
– Fertilizers
– Crop protectants
What is variability
• Variability - difference in the magnitude of
measurements of a variable
– Values can change randomly because of error in
the sensor
– Systematic error or bias
– Values can change because of changes in the
underlying factor
• As time changes (Temporal)
• As location changes (Spatial)
Why statistically describe
measurements?
• Raw data sets are too large to understand or
interpret
• Statistics provide a means of summarizing
data and can be readily interpreted for making
management decisions
• Statistics can define relationships among
variables
Statistical Analyses Commonly Used
In Precision Agriculture
 Descriptive Statistics
 Measures of Central Tendency
 Mean
 Median
 Measures of Dispersion
 Range
 Standard Deviation
 Coefficient of Variation
 Normal Distributions
 Regression
 Geostatistics - Semivariance Analysis
Measures of Central Tendency
• When a factor, such as crop yield, is
measured at different locations within a
field, values may vary greatly
• This variation can appear to be random
• The set of these measurements is a
population
• A value exists that is the central or usual
value of the population
Measures of Central Tendency
• This is important because dimensions
representing Biological Material are
generally reported as single “expected”
values.
Examples:
http://www.nue.okstate.edu/By_Plant_Variability_Corn.htm
Mean or Average Value
• Most common measure of central tendency
• Definition:
For n measurements X1,X2,X3,…,Xn
n
X 1 + X 2 +...+ X n
=
X =
n
X
i =1
n
i
Mean or Average
• The mean or average value is useful if the
measured value is normally distributed
(Bell Curve)
– Most biological processes are normally
distributed
– Spatially distributed measurements are often
not normally distributed
• To calculated the mean in Excel
= Average (Col Row:Col Row)
Definition of (Col Row : Col Row)
(Col Row:Col Row)
•
•
•
•
•
Column letter of the upper left cell of an
array of data
Row number of the upper left cell of an
array of data
Column letter of the lower right cell of an
array of data
Row number of the lower right cell of an
array of data
The “:” instructs Excel to include all data
between the two corner cells
The Median Value
• For skewed distributions, it is the better
predictor of the expected or central value
• Calculated by ranking the values from high
to low
– For an odd number of measurements, the
median is middle value
– For an even number of measurements, the
median is average of the two middle values
• In Excel, the median is calculated using the
following formula:
= Median (Col Row : Col Row)
Normal vs. Skewed Distribution
Mean
Skewed
Normal
Skewed
Median
Normal
Skewed
Normal
Normality
• Biological materials physical measurements are generally
normally distributed about the mean. There are several test of
normality which will be discussed in your statistics courses.
However, three “quick and dirty” tests can be accessed easily
from Excel
• The first is simply comparing the mean and median values. If
the values are nearly the same the measurement is likely
distributed normally.
• Excel has function calls to calculate Skewness and Kurtosis.
These statistics can be used to test for normality
Normality
• Kurtosis measures deviation from the mean.
A value of ‘0’ indicates that there is no
deviation from a normal distribution. A
positive value indicates that more values are
clustered near the mean or far from it. A
negative value means a “flat” top of the curve.
• = Kurt (Col Row : Col Row)
Normality
• Skewness is a measure of the tail of the
distribution. A positive value indicates that
there is an asymetrical tail of the distribution
and that it is positive. A negative value
indicates that there is a negative tail to the
distribution.
• =Skew (Col Row : Col Row)
Measures of Dispersion
• Measures of dispersion describe the
distribution of the set of measurements
Maximum and Minimum Values
• The maximum value is the highest value in the
data set
• In Excel the maximum value is calculated by:
= Max(Col Row:Col Row)
• The minimum value is the lowest value in the
data set and is calculated by:
= Min(Col Row:Col Row)
Range of the Sample Set
• Difference between the maximum and
minimum values of the measurement
• Calculated in Excel by the following formula:
= Max (Col Row:Col Row)
- Min (Col Row:Col Row)
Standard Deviation
• The standard deviation of a normally
distributed sample set is 1/2 of the “range”
or ≈68 %values for the population
n
s=
 (X
i =1
i
-X )
n -1
2
Standard Deviation
• For a normal distribution (Bell Curve)
≈ 95% of the samples from a population will lie in the
interval
X - 1.96s  Z  X + 1.96s
Where: X is the mean(average) value
Z is a value (measurement)
s is the standard deviation
• The standard deviation is calculated in Excel using the
following formula:
= Stdev (Col Row : Col Row)
Coefficient of Variation
• The magnitude of the differences between large values
and their means tend to be large. The differences
between small values and their means tend to be small.
• Consequently, a high yielding field is likely to have a
higher standard deviation than a low yielding field, even if
the variability is lower in the high yield field or the same
as the lower yielding field.
Coefficient of Variation
• Thus, variation about two means of different
magnitudes cannot easily be compared.
• Comparisons can be made by calculating the
relative variation, or the normalized standard
deviation.
• This measurement is called the Coefficient of
Variation.
Coefficient of Variation
• The Coefficient of Variation or C.V. is
calculated by dividing the standard
deviation of the data set by its mean.
Often that value is multiplied by 100 and
the C.V. is expressed as a percentage.
• Experience with similar data sets is
required to determine if the C.V. is
unusually large.
Mean, Standard Deviation and
Coefficient of Variation
Population = Y
Mean Plant
Spacing
CV =
Std. Dev. = s
s
X
Population = ½ Y
Mean Plant
= 2X
Spacing
2 (X - X )
2
Std. Dev. =
n -1
2
CV =
2s s
=
2X X
Correlation
• One objective of Biosystems engineering and Agronomy is to
alter the level of one variable (e.g. soil nitrate) to change the
response of another variable (e.g. grain yield).
• There are other confounding factors affecting grain yield, such
as soil pH, which cannot always be accounted for.
Correlation
• Scientists still need to determine the degree
to which the two variables vary together.
• The correlation coefficient or r is that
measure.
• The correlation coefficient, r, lies between -1
and 1. Positive values indicate that X and Y
tend to increase or decrease together.
y
y
x
x
Correlation
• Values of r near 0 indicate that there is little or no
relationship between the two variables.
• The coefficient of determination or r2 is important in
precision farming because, when the samples are
collected by location in the field, it indicates the
percentage of the variability in the dependent variable
(e.g. yield) explained by the independent variable (e.g. N
fertilizer).
Correlation
• For example, if the r2 of soil N and grain yield is 90% then
90% of the variability across the field can be explained by
soil nitrate. Spatially varying the N fertilizer rate based on
the nitrate level in the soil should have a large effect on
grain yield.
• In Excel, correlation r is calculate by the following:
= Correl (Col Row : Col Row, Col Row:
Col Row)
To calculate r2, simply square the value of r.
Regression
• Excel has the capability of fitting mathematical
models (linear and non-linear curves) to data
which relate dependent to independent
variables. Regression (curve fitting) can be
performed using the Charting GUI in Excel.
You can also directly calculate the slope and
intercept for a linear model using the
commands
Regression
• = Intercept (Col Row : Col Row)
and
• = Slope (Col Row : Col Row)
• Regression R2 is a measure in decimal percent
of how well the model fits the data. For linear
regression, the regression R2 can be directly
calculated be squareing the correlation
coefficient
Data presentation
• Always be wary of Data.
– What is the error
– What is the scale of the Axis.
• Is it a fertilizer Trial, was the a 0 check?
90
80
70
60
5 bushel and \$30 increase
due to 2 pt of MikesMagic
Juice over 2 gal Joes Sauce
50
40
30
20
10
0
0 Check
50%
100%
100% + 2 100% + 3 100% +
gal
gal
2pt
100% + 100% + 2
3pt
oz
60.6
CY 08-09
CY 09-10
CY 10-11
50.6
40.6
30.6
20.6
10.6
0.6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Yield bu/ac
\$22.05/A
profit increase
45
40
35
30
25
20
15
10
5
0
Improving
Wheat Profits
\$110.70
\$88.65
28-0-0 @ 150 lbs/A
What question if any do you ask?
HH @ 1gal/A + SuperN 250-0 @ 2 gal/A
The 3 R’s
• r correlation coefficient
– P and K, slope and texture, N and OM
– Are they correlated at that site
• r2 correlation of determination
– N and yield, irrigation and yield, lime and soil pH
– Independent (controlled) and dependent (result)
• R2 Regression how well does a model explain
the data. Linear, quadratic, Linear plateau
Regression R2
Spatial Interpolation
• Interpolation:
In the mathematical field of numerical
analysis, interpolation is a method of
constructing new data points within the range
of a discrete set of known data points.
• Methods
–
–
–
–
–
Proximal / Inverse Distance
Moving Average/distance weighted.
Triangulation
Spline
Kriging provides a confidence in estimates produced.
Inverse Distance Weighting
• Inverse Distance Weighting (IDW) is a type of
deterministic method for multivariate interpolation
with a known scattered set of points. The assigned
values to unknown points are calculated with a
weighted average of the values available at the known
points.
•
• The name given to this type of methods was motivated
by the weighted average applied since it resorts to the
inverse of the distance to each known point ("amount
of proximity") when assigning weights.
IDW
• Known value, distance between and a Power
•
How much could distance influence value of unknown.
•
identify the power that produces the minimum RMSPE root mean square prediction error
Shepard's interpolation in 1
dimension, from 4 scattered points
Kriging
• Kriging is a group of geostatistical techniques
to interpolate the value of a random field
(e.g., the elevation, z, of the landscape as a
function of the geographic location) at an
unobserved location from observations of its
value at nearby locations.
• Kriging belongs to the family of linear least
squares estimation algorithms
• Use of variograms.
Kriging
Example of one-dimensional
data interpolation by kriging,
with confidence intervals.
Squares indicate the location
of the data. The kriging
interpolation is in red. The
confidence intervals are in
green.
• In IDW, the weight, ?i, depends solely on the distance to
the prediction location. However, in Kriging, the weights are
based not only on the distance between the measured
points and the prediction location but also on the overall
spatial arrangement among the measured points.
• To use the spatial arrangement in the weights, the spatial
autocorrelation must be quantified.
• Thus, in Ordinary Kriging, the weight, ?i , depends on a
fitted model to the measured points, the distance to the
prediction location, and the spatial relationships among the
measured values around the prediction location.
Impact of Resolution of samples
```
Related documents