Download Variability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Confidence interval wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Variability
Variability
• The differences between individuals in a population
• Measured by calculations such as Standard Error, Confidence Interval
and Sampling Error.
• Variability gives an indication of the effort required to estimate
population parameters with confidence.
Standard Error (of the mean)
• is the standard deviation of all possible sample means around the
true population mean calculated by dividing the standard deviation
by the square root of the sample size.
• the notation for standard error is lower case s with an x bar subscript
or SE.
Standard Error for finite populations
• If the total population is small or if the sample comprises more than 5
to 10 percent of the population, the sample mean is probably closer
to the population mean than with infinite populations. As a result, the
standard error of the mean is also smaller. The finite population
correction factor serves to reduce the standard error when relatively
large samples are drawn from finite populations. If every unit in the
population is measured then the true mean is known and the
standard error becomes zero, as there are no other possible means
using that sample size.
Confidence Interval
• Used to specify the precision of the sample mean in relation to the
population mean. It is derived by adding and subtracting a tabulated t
value for the desired level of confidence times the standard error.
• The t value comes from the Student's t distribution which is really a series
of distributions based on the normal distribution, each defined by the
degrees of freedom or the sample size minus one (represented as v).
When applying a probability level to the Student's t distribution, at a 95%
probability level for example, the t value is found from a lookup table.
Using the degrees of freedom and the probability level, the t value is the
multiplier for the standard error to determine the confidence interval level.
The t value increases as sample size decreases, leading to larger confidence
intervals and decreased precision.
Student’s t distribution
For t = 2 there is a 95% chance your sample confidence
interval includes the true population mean
Confidence Interval
Effect of Standard Deviation
The red distribution has a mean of 40 and a standard deviation of 5;
the blue distribution has a mean of 60 and a standard deviation of 10.
For the red distribution, 68% of the distribution is between 45 and 55;
for the blue distribution, 68% is between 40 and 60.
Sampling Error
• rather than work with absolute
confidence limits, convert them
to a percent of the sample mean
which is called sampling error.
The notation in the handbook is
an upper case E. Take the
confidence interval quantity and
scale it to the sample mean by
dividing by the sample mean.
Express this value as a percent
by multiplying by 100
Sample Error Example step 1 (Calculate Standard Error)
Plots 1/5 acre in size were used in this
example and acres is equal to 18. So the total
number of plots for the strata would be 5
plots per acre times 18 acres = 90 potential
plots.
Notice the application of the Finite
Correction Factor (FCF) for this method.
Sample Error – Step 2
Recall the Standard Error was calculated as 8.3 ft3
36.2% is a bit larger than the level we set to begin with (10%) – Implications?
Determining Sample Size
• Since t is very close to 2 for 95% confidence at infinite sample
size we will use it.
• E is the desired sampling error, we will use 10%
Sample Coefficient of Variation refresher
• Because populations with large means tend to have larger standard
deviations than those with small means, the coefficient of variation
permits a comparison of relative variability about different-sized
means. The sample coefficient of variation is an expression of the
standard deviation as a percentage of the mean. It is usually
represented by upper case CV. It is calculated as the standard
deviation divided by the mean multiplied by 100 to convert to a
percentage.
Sampling Intensity
once a reasonable number of sample
units are selected (i.e., over 20 to 30)
the changes in the estimates of the
mean and standard deviation by
adding another sampling unit
probably won't amount to much. The
biggest gain to be had is in the
estimation of standard error. Recall
the formula for calculating the
standard error results by dividing the
sample standard deviation by the
square root of the sample size. So as
sample size increases, the standard
error decreases.
Effect of CV change
• At high Coefficients of
Variations very high
sample numbers are
needed to obtain desired
confidence.
• At lower desired errors,
samples needed approach
population size
Sampling Intensity
The USFS Way
Sample Selection – from Precruise data
1. Determine the sampling error for the sale as a whole. (set to 10%)
2. Subdivide (or stratify) the sale population into sampling components as needed to
reduce the variability within the sampling strata.
3. Calculate the coefficient of variation (CV) by stratum and a weighted CV over all
strata.
4. Calculate number of plots for the sale as a whole and then distribute by stratum.
Number of Plots
Value of t is assumed to be 2
Error is set at 10%
Distribute Plots by Stratum
For each stratum, the calculation would look like this:
• n1 = (17.6 * 185) / 67.9 = 48 plots
• n2 = (7.7 * 185) / 67.9 = 21 plots
• n3 = (7.2 * 185) / 67.9 = 20 plots
• n4 = (35.4 * 185) / 67.9 = 96 plots
• Which totals to the 185 plots for the sale.