Download Stats and sampling

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Lecture 5 – Statistics and Sampling
Topics:
1. Statistics related to sampling.
Handouts/Readings:
1. Chapter 2 - section 2.1-2.14 - Avery and Burkhart
2. Chapter 3 - section 3.1-3.12 - Avery and Burkhart
Assignment:
Complete problem set #3 before next Monday's (Feb. 11) lecture.
Notes:
Review of basic statistics:
The variance and standard deviation provide measures of the
dispersion of individual observations about their arithmetic mean.
Ex. - if we calculated the estimated standard deviation for n
measurements of tree heights to be 1.2 cm, then we can expect
about 2/3 of the n measurements of individual tree heights to
fall within + 1.2 cm of the estimated mean.
The standard error, just like the standard deviation is a measure
of the variation of individual observations about their mean,
represents the variation among sample means. It is essentially
the standard deviation among means of samples of a fixed size n.
We use the standard error to calculate confidence intervals and
to determine the sample size needed for a specified sampling
precision.
The standard error for an infinite population (or an unknown
population) is:
For a finite population (i.e., we know how many total individuals
there are) is:
This finite correction factor reduces the standard error.
Confidence intervals establish an interval, which given some
specified probability level would be expected to include the
sample mean.
We use the standard error and t values to establish these limits:
Ex. - A 95% confidence interval says that if the population was
repeatedly sampled, 95% of all possible samples will produce
confidence intervals that contain the estimated mean value. If
20 samples were taken only 1 of 20 (P=0.5) would be expected to
produce a mean value outside the calculated confidence limits.
Sampling designs:
Sample design - is the method employed to select non-overlapping
sampling units.
There are many different kinds of sampling designs (overhead fig. 3-1 - Avery and Burkhart). One of the most common is simple
random sampling.
Simple random sampling - provides for an equal and independent
chance of every possible combination of sampling units being
selected.
Sampling units can be selected with or without replacement.
Sampling with replacement allows each unit to appear as often as
it is selected. Sampling without replacement allows each unit
selected to appear only once.
Determining sampling intensity:
When planning an inventory we want to select enough sampling
units of a desired precision so that our sample is statistically
significant and practically efficient.
We can calculate the required sampling intensity using a formula
based on the relationship of the confidence limits on the mean
(assuming an infinite population):
Where,
n = the number of sampling units
t = the critical value from the t-distribution table
s = the standard deviation
E = the desired half-width of the confidence interval
The desired precision, E, can be estimated by (a) obtaining a small
preliminary sample of the population or (b) using information
obtained from previous sampling of the same or a similar
population.
Ex. - Suppose we have conducted a preliminary inventory of 25
plots to estimate volume per acre of a timber stand. From that
sample we estimated the mean to be 4,400 bd ft per acre and a
standard deviation of 2,000 bd ft per acre. We want to
determine the sampling intensity needed to be within + 500 bd ft
per acre, with a confidence level of 95%.
When sampling from a finite population, the sample-size formula
is:
Where,
n = the number of sampling units required
N = the number of sampling units in the population
t = the critical value from the t-distribution table
s = the standard deviation
E = the desired half-width of the confidence interval
If we know the allowable error and have an estimate of the
coefficient of variation (CV) the required sampling size (for an
infinite population) is:
Where,
AE = the allowable error
T = the critical value
CV = coefficient of variation
This allows you to estimate the number of observations needed to
estimate a population mean within + X percent at a defined
probability level.
For a finite population the formula is:
Effects of plot size variability:
Small sample plots generally exhibit more variability than larger
plots. Large plots tend to average out the effect of clumping and
openings.
If the coefficient of variation (CV) has been estimated for plots
of a given size, we can approximate the CV for different sized
plots using the following formula:
Where,
CV2 = estimated CV for new plot size
CV1 = known CV for plots of known size
P1 = plot size used to estimate CV1
P2 = new plot size
Ex. - If the CV for 1/5-acre lots is 30% the estimated CV for
1/10-acre plots would be:
You can then compare the number of plots needed for each plot
size.
Ex. - Assume the 1/5-acre plots produced a sample measn of 4 cd
per plot (20 cd per acre) and the sample standard deviation is 6
cd per acre (30% from previous example). We want to estimate
the total number of 1/5-acre plots needed to estimate the mean
volume per acre within + 2 cd per acre at a 95% probability level
(approximating t = 2):
Compared to1/10-acre plots, which would have a standard
deviation of + 7.2 cd per acre (0.36*20 = 7.2):