Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 5 – Statistics and Sampling Topics: 1. Statistics related to sampling. Handouts/Readings: 1. Chapter 2 - section 2.1-2.14 - Avery and Burkhart 2. Chapter 3 - section 3.1-3.12 - Avery and Burkhart Assignment: Complete problem set #3 before next Monday's (Feb. 11) lecture. Notes: Review of basic statistics: The variance and standard deviation provide measures of the dispersion of individual observations about their arithmetic mean. Ex. - if we calculated the estimated standard deviation for n measurements of tree heights to be 1.2 cm, then we can expect about 2/3 of the n measurements of individual tree heights to fall within + 1.2 cm of the estimated mean. The standard error, just like the standard deviation is a measure of the variation of individual observations about their mean, represents the variation among sample means. It is essentially the standard deviation among means of samples of a fixed size n. We use the standard error to calculate confidence intervals and to determine the sample size needed for a specified sampling precision. The standard error for an infinite population (or an unknown population) is: For a finite population (i.e., we know how many total individuals there are) is: This finite correction factor reduces the standard error. Confidence intervals establish an interval, which given some specified probability level would be expected to include the sample mean. We use the standard error and t values to establish these limits: Ex. - A 95% confidence interval says that if the population was repeatedly sampled, 95% of all possible samples will produce confidence intervals that contain the estimated mean value. If 20 samples were taken only 1 of 20 (P=0.5) would be expected to produce a mean value outside the calculated confidence limits. Sampling designs: Sample design - is the method employed to select non-overlapping sampling units. There are many different kinds of sampling designs (overhead fig. 3-1 - Avery and Burkhart). One of the most common is simple random sampling. Simple random sampling - provides for an equal and independent chance of every possible combination of sampling units being selected. Sampling units can be selected with or without replacement. Sampling with replacement allows each unit to appear as often as it is selected. Sampling without replacement allows each unit selected to appear only once. Determining sampling intensity: When planning an inventory we want to select enough sampling units of a desired precision so that our sample is statistically significant and practically efficient. We can calculate the required sampling intensity using a formula based on the relationship of the confidence limits on the mean (assuming an infinite population): Where, n = the number of sampling units t = the critical value from the t-distribution table s = the standard deviation E = the desired half-width of the confidence interval The desired precision, E, can be estimated by (a) obtaining a small preliminary sample of the population or (b) using information obtained from previous sampling of the same or a similar population. Ex. - Suppose we have conducted a preliminary inventory of 25 plots to estimate volume per acre of a timber stand. From that sample we estimated the mean to be 4,400 bd ft per acre and a standard deviation of 2,000 bd ft per acre. We want to determine the sampling intensity needed to be within + 500 bd ft per acre, with a confidence level of 95%. When sampling from a finite population, the sample-size formula is: Where, n = the number of sampling units required N = the number of sampling units in the population t = the critical value from the t-distribution table s = the standard deviation E = the desired half-width of the confidence interval If we know the allowable error and have an estimate of the coefficient of variation (CV) the required sampling size (for an infinite population) is: Where, AE = the allowable error T = the critical value CV = coefficient of variation This allows you to estimate the number of observations needed to estimate a population mean within + X percent at a defined probability level. For a finite population the formula is: Effects of plot size variability: Small sample plots generally exhibit more variability than larger plots. Large plots tend to average out the effect of clumping and openings. If the coefficient of variation (CV) has been estimated for plots of a given size, we can approximate the CV for different sized plots using the following formula: Where, CV2 = estimated CV for new plot size CV1 = known CV for plots of known size P1 = plot size used to estimate CV1 P2 = new plot size Ex. - If the CV for 1/5-acre lots is 30% the estimated CV for 1/10-acre plots would be: You can then compare the number of plots needed for each plot size. Ex. - Assume the 1/5-acre plots produced a sample measn of 4 cd per plot (20 cd per acre) and the sample standard deviation is 6 cd per acre (30% from previous example). We want to estimate the total number of 1/5-acre plots needed to estimate the mean volume per acre within + 2 cd per acre at a 95% probability level (approximating t = 2): Compared to1/10-acre plots, which would have a standard deviation of + 7.2 cd per acre (0.36*20 = 7.2):