Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 4 – Probability and Statistics Topics: 1. Descriptive Statistics. Handouts/Readings: 1. Notes - can be found on the course web page. Assignment: Complete problem set #2 before next Monday's (Feb. 4) lecture. Notes: Review of terminology: Population - is the total number of elements in a given area and time period. There are no limits to the size of a population. Size is determined entirely by the collection of elements about which information is desired. Sample - is the portion of the population that is measured. It is used to make inferences about the population. Sampling units – the units in which the population is defined and that are available to be selected in the sample. Sampling Frame – is a list of all the sampling units in the population from which the sample will be selected. Parameter - is a population characteristic. It is the value that would be obtained for the characteristic of the population of interest if every object in the population were measured. Sample estimate - is the value of the parameter as estimated from the population sample. Sample estimator – is a mathematical formula that is used to calculate the sample estimate. Descriptive statistics: Mean – is the arithmetic average of a series of values or observations. We can calculate two different means: (a) the population mean: Where: = population mean yi= value of the ith population unit N = total number of all sampling units in the population And, (b) a sample estimator of the population mean: Where: y = sample mean yi= value of the ith sampling unit n = total number of all sampling units in a particular sample **Note: we rarely calculate a population parameter due to a lack of time and/or economic resources. Instead we sample a population and calculate an estimate of the parameter. In addition, sampling may actually result in fewer systematic errors than complete enumeration (i.e., a 100% inventory). Variance – defines the variability of individuals of a population with respect to a particular parameter. For example, not all tree diameters or heights will be the same. Just as a population has a mean, it will also have a variance. The sample variance is calculated as: Where: y = sample mean n = sample size **Note: that the units associated with the variance are square units. For example, square inches. To return the units to inches we take the square root of the variance: This is called the standard deviation. It is a measure of how much the unit values of the sample vary from the mean. The standard deviation estimates the dispersion of the unit values in a population about the true mean. **Note: it takes a sample of at least size n = 2 to calculate the variance and standard deviation. Standard error – is the standard deviation of the means and is obtained by taking the square root of the variance of the means. Both estimators provide a measure of the variation among sample means. The variance of the means is calculated as: Where: yi = sample estimate of the mean for sample i m = total number of sample estimates = the variance of the means The standard error is then calculated as: **Note: the concept of standard error makes sense only in repeated sampling since it is the standard deviation of different sample estimates. It takes a sample size of at least n = 2 to calculate the variance and standard deviation, and likewise it takes at least 2 samples means to calculate a standard error. The problem is in most inventory situations only one sample will be conducted. However, the Central Limit Theory (CLT) allows a variance of the means and standard error to be estimated from only one sample. The variance of the means can be estimated by: The sample standard estimate can be estimated by: **Note: given a normal distribution - approximately 67% of the sample means lie within 1 standard error of the true mean, 95% within 2 standard errors, and 99% within 2.6 standard errors. If it is known that 95% of sample means lie within 2 standard errors of the true mean and if the standard error can be estimated from a sample, then the interval in which the true mean should lie can be quantified. This interval is called a 95% confidence interval. For sample size n > 30, confidence intervals can be constructed as follows: Where: Y = sample mean Z = a value from the normal distribution S = standard error **Note: values for Z vary depending on the probability level. Z = 1.96 (you can use a value of 2 for quick approximations) for 95% confidence limits. For small samples (n <= 30), a t distribution and t value are applied. The CLT does not apply to small samples. A 95% confidence interval means that if a population is repeatedly sampled, 95% of all possible random samples will produce 95% confidence intervals that contain the true population mean. The other 5% will produce confidence intervals that do not contain the true population mean. Allowable error – the amount of error that can be tolerated from an inventory estimate about which rational decisions can be based. It is calculated differently for different sampling methods. For a simple random sampling method it is calculated as: Where: BM = the upper bound Y = sample mean When we do inventories we are also interested in estimating totals. Ex. – An inventory of a 55-acre white spruce stand was carried out by taking a random sample of 50 1/5-acre sample plots. The merchantable cubic foot volume of sawtimber on each 1/5-acre plot was determined. The average volume per sample plot and its associated standard error are: Y = 510 ft3/sampling unit Sy = 100 ft3/sampling unit 1. How many 1/5-acre units are there in the tract? 5 units/acre * 55 acres = 275 units 2. What is the estimate of total cubic foot volume of sawtimber on this tract? T = 510 ft3/ unit * 275 units = 140,250 ft3 3. What is the standard error of the total cubic foot volume of sawtimber on this tract? ST = 100 ft3/ unit * 275 units = 27,500 ft3 4. What is the approximate 95% confidence interval for the total cubic foot volume of sawtimber on this tract? 140,250 + 2(27,500) = 85,250 ft3 to 195,250 ft3 Coefficient of variation – is the ratio of the standard deviation of the sampling units (Sy) to the mean, expressed as a percentage. The CV puts variability on a relative basis and allows for the comparison of variation between 2 or more populations. It is calculated as: Ex. – Compare the variability in tree heights in a 20-year-old natural stand with those in a 4-year-old plantation.