Download Lecture 14 - Probability and CLT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Lecture 4 – Probability and Statistics
Topics:
1. Descriptive Statistics.
Handouts/Readings:
1. Notes - can be found on the course web page.
Assignment:
Complete problem set #2 before next Monday's (Feb. 4) lecture.
Notes:
Review of terminology:
Population - is the total number of elements in a given area and
time period. There are no limits to the size of a population. Size
is determined entirely by the collection of elements about which
information is desired.
Sample - is the portion of the population that is measured. It is
used to make inferences about the population.
Sampling units – the units in which the population is defined and
that are available to be selected in the sample.
Sampling Frame – is a list of all the sampling units in the
population from which the sample will be selected.
Parameter - is a population characteristic. It is the value that
would be obtained for the characteristic of the population of
interest if every object in the population were measured.
Sample estimate - is the value of the parameter as estimated
from the population sample.
Sample estimator – is a mathematical formula that is used to
calculate the sample estimate.
Descriptive statistics:
Mean – is the arithmetic average of a series of values or
observations. We can calculate two different means: (a) the
population mean:
Where:
= population mean
yi= value of the ith population unit
N = total number of all sampling units in the population
And, (b) a sample estimator of the population mean:
Where:
y = sample mean
yi= value of the ith sampling unit
n = total number of all sampling units in a particular
sample
**Note: we rarely calculate a population parameter due to a lack
of time and/or economic resources. Instead we sample a
population and calculate an estimate of the parameter. In
addition, sampling may actually result in fewer systematic errors
than complete enumeration (i.e., a 100% inventory).
Variance – defines the variability of individuals of a population
with respect to a particular parameter. For example, not all tree
diameters or heights will be the same. Just as a population has a
mean, it will also have a variance. The sample variance is
calculated as:
Where:
y = sample mean
n = sample size
**Note: that the units associated with the variance are square
units. For example, square inches. To return the units to inches
we take the square root of the variance:
This is called the standard deviation. It is a measure of how
much the unit values of the sample vary from the mean. The
standard deviation estimates the dispersion of the unit values in a
population about the true mean.
**Note: it takes a sample of at least size n = 2 to calculate the
variance and standard deviation.
Standard error – is the standard deviation of the means and is
obtained by taking the square root of the variance of the means.
Both estimators provide a measure of the variation among sample
means. The variance of the means is calculated as:
Where:
yi = sample estimate of the mean for sample i
m = total number of sample estimates
= the variance of the means
The standard error is then calculated as:
**Note: the concept of standard error makes sense only in
repeated sampling since it is the standard deviation of different
sample estimates. It takes a sample size of at least n = 2 to
calculate the variance and standard deviation, and likewise it
takes at least 2 samples means to calculate a standard error.
The problem is in most inventory situations only one sample will be
conducted. However, the Central Limit Theory (CLT) allows a
variance of the means and standard error to be estimated from
only one sample. The variance of the means can be estimated by:
The sample standard estimate can be estimated by:
**Note: given a normal distribution - approximately 67% of the
sample means lie within 1 standard error of the true mean, 95%
within 2 standard errors, and 99% within 2.6 standard errors.
If it is known that 95% of sample means lie within 2 standard
errors of the true mean and if the standard error can be
estimated from a sample, then the interval in which the true
mean should lie can be quantified. This interval is called a 95%
confidence interval. For sample size n > 30, confidence intervals
can be constructed as follows:
Where:
Y = sample mean
Z = a value from the normal distribution
S = standard error
**Note: values for Z vary depending on the probability level. Z =
1.96 (you can use a value of 2 for quick approximations) for 95%
confidence limits. For small samples (n <= 30), a t distribution and
t value are applied. The CLT does not apply to small samples.
A 95% confidence interval means that if a population is
repeatedly sampled, 95% of all possible random samples will
produce 95% confidence intervals that contain the true
population mean. The other 5% will produce confidence intervals
that do not contain the true population mean.
Allowable error – the amount of error that can be tolerated from
an inventory estimate about which rational decisions can be
based. It is calculated differently for different sampling
methods. For a simple random sampling method it is calculated
as:
Where:
BM = the upper bound
Y = sample mean
When we do inventories we are also interested in estimating
totals.
Ex. – An inventory of a 55-acre white spruce stand was carried
out by taking a random sample of 50 1/5-acre sample plots. The
merchantable cubic foot volume of sawtimber on each 1/5-acre
plot was determined. The average volume per sample plot and its
associated standard error are:
Y = 510 ft3/sampling unit
Sy = 100 ft3/sampling unit
1. How many 1/5-acre units are there in the tract?
5 units/acre * 55 acres = 275 units
2. What is the estimate of total cubic foot volume of
sawtimber on this tract?
T = 510 ft3/ unit * 275 units = 140,250 ft3
3. What is the standard error of the total cubic foot volume
of sawtimber on this tract?
ST = 100 ft3/ unit * 275 units = 27,500 ft3
4. What is the approximate 95% confidence interval for the
total cubic foot volume of sawtimber on this tract?
140,250 + 2(27,500) = 85,250 ft3 to 195,250 ft3
Coefficient of variation – is the ratio of the standard deviation of
the sampling units (Sy) to the mean, expressed as a percentage.
The CV puts variability on a relative basis and allows for the
comparison of variation between 2 or more populations. It is
calculated as:
Ex. – Compare the variability in tree heights in a 20-year-old
natural stand with those in a 4-year-old plantation.