Download Sampling - Sys

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Sampling
Sampling: Experiment
Show that the distribution of averages from a non-normal distribution tends to
follow a normal distribution.
Obtained 20 or so uniform random numbers, sum them and compute the average.
Compute 100,000 of these averages.
Plot a histogram of the averages.
What do you see?
What you observed is an effect based on the Central Limit Theorem
2
Central Limit Theorem (CLT)
The Central Limit Theorem is a statement about the characteristics of the sampling the distribution of means
of random samples from a given population.
The Central Limit Theorem consists of three statements:
[1] The mean of the sampling distribution of means is equal to the mean of the population from
which the samples were drawn.
[2] The variance of the sampling distribution of means is equal to the variance of the population from
which the samples were drawn divided by the size of the samples.
[3] If the original population is distributed normally (i.e. it is bell shaped), the sampling distribution of
means will also be normal. If the original population is not normally distributed, the sampling distribution
of means will increasingly approximate a normal distribution as sample size increases (i.e. when increasingly
large samples are drawn).
3
Standard Error (SE)
Population
π‘₯3
Samples
π‘₯2
π‘₯1
π‘₯4
π‘₯5
π‘₯6
4
Compute the average weight (π‘₯𝑖 ) for each rabbit in the sample
Standard Error (SE)
π‘₯1 π‘₯2 π‘₯3 π‘₯4 π‘₯5 π‘₯6 ….
1. Because of the central limit theorem, the averages are
normally distributed.
2. The mean of the averages converges to the mean of the population
3. The standard deviation of the average is called the standard error
4.
where 𝜎π‘₯ is the population standard deviation.
5
Central Limit Theorem (CLT)
The normal distribution arises repeatedly in biology.
Gauss and Laplace noticed that measurement errors tend to follow a normal distribution.
Quetelet and Galton observed that the normal distribution fits data on the heights and weights of
human and animal populations. This holds true for many other characters as well.
SE can be proved by using arguments based on error
propagation which we will return to later.
6
Standard Error
Question:
Assume you sample from a normal distribution. Will the standard error be smaller,
the same or larger than the population standard deviation?
7
Standard Error
8
Standard Error
The standard error of the mean (SE of the mean) estimates the variability between
sample means that you would obtain if you took multiple samples from
the same population.
The standard error of the mean estimates the variability between samples
The standard deviation measures the variability within a single sample.
This is much
more interesting
9
Standard Error
The standard error equation also states that if we want to reduce the error of our
estimate of a mean value by a factor of ten, we have to gather one hundred times
more data.
10
Standard Error
Example
Three measurements of gene expression yield the values 1.34, 3.23, and 2.11
Find the mean and standard deviation of these values.
Find the standard deviation of the mean
Note that the three values represent a sample of expression values.
Since we don’t have the population standard deviation we must use the
sample standard deviation to estimate the standard error.
The large N the better the approximation.
11
Standard Error
The cholesterol levels in a sample of 25 men aged 55 to 60 was found to have a mean of
5.2 mmol/L. The standard deviation of the same was 0.6 mmol/L.
Compute the standard error of the means from the sample?
SE ~ 0.6/square Root (25) = 0.12 mmol/L
What does this mean?
Form up in pairs. Each pair write a sentence that explains what this means.
12
What you should use in your graphs when plotting data?
13
Common ways to describe variability in data
14