Download n = 3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Statistical inference wikipedia , lookup

Student's t-test wikipedia , lookup

Gibbs sampling wikipedia , lookup

Transcript
Last lecture summary
β€’ The nature of the normal distribution
β€’ Non-Gaussian distributions
New stuff
Lognormal distribution
β€’ Frazier et al. measured the ability of a drug isoprenaline
to relax the bladder muscle.
β€’ The results are expressed as the EC50, which is the
concentration required to relax the bladder halfway
between its minimum and maximum possible relaxation.
Lognormal distribution
Geometric mean
π‘₯ = 1 333 𝑛𝑀
π‘₯ = 2.71
π‘₯ = 102.71
= 513 nM
Geometric mean – transform all values to their logarithms,
calculate the mean of the logarithms, transform this mean
back to the units of original data (antilog)
The nature of the lognormal distribution
β€’ Lognormal distributions arise when multiple random
factors are multiplied together to determine the value.
β€’ A typical example: cancer (cell division is multiplicative)
β€’ Lognormal distributions are very common in many
scientific fields.
β€’ Drug potency is lognormal
β€’ To analyse lognormal data, do not use methods that
assume the Gaussian distribution. You will get misleding
results (e.g.,non-existing outliers).
β€’ Better way is to convert data to logarithm and analyse the
converted values.
How normal is normal?
Checking normality
1. Eyball histograms
2. Eyball QQ plots
3. There are tests
http://www.nate-miller.org/blog/how-normal-is-normal-a-q-q-plot-approach
QQ plot
β€’ Q stands for β€˜quantile’. Quantiles are values taken at
regular intervals from the data. The 2-quantile is called the
median, the 3-quantiles are called terciles, the 4-quantiles
are called quartiles (deciles, percentiles).
Typical normal QQ plot
http://emp.byui.edu/BrownD/Stats-intro/dscrptv/graphs/qq-plot_egs.htm
QQ plot of left-skewed distribution
http://emp.byui.edu/BrownD/Stats-intro/dscrptv/graphs/qq-plot_egs.htm
QQ plot of right-skewed distribution
http://emp.byui.edu/BrownD/Stats-intro/dscrptv/graphs/qq-plot_egs.htm
SAMPLING
DISTRIBUTIONS
výbΔ›rová rozdΔ›lení
Histogram
𝒙 = πŸπŸ—. πŸ’πŸ’
𝒔 = 𝟐. πŸ’πŸ“
𝒏=πŸ—
𝒙 = πŸπŸ”. πŸ–πŸ—
𝒔 = πŸ—. πŸπŸ•
𝒏=πŸ—
𝒙 = πŸπŸ•. 𝟐𝟐
𝒔 = πŸ”. πŸπŸ’
𝒏=πŸ—
Sampling distribution of sample mean
β€’ výbΔ›rové rozdΔ›lení výbΔ›rového prΕ―mΔ›ru
Sweet demonstration of the sampling
distribution of the mean
3
2
3
3
3
5
6
5
5
1
4
5
6
2
4
3
2
1
5
4
3
2
3
2
5
prΕ―mΔ›r = 3.3
3
3
6
5
1
5
prΕ―mΔ›r = 1.7
4
5
6
4
3
2
1
5
4
Data 2015
Population:
4,3,3,5,0,4,4,4,3,4,2,6,8,2,4,3,5,7,3,3
25 samples (n=3) and their averages
3,5,3,4,2,3,3,3,5,5,3,4,3,4,5,4,4,4,6,3,4,3,4,3,4
http://blue-lover.blog.cz/1106/lentilky
Histogram of 2015 data
2015, n = 3, number of samples = 25
Going further
β€’ So far, we have generated 25 samples with n = 3.
β€’ To improve our histogram, we need more samples.
β€’ However, we don’t want to spend ages in the classroom.
β€’ Thus, I have prepared a simulation for you. In this
simulation, I use data from 2014 and I generate all
possible samples, n = 3.
Sampling distribution, n = 3
1 540 samples
Sampling distribution, n = 5
42 504 samples
Sampling distribution, n = 10
20 030 010 samples
Central limit theorem (CLT)
β€’ The distribution of sample means is normal.
β€’ The distribution of sample means is always normal
irrespective of the underlying distribution.
β€’ The distribution of sample means will increasingly
approximate a normal distribution as a sample size 𝑛
increases.
Non-Gaussian distribution
1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,4,4,4,5,5,6,7,7,8,8,8,9,9,9,9,10,10,10,10,10,11,11,11,11,11,11
Sampling distribution
n=2
Sampling distribution
n=4
Sampling distribution
n=6
Sampling distribution
n=8
Back to CLT
β€’ Once we know that the sampling distribution of the
sample mean is normal, we want to characterize this
distribution.
β€’ By which numbers you characterize a distribution?
mean
standard deviation
Back to CLT
β€’ Mean 𝑀 (sometime also denoted as πœ‡π‘₯ ) of the sampling
distribution is equal to the population mean.
𝑀 = πœ‡π‘₯ = πœ‡
β€’ Standard deviation 𝑆𝐸 (sometime also denoted as 𝜎π‘₯ ) of
the sampling distribution is equal to the population
standard deviation divided by the square root of 𝑛.
β€’ 𝑆𝐸 is called standard error (smΔ›rodatná chyba).
𝜎
𝑆𝐸 = 𝜎π‘₯ =
𝑛
M and SE
Let’s have a look at our demonstration data:
1. Calculate population mean, population standard
deviation and standard error for n=3.
2. Take all our sample means and calculate their mean. It
should be close to the population mean.
3. Take all our sample means and calculate their standard
deviation. It should be close to the standard error.
M and SE
pop_mean <- mean(data.set2015)
pop_sd <- sd(data.set2015)*sqrt(19/20)
se <- pop_sd/sqrt(3)
sampl_mean <- mean(prumery2015)
sampl_sd <- sd(prumery2015)
Quiz
β€’ As the sample size increases, the standard error
β€’ increases
β€’ decreases
β€’ As the sample size increases, the shape of the sampling
distribution gets
β€’ skinnier
β€’ wider
Sampling distribution applet
parent distribution
sample data
sampling distributions
of selected statistics
http://onlinestatbook.com/stat_sim/sampling_dist/index.html