Download The sampling distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistical distributions
Lars Valter
Statistician
LARC and Unit for Health Analysis, County Council
of Östergötland
Statistical distributions
The basic 
A variable
• is able or likely to change or be changed
• has not always the same value (it is not a
constant)
• is able to vary between subjects
Statistical distributions
Variables
Numerical
(quantitative)
Discrete
Continuous
Categorical
(qualitative)
Ordinal
Nominal
Statistical distributions
Variable examples
• Number of children, Visits at PHC
(Numerical discrete)
• Weight, Blood pressure (Numerical continuous)
• Disease stage, Education (Categorical ordinal)
• Sex, Blood group, Education (Categorical nominal)
Statistical distributions
Example:
In a sample (13000) for a population survey
n=6997 responded. One of many questions
was ”What is your length?”
Statistical distributions
Example:
Statistical distributions
Example:
Statistical distributions
Statistical distributions can be
described in mathematical terms.
The normal distribution
𝑓𝑓 𝑥𝑥 =
1
2𝜋𝜋𝜎𝜎
2 ⁄2𝜎𝜎 2
−
𝑥𝑥−𝜇𝜇
𝑒𝑒
Statistical distributions
The two most important measures for
numerical variables are
• 𝜇𝜇: the mean (or expected value or
average), is a measure of the central
tendency of the distribution
• 𝜎𝜎: the standard deviation is a measure of
the dispersion of the distribution
Statistical distributions
σ
µ
In the example the mean length is 𝛍𝛍 = 𝟏𝟏𝟏𝟏𝟏𝟏. 𝟓𝟓
and the standard deviation is 𝝈𝝈 = 𝟗𝟗. 𝟑𝟑
Statistical distributions
Normal distributions
Two normal distributions with the same
standard deviation but different means
Three normal distributions with the same
mean but different standard deviations
Statistical distributions
The normal distribution is characterised
by its mean and standard deviation
All normal distribution can be be
transformed into the same standardised
normal distribution by the formula:
𝒙𝒙 − 𝝁𝝁
𝒛𝒛 =
𝝈𝝈
Statistical distributions
Standardised normal distribution (z)
𝝁𝝁 = 𝟎𝟎 𝒂𝒂𝒂𝒂𝒂𝒂 𝝈𝝈 = 𝟏𝟏
Standard deviations
Statistical distributions
Example: The variable age in a
population is normal distributed with
mean 𝝁𝝁 = 𝟓𝟓𝟓𝟓 and 𝝈𝝈 = 𝟏𝟏𝟏𝟏. Find the
proportion of the population above 65.
𝒙𝒙−𝝁𝝁
𝝈𝝈
𝟔𝟔𝟔𝟔−𝟓𝟓𝟓𝟓
𝟏𝟏𝟏𝟏
Calculate 𝒛𝒛 =
=
= 𝟏𝟏 and look
up the answer in a table for a
standardised normal distributed
variable
Statistical distributions
Statistical distributions
Statistical distributions
Other important quantitative and continuous
distributions
• Students t-distribution is characterised by
the degrees of freedom (n-1)
• Chi-2 distribution is used for categorical
variables (both ordinal and nominal). It is
characterised by its degrees of freedom
• The F-distribution. It is characterised by a
pair of degrees of freedom
Statistical distributions
A quantitative discrete variable
X=Number of doctors appointments
at PHC-centre (2013)
Number of appointments
0
1
2
3
4
5
6
...
Proportion of population
0.467 0.263 0.130 0.065 0.033 0.018 0.010 . . .
Statistical distributions
Mean=1.12 Standard deviation=1.60
Statistical distributions
All types (continuous, ordinal and
nominal) of variables can be
dichotomized into a binary variable.
Some variables are binary variables from
the start.
Statistical distributions
Example of a binary qualitative
variable in a population survey
What is your sex?
men
women
sex
Statistical distributions
Coding an binary variable with values 0
and 1 is very useful for statistical analysis
(also called a bernoulli variable)
Example:
0=Male, 1=Female,
The mean (π) of a binary variable coded 1
and 0 is the proportion of ones
The mean of the variable gender is 0.55
Statistical distributions
Dichotomizing the number of doctors
appointments into a binary variable
X=0 if no appointments
X=1 otherwise (or at least one appointment)
Statistical distributions
From population to sample
Population
Mean: μ
Standard deviation: σ
Sample
�
Mean: 𝒙𝒙
Standard deviation: s
Statistical distributions
A sample must be randomly drawn
This is the most important condition when
creating a sample
There is special cases of sampling
e.g. stratified sampling but there is still a random component
Statistical distributions
Example: Body temperature. Sampling
from a population of healthy people.
Mean:
μ = 37.0
Standard deviation:
σ = 0.5
Body temperature
Statistical distributions
A sample (n=25) from
the population
37.24 37.73 36.55 . . .
Statistical distributions
The sample size = n
The sample mean:
𝑥𝑥̅ =
∑ 𝑥𝑥 37.24 + 37.73 + 36.55 + ⋯
=
= 𝟑𝟑𝟑𝟑. 𝟎𝟎𝟎𝟎
𝑛𝑛
25
The sample standard deviation:
s=
∑ 𝑥𝑥 − 𝑥𝑥̅
𝑛𝑛 − 1
2
=
37.24 − 37.02
2
+ 37.23 − 37.02
24
2
+ 37.25 − 37.02
2
+⋯
Statistical distributions
𝑥𝑥̅ = 37.02
𝑠𝑠 = 0.38
𝑥𝑥̅ = 36.87
𝑠𝑠 = 0.50
Statistical distributions
𝑥𝑥̅ = 37.03
𝑥𝑥𝑠𝑠̅ ==0.60
37.03
𝑠𝑠 =0.60
𝑥𝑥̅ = 36.77
𝑠𝑠 = 0.50
𝑥𝑥̅ = 36.77
𝑠𝑠 =0.50
Statistical distributions
Statistical distributions
The sampling distribution
The theoretical distribution of sample means is a
fundamental concept for statistical analysis
The mean of the sampling distribution is 𝝁𝝁, the same as the
population mean
The standard deviation of the sampling distribution is called
the standard error and is
𝜎𝜎
𝑠𝑠𝑠𝑠 =
𝑛𝑛
Estimated from a sample,
𝑠𝑠
se =
𝑛𝑛
Statistical distributions
Population
The sampling distribution (n=25)
Mean: 𝜇𝜇 = 37
Standard deviation:
𝜎𝜎 = 0.5
Mean: 𝝁𝝁 = 𝟑𝟑𝟑𝟑
Standard deviation:
𝜎𝜎
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = 25 = 0.5
Statistical distributions
If the population is normally distributed the sampling
distribution is normal distributed with mean= 𝜇𝜇 and
𝜎𝜎
standard deviation=
𝑛𝑛
What if the population is not normally distributed?
• The mean and standard deviation of the sampling
𝜎𝜎
distribution are still 𝜇𝜇 and
𝑛𝑛
What about the shape of the sampling distribution?
• The fundamental Central Limit Theorem help us
Statistical distributions
The central limit theorem
It goes something like this:
If the population is normally distributed so is the
sampling distribution
If the population is symmetrical but not normally
distributed, the sampling distribution is approximately
normally distributed for a rather small sample size.
If the population is skew you need a rather large sample
for the sampling distribution to be approximately
normally distributed
Statistical distributions
Statistical distributions
A sample mean from a Bernoulli distributed
population will also be approximately normally
distributed if both
𝑛𝑛 � 𝜋𝜋 > 5 and 𝑛𝑛 � 1 − 𝜋𝜋 > 5
where 𝜋𝜋 is the proportion of ones in the population.
If you have a sample use p, the proportion of ones in
the sample, as an estimator of 𝜋𝜋.
Statistical distributions
A bernoulli distributed variable with 𝜋𝜋 = 0.2 have 𝜇𝜇 = 𝜋𝜋 and standard
deviation= 𝜋𝜋 � 1 − 𝜋𝜋
Population
Statistical distributions
The sampling distribution of the mean from a
bernoulli variable have mean= 𝜇𝜇 and standard
deviation (s.e.)=
𝜋𝜋=0.2 n=10
Mean=0.2
Standard error = 0.13
𝜋𝜋 1−𝜋𝜋
𝑛𝑛
𝜋𝜋=0.2 n=20
Mean=0.2
Standard error = 0.09
Statistical distributions
Some other discrete distributions
•
•
•
The binomial distribution: The sum of n bernoulli
distributed variables. Mean= 𝑛𝑛𝜋𝜋
The hypergeometric distribution: Limited
population. Mean= 𝑛𝑛𝜋𝜋
The poisson distribution: The number of events
during a time period. Mean = the number of events
during a time unit.
Related documents