Download Drawing Inferences from Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Drawing Inferences from Data
Berghold, IMI, MUG
Estimation, standard error and
confidence interval
Berghold, IMI, MUG
Estimation
• The sample mean and the sample variance were used to
describe a typical value and the variation in the sample.
• We may similarly use the population mean, the expected
mean, and the population variance to describe the typical
value and the variation in the population.
• These values are often referred to as the theoretical values
and the sample mean and the sample variance are considered
as estimates of the analogous population quantities (with
certain properties: e.g. unbiased) –
using sample information to draw conclusions about the value
of a population parameter.
Berghold, IMI, MUG
Sampling
Sampling:
10 random samples out of the
same (normal distributed)
population;
each with sample size n = 100:
1.
Mean
SD
3.20
.565
2.
Mean
SD
3.20
.590
3.
Mean
SD
3.31
.486
4.
Mean
SD
3.27
.574
5.
Mean
SD
3.18
.542
6.
Mean
SD
3.31
.575
7.
Mean
SD
3.13
.606
8.
Mean
SD
3.11
.524
9.
Mean
SD
3.26
.648
10.
Mean
SD
3.32
.582
Berghold, IMI, MUG
Sampling distribution
The distribution of all possible sample means is
called the sampling distribution of the mean.
In general, the sampling distribution of any
statistic is the distribution of the values of the
statistic which would arise from all possible
samples.
Berghold, IMI, MUG
Distribution of x-values
Distribution of x-values
μ
Berghold, IMI, MUG
Sampling distribution
Given a population with mean μ and standard deviation σ, the
sampling distribution of the mean based on repeated random
samples of size n has the following properties:
• The expected value of the mean of the distribution of the
sample means is equal the population mean μ based on the
individual measurements.
• The expected value of the standard deviation of the means
of several samples is σ / n
• If the distribution in the population is Normal then the
sampling distribution of the mean is also Normal.
Berghold, IMI, MUG
Sampling distribution
Central limit theorem: the distribution of the
sample means is approximately normally
distributed, regardless of the shape of the original
population distribution as long as the samples are
large enough.
Berghold, IMI, MUG
Standard error of sample mean (SE)
Variability of sample means will have the following properties:
• It will be less among the means of large samples than small
samples.
• It will be less than the variability of the individual
observations in the population.
• It will increase with greater variability among the individual
values in the population.
• It is estimated by
s/ n
0.542
s
=
= 0.0542 (sample 5)
10
n
Berghold, IMI, MUG
Standard error of sample mean (SE)
It is a measure of the uncertainty
of a single sample mean as an estimate
of the population mean.
Berghold, IMI, MUG
Confidence Interval
To get an idea of the uncertainty associated with a
single sample estimate a (1-α)-Confidence
interval is constructed from the data of the sample.
95% confidence interval (σ is known):
σ
σ ⎞
⎛
≤ μ ≤ x + 1.96
P⎜ x − 1.96
⎟ = 1 − α = 95%
n
n⎠
⎝
Berghold, IMI, MUG
Confidence Interval
All intervals that meet the general requirement
P(lower limit ≤ „true parameter“ ≤ upper limit) = 1 - α
are called confidence interval with the certainty 1 - α .
Berghold, IMI, MUG
Confidence Interval
Frequency interpretation: if the experiment is
repeated a large number of times and a 95% confidence
interval is computed for each replication, then 95% of
these confidence intervals will contain the true value of
the unknown parameter.
95% confidence interval (σ is unknown, small samples):
s ⎞
s
⎛
≤ μ ≤ x + tn −1;1−α / 2
P⎜ x − tn −1;1−α / 2
⎟ = 1 − α = 95%
n
n⎠
⎝
Berghold, IMI, MUG
t-distribution („Student‘s t“)
The critical ratio, using s as an estimate of σ, defined as
x−μ
s/ n
is not normally distributed, but follows a t-distribution
with f=n-1 degrees of freedom (William Gosset, 1908)
The distribution is similar to the standard normal
distribution that it is symmetric with a mean 0, but its
standard deviation depends on a parameter called
degrees of freedom.
Berghold, IMI, MUG
t-distribution
Berghold, IMI, MUG
Example
The parameter PTT (partial thromboplastine time)
is assessed in a sample of 25 children. The mean is
42 sec, the standard deviation is 4 sec.
Calculate a 95% confidence interval for the
expected mean.
Berghold, IMI, MUG
Example
Assumptions: we assume that the PTT-values are normally
distributed and that σ is estimated by the standard deviation
s of the sample.
x − tn −1;(1−α / 2 )
s
s
≤ μ ≤ x + tn −1;(1−α / 2 )
n
n
4
4 ⎤
⎡
42
−
2
.
06
⋅
;
42
+
2
.
06
⋅
= [40.4 ; 43.7]
⎢
⎥
25 ⎦
25
⎣
Berghold, IMI, MUG
Points to Remember
• The level of significance α: for α = 1% the interval is
wider than for α = 5%.
• The sample size n: the estimation is more precise the
larger the sample size n is. To halve the width of the
confidence interval it requires the fourfold sample size.
Berghold, IMI, MUG
Points to Remember
• 95% - reference range:
x ± 2s
(in this interval are 95% of the observations)
• Standard error of the mean:
s
n
(tells us, about the uncertainty of the estimate of the mean)
• 95% - confidence interval:
s
⎡
⎤
P ⎢μ ∈ x ±
t n −1;1−α / 2 ⎥ = 0.95 (α = 5%)
n
⎣
⎦
(tells us, how often the "true" value will lie in this interval)
Berghold, IMI, MUG
Related documents