Download Sampling Distributions—Statistics as RV`s Sample values are

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Sampling Distributions—Statistics as RV’s
Sampling Distributions
¾ Sample values are measurements or observations of
RVs Î the value for a sample statistic will vary in
a random manner from sample to sample
New Definition
Sampling distribution (of a sample statistic)---the
distribution of possible values of a statistic for repeated
samples of the same size from a population
¾ Sample statistics are RVs because different samples
can lead to different values for the sample statistics
1
2
Statistical Inference
Want to make conclusions about population parameters
on the basis of sample statistics.
Min:
1st Qu.:
Mean:
Median:
3rd Qu.:
Max:
Total N:
NA's :
Std Dev.:
80
¾ Confidence Intervals—interval of values that the
researcher is fairly certain will cover the true,
unknown value of the population parameter
60
40
¾ Hypothesis Tests (significance testing)—uses
sample data to attempt to reject a hypothesis about
the population
0.00000
78.25000
81.00518
84.00000
91.00000
99.00000
198.00000
4.00000
18.34231
20
0
0
11
22
33
3
66
77
88
99
4
n = 10
mean30 mean50
84.45
82.76
84.30
82.22
82.90
82.06
Sample 4
Sample 5
Sample 6
79.90
86.40
80.40
73.15
81.10
82.61
82.14
80.31
81.47
81.06
78.51
82.52
0
83.05
73.16
78.68
50
50
mean20
86.89
77.40
85.90
0
mean10
Sample 1
Sample 2
Sample 3
n = 20
100
100
SD Midterm Score = 18.34
150
150
200
Mean Midterm Score
= 81.01
44
55
Midterm
60
70
80
90
65
70
75
80
85
90
mean20.out[1:1000]
n = 50
100
150
n = 30
0
0
50
50
100
150
200
250
mean10.out[1:1000]
70
5
75
80
mean30.out[1:1000]
85
74
76
78
80
82
84
mean50.out[1:1000]
86
88
6
Start with what distribution?
300
300
200
n = 20
50 100
0.10
0.08
0
0
50 100
200
n = 10
60
70
80
90
100
50
60
70
80
90
100
mean20.out[1:1000]
300
300
mean10.out[1:1000]
0.06
0.04
n = 50
200
200
n = 30
p(x)
50
50 100
0.00
0
0
50 100
0.02
50
60
70
80
90
100
50
60
mean30.out[1:1000]
70
80
90
0
100
2
4
6
8
10
x
7
mean50.out[1:1000]
8
1000 observations from Lognormal(0,1)
Sampling Distributions for means from U(0,10)
0.3
0.4
n=5
n = 10
X ~ Lognormal ( µ ,σ ) ⇒
500
0.3
Y = log ( X ) ~ N ( µ ,σ )
0.2
0.2
0.1
400
0.1
0.0
0.0 1.3 2.6 3.9 5.1 6.4 7.7 9.0
0.0
0.0
0.6
0.8
1.3
2.6
3.9
5.1
6.4
7.7
300
9.0
200
0.5
n = 20
100
0.4
0.3
0.2
0.2
0.1
0.0
0.0
n = 30
0.6
0.4
1.3
2.6
3.9
5.1
6.4
7.7
9.0
0.0
0.0
n=5
0
0.0
1.3
2.6
3.9
5.1
6.4
7.7
9.0
0.3
4.5
6.8
9.0
11.3 13.5 15.8 18.0 20.3 22.5
10
Sampling Distributions
n=10
Many common, practical problems involve estimating
mean values from samples; interest is really in making
an inference about the mean, µ , of some population
0.6
0.4
2.3
9
0.4
0.2
0.2
0.1
0.0
0.0 0.9 1.9 2.8 3.8 4.7 5.7 6.6 7.6 8.5
0.8
0.6
What do we know: the sample mean, x , is often a
good estimator of µ
0.0
0.0 0.9 1.9 2.8 3.8 4.7 5.7 6.6 7.6 8.5
n=20
n=30
Example: Midterm exam scores—we will consider the
class data to represent the population
0.8
0.6
0.4
0.4
0.2
0.2
0.0
0.0 0.9 1.9 2.8 3.8 4.7 5.7 6.6 7.6 8.5
0.0
0.0 0.9 1.9 2.8 3.8 4.7 5.7 6.6 7.6 8.5
Consider histograms for
50 samples
11
x based on n = 10, 20, 30 and
12
}
= 81.01
We are using parameter designators since we are
taking the full class data to be the population
the normal probability distribution approximates the computergenerated sampling distribution very well
82.22
82.76
82.90
80.86
80.71
81.10
80.95
0.04
60
70
80
90
100
70
90
100
90
100
0.15
0.12
0.10
0.0
0.05
0.04
µ = 81.01
80
mean20.out[1:1000]
0.0
60
13
70
80
90
100
60
mean30.out[1:1000]
70
80
mean50.out[1:1000]
14
Sampling Distributions
σ = 18.34
Properties of the sampling distribution of x
Sample 1
mean10 mean20 mean30 mean50
86.89
83.05
84.45
82.22
Sample 2
Mean of Sampling
Distribution based on 1000
draws of size n
SD of Sampling
Distribution based on 1000
draws of size n
Sigma/sqrt(n)
77.40
73.16
82.76
82.90
80.86
80.71
81.10
80.95
5.78
5.80
4.13
4.10
3.09
3.35
2.36
2.59
18.34 / 10
For the midterm scores example:
18 . 34
= 5 . 80
10
18 . 34
σx =
= 4 . 10
20
1. Mean of sampling distribution equals mean of
sampled population:
µx = E ( x ) = µ
2. Standard deviation of sampling distribution equals
Standard deviation of sampled population
Square root of sample size
σx =
σ
n
15
Sampling Distributions
σx =
60
mean10.out[1:1000]
0.08
Mean of Sampling
Distribution based on 1000
draws of size n
0.02
84.45
73.16
0.0
83.05
77.40
0.02
mean50
86.89
Sample 2
0.0
mean10 mean20 mean30
Sample 1
0.06
0.06
= 18.34
0.10
µ
σ
σx =
18 . 34
= 3 . 35
30
18.34
σx =
= 2.59
50
16
Central Limit Theorem (CLT)
Consider a random sample of n observations selected
from a population (any population) with mean µ
and standard deviation σ
Then, when n is sufficiently large, the sampling
distribution of x will be approximately a normal
distribution with mean µ x = µ and standard deviation
σx = σ n .
The larger the sample size, the better will be the normal
approximation to the sampling distribution of x
If X is normally distributed then x will be normally
distributed
17
18
Central Limit Theorem (CLT)
Z-scores
The sum of a random sample of n observations, ∑ x ,
will also possess a sampling distribution that is
approximately normal for large samples;
µ
∑x
= nµ; σ
∑x
Z≈
= nσ
x −µ
σ
where σ x =
σx
n
approaches
Z ⎯⎯⎯⎯
→ N ( 0 ,1) as n → ∞
If X ∼ N ( µ ,σ 2 ) then Z ∼ N ( 0 ,1)
How large is large?
The greater the skewness of the sampled population
distribution, the larger the sample size must be before
the normal distribution is an adequate approximation
for the sampling distribution of x ; for many sampled
population, sample sizes of n ≥ 30 will suffice.
We do not need to depend on the CLT to “induce”
normality; the normality is exact, not approximate
19
20
Related documents