Download chapter5

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Statistical inference wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Gibbs sampling wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 5 Contents
I
The Concept of a Sampling Distribution
I
Properties of Sampling Distributions: Unbiasedness and
Minimum Variance
I
The Sample Distribution of the Sample Mean and the
Central Limit Theorem
I
The Sampling Distribution of the Sample Proportion
Models for the population of a variable
I
We have two equivalent ways of describing a population
I
Through relative frequency
I
Through the probability distribution of a sample drawn at
random.
This is called the Probability Model of the population
Table: Population proportions
values
0
2
3
4
proportion
.1
.4
.3
.2
I
Suppose we draw many, say 1000 samples from the
population
I
At the end look at the proportion of 4,3,2,0 in the 1000
samples
I
What would these proportions be?
I
The proportions would be close to the population
proportions
simulate1sample1
simulate1sample2
simulate1samplenormal
I
A variable of interest in a population has a distribution,
described by a histogram.
The population mean is µ and s.d σ
I
The probability distribution of a sample X from the
population is another description of the population
E(X ) = µ and σX = σ
I
If a population is constructed by taking one sample, many
times, then, this population will have a distribution close to
the original population, and mean and s.d close to µ and σ
Table: Population proportions
values
3
4
5
6
7
proportion
3
16
1
4
1
8
1
4
3
16
√
I
µ = 5, σ =
2
I
If we draw a sample 62 % of these will be within µ ± 1
I
Suppose that we draw two independent samples and take
their average X̄
I
What percentage of these will be within one µ ± 1 ?
I
To answer this, we need to calculate the distribution of the
sample
I
The possible values of the sum are: 6,7,8,9,10,11,12,13,14
I
Prob of getting 6 = P(3,3,) =
I
Prob of getting 7 = P(3,4)+ P(4,3) =
I
Prob of getting 8 = P(3,5)+ P(4,4)+P(5,3) =
I
and so on
3 3
16 16
3 1
16 4
+
3 1
16 4
3 1
16 8
+
11
44
+
3 1
16 8
I
Tedious computation shows µX̄ = 5, σX̄ = 1
I
X̄ still has the original µ = 5 as center
I
The s.d has decreased
I
What if we take 25 samples and compute the mean X̄ ?
I
Too tedious to calculate so we will simulate to get an
approximation
I
Simulation: take 25 samples compute mean. Repeat this
process many times and create a population of X̄ s. Find
the mean,s.d of this population.
simulate 25 samples
I
Suppose we have a population which is N(5, 2). Then
roughly 68% of the population will be within 5 ± 2.
I
So if we take many samples, roughly 68% of these
samples will lie within one s.d. of the mean
I
Suppose we take four samples, calculate their average and
repeat this process many times.
I
we now have a population of averages of four samples
from the population.
I
Question: What percentage of this will be within 5 ± 2?
simulate 4 samples
I
For X̄ the mean is (close to) 5 and s.d is approximately 1.
I
So 95% of the sample means would be within 5 ± 2
simulate 25samples
I
For X̄ the mean is (close to) 5 and s.d is approximately .4.
I
So 99.5% of the sample means would be within 5 ± 2
Sampling Distribution of mean
I
From a normal population draw n independent samples.
Let X̄ be the average of the n samples
I
Since the samples are random, so is X̄ . Hence X̄ is a
random variable
I
The distribution of X̄ is called the Sampling Distribution of
X̄
Statistic and parameters
I
Quantities calculated for the whole population are called
parameters.
I
Compute the average,median , standard deviation,
percentiles of a population
I
we generally denote the population mean byµ, s.d byσ etc
I
Typically these population quantities are not known and we
use the analogous quantities computed from the sample to
estimate these
Statistic and parameters
I
quantities calculated from the sample are called ‘ statistic‘
I
Examples, mean of a sample, median of a sample, s.d of a
sample
I
Not surprisingly, the sample mean is used as an estimate
of the population mean, and the sample s.d is used as an
estimate of population s.d
I
These estimates satisfy some nice properties,
unbiasedness and so on.We will not get into it
Properties of the Sampling
Distribution of x
1. Mean of the sampling distribution equals mean of
sampled population , that is,
 x  E x   .
2. Standard deviation of the sampling distribution
equals
Standard deviation of sampled population
Square root of sample size
That is,  x 

n
.
Standard Error of the Mean
The standard deviation isx often referred to
as the standard error of the mean.
I
If a random sample of n observations is selected from a
population with a normal distribution, the sampling
distribution of X̄ will be a normal distribution
I
If the population is N(µ, σ), then X̄ will be N(µ, √σn )
Central Limit theorem
Consider a random sample of n observations selected from a
population (any probability distribution) with meanµand
standard deviation σ. Then, when n is sufficiently large, the
sampling distribution of X̄ will be approximately a normal
distribution with mean µ and standard deviation
√σ .
n
The larger the sample size, the better will be the normal
approximation to the sampling distribution of X̄
Problems 21,24,29,68,74
problem 21
I
n = 100, µ = 30, σ = 16
I
µX̄ = 30, σX̄ =
I
P(X̄ > 28) = P(z >
I
P(22.1 < X̄ < 26.8) = P( 22.1−30
<Z <
1.6
I
= P(−4.9 < Z < −2) = 0.228
I
Part c and d are similar
√16
100
= 1.6
28−30
1.6 )
= P(Z > −1.25) = 0.8944
26.8−30
1.6 )
Problem 24
I
µ = 96850 σ = 30, 000 n = 50
I
µX̄ = 96850 σX̄ =
I
Approximately Normal with mean = 96850, s.d =4242.64
I
z score of x̄ = 89500 is
I
P(X̄ > 89500) = P(Z > −1.73) = .9582
30000
√
50
= 4242.64
89500−96850
4242.64
= −1.73
Problem 29
I
µ = .53 σ = .193 n = 50
I
µX̄ = .53 σX̄ =
I
Approximately Normal with mean 53 and s.d .0273
I
P(X̄ > .58) = P(Z >
I
The z value of .59, before tensioning is
.193
√
50
after tensioning is
I
= .0273
.58−.53
.0273 )
.59−.58
.0273
= P(Z > 1.832) = .0335
.59−.53
.0273
= 2.2 and
= 0.37
the z-value before tensioning is much larger so the sample
came after tensioning
Problem 68
I
n = 344
x̄ = 19.1
σ=6
I
Approximately normal with s.d =
I
µ = 18.5.P(X̄ > 19.1) = P(Z >
I
If µ = 19.5P(X̄ > 19.1) = P(Z >
I
µ = 19.1
I
Less than 19.1. If not, P(X̄ > 19.1) would be more than .5
√6
344
= .3235
19.1−18.5
.3235 )
= .0322
19.1−19.5
.3235 )
= .8925
Problem 74
I
(a) µ = 157,
σ = 3 n = 40 µX̄ = 157 σX̄ =
√3
40
=
.74 X̄ = 157 − 1.3 = 155.7
−1.3
.74 )
I
P(X̄ < 155.7) = P(Z <
I
(b) µ = 156, σ = 3, more likely. If mean is 158, less likely
I
(c) µ = 157, σ = 2, less likely. If σ = 6, more likely
= .0031
I
Let X be the number of successes in n independent trials
with p - the probability of success in each trial
I
We know that X is Bin(n,p)
I X
n
is the proportion of success in n trials
I X
n
is usually denoted by p̂. This is because p̂ serves as an
estimate of p
Sampling distribution of sample proportion
I
µp̂ = E(p̂) = p
This follows from E(X ) = np
p(1−p)
n
I
V (p̂) =
I
Again follows from V (X ) = np(1 − p) for a binomial(n,p)
q
s.d(p̂) = p(1−p)
. We write this as
n
r
σp̂ =
p(1 − p)
n
Sampling distribution of sample proportion
I
The exact sampling distribution of p̂ is quite messy,
especially if n is large
I
The normal approximation to binomial helps us to
approximate the sampling distribution by a normal
I
If n is large, the sampling distribution of p̂ is approximately
Normal with
r
µ = µp̂ = p and σ = σp̂ =
p(1 − p)
n
problems 39,43,46,69
Problem 39
I
X is Binomial with n =250, p =.85
I
X̂ =
I
σX̂ =
I
So p̂ is approximately normal with mean =.85 and s.d =
X
n
q
E(p̂) = p = .85
p(1−p)
n
=
q
.85∗.15
250
= .0226
.0226
I
P(p̂ < .9) = P(Z <
.9−.85
.0226 )
= .9864
Problem 43
I
X is Binomial with n = 1000, p =.67
I
X̂ =
I
σX̂ =
I
So p̂ is approximately normal with mean =.67 and s.d =
X
n
q
E(p̂) = p = .67
p(1−p)
n
=
q
.67∗.33
1000
= .0149
.0149
I
P(p̂ < .75) = P(Z <
I
P(p̂ > .5) = P(Z >
.75−.67
.0149 )
.5−.67
.0149 )
=1
=1
Problem 46
I
For High IQ, p = .44; Average IQ, p = .26; Low IQ , p = .14
I
X is Bin( 500,.44), Find P(X > 150)
I
Two ways of doing this
I
Normal approximation to Binomial:
P(Z >
150.5−500∗.44
√
)
500∗.44∗.56
=1
150
500 )
= P(Z > √(150/500)−.44 )
I
P(X > 150) = P(p̂ >
I
This is same as the normal approximation but without the
continuity correction
(.44∗.53)/500)
Problem 69
I
X is Binomial with n = 250, p =.2
I
X̂ =
X
n
q
E(p̂) = p = .2
q
p(1−p)
.2∗.8
=
n
250 = .0253
I
σX̂ =
I
E(p̂ ± 2σp̂ ) = .2 ± (2 ∗ .0253) = (.1494, .2596)
I
So p̂ is approx. normal(.2 ,.0253)
I
P(.1494 < p̂ < .2596) = P( .1494−.2
.0253 < Z <
I
So roughly 95% of the samples would have p̂ that fall in the
interval (.1494,.2596)
.75−.67
.0149 )
= .954