Download Suppose we have two distinct populations with

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
STA301 – Statistics and Probability
Lecture No 32:
•
Sampling Distribution of
•
Sampling Distribution of
p̂
X1  X 2
We discussed the mean and the standard deviation of the sampling distribution, and, towards the end of the lecture, we
consider the very important theorem known as the Central Limit Theorem.
Let us now consider the real-life application of this concept with the help of an example:
EXAMPLE:
A construction company has 310 employees who have an average annual salary of Rs.24,000.The standard
deviation of annual salaries is Rs.5,000.
Suppose that the employees of this company launch a demand that the government should institute a law by
which their average salary should be at least Rs. 24500, and, suppose that the government decides to check the validity
of this demand by drawing a random sample of 100 employees of this company, and acquiring information regarding
their present salaries. What is the probability that, in a random sample of 100 employees, the average salary will exceed
Rs.24,500 (so that the government decides that the demand of the employees of this company is unfounded, and hence
does not pay attention to the demand(although, in reality, it was justified))?
SOLUTION:
The sample size (n = 100) is large enough to assume that the sampling distribution ofX is approximately
normally distributed with the following mean and standard deviation:
and standard deviation
 x    Rs.24,000.
 N  n 5000 310  100
.

n N 1
100 310  1
 Rs. 412.20
x 
NOTE:
Here we have used finite population correction factor (fpc), because the sample size
n = 100 is greater than 5 percent of the population size N = 310.
Since X is approximately N(24000, 412.20), therefore
Z
X  x
x

X  24000
412.20
is approximately N(0, 1).We are required to evaluate P(X > 24,500).
Atx = 24,500, we find that
z
24500  24000
 1.21
412.20
24000
24500
0
1.21
X
Z
Using the table of areas under the standard normal curve, we find that the area between z = 0 and z = 1.21 is 0.3869.
Virtual University of Pakistan
Page 253
STA301 – Statistics and Probability
0.3869
24000
24500
0
1.21
X
Z
Hence,
P(X > 24,500)
= P(Z > 1.21)
= 0.5 – P(0 < Z < 1.21)
= 0.5 – 0.3869 = 0.1131.
0.3869
0.1131
24000
24500
0
1.21
X
Z
Hence, the chances are only 11% that in a random sample of 100 employees from this particular construction company,
the average salary will exceed Rs.24,500.In other words, the chances are 89% that, in such a sample, the average salary
will not exceed Rs.24,500.
Hence, the chances are considerably high that the government might pay attention to the employees’ demand.
SAMPLING DISTRIBUTION OF THE SAMPLE PROPORTION:
In this regard, the first point to be noted is that, whenever the elements of a population can be classified into
two categories, technically called “success” and “failure”, we may be interested in the proportion of “successes” in the
population. If X denotes the number of successes in the population, then the proportion of successes in the population is
given by
p
X
.
N
Similarly, if we draw a sample of size n from the population, the proportion of successes in the sample is given by
pˆ 
X
,
n
where X represents the number of successes in the sample.
It is interesting to note that X is a binomial random variable and the binomial parameter p is being called a proportion
of successes here. The sample proportion has different values in different samples. It is obviously a random variable
and has a probability distribution.
This probability distribution of the proportions of successes in all possible random samples of size n, is called the
sampling distribution of p̂.
Virtual University of Pakistan
Page 254
STA301 – Statistics and Probability
We illustrate this sampling distribution with the help of the following examples:
EXAMPLE-1:
A population consists of six values 1, 3, 6, 8, 9 and 12.Draw all possible samples of size n = 3 without replacement
from the population and find the proportion of even numbers in each sample. Construct the sampling distribution of
sample proportions and verify that
i)
 p̂  p
ii) Var p̂  
pq N  n
.
.
n N 1
SOLUTION:
The number of possible samples of size n = 3 that could be selected without replacement from a population of
size N is
 6
   20.
3
Let p̂ represent the proportion of even numbers in the sample.Then the 20 possible samples and the proportion of even
numbers are given as follows:
Sample
No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Sample
Data
1, 3, 6
1, 3, 8
1, 3, 9
1, 3, 12
1, 6, 8
1, 6, 9
1, 6, 12
1, 8, 9
1, 8, 12
1, 9, 12
3, 6, 8
3, 6, 9
3, 6, 12
3, 8, 9
3, 8, 12
3, 9, 12
6, 8, 9
6, 8, 12
6, 9, 12
8, 9, 12
Sample
Proportion p̂ 
1/3
1/3
0
1/3
2/3
1/3
2/3
1/3
2/3
1/3
2/3
1/3
2/3
1/3
2/3
1/3
2/3
1
2/3
2/3
The sampling distribution of sample proportion is given below:
Virtual University of Pakistan
Page 255
STA301 – Statistics and Probability
Sampling Distribution of
p̂ :
Probability
p̂ f p̂ 
p̂ 2 f p̂ 
0
1/3
2/3
1
No. of
Samples
1
9
9
1
1/20
9/20
9/20
1/20
0
3/20
6/20
1/20
0
1/20
4/20
1/20

20
1
10/20
6/20
p̂
f p̂
Now
 p̂   p̂ f p̂  
10
 0.5 , and
20
2p̂   p̂ 2 f p̂    p̂ f p̂ 2
2
2  10 
1

  
 0.05.
60  20 
20
To verify the given relations, we first calculate the population proportion p.Thus:
p
X
, where X represents
N
the number of even numbers in
the population.
In other words, p 
3
 0 .5 ,
6
Hence, we find that  p̂  0.5  p ,
pq N  n 0.25 6  3
.

.
n N 1
3 6 1
and
0.25

 0.05  Var p̂ 
5
Hence, two properties of the sampling
distribution of p̂ are verified.
 pˆ 
The sampling distribution of
p̂
pq N  n
,
n N 1
has the following important properties:
Virtual University of Pakistan
Page 256
STA301 – Statistics and Probability
PROPERTIES OF THE SAMPLING DISTRIBUTION OF
p̂ :
Property No. 1:
The mean of the sampling distribution of proportions, denoted by
proportion p, that is
 p̂ , is equal to the population
 pˆ  p.
Property No. 2:
The standard deviation of the sampling distribution of proportions, called the standard error of p̂ and
denoted by  ,
p̂
is given as:
a)
 p̂ 
pq
,
n
when the sampling is performed with replacement
b) when sampling is done without replacement from a finite population.
(As in the case of the sampling distribution of X,is known as the finite population correction factor (fpc).)
Nn
,
N 1
Property No. 3:
SHAPE OF THE DISTRIBUTION:
The sampling distribution of
is the binomial distribution. However, for sufficiently large sample sizes, the sampling distribution of
approximately normal.
As n  , the sampling distribution of
p̂
approaches normality:
is
p̂
 pˆ  p.
 p̂ 
pq
,
n
As a rule of thumb, the sampling distribution of
will be approximately normal whenever both np and nq are equal
to or greater than 5.Let us apply this concept to a real-world situation: p̂
EXAMPLE-2:
Ten percent of the 1-kilogram boxes of sugar in a large warehouse are underweight. Suppose a retailer buys a
random sample of 144 of these boxes. What is the probability that at least 5 percent of the sample boxes will be
underweight?
SOLUTION:
Virtual University of Pakistan
Page 257
STA301 – Statistics and Probability
Here the statistic is the sample proportion, The sample size (n = 144) is large enough to assume that the sample
proportion is approximately normally distributed with mean
Mean of the sampling distribution
of p̂ :
p̂  p  0.10 ,
and
Standard Error of p̂ :
p̂ 
0.100.90
pq

n

144
0.3
 0.025.
12
Therefore, the sampling distribution of is approximately N(0.10, 0.025)
And, hence:
Z
p̂   p̂
p̂

p̂  p
pq / n

p̂  0.10
0.025
is approximately N(0, 1).
We are required to find the probability that the proportion of underweight boxes in the sample is equal to or greater
than 5% i.e., we require
P pˆ  0.05.
In this regard, a very important point to be noted is that, just as we use a continuity correction of + ½ whenever we
consider the normal approximation to the binomially distributed random variable X, in this situation, since
p̂ 
X
,
n
therefore, we need to use the following continuity correction:
We need to use a continuity correction of

1
2n
in the case of
the sampling distribution of
p̂ .
Applying the continuity correction
in this problem, we have:


1

Pp̂  0.05 P p̂  0.05 
2144 

1 

 P p̂  0.05 

288 

 p̂  0.10 0.05  1 / 288  0.10 
 P


0.025
 0.025

 P Z  2.14
Virtual University of Pakistan
 P  2.14  Z  0  P0  Z  
 0.4838  0.5  0.9838
Page 258
STA301 – Statistics and Probability
0.4838
0.5
p̂
0.10
-2.14
0
Z
Hence, the probability that at least 5% of the sample boxes are under-weight is as high as 98% !
The sampling distributions of X and
pertain to the situation when we are drawing all possible samples of a
p̂
particular size from one particular population.
Next, we will discuss the case when we are dealing with all possible samples drawn from two populations, such that
the samples from the two populations are independent.
In this regard, we will consider the sampling distributions of X  X and
pˆ  pˆ :
1
We begin with the sampling distribution of
2
1
2
X1  X 2 :
SAMPLING DISTRIBUTION OF DIFFERENCES BETWEEN MEANS
Suppose we have two distinct populations with means
Let independent random samples of sizes
differences
x1  x 2
1 and  2 and variances 12 and  22
n1 and n 2
respectively.
be selected from the respective populations, and the
between the means of all possible pairs of samples be computed.
X1  X 2 can
sampling distribution of the differences of sample means X1  X 2 .
Then, a probability distribution of the differences
be obtained. Such a distribution is called the
We illustrate the sampling distribution of X  X with the help of the following example:
1
2
EXAMPLE:
Draw all possible random samples of size n1 = 2 with replacement from a finite population consisting of 4, 6,
8.Similarly, draw all possible random samples of size n = 2 with replacement from another finite population consisting
of 1, 2, 3.
a) Find the possible differences between the sample means of the two population.
b) Construct the sampling distribution of
Virtual University of Pakistan
X1  X 2
and compute its mean and variance.
Page 259
STA301 – Statistics and Probability
c) Verify that
x1  x 2  1  2 and 
2
x1  x2

 12
n1

 22
n1
.
SOLUTION:
Whenever we are sampling with replacement from a finite population, the total number of possible samples is
Nn (where N is the population size, and n is the sample size).Hence, in this example, there are (3)2 = 9 possible
samples which can be drawn with replacement from each population.
These two sets of samples and their means are given below:
From Population 1
From Population 2
Sample Sample
Sample Sample
x1
x2
No.
Value
No.
Value
1
4, 4
4
1
1, 1
1.0
2
4, 6
5
2
1, 2
1.5
3
4, 8
6
3
1, 3
2.0
4
6, 4
5
4
2, 1
1.5
5
6, 6
6
5
2, 2
2.0
6
6, 8
7
6
2, 3
2.5
7
8, 4
6
7
3, 1
2.0
8
8, 6
7
8
3, 2
2.5
9
8, 8
8
9
3, 3
3.0
a) Since there are 9 samples from the first population as well as 9 from the second, hence, there are 81 possible
combinations of x1 andx2 .
The 81 possible differencesx1 –x2 are presented in the following table:
x2
x2
4
3.0
2.5
2.0
2.5
2.0
1.5
2.0
1.0
1.0
5
4.0
3.5
3.0
3.5
3.0
2.5
3.0
2.5
2.0
6
5
6
7
1.0
5.0 4.0 5.0 6.0
1.5
4.5 3.5 4.5 5.5
2.0
4.0 3.0 4.0 5.0
1.5
4.5 3.5 4.5 5.5
2.0
4.0 3.0 4.0 5.0
2.5
3.5 2.5 3.5 4.5
2.0
4.0 3.0 4.0 5.0
2.5
3.5 2.5 3.5 4.5
3.0
3.0 2.0 3.0 4.0
b)The sampling distribution ofX 1  X 2 is as follows:
6
5.0
4.5
4.0
4.5
4.0
3.5
4.0
3.5
3.0
7
6.0
5.5
5.0
5.5
5.0
4.5
5.0
4.5
4.0
8
7.0
6.5
6.0
6.5
6.0
5.5
6.0
5.5
5.0
Probability
x1  x 2
Tally
d
f
f x 1  x 2 
df (d)
d2 f(d)
 f d 
1.0
|
1
1/81
1/81
1.0/81
1.5
||
2
2/81
3/81
4.5/81
Virtual University of Pakistan
2.0
||||
5
5/81
10/81
20.0/81
2.5
|||| |
6
6/81
15/81
37.5/81
Page 260
STA301 – Statistics and Probability
Thus the mean and the variance are
 x  x   x1  x 2  f x1  x 2 
1
2
  df d  
324
 4 , and
81
2x1 x 2   d 2f d    df d 2
2
1431  324 
53
5


   16   1.67
81  81 
3
3
c) In order to verify the properties of the sampling distribution of
variance of the first population:
The mean and standard deviation of the first population are:
1 
12 
X1  X 2
we first need to compute the mean and
468
 6 , and
3
4  62  6  62  8  62
3
8
 .
3
12  22 8 1 2 1

 .  .
n1 n 2 3 2 3 2

4 1 5
 
3 3 3
 1.67
 2
Virtual University of Pakistan x1  x 2
Page 261
STA301 – Statistics and Probability
And
The mean and variance of the second population are:
2 
22
1 2  3
 2 , and
3

1  22  2  22  3  22

3
2
 .
3
Now  x1  x2  4  6  2  1   2 , and
12  22 8 1 2 1

 .  .
n1 n 2 3 2 3 2

4 1 5
 
3 3 3
 1.67
  2x1  x 2
Hence, two properties of the sampling distribution of
differences
X1  X 2
X 1  X 2 are satisfied. The sampling distribution of the
has the following properties:
PROPERTIES OF THE SAMPLING DISTRIBUTION OF X1
Property No. 1:
The mean of the sampling distribution of
between population means, that is
 X2 :
X1  X 2 , denoted by  X
1 X2
,
is equal to the difference
 X1X2  1  2
Property No. 2:
In case of sampling with or without replacement from two infinite populations, the standard deviation of the sampling
distribution of
X1  X 2
(i.e. standard error of
The above expression for the Standard
X1  X 2 ), denoted by  X
12  22


X1 of
X 2 X1  X
Error
n1 2 nalso
2
1 X2
,
is given by
holds for finite population when sampling is
performed with replacement. In case of sampling without replacement from a finite population, the formula for the
standard error of
will be suitably modified.
Property No. 3:
Shape of the distribution:
Virtual University of Pakistan
Page 262
STA301 – Statistics and Probability
a) If the POPULATIONS are normally distributed, the sampling distribution of
sizes, will be normal with mean
1  2
and variance
X1  X 2 , regardless of sample
12  22
.

n1 n 2
In other words, the variable
Z
X
1
 X 2   1   2 
 12
n1

 22
n2
is normally distributed with zero mean and unit variance.
b) If the POPULATIONS are non-normal and if both sample sizes are large, (i.e., greater than or equal to 30), then the
sampling distribution of the differences between means is approximately a normal distribution by the Central Limit
Theorem.
In this case too, the variable
Z
X
1
 X 2   1   2 
 12
n1

 22
n2
will be approximately normally distributed with mean zero and variance one.
Virtual University of Pakistan
Page 263