Download Probability - WordPress.com

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
STA301 – Statistics and Probability
Lecture No 31:

Sampling Distribution of
X

Mean and Standard Deviation of the Sampling Distribution of X
Central Limit Theorem
In today’s lecture, we begin the third and last part of this course, i.e. INFERENTIAL STATISTICS --- that branch of
Statistics which enables us to to draw conclusions or inferences about various phenomena on the basis of real data
collected on sample basis.
In this regard, the first point to be noted is that statistical inference can be divided into two main branches --estimation, and hypothesis-testing.
Estimation itself can be further divided into two branches --- point estimation, and interval estimation
Statistical Inference
Estimation
Point
Estimation
Hypothesis
Testing
Interval
Estimation
The second important point is that the concept of sampling distributions forms the basis for both estimation and
hypothesis-testing,
SAMPLING DISTRIBUTION:
The probability distribution of any statistic (such as the mean, the standard deviation, the proportion of successes in a
sample, etc.) is known as its sampling distribution.
In this regard, the first point to be noted is that there are two ways of sampling --- sampling with replacement, and
sampling without replacement.
In case of a finite population containing N elements, the total number of possible samples of size n that can be drawn
from this population with replacement is Nn .
In case of a finite population containing N elements, the total number of possible samples of size n that can be drawn
from this population without replacement is
We begin with the sampling distribution of X:
N
We illustrate the concept of the sampling distribution of 
 n  .with the help of the following example:
EXAMPLE:
 
Let us examine the case of an annual Ministry of Transport test to which all cars, irrespective of age, have to be
submitted. The test looks for faulty breaks, steering, lights and suspension, and it is discovered after the first year that
approximately the same number of cars have 0, 1, 2, 3, or 4 faults.
The above situation is equivalent to the following:
Let X denote the number of faults in a car.
Virtual University of Pakistan
Page 245
STA301 – Statistics and Probability
Then X can take the values 0, 1, 2, 3, and 4,
and the probability of each of these X values is 1/5.
Hence, we have the following probability distribution:
No. of
Faulty Items
(X)
0
1
2
3
4
Total
Probability
f(x)
1/5
1/5
1/5
1/5
1/5
1
In order to compute the mean and standard deviation of this probability distribution, we carry out the
following computations:
No. of
Faulty Items
(x)
0
1
2
3
4
Total
Probability
f(x)
x f(x)
x2 f(x)
1/5
1/5
1/5
1/5
1/5
1
0
1/5
2/5
3/5
4/5
10/5=2
0
1/5
4/5
9/5
16/5
30/5=6
MEAN AND VARIANCE OF THE POPULATION DISTRIBUTION:
  E X    xf x  2
 2  Var  X   E  X 2  E  X 2
  x 2 f x    x f x 
2
 6  22  6  4  2
Practically speaking, only a sample of the cars will be tested at any one occasion, and, as such, we are interested in
considering the results that would be obtained if a sample of vehicles is tested.
Let us consider the situation when only two cars are tested after being selected at the roadside by a mobile testing
station.
The following table gives all the possible situations:
NO. OF FAULTY ITEMS
Second Car
First Car
0
1
2
3
4
Virtual University of Pakistan
0
1
2
3
4
(0,0)
(1,0)
(2,0)
(3,0)
(4,0)
(0,1)
(1,1)
(2,1)
(3,1)
(4,1)
(0,2)
(1,2)
(2,2)
(3,2)
(4,2)
(0,3)
(1,3)
(2,3)
(3,3)
(4,3)
(0,4)
(1,4)
(2,4)
(3,4)
(4,4)
Page 246
STA301 – Statistics and Probability
The above situation is equivalent to drawing all possible samples of size 2 from this probability distribution (i.e. the
population) WITH REPLACEMENT. From the above list of 25 samples, we can work out all the possible sample
means. These are indicated in the following table:
SAMPLE MEANS
Second Car
First Car
0
1
2.
3
4
0
1
2
3
4
0.0
0.5
1.0
1.5
2.0
0.5
1.0
1.5
2.0
2.5
1.0
1.5
2.0
2.5
3.0
1.5
2.0
2.5
3.0
3.5
2.0
2.5
3.0
3.5
4.0
It is immediately evident that some of these possible samples mean occur several times.
In view of this, it would seem reasonable and sensible to construct a frequency distribution from the sample means.
This is given in the following table:
Sample Mean
x
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Total
No. of Samples
f
1
2
3
4
5
4
3
2
1
25
If we divide each of the above frequencies by the total frequency 25, we obtain the probabilities of the various values
ofX.(This is so because every one of the 25 possible situations is equally likely to occur, and hence the probabilities of
the various possible values ofX can be computed using the classical definition of probability i.e. m/n --- number of
favorable outcomes divided by total number of possible outcomes.)
Hence, we obtain the following probability distribution:
Sample Mean
x
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Total
No. of Samples
f
1
2
3
4
5
4
3
2
1
25
Probability
P(X =x)
1/25
2/25
3/25
4/25
5/25
4/25
3/25
2/25
1/25
25/25=1
The above is referred to as the SAMPLING DISTRIBUTION of the mean.
Virtual University of Pakistan
Page 247
STA301 – Statistics and Probability
The visual picture of the sampling distribution is as follows:
Sampling Distribution ofX for n = 2
P x 
5/25
4/25
3/25
2/25
1/25
X
0
0. 0.
1.
1. 2.0 2.5 3. 3.5 4.
Next, we wish to compute
0
5the mean 0and standard
5 deviation of this distribution.
0
0
As we are already aware, for the probability distribution of a random variable X, the mean is given by
 = E(X) = x f(x) and the variance is given by2 = Var(X) = E(X2) - [E(X)]2
The point to be noted is that, in case of the sampling distribution of X, our random variable is not X butX.
Hence, the mean and variance of our sampling distribution are given by
MEAN AND VARIANCE OF THE SAMPLING DISTRIBUTION OF X:
 x  E X    x f x 
 2 x  Var X   E X   E X 
2
2
  x 2 f x    x f x 
2
The square root of the variance is the standard deviation, and the standard deviation of a sampling distribution is termed
as its standard error. In order to find the mean and standard error of the sampling distribution of X in this example, we
carry out the following computations:
In order to find the mean and standard error of the sampling distribution of X in this example, we carry out
the following computations:
Sample Mean
x
Probability
f(x)=P(X =x)
x f(x)
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Total
1/25
2/25
3/25
4/25
5/25
4/25
3/25
2/25
1/25
25/25=1
0
1/25
3/25
6/25
10/25
10/25
9/25
7/25
4/25
50/25=2
(x)2 f(x)
0
1/50
6/50
18/50
40/50
50/50
54/50
49/50
32/50
250/50=5
Hence, in this example, we have:
 x  E X    x f x 
 50 / 25  2
Virtual University of Pakistan
Page 248
STA301 – Statistics and Probability
And
 2 x  Var X   E X   E X 
2
2
  x 2 f x    x f x 
2
 5  22  5  4  1
 x   2x  1  1
These computations lead to the following two very important properties of the sampling distribution of X
Property No.1:
In the case of sampling with replacement as well as in the case of sampling without replacement, we have:
x  
In this example:
 2
and
x  2
Hence
x  
Property No.2:
In case of sampling with replacement:
x 

n
In this example:
  2


n

2
2
and  x  1
1
Hence  x 

n
NOTE:
In case of sampling without replacement from a finite population:
x 
The factor

n
N n
N 1
N n
N 1
is known as the finite population correction (fpc).
The point to be noted is that, if the sample size n is much smaller than the population size N, then is approximately
equal to 1, and, as such, the fpc is not required.
Hence, in sampling from a finite population, we apply the fpc only if the sample size is greater than 5% of the
population size. Next, we consider the shape of the sampling distribution of X.
Virtual University of Pakistan
Page 249
STA301 – Statistics and Probability
As indicated by the line chart, the above sampling distribution is absolutely symmetric and triangular. But let
us consider what will happen to the shape of the sampling distribution with if the sample size is increased. If in the car
tests instead of taking samples of 2 we had taken all possible samples of size 3, our sampling distribution would contain
53 = 125 sample means, and it would be in the following form:
SAMPLING DISTRIBUTION
FOR SAMPLES OF SIZE 3
x
0.00
0.33
0.67
1.00
1.33
1.67
2.00
2.33
2.67
3.00
3.33
3.67
4.00
No. of Samples
1
3
6
10
15
18
19
18
15
10
6
3
1
125
f(x)
1/125
3/125
6/125
10/125
15/125
18/125
19/125
18/125
15/125
10/125
6/125
3/125
1/125
1
The graph of this distribution is as follows:
Sampling Distribution ofX for n = 3
P x 
20/125
16/125
12/125
8/125
4/125
X
0
0. 0. 0. 1. 1. 1. 2. 2. 2. 3. 3. 3. 4.
00 33 67 00 33 67 00 33 67 00 33 67 00
If in the car tests instead of taking samples of 2 we had taken all possible samples of size 4, our sampling distributions
would contain 54 = 625 sample means, and it would be in the following form:
SAMPLING DISTRIBUTION
FOR SAMPLES OF SIZE 4
No. of Samples
f(x)
x
Virtual University of Pakistan
0.00
1
1/625
0.25
4
4/625
Page 250
STA301 – Statistics and Probability
The graph of this distribution is as follows:
Sampling Distribution ofX for n = 4
P x 
100/625
80/625
60/625
40/625
20/625
X
0
0. 0. 0. 0. 1. 1. 1. 1. 2. 2. 2. 2. 3. 3. 3. 3. 4.
00 25 50 75 00 25 50 75 00 25 50 75 00 25 50 75 00
As in the case of the sampling distribution of X based on samples of size 2, each of these two distributions has a mean
of 2 defective items. It is clear from the above figures that as larger samples are taken, the shape of the sampling
distribution undergoes discernible changes.
In all three cases the line charts are symmetrical, but as the sample size increases, the overall configuration
changes from a triangular distribution to a bell-shaped distribution. When relatively large samples are taken, this bellshaped distribution assumes the form of a ‘normal’ distribution (also called the ‘Gaussian’ distribution), and this
Virtual University of Pakistan
Page 251
STA301 – Statistics and Probability
happens irrespective of the form of the parent population. (For example, in the problem currently under consideration,
the population of defective items in a car is rectangular.)
This leads us to the following fundamentally important theorem:
CENTRAL LIMIT THEOREM:
The theorem states that:
“If a variable X from a population has mean  and finite variance 2, then the sampling distribution of the
sample meanX approaches a normal distribution with mean  and variance 2/n as the sample size n approaches
infinity.”
As n  , the sampling distribution ofX approaches normality:
X
x  
x 

n
Due to the Central Limit Theorem, the normal distribution has found a central place in the theory of statistical
inference.(Since, in many situations, the sample is large enough for our sampling distribution to be approximately
normal, therefore we can utilize the mathematical properties of the normal distribution to draw inferences about the
variable of interest.)
The rule of thumb in this regard is that if the sample size, n, is greater than or equal to 30, then we can
assume that the sampling distribution of X is approximately normally distributed.
On the other hand:
If the POPULATION sampled is normally distributed, then the sampling distribution of X will also be
normal regardless of sample size. In other words, X will be normally distributed with mean  and variance 2/n.
Virtual University of Pakistan
Page 252