Download • - WordPress.com

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
STA301 – Statistics and Probability
Lecture No. 24

Chebychev’s Inequality

Concept of Continuous Probability Distribution

Mathematical Expectation, Variance & Moments of a Continuous Probability Distribution
We begin with the discussion of the concept of the Chebychev’s Inequality in the case of a discrete probability
distribution
Chebychev’s Inequality:
If X is a random variable having mean  and variance 2 > 0, and k is any positive constant, then the
probability that a value of X falls within k standard deviations of the mean is at least
That is:
P  k  X    k   1 
1
,
k2
Alternatively, we may state Chebychev’s theorem as follow: Given the probability distribution of the random variable
X with mean  and standard deviation , the probability of the observing a value of X that differs the  by k or more
standard deviations cannot exceed 1/k2. As indicated earlier, this inequality is due to the Russian mathematician P.L.
Chebychev (1821-1894), and it provides a means of understanding how the standard deviation measures variability
about the mean of a random variable. It holds for all probability distributions having finite mean and variance.
Let us apply this concept to the example of the number of petals on the flowers of a particular species that we
considered earlier:
EXAMPLE:
If a biologist is interested in the number of petals on a particular flower, this number may take the values 3, 4,
5, 6, 7, 8, 9, and each one of these numbers will have its own probability
The probability distribution of the random variable X is:
No. of Petals
X
x1 = 3
x2 = 4
x3= 5
x4 = 6
x5 = 7
x6 = 8
x7 = 9
P(x)
0.05
0.10
0.20
0.30
0.25
0.075
0.025
1
The mean of this distribution is:
 = E(X) = XP(X) = 5.925  5.9
And the standard deviation of this distribution is:
=
S.D.X   36.925  5.9252
 36.925  35.106
 1.819  1.3
According to the Chebychev’s inequality, the probability is at least 1 - 1/22 = 1 - 1/4 = 3/4 = 0.75 that X will lie
between  - 2 and  + 2 i.e. between 5.9 - 2(1.3) and 5.9 + 2(1.3) i.e. between 3.3 and 8.5
Let us have another look at the probability distribution:
Virtual University of Pakistan
Page 185
STA301 – Statistics and Probability
No. of Petals
X
x1 = 3
x2 = 4
x3= 5
x4 = 6
x5 = 7
x6 = 8
x7 = 9
P(x)
0.05
0.10
0.20
0.30
0.25
0.075
0.025
1
According to this distribution, the probability that X lies between 3.3 and 8.5 is
0.10 + 0.20 + 0.30 + 0.25 + 0.075
= 0.925
which is greater than 0.75(AS indicated by the Chebychev’s inequality).
Finally, and most importantly, we will use the concepts in Chebychev's Rule and the Empirical Rule to build the
foundation for statistical inference-making. The method is illustrated in next example.
EXAMPLE:
Suppose you invest a fixed sum of money in each of five business ventures. Assume you know that 70% of
such ventures are successful, the outcomes of the ventures are independent of one another, and the probability
distribution for the number, x, of successful ventures out of five is:
x
P(x)
a) Find  = E(X).
Interpret the result.
b)Find

0
1
2
3
4
5
.002 .029 .132 .309 .360 .168

  E X   2 .
Interpret the result.
c) Graph P(x).
d) Locate  and the interval  + 2 on the graph. Use either Chebychev’s Rule or the Empirical Rule to approximate
the probability that x falls in this interval. Compare this result with the actual probability.
e) Would you expect to observe fewer than two successful ventures out of five?
SOLUTION:
a) Applying the formula,
= E(X)=xP(x)
= 0(.002)+1(.029) + 2(.132) + 3(.309) + 4.(.360) + 5(.168)
= 3.50
INTERPRETATION:
On average, the number of successful ventures out of five will equal 3.5. (It should be remembered that
this expected value has meaning only when the experiment – investing in five business ventures – is repeated a large
number of times.)
b) Now we calculate the variance of X:
We know that
2 = E[(X - )2] = (x - )2 P(x)
Hence, we will need to construct a column of x - :
x P(x) x-
0 .002 –3.5
.029 –2.5
Virtual University1of Pakistan
2 .132 –1.5
3 .309 –0.5
(x-)2
12.25
6.25
2.25
0.25
(x-)2P(x)
0.02
0.18
0.30
0.08
Page 186
STA301 – Statistics and Probability
Thus, the variance is 2 = 1.05 and the standard deviation is
   2  1.05  1.02
This value measures the spread of the probability distribution of X, the number of successful ventures out of five.
c) The graph of P(x) is shown in the following figure with the mean  and the interval
 + 2 =3.50+2(1.02)
=3.50+2.04
= (1.46, 5.54) shown on the graph.
p(x)
0.4
0.3
0.2
0.1
x
0
0
1
2
 + 2
(1.46)
3
4

5
 + 2
(5.54)
Note particularly that  = 3.5 locates the centre of the probability distribution. Since this distribution is a theoretical
relative frequency distribution that is moderately mound-shaped, we expect (from Chebychev’s Rule) at least 75% and,
more likely (from the Empirical Rule), approximately 95% of observed x values to fall in the interval  + 2 ------ that
is, between 1.46 and 5.54.
It can be seen from the above figure that the actual probability that X falls in the interval  + 2 includes the sum of
P(x) for the values
X = 2, X = 3, X = 4, and X = 5.
Virtual University of Pakistan
Page 187
STA301 – Statistics and Probability
p(x)
0.4
0.3
0.2
0.1
x
0
0
1
2
 + 2
(1.46)
3
4

5
 + 2
(5.54)
This probability is P(2) + P(3) + P(4) + P(5)
= .132 +.309 + .360 + .168
= .969.
Therefore, 96.9% of the probability distribution lies within 2 standard deviations of the mean. This percentage is
CONSISTENT with both the Chebychev’s rule and the Empirical Rule.
d) Fewer than two successful ventures out of five implies that x = 0 or x = 1.
Since both these values of x lie outside the interval  + 2, we know from the Empirical Rule that such a
result is unlikely (with approximate probability of only .05).
The exact probability, P(x < 1), is P(0) + P(1) = .002 + .029 = .031.
Consequently, in a single experiment where we invest in five business ventures, we would not expect to
observe fewer than two successful ones.
The key question:
What is the significance of the Chebychev’s Inequality and the Empirical Rule?
The answer to this question is that both these rules assist us in having a certain IDEA regarding amount
of data lying between the mean minus a certain number of standard deviations and mean plus that same number of
standard deviations.
Given any data-set, the moment we compute the mean and standard deviation, we HAVE an idea
regarding the two points (i.e. mean minus two standard deviations, and mean plus two standard deviations) between
which the BULK of our data lies.
If our data-set is hump-shaped, we obtain this idea through the Empirical Rule, and if we don’t have any
reason to believe that our data-set is hump-shaped, then we obtain this idea through the Chebychev’s Rule
Next, we begin the discussion of CONTINUOUS RANDOM VARIABLES.
In this regard, the first point to be noted is that up till now we have discussed discrete random variables – quantities that
are countable.
We now begin the discussion of CONTINUOUS RANDOM VARIABLES – quantities that are
measurable. As stated in the very first lecture, continuous variables result from measurement, and can therefore take
any value within a certain range. For example, the height of a normal Pakistani adult male may take any value between
5 feet 4 inches and 6 feet. The temperature at a place, the amount of rainfall, time to failure for an electronic system,
etc. are all examples of continuous random variable. Formally speaking, a continuous random variable can be defined
as follows:
CONTINUOUS RANDOM VARIABLE:
A random variable X is defined to be continuous if it can assume every possible value in an interval [a,
b], a < b, where a and b may be – and + respectively.
The function f(x) is called the probability density function, abbreviated to p.d.f., or simply density function of the
random variable X.
Virtual University of Pakistan
Page 188
STA301 – Statistics and Probability
A continuous probability distribution looks something like this:
f(x)
X
A p.d.f. has the following properties:
i)
f(x) > 0, for all x

ii)
 f x  dx  1
iii) The probability that X takes on a value in the interval [c, d], c < d is given by:

P(c < x < d)
d
 f x  dx
=
which is the area under the curve y = f(x)
c between X = c and X = d, as shown in the following figure:
f(x)
P(c < x < d)
c
d
The TOTAL area under the curve is 1.
In other words:
1) f(x) a non-negative function,
2) the integration takes place over all possible values of the random variable X between the specified limits, and
3) the probabilities are given by appropriate areas under the curve.
Since
k
P X  k    f x  dx  0,
k
It should therefore be noted that the probability of a continuous random variable X taking any particular value k is
always zero. That is why probability for a continuous random variable is measurable only over a given interval.
Virtual University of Pakistan
Page 189
STA301 – Statistics and Probability
Further, since for a continuous random variable X, P(X = x) = 0 for every x, the following four probabilities are
regarded as the same:
P(c < X < d), P(c < X < d),
P(c < X < d) and P(c < X d).
They may be different for a discrete random variable. The values (expressed as intervals) of a continuous random
variable and their associated probabilities can be expressed by means of a formula.
We now discuss the distribution function of a continuous random variable.
CONTINUOUS RANDOM VARIABLE:
A random variable X may also be defined as continuous if its distribution function F(x) is continuous and is
differentiable everywhere except at isolated points in the given range. In contrast with the graph of the distribution
function of a discrete variable, the graph of F(x) in the case of a continuous variable has no jumps or steps but is a
continuous function for all x-values, as shown in the following figure:
1
F(x)
F(a)
F(b)
0
X
Since F(x) is a non-decreasing function of x, we have
i) f(x) > 0,
x
ii) F x   f x dx , for all x.

The relationship between f(x) and F(x) is as follows:
f(x) is obtained by finding the derivative of F(x), i.e.
 
 
d F x 
 f x 
dx with the help of an example:
Let us now explain the above concepts
EXAMPLE:
a) Find the value of k so that the function f(x) defined as follows, may be a density function
f(x)
= kx, 0 < x < 2
= 0, elsewhere
b) Compute P(X = 1).
c) Compute P(X > 1).
d) Compute the distribution function F(x).
e)

P X  1/2

1/ 3  X  2 / 3
SOLUTION
a)
The function f(x) will be a density function, if
i)
f(x) > 0 for every x, and
Virtual University of Pakistan
Page 190
STA301 – Statistics and Probability
ii)

 f x  dx  1

The first condition is satisfied when k > 0.
The second condition will be satisfied, if

 f x  dx  1,

0

2
i.e. if 1   f x  dx   f x  dx   f x  dx

0
2


0
2
0
2
i.e. if 1   0 dx   kx dx   0 dx
2
 x
  0  2k
 2 
0
i.e. if 1  0  k
2
This gives k = 1/2
We had
f(x)
= kx, 0 < x < 2
= 0, elsewhere
and since we have obtained
k = 1/2, hence:
 2x ,
f x   
 0,
b)
for 0  x  2
elsewhere
Since f(x) is continuous probability function, thereforeP(X = 1) = 0.
c) P(X > 1) is obtained by computing the area under the curve (in this case, a straight line) between X=1 and X=2:
f(x)
1
f(x) = x|2
0
X
1
2
This area is obtained as follows:
Virtual University of Pakistan
Page 191
STA301 – Statistics and Probability
P(X > 1)
= area of shaded region
2
=  f x  dx
1
 
2
2
3
=  x2 dx  x4 
4
1
2
1
d) To compute the distribution function, we need to find:
x
F(x) = P(X < x) =  f x  dx

We do so step by step, as shown below:
For any x such that - < x < 0,
x
F(x) =  0 dx  0,

If 0 < x < 2, we have
x
Fx    0 dx     dx 

0 2 
0
x
   x4 ,
x
x2
4
2
0
and, finally, for x > 2 we have
0
2x
Fx    0 dx  
Hence  
02
F(x) = 0, for x < 0
=
2
dx   0 dx  1
0
x2
, for 0 < x < 2
4
=1,
for x > 2.
We will discuss the computation of the conditional probability

P X  1/2

1/ 3  X  2 / 3
Virtual University of Pakistan
Page 192