Download 3.7 Continuous probability.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Randomness wikipedia , lookup

Transcript
3.7 Continuous probability.
So far we have been considering examples where the outcomes form a finite or countably infinite set. In
many situations it is more natural to model a situation where the outcomes could be any real number or
vectors of real numbers. In these situations we usually model probabilities by means of integrals.
Example 1. An bank is doing a study of the amount of time, T, between arrivals of successive
customers. Assume we are measuring time in minutes. They are interested in the probability that T
will have various values. If we imagine that T can be any non-negative real number, then the sample
space is the set of all non-negative real numbers, i.e. S = {t: t  0}. This is an uncountable set. In
situations such as this, the probability the T assumes any particular value may be 0, so we are interested
in the probability that the outcome lies in various intervals. For example, what is the probability that T
will be less than 2 and 3 minutes.
A common way to describe probabilities in situations such as this is by means of an integral. We try to find
b
a function f(t) such that the probability that T lies in any interval a  t  b is equal to 
 f(t) dt, i.e.
a
b
(1)
Pr{ a  t  b } =

 f(t) dt.
a
A function f(t) with this property is called a probability density function for the outcomes of experiment.
We can regard T as a random variable and f(t) is also called the probability density function for the random
variable T.
Since Pr{ T = a } = 0 and Pr{ T = b } = 0, one has Pr{ a < T  b }, Pr{ a  T < b }, and Pr{ a < T < b } all
equal to this integral. In order that the non-negativity property and normalization axiom hold (formulas
(1.10) and (1.11) in section 3.1), one should have
(2)
f(t)  0
for all t,

(3)

 f(t) dt = 1.
-
For example, suppose after some study the bank has come with the following model. They feel that
b
e
Pr{ a  T  b } = 
dt
 2
-t/2
(4)
for 0  a  b.
a
1.1 - 1
0.5
0.4
0.3
0.2
0.1
2
4
6
8
This is an example of an exponential density. In this case f(t) =
10
e-t/2
for t  0 and f(t) = 0 for t < 0. Note
2
that f(t) satisfies (2) and (3). This integral can be evaluated in terms of elementary functions, so we could
just as well say
Pr{ a  T  b } = e-a/2 - e-b/2.
However, it is useful to keep in mind the integral representation (4).
For example, the probability that T would be between 2 and 3 minutes would be
Pr{ 2  T  3 } = e-2/2 - e-3/2  0.145.
Cumulative distribution functions. Just as with a random variable that only takes on a finite set of values,
it is sometimes convenient to use its cumulative distribution function F(t). If the random variable is T then
t
(5)
F(t) = Pr{ T  t } =

 f(s) ds
-
Note that F '(t) = f(t). In the example above where f(t) =
e-t/2
for t  0 and f(t) = 0 for t < 0 one has f(t) = 1 2
e-t/2 for t  0 and f(t) = 0 for t < 0.
1
0.8
0.6
0.4
0.2
2
4
6
8
1.1 - 2
10
Exponential random variables. The example above is an example of an exponential random variable.
These are random variables whose density function has the form f(t) = e-t for t  0 and f(t) = 0 for t < 0.
They are often used to model the time between successive events of a certain type, for example
1.
the times between breakdowns of a computer,
2.
the times between arrivals of customers at a store,
3.
the times between sign-ons of users on a computer network,
4.
the times between incoming phone calls,
5.
the times it takes to serve customers at a store,
6.
the times users stay connected to a computer network,
7.
the lengths of phone calls,
8.
the times it takes for parts to break, e.g. light bulbs to burn out.
Exponential random variables have an interesting property called the memoryless property. If T is an
exponential random variable, then the conditional probability that T is greater than some value t + s given
that T is greater than s is the same as the probability that T is greater than t. In symbols
Pr{ T > t + s | T > s } = Pr{ T > t }
This follows from the fact that Pr{ T > t } = e--t. Suppose, for example, the time between arrivals of bank
customers is an exponential random variable and it has been ten minutes since the arrival of the last
customer. Then the probability that the next customer will arrive in the next two minutes is the same as if a
customer had just arrived.
Means or expected values. For a random variable X that only takes on a discrete set of values x1, …, xn
with probabilities Pr{X = xk } = f(xk), its mean or expected value is
 = = E(X) = x1f(xn) + … + xnf(xn)
For a continuous random variable T with density function f(t) we replace the sum by an integral to calculate
its mean, i.e.

(6)
 = = E(T) = 
 t f(t) dt
-
For an exponential random variable we have

(7)


e--t

1
--t
--t
--t
 = 
 t e dt = - te | 0 + 
 e dt = - 0 + 0 -  | 0 = 
0
0
We used integration by parts with u = t and dv = e--t dt so that du = dt and v = -e--t.
In the example above where the time between arrivals of bank customers was an exponential random
variable with  = ½, the average time between arrivals is 1/ = 2 min.
1.1 - 3
Problem 1. The lifetime of a certain type of light bulb has density function f(t) given by f(t) = 0 for t <
100 days and f(t) = 20000/t3 for t > 100 days.
a.
Find the probability that the lifetime is more than 110 days.
b.
Find the average lifetime.
Problem 2. Taxicabs pass by at an average rate of 20 per hour. Assume the time between taxi cabs is
an exponential random variable. What is the probability that a taxicab will pass by in the next minute?
Example 2 (Machine replacement). Consider the hard drive on my office computer. It costs
c1 = $300 to replace if it is replaced before it fails. If it fails before it is replaced, it costs an additional
c2 = $1000 in terms of down time for my computer. This is in addition to the $300 replacement cost.
Suppose that after it has been installed it is equally likely to fail anytime in the next five years.
Suppose every hard drive fails by the end of the 5th year. Let
T = the time the hard drive fails
q = time at which you replace it if it hasn't already failed
C = Cq = the cost of a replacement
Tq = replacement time if it is replaced at time q if it has not already failed.
a.
Find the probability density function f(t) and cumulative distribution function F(t) for T.
b.
Find the expected cost E(C) of a replacement.
c.
Find the expected time E(Tq) of a replacement.
d.
Find the long run average cost z(q) = E(C)/E(Tq) of a replacement.
e.
When should the hard drive be replaced so as to minimize the long run replacement cost.
Since it is equally likely to be replaced at any time in the next five years, f(t) should be constant for t
between 0 and 5 and f(t) should be 0 for t less than 0 and greater than 5. Since the integral of f(t) over
all t should be 1, we must have f(t) = 1/5 for 0  t  5 and f(t) = 0 for t < 0 and t > 5. This is an
example of a uniform probability distribution. It is constant in an interval and zero elsewhere.
The cumulative distribution function is just the integral of the density function from -  to t. So
F(t) = 0 for t < 0 and F(t) = t/5 for 0  t  5 and F(t) = 1 for t > 5.
E(C) = c1 + c2F(q) = 300 + 1000q/5 = 300 + 200q.
Tq is an example of a random variable is a mixture of continuous and discrete. It is continuous for t  q
and it is discrete for t = q. It's density function is the same as that of T for t < q,
i.e. f(t) = 1/5 for 0  t < q. It's density function is 0 for t > q. It has a probability mass of f(t) = 1 - F(q)
for t = q. To compute E(Tq) we combine the formulas for discrete and continuous random variable, i.e.
we integrate tf(t) over the region where T is continuous and sum tf(t) over the points where T is discrete.
q
q
So E(Tq) = 
 tf(t) dt + q(1 - F(q)) =
0
2
2
2

 t/5 dt + q(1 - q/5) = q /10 + q – q /5 = q - q /10.
0
z(q) = E(C)/E(Tq) = q
c1 + c2F(t)

 tf(t) dt + q(1 - F(q))
=
300 + 200q
3000 + 2000q
=
q - q2/10
10q - q2
0
1.1 - 4
To minimize z, we can minimize u = z/1000 =
3 + 2q
2q2 + 6q - 30 . So u' = 0 when
2. We have u' =
10q - q
(10q - q2)2
q2 + 3q - 15 = 0. The positive solution to this equation is about q = 2.65. So we should replace the
disk drive after 2.65 years.
1.1 - 5