Download Theoretical informatics - Chapter 1 - BFH

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Randomness wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Probability Theory and Random Variables
One of the most noticeable aspects of many computer science related phenomena is the lack of
certainty. When a job is submitted to a batch oriented computer system, the exact time the job
will be completed is uncertain. The number of jobs submitted tomorrow is probably not known
either. Similarly, the exact response time for an interactive inquiry system cannot be predicted.
If several computers are attached to a local area network, some of them may try to communicate
at almost the same time and thus cause a collision on the network. How often this will happen
during a given period of time is a random number. In order to work with such observed,
uncertain processes, we need to put them into a mathematical framework. This is the purpose of
this chapter.
To apply probability theory to the process under study, we view it as a random experiment, that
is, as an experiment whose outcome is not known in advance but for which the set of all
possible individual outcomes is known.
The sample space  (Stichprobenraum, espace échantillon) of a random experiment is the set
of all possible simple outcomes of the experiment. These possible outcomes are called
elementary events (Elementarereignisse, événements élémentaires). Elementary events are
mutually exclusive.
Example 1
Tossing a fair die:
Example 2
Tossing two fair dice, consider the sum of points:
Example 3
Tossing a fair coin again and again until the first head appears:
Example 4
Measuring the "response time" of a computer program:
Probability Theory and Random Variables
______________________________________________________________
2
Sample spaces can be finite or infinite. They are also classified as discrete if the number of
sample points is finite or countably infinite. A sample space is continuous if its elementary
events consist of all the numbers of some finite or infinite interval of the real line.
An event (Ereignis, événement) is defined as a subset of a sample space.
The empty set  represents the impossible event, whereas  stands for the certain event.
Example 5 In Example 1 or Example 2, if A = {2,3,5}, then A is the event of tossing a
prime number while B = {1,3,5} is the event of tossing an odd number.
Example 6 In Example 4, A = {t | 20 < t < 30} is the event that the response time is between
20 and 30 seconds.
Note, that in general events are not mutually exclusive. E.g. in Example 5 the events A and B
may appear simultaneously.
If A and B are two events of a given experiment, then also A  B and A  B are events. The
event A* = \ A is the complementary event to A.
If A  B =  then A and B are mutually exclusive or incompatible.
The probability (Wahrscheinlichkeit, probabilité) of an event is defined as a real function P[]
which satisfies the following
Axioms 1
(1)
0  P[A]
(2)
P[] = 1
(3)
P[A  B] = P[A] + P[B]
for every event A
if the events A and B are mutually exclusive
It is immediate from (3) that for any finite collection A1, A2, ..., An of mutually exclusive
events
P[A1  A2  ...  An] = P[A1] + P[A2] + ... + P[An]
We assume as Axiom (4) that this remains true if n tends to infinity:
(4)
P[A1  A2  A3 ...] = P[A1] + P[A2] + P[A3] + ...
The above axioms lead to the following important consequences:
Probability Theory and Random Variables
______________________________________________________________
3
Theorem 1
(a)
P[] = 0
(b)
P[A*] = 1 - P[A] for every event A
(c)
P[A  B] = P[A] + P[B] - P[A  B] for any events A, B
(d)
A  B implies P[A]  P[B] for any events A, B
Example 7 We toss two fair dice and consider the total number of spots as the result of the
experiment. We define the following events:
A : the result is a prime number
B : the result is less than 5
The conditional probability (bedingte Wahrscheinlichkeit, probabilité conditionnelle) P[A|B],
read as "conditional probability of A, given B" is defined as
P[A|B] =
Two events A and B are independent, if and only if P[A|B] = P[A].
Using the above definition of the conditional probability immediately leads us to following
equivalent definition:
Two events A and B are independent, if and only if P[A  B] = P[A]  P[B]
Check whether the events A and B in Example 7 are independent or not!
Probability Theory and Random Variables
______________________________________________________________
4
The concept of two events A and B being independent should not be confused with the concept
of their being mutually exclusive. In fact, if A and B are mutually exclusive then
P[A  B] = P[] = 0,
however, if A and B are independent and P[A]  0 and P[B]  0 then
P[A  B] = P[A]  P[B]  0.
Hence, mutually exclusive events are not independent.
One of the main uses of conditional probability is to assist in the calculation of unconditional
probability by the use of the following theorem.
Theorem 2 (Law of total probability)
Let A1, A2, A3, ..., An be events such that
(a)
Ai  Aj =  if i  j (mutually exclusive events)
(b)
P[Ai] > 0 for i = 1,2,3,...,n
(c)
A1  A2  .... An = 
The family of events A1, A2, A3, ..., An is then called a partition of . For any event A, the
probability P[A] can then be calculated as follows:
P[A] = P[A1]  P[A|A1] + P[A2]  P[A|A2] + ... + P[An]  P[A|An]
Example 8 There are three coins in a box. The first one is fair. The second one is weighted
so that heads is twice as likely to appear as tails, and the third one shows tails on both sides. A
coin is randomly drawn from the box and tossed. Find the probability of heads.
Probability Theory and Random Variables
______________________________________________________________
5
Another direct consequence of the definition of conditional probability is the following
theorem:
Theorem 3 (Bayes' Theorem)
Suppose the events A1, A2, A3, ..., An form a partition of . Then for any event A with P[A] >
0
P[Ai|A] =
Example 9 In a certain college, 25 % of the male students and 15 % of the female students
are studying computer science. 70 % of all students are men. A student is chosen at random. (a)
Find the probability that the student is studying computer science. (b) If the student is studying
computer science, find the probability that the student is female.
Example 10 A red die and a white one are tossed simultaneously. Let A, B, and C be events
defined as follows:
A:
the red die turns up odd,
B:
the white die turns up odd,
C:
the total number of spots of both dice is odd.
Probability Theory and Random Variables
______________________________________________________________
6
The system of events A1, A2, A3, ..., An is called independent if
P[Ai  Aj] = P[Ai]  P[Aj]
P[Ai  Aj Ak] = P[Ai]  P[Aj] P[Ak]
P[A1  A2 ...  An] = P[A1]  P[A2]  ...  P[An]
for all combinations of indices such that
1  i < j < ... < k  n.
Reconsider the events in Example 10.
Exercise 1
Let A and B be events with P[A] = 0.3, P[A  B] = 0.4, and P[B] = p. Calculate
p if (a) A and B are mutually exclusive, (b) A and B are independent, and (c) A is a subset of B.
Exercise 2
A pair of fair dice is thrown. If the numbers appearing are different, find the
probability that (a) the sum is 6, (b) the sum is 4 or less, (c) the sum is even.
Exercise 3
Let three fair coins be tossed. Let A = "all heads or all tails", B = "at least two
heads", and C = "at most two heads". Of the pairs (A,B), (A,C) and (B,C), which are
independent? Is the system of the events A, B, C independent?
Exercise 4
20 % of all HiQuHard computers are assembled on Mondays, 25 % of them on
Tuesdays, Wednesdays and Thursdays, and 5% on Fridays. The company statistician has
determined, that 4 % of the computers produced on Monday are "lemons", i.e. they are
defective. 1 % of the computers made on Tuesday, Wednesday or Thursday are lemons, and 3
% of the computers manufactured on Friday are lemons. You find that your HiQuHard
computer is a lemon. What is the probability it was manufactured on Monday?
Exercise 5
Suppose there are n people in a room. (a) What is the probability p(n) that at
least two persons have the same birthday. (b) What is the smallest n such that p(n) exceeds 0.5.
Consider day and month only, and forget about leap years.
Probability Theory and Random Variables
______________________________________________________________
7
A random variable X (Zufallsvariable, variable aléatoire) is a function from a sample space 
of a given random experiment into the real numbers. I.e. each outcome of the experiment is
related to a real number.
Example 11 An urn contains 2 red balls and 3 white ones. We draw one ball after the other
(without putting it back) until we get a red one. The number of moves is our random number X.
The probability distribution function (PDF, Wahrscheinlichkeitsverteilung, distribution de
probabilité) of a random variable is defined as
F(x) = P[X  x]
In the case of a discrete random variable we have F(x) =
In Example 11:
Probability Theory and Random Variables
______________________________________________________________
8
The PDF has the following properties:
(1)
lim
F(x) = 0
(2)
lim
F(x) = 1
(3)
if x1 < x2 then F(x1)  F(x2), .e. F(x) increases monotonically
There are two important functions that are defined for random variables: mean or average or
expectation value (Mittelwert oder Erwartungswert, moyenne ou espérance mathématique) and
variance (Varianz, variance).
E[X] =
expectation value
Var[X] =
variance
The square root of the variance is known as standard deviation (Streuung, écart type)
Probability Theory and Random Variables
______________________________________________________________
9
Given a basic experiment with two possible outcomes (sucess and failure) with constant
probabilities p (for success) and q = 1 - p (for failure). The experiment is repeated n times, and
the number of successful outcomes is taken as the value of a random variable X.
Let p(k) be the probability for the event that X takes the value of k: p(k) = P[X = k].
Then
pBinomial(k,n,p) =
with q = 1 - p
This is the famous binomial distribution. Its expectation value and variance are:
E[X] =
Var[X] =
For large values of n (and if p and q are not too close to zero) the evaluation of p(k,n,p) may be
numerically difficult or even impossible.The binomial distribution may then be approximated
by the normal distribution with the same mean and variance.
pNormal(x,n,p)  f(x) =
with  =
and  =
Example 12 A telephone operator services 10 incoming telephone calls per hour in the
average. Let us consider the minutes of an hour as "basic experiments" and declare an arriving
call as a "successful outcome" of the basic experiment. What is the probability for exactly k =
12 "successful minutes" within a given hour?
Binomial distribution:
With the normal distribution we get:
Evidently the number of "successful minutes" is not necessarily identical to the number of
arriving calls, because it may happen that more than one call arrives within a single minute. We
can improve the result of our probability calculation by considering "successful seconds".
Probability Theory and Random Variables
10
______________________________________________________________
In order to do so, we replace:
n = 60
n=
p=
p=
Note that while applying this substitution, the product np (equal to the mean value of the
binomial distribution!) remains unchanged.
We can even go further and take the limits for n   and p  0 in a way that np = 
remains constant. We then obtain the famous Poisson distribution:
pPoisson(k,) =
The mean value is E[X] =  and the variance is also Var[X] = 2 = .
In our Example 12 we obtain for k = 12 and  = 10:
pPoisson(12,10) =
Example 13 In Example 12, what is the probability for the event that less than seven calls
arrive in a given hour? The probability distribution function F(x) gives us the answer:
F(6) = P[X  6] =
Example 14 Consider again the situation of our telephone operator. We start our observation
(and our clock) at an arbitrarily chosen moment in time and measure the time until the first
telephone call arrives. We take this time as our random variable T.
Note that the starting point on the time axis is not important. We can choose it wherever we
like. In particular, we can choose it at the moment, when a telephone call arrives. If we do so, T
is the time between two arriving calls: the interarrival time, or pause, for short.
Clearly, T is a continuous random variable. However, we shall first work with a discrete time
interval t (e.g a minute) and measure T in units of this interval. We may then calculate the
probability of a pause T of a certain maximum length, say t = 12 min:
F(t) = P[T  t] =
Probability Theory and Random Variables
11
______________________________________________________________
In our example  = 10 h-1 = 1/6 min-1 represents the average arrival rate.
We then have
F(t) =
In our numerical example:
F(12 min) =
However, the random number T should be measured in much shorter time units t in order to
get a "more correct" result for F(t). The solution is to take the limit t  0 and n  , such
that t = n t remains constant. The result is the exponential distribution:
F(t) =
We have proved the following important fact:
If the basic events of an experiment are independent from each other, then their number
in a given interval of time is Poisson distributed, while the interarrivel time distribution is
exponential.
If X is a continuous random variable (like the interarrival time distribution in the preceeding
example) we cannot use the definitions for the mean value and the average given earlier. In
order to be able to define these functions for continuous random variables too, we define the
probability density function (pdf) f(x) as the derivative of the probability distribution function
F(x):
F(x) = P[X  x].
(probability distribution function);
f(x) dx = F(x + dx) - F(x) = P[x  X  x + dx]
f(x) =
Probability Theory and Random Variables
12
______________________________________________________________
The mean value or average and the variance of a continuous variable is defined as:
E[X] =
Var[X] =
Exercise 7
We throw a 5 cent coin and a fair die and take the absolute value of the
difference of the numbers that appear. The result is our random number X. Draw the probability
distribution function F(x) and calculate mean and variance.
Exercise 8
Use a mathematical software package to draw a picture of the binomial
distribution pBinomial(k,n,p) for n = 20, p = 0.3. Compare it with a normal distribution having
the same mean and variance.
Exercise 9
Use a mathematical software package to draw a picture of the Poisson
distribution pPoisson(k,) for  = 25 , k = 0...100.
Exercise 10
Calculate mean and variance of the exponential distribution F(t) = 1 - e- t
Exercise 11 A particle detector counts 15 events per minute on the average. (a) What is
probability of counting more than 20 events during a particular minute? (b) Find the probability
of a pause shorter than 0.1 sec between two consecutive events.