Download Notes: Section 001 (Dawes) - School of Computing Sites

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Inductive probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
20160927
and
20160930
Order Notation and Probability
Order Notation
We have looked at functions as ordered pairs. A more traditional way (but completely
equivalent) to think of a function is as something that takes an input value and produces an
output value. We are now going to develop a way to establish a relationship between
functions. This relation will be extremely important in your future study of data structures
(CISC-235) and algorithms (CISC-365).
Let
and
be two functions from
to
- that is,
and
Suppose there exists a positive value
such that
for all , with only a finite number of exceptions.
Then we say
is order
, and we write
This is the way the text defines order. It’s perfectly correct, but I don’t think it is the most
easily understood form. I prefer the following:
Suppose there exists a positive value
and and integer
such that
Then we say
is order
, and we write
The two forms of the definition are completely equivalent. You can use whichever one you
prefer.
We also sometimes read
... but what does it mean?
Suppose the definition is satisfied, so
This is equivalent to
as
which is another way of saying that even when gets really large,
can never get too
much larger than
... and this means that even if
goes up as increases,
goes up at least as fast.
Ok, that’s what it means ... but why is it useful?
If f(n) represents the running time of an algorithm (and n represents the size of the input)
then f(n) may be very complicated and hard to compute. But if we can find a simple g(n)
such that f(n) is O(g(n)), then we can use g(n) to describe the running time of the algorithm.
Let’s look at an example or two.
Example 1:
Can we find an
and a
I claim that
Thus we can see that
for which
?
works:
is
0
20
0
1
23
1
2
26
4
3
29
9
4
32
16
5
35
25
6
38
36
7
41
49
8
44
64
...
...
...
Example 2:
For this example, I claim M = 4, k = 10 works
And we see that
0
100
0
1
103
4
2
112
16
3
127
36
4
148
48
5
175
100
6
208
144
7
247
196
8
292
256
9
343
324
10
400
400
11
463
484
...
...
...
is
From this we can extrapolate that for any constants a, b and c,
, and let k be the smallest element of for which
And from this we can extrapolate that when
is
is any polynomial
(where the values are constants),
then
is
The text develops these ideas in detail, and I recommend reading this section carefully.
.... we can let
We can state a few simple facts about this relationship between functions.
1.
For all functions
2.
(so the big-O relation is reflexive)
does not necessarily imply that
(so the big-O relation is not symmetric)
3.
if
is
and
is
then
is
(so the big-O relation is transitive)
In practice, when we use the big-O classification all of our functions are non-negative, so the absolute
value signs in the definition become irrelevant.
There are two other classifications to define, both of which are related to big-O.
Big
Classification:
Let
and
be functions defined as before. We say
is
Omega” if there exists M > 0 such that
with finitely many exceptions
(or equivalently, if there exist M > 0 and k
0 such that
(which we read as “big
)
Classification:
We can show that the big-O and big- relationships are independent of each other. That is, if
,
may or may not be
, and vice versa.
However, sometimes we find that for particular functions
and
is
. When this happens, we say
“
is Theta
”
and
is
, it turns out that
is
.... which we read as
is
Probability
In casual conversation, people tend to use the concepts of probability quite loosely. We are going to
give precise definitions and formulas to give probability theory a firm footing in discrete mathematics.
Sample Spaces
Probability theory is based on the idea of observing the outcome of an experiment that has different
possible outcomes. We call this process of experimenting and observing sampling. The type of
experiment we are usually talking about here is one with a fixed set of possible outcomes, such as
flipping a coin, tossing a six-sided die, or picking one ball out of a box of different coloured balls.
A sample space consists of the set of possible outcomes of an experiment, and a function P( ) that
assigns a value to each outcome. P( ) must have the following properties:
for each outcome x
We call P( ) a probability function.
We will sometimes write (S,P) to identify the sample space where S is the set of outcomes and P is the
probability function.
Any function that satisfies the requirements is a valid probability function, but we usually want our
probability functions to correspond to the “likelihood” of the different outcomes occurring. But that is
dangerously close to a circular definition, since “likelihood” is often used as a synonym for
“probability”.
We can get a concrete sense of what we want our probability functions to do by considering an
experiment which we sample many, many times. If we count the number of times a particular outcome
occurs and divide that by the number of samples, we expect that this ratio will change less and less as
the number of samples increases. The limit of this ratio as the number of samples goes to
is what
we want as the probability of that particular outcome.
For example, consider a box containing a red ball, a yellow ball and a blue ball. If the balls are all
identical in size and weight, we expect that if we take out one ball, record its colour and return it to the
box, over and over again, the number of times we withdraw the red ball divided by the total number of
samples, will get closer and closer to 1/3 ... as will the ratios for the yellow ball and the blue ball.
Now suppose the box contains 2 red balls and 1 yellow ball. The possible outcomes from our sampling
experiment are {red, yellow}. Again assuming that the balls are identical except for their colours, we
expect that the ratio for red (the ratio of occurrences / samples) will approach 2/3, while the ratio for
yellow approaches 1/3.
So in general, we want P(x) to be the ratio “occurrences of x”/”number of samples” as the number of
samples goes towards
Events
If (S,P) is a sample space, we use the word event to describe any subset of S. For example, if S =
{1,2,3,4,5} then A = {1,3,5} is an event. {} is also an event, and so is S.
Suppose we sample (S,P) by conducting the experiment once. If the outcome of the sample is an
element of A, we say that A has occurred.
Now we can define the probability of an event: the probability of event A is the sum of the
probabilities of the elements of A. In notation
For example, given S as above, suppose P(1) = P(2) = P(3) = P(4) = P(5) = 0.2
Then P({1,3,5}) = 0.6
But suppose P(1) = 0.4, P(2) = 0.1, P(3) = 0.3, P(4) = 0.1, P(5) = 0.1
Then P({1,3,5}) = 0.8
In casual discussions of probability, people often forget that the probability of an event depends on the
probability function – they assume (without cause) that all outcomes are equally probable. A very
simple example is when the experiment consists of rolling two fair 6-sided dice and adding the
numbers that come up. The possible outcomes are {2,3,4,...,12} but the probabilities are not all equal.
So the event {6,7,8} is very different from the probability of the event {2,3,4} even though both
contain the same number of outcomes.
To be honest, the (unjustified) assumption that all outcomes of an experiment are equally probable
sometimes shows up in scientific discussions as well.
Combinations of Events
Let (S,P) be a sample space, and let A and B be events in that sample space.
What can we say about A
B? More precisely, can we compute P(A
B) from P(A) and P(B)?
Unfortunately P(A) and P(B) do not give enough information to compute P(A B). Consider this
example. Let the experiment be tossing a single 6-sided die. We will assume that all outcomes are
equally probable (i.e. P(i) = 1/6
i {1,2,3,4,5,6})
Let A = {1,2,3} B = {2,3,4} C = {4,5,6}
P(A) = P(B) = P(C) = 1/2, but P(A
B) = P({1,2,3,4}) = 2/3 while P(A
C) = 1
In the first case two events with individual probabilities = 1/2 have a combined probability = 2/3, and
in the second case two events with individual probabilities = 1/2 have a combined probability = 1
The difference of course is that A and B overlap (i.e. they have non-empty intersection) while A and C
do not overlap ... and we can’t tell that just by looking at their probabilities. Fortunately the solution is
obvious as soon as we recognize the problem. We just apply our old friend the Principle of
Inclusion/Exclusion, and arrive at this formula:
P(A
B) = P(A) + P(B) – P(A
Note that this means the only time P(A
B)
B) = P(A) + P(B) is when P(A
B) = 0
Also observe that using this equation, if we know any three of the terms we can deduce the fourth.
For example if we know P(A B) = 0.7, P(A) = 0.3, P(A B) = 0.2, then we know P(B) = 0.6
Here are some more useful facts about the probabilities of events
P( ) = 0
P(S) = 1
P(A
B)
P(A) + P(B)
P( ) = 1 – P(A)
where
represents the complement of A
Conditional Probability
Before we sample, we can compute the probability that a particular event will occur. After we sample,
if we know the outcome then we know with complete certainty whether the event occurred or not. For
example, if the event A = {1,5,6} and the outcome of the sample = 3, then A did not occur.
But what if we are given partial information about the outcome? Can we compute the probability that
event A occurred?
More formally, let A and B be events. For a particular sample, suppose all we know is that B occurred
(we don’t know the actual value of the outcome). What is the probability that A also occurred?
We express this as “what is the probability of A, given B?” and we use the notation P(A | B)
Before we analyze this we can do some examples.
Suppose A B =
(Eg. A = {1,2} and B = {3,4,5} )
Then A and B cannot occur simultaneously, so P(A | B) = 0
Suppose B A (Eg. A = {1,2,4,5} and B = {2,5} )
Then A occurs whenever B occurs, so P(A | B) = 1
For other situations a bit more thought is required. If event B has occurred, the only way A can also
have occurred is if the outcome is in
, so we clearly need to compute P(A B) ... but that is not
the end of the story! Because we know B occurred, the set of possible outcomes is reduced to just B.
What we need to know is what fraction of the total probability of B is concentrated in A B.
Our solution then is P(A | B) =
A few examples will help.
Example 1: Suppose S = {1,2,..., 100} and
Let A = {1,2,3} and B = {1,2,4,5}
P(A) =
P(B) =
P(A
B) =
So with no information about the outcome, A is not very probable. But if we know that B occurred, it
is clearly very probable that A also occurred. In fact, 2 out of the 4 outcomes in B are also in A, so it
seems logical that P(A | B) =
. And indeed, our formula gives P(A | B) =
Now if the outcomes do not all have equal probability, it may be less obvious that the formula will give
the correct result. We consider this in the next example.
Example 2: Suppose we have a fair 10-sided die, where 4 of the sides show “1”, 2 sides show “2”, and
the others show “3”, “4”, “5” and “6”. So S = {1,2,3,4,5,6} and P is shown by this table
n
P(n)
1
0.4
2
0.2
3
0.1
4
0.1
5
0.1
6
0.1
Let A = {1,3,4} and let B = {1,2,3,6}. Using the formula, we get P(A | B) =
... but does this
make sense?
By restricting the possible outcomes to B, it is as if outcomes 4 and 5 now have probability 0. All of
the probability is concentrated in B (i.e. in the situation we are dealing with, P(B) = 1). Within B,
outcome 1 has half of the probability (0.4 out of 0.8) so now outcome 1 has a “conditional probability”
of P(1) = 1/2. Similarly the conditional probabilities of 2, 3 and 6 are 1/4, 1/8 and 1/8. So the
outcomes that are of interest to us – i.e. the ones that are in A B, so 1 and 3 – have conditional
probabilities
and
giving a total conditional probability for A, given B, of
You can see that the formula
is really just a quick way to express the same
thought process we just went through in the example.