Download Pattern Recognition

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Hardware random number generator wikipedia , lookup

Information theory wikipedia , lookup

Generalized linear model wikipedia , lookup

Fisher–Yates shuffle wikipedia , lookup

Simulated annealing wikipedia , lookup

Randomness wikipedia , lookup

Birthday problem wikipedia , lookup

Pattern recognition wikipedia , lookup

Probability amplitude wikipedia , lookup

Transcript
Pattern Recognition
Probability Theory
Probability Space
is a three-tuple
with:
•
− the set of elementary events
•
− algebra
•
− probability measure
-algebra over is a system of subsets,
i.e.
( is the power set) with:
•
•
•
is closed with respect to the complement and countable conjunction
It follows –
, countable disjunction (due to the De Morgan's laws)
Pattern Recognition: Probability Theory
2
Probability Space
Examples:
•
•
•
(smallest) and
the minimal -algebra over
is
containing a particular subset
discrete and finite,
•
•
(largest) -algebras over
, the Borel-algebra (contains all intervals amongst others)
etc.
Pattern Recognition: Probability Theory
3
Probability Measure
Is a “measure” ( ) with the normalizing
-additivity: let
then
be pairwise disjoint subsets, i.e.
Note: there are sets, for which there is mo measure.
Examples: the set of irrational numbers, function spaces
,
etc.
Banach–Tarski paradox:
Pattern Recognition: Probability Theory
4
(For us) practically relevant cases
•
•
The set
is “good-natured”, i.e.
, discrete finite sets etc.
, i.e. the algebra is the power set
•
We often consider a (composite) “event”
of the elementary ones
•
Probability of an event is
Pattern Recognition: Probability Theory
as the union
5
Random variables
Here a special case – real-valued random variables.
A random variable for a probability space
, satisfying
(always holds for power sets
is a mapping
).
Note: elementary events are not numbers – they are elements of
an abstract set
Random variables in contrast are numbers, i.e. they can be summed
up, subtracted, squared etc.
Pattern Recognition: Probability Theory
6
Distributions
Cumulative distribution function of a random variable :
Probability distribution of a discrete random variable
Probability density of a continuous random variable
Pattern Recognition: Probability Theory
:
:
7
Distributions
Why it is necessary to do it so complicated (through the cumulative
distribution function)?
Example – a Gaussian.
Probability of any particular real value is zero → a “direct” definition
of a “probability distribution” is senseless 
It is indeed possible through the cumulative distribution function.
Pattern Recognition: Probability Theory
8
Mean
A mean (average, expectation…) of a random variable is
Arithmetic mean is a special case:
with
(uniform probability distribution)
Pattern Recognition: Probability Theory
9
Mean
The probability of an event
can be expressed as the mean
value of a corresponding “indicator”-variable:
with
Often, the set of elementary events can be associated with a random
variable (just enumerate all
).
Then one can speak about a “probability distribution over “
(instead of the probability measure).
Pattern Recognition: Probability Theory
10
Example 1 – numbers of a die
The set of elementary events:
Probability measure:
Random variable:
Cumulative distribution:
Probability distribution:
Mean value:
Another random variable (squared numbers of a die):
Mean value:
Note:
Pattern Recognition: Probability Theory
11
Example 2 – two independent dice numbers
The set of elementary events (6x6 faces):
Probability measure:
Two random variables:
1. The number of the first die:
2. The number of the second die
Probability distributions:
Pattern Recognition: Probability Theory
12
Example 2 – two independent dice numbers
Consider the new random variable
The probability distribution
is not uniform anymore 
Mean value is
In general for mean values:
Pattern Recognition: Probability Theory
13
Random variables of higher dimension
Analogously: Let
with
,
be a mapping (
and
for simplicity),
Cumulative distribution function:
Joint probability distribution (discrete):
Joint probability density (continuous):
Pattern Recognition: Probability Theory
14
Independence
Two events
Interesting:
Events and
and
are independent, if
are independent, if
and
are independent.
Two random variables are independent, if
It follows (example for continuous )
Pattern Recognition: Probability Theory
15
Conditional Probabilities
Conditional probability:
Independence (“almost” equivalent):
and
are independent, if
and/or
Bayes’ theorem (formula, rule):
Pattern Recognition: Probability Theory
16
Further definitions (for random variables)
Shorthand:
Marginal probability distribution:
Conditional probability distribution:
Note:
Independent probability distributions:
Pattern Recognition: Probability Theory
17
Example
Let the probability to be taken ill be
Let the conditional probability to have a temperature in that case is
However, one may have a temperature without any illness, i.e.
What is the probability to be taken ill provided that one has a
temperature?
Pattern Recognition: Probability Theory
18
Example
Bayes’ rule:
− not so high as expected , the reason – very low prior probability
to be taken ill
Pattern Recognition: Probability Theory
19
Further topics
The model
Let two random variables be given:
• The first one is typically discrete (i.e.
• The second one is often continuous (
“observation”
Let the joint probability distribution
As is discrete it is often specified by
) and is called “class”
) and is called
be “given”.
The recognition task: given , estimate .
Usual problems (questions):
• How to estimate from ?
• The joint probability is not always explicitly specified.
• The set is sometimes huge (remember the Hopfield-Networks)
Pattern Recognition: Probability Theory
20
Further topics
The learning task:
Often (almost always) the probability distribution is known up to free
parameters. How to choose them (learn from examples)?
Next themes:
1.
2.
3.
4.
Recognition, Bayessian Decision Theory
Probabilistic (generative) learning, Maximum-Likelihood principle
Discriminative models, recognition and learning
Support Vector Machines
Pattern Recognition: Probability Theory
21