Download ESS011 Mathematical statistics and signal processing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Randomness wikipedia , lookup

Transcript
ESS011
Mathematical statistics and signal processing
Lecture 6: The random variable, its probability law and first two
moments
Tuomas A. Rajala
Chalmers TU
March 27, 2014
Course ESS011 (2014)
Lecture 6: The random variable, its probability law and first two moments
Where are we
Over the three past lectures we have
learned how to talk about randomness
collected the basic tools of probability theory
learned how to manipulate given probabilities to form new ones.
Now let’s talk about modeling the events.
This lecture will be a crash course to the basic definitions and
terminology of models for random outcomes.
1/21
Course ESS011 (2014)
Lecture 6: The random variable, its probability law and first two moments
Motivation
0.8
0.6
0.4
P(outcome is x)
0.2
7
6
5
4
3
2
1
0
0.0
How would you describe the roll of a die?
”1 out 6”, ”equally likely”,...
Perhaps in a figure: pk = 1/6 for k = 1, ..., 6,
like so →
1.0
How would you describe the coin toss? ”50-50”, ”equally likely”...
p = 0.5.
0.08
P(outcome is x)
0.04
outcome = x
2/21
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0.00
How would you describe an experiment with
outcomes 1, ..., 25 that would have the
probabilities as given in the figure →
0.12
outcome of a die = x
Course ESS011 (2014)
Lecture 6: The random variable, its probability law and first two moments
Random variable: layman’s definition
We have been describing the outcome of an experiment with events.
Now let’s describe events using variables.
Variable = an element, feature, or factor that is liable to vary or change
Random variable (r.v.) A variable which describes the outcome of a
random experiment as a number.
We denote random variables before the experiment by uppercase letters,
e.g. X, Y , and after the experiment with lowercase letters, e.g.
x = 0, y = 4.5.
One must encode the outcomes into numbers, so e.g. heads = 0, tails =
1.
3/21
Course ESS011 (2014)
Lecture 6: The random variable, its probability law and first two moments
Random variable: mathematical definition *
Mathematical definition: Let S be a non-empty sample space, (S, S, P )
be a probability space, and let B(R) be the set of all Borel sets of R (all
”nice” subsets).
Random variable (r.v.) A function X : S → R is called random variable
if
X −1 (B) = {s : X(s) ∈ B} ∈ S ∀B ∈ B(R)
The mathematical definition says X is S-measurable.
Which means that whatever X does, P can handle it.
4/21
Course ESS011 (2014)
Lecture 6: The random variable, its probability law and first two moments
Discrete and continuous random variables
We classify random variables based on their outcome properties.
Discrete random variable If r.v. X takes values on a finite or countably
infinite set (can be mapped to N), it is called a discrete r.v.
Continuous random variable If r.v. X takes values on a uncountably
infinite set, it is called continuous r.v.
E.g.: The number of people in a queue (discrete), the time we need to
wait (continuous)
We study only these two. Others include: processes, graphs, sequences,
fields, point patterns...
5/21
Course ESS011 (2014)
Lecture 6: The random variable, its probability law and first two moments
Examples of random variables
Die cast: X(s) = s. Then P ({score 5}) = P ({s : X(s) = 5}) or
P (X = 5) for short.
Coin sequence (see book example 3.1.2): Let’s toss a coin repeatedly,
and observe when the first tails comes up. Denote this random number
by Y = 1, 2, 3, .... Then
3
2
1
0
Continuous: X =the height of a male Swede. The
graph on the right could depict a rough model for
its probabilities.
P(height = x)
4
P (first tails at toss k) = P (Y = k)
1.4
1.6
1.8
2.0
height = x (in meters)
6/21
2.2
Course ESS011 (2014)
Lecture 6: The random variable, its probability law and first two moments
Distribution: discrete
From possible to probable: Likelihood of the outcomes a r.v. can have.
The set of all probabilities {P (X = si ) : ∀si } is called the distribution
of a discrete r.v. X.
For example, dice: {1, 2, 3, 4, 5, 6} → {1/6, 1/6, ..., 1/6}
The coin sequence: As we have P (tails) = P (not tails), and the tosses
are independent,
1
P (Y = 1) =
2
1 1
P (Y = 2) = P (1st not tails)P (2nd is tails) = ·
2 2
and so forth (see later), the distribution is
1 1 1
{1, 2, 3, ...} → { , , , ...}
2 4 8
7/21
Course ESS011 (2014)
Lecture 6: The random variable, its probability law and first two moments
Density: discrete
The listing of the distribution is practical only for small sample spaces.
Often we can (and want to) describe the distributions by special
functions: define
Discrete density Let X be a discrete r.v. The function f such that
f (x) = P (X = x),
x∈R
is called the density function of X.
Also known as the probability mass function (pmf) or the point mass
function.
The density defines the distribution, and vice versa.
8/21
Course ESS011 (2014)
Lecture 6: The random variable, its probability law and first two moments
Density: continuous
For a continuous random variable X this is a bit trickier: For any single
value x ∈ R we unfortunately have P (X = x) = 0 (height exactly 1.8?).
We define the density as follows:
Continuous density The function f is called the density of a continuous
r.v. X, iff
1
2
3
f (x) ≥ 0 for all x ∈ R
R∞
f (x)dx = 1
−∞
Rb
P (a ≤ X ≤ b) = a f (x)dx for a, b ∈ R.
The density of continuous X is equated to its distribution.
Remember: For continuous X we have P (X = x) = 0 for every single
value x, even if f (x) > 0!
9/21
Course ESS011 (2014)
Lecture 6: The random variable, its probability law and first two moments
Density: notes
Here is a check-list for a function to be a density.
Discrete density: A real valued function f is a discrete density, iff
1
2
f (x) ≥ 0 for all x
P
all x f (x) = 1
Continuous density: A real valued function f is a continuous density, iff
1
2
f (x) ≥ 0 for all x
R∞
f (x)dx = 1
−∞
Notes:
We will only consider sample points for which f (x) > 0 and where f
not defined we assume f (x) = 0
Continuous X ⇔ continuous function f
We often write X ∼ f when X has density f
10/21
Course ESS011 (2014)
Lecture 6: The random variable, its probability law and first two moments
Example continues
For coin sequence: Due to the independence, f (k) := P (Y = k) = 1/2k ,
k = 1, 2, .... Is it a density? The values are positive, so we only need to
check if it sums to 1:
∞
X
f (k) =
k=1
∞
X
(1/2)k
k=1
P(Y=k)
0.2
1/2
=1
1 − 1/2
0.1
=
0.3
0.4
0.5
is a converging geometric series with limit
11/21
26
25
24
23
22
21
20
19
18
17
16
15
14
9
Y=k
13
8
12
7
11
6
5
10
4
3
2
1
0.0
so f is a density for Y .
Course ESS011 (2014)
Lecture 6: The random variable, its probability law and first two moments
Cumulative distribution function
Another useful function:
Cumulative distribution function (CDF) Let X be a r.v. with density
f . Then the function F with
F (a) := P (X ≤ a) a ∈ R
is called the cumulative distribution function (CDF) of X.
The connection between density and CDF is
For discrete X:
X
F (a) =
f (x)
x≤a
For continuous X:
Z
a
F (a) =
f (x)dx,
f (x) = F 0 (x)
−∞
Note that for continuous X P (X ≤ a) = P (X < a) as P (X = a) = 0.
12/21
Course ESS011 (2014)
Lecture 6: The random variable, its probability law and first two moments
Examples contd.
Coin toss: How many coins to toss so that the probability of getting the
tails is ≥ 99%? I.e. with what k is P (Y ≤ k) ≥ 0.99? First,
X
X
F (a) =
f (a) =
1/2k
k≤a
k≤a
It is a truncated geometric series and has a finite form F (a) = 1 − 1/2a .
The solutions is given by
F (a) > 0.99 ⇔ a ≈ 7.
Heights of men: What is the probability that a randomly picked Swedish
man is between a and b meters tall?
Z b
Z a
P (a < X < b) =
f (x)dx −
f (x)dx = F (b) − F (a).
−∞
−∞
***
13/21
Course ESS011 (2014)
Lecture 6: The random variable, its probability law and first two moments
Expectation
We are often interested in only some aspects of distributions.
Most often want to know the average behaviour: What we would get on
average if we would repeat the experiment.
Example of heights: You are selling life jackets, one size only. What is
the average size of a male Swede?
Coins Butterflies example: You are a lepidopterist, and want to study at
least one male of a special butterfly species. Your trap catches males and
females with ratio 50:50, but only one at the time. How many do you
need to catch on average to get that one male?
14/21
Course ESS011 (2014)
Lecture 6: The random variable, its probability law and first two moments
Expectation: general
Note: If X is a random variable, and h is real valued function, h(X) is
also a random variable.
E.g. for the life jackets the chest diameter could be some h(x), where x
is height.
Then we can ask for the expected value of h(X).
Expectation (general) For a random variable X with a density f , and a
function h, the expected value of h(X) is defined as
X
E[h(X)] =
h(x)f (x)
(discrete)
x
Z
E[h(X)]
∞
=
h(x)f (x)dx
−∞
technical condition: E|h(X)| < ∞
15/21
(continuous)
Course ESS011 (2014)
Lecture 6: The random variable, its probability law and first two moments
Expected value
The quantity often asked describes the expected outcome of X, i.e.
h(x) = x:
Expected value For a random variable X with a density f the expected
value is defined as
X
E(X) =
xf (x)
(discrete)
x
Z
E(X)
∞
=
xf (x)dx
(continuous)
−∞
Note:
Theoretical average value: Where is the distribution concentrated
”expected value” or ”mean” or ”average”
When data is involved most often ”average”
Usually denoted by µ
Useful equations: Ec = c and E(cX) = cE(X) for any constant c
16/21
Course ESS011 (2014)
Lecture 6: The random variable, its probability law and first two moments
Example of expectation
The butterflies (i.e. coins): Y is the time of first male. We need on
average
∞
∞
X
X
E(Y ) =
kf (k) =
k/2k = ... = 2
k=1
k=1
0.6
0.4
0.0
0.2
Consider the following distribution (book Table 3.6)
of heartbeat rates S = {40, 60, 67, 70, 72, 80, 100}:
P(X=x)
0.8
1.0
samples (proof later).
S → {.01, .04, .05, .8, .05, .04, .01}
40
60
68
70
72
80
100
80
100
heart beat rate X = x
1.0
The average is µ = 70 (dashed line).
0.6
0.4
0.2
0.0
S → {.4, .05, .04, .02, .04, .05, .4}
P(X=x)
0.8
But another distribution could be
The average is also µ = 70.
40
60
68
70
72
heart beat rate X = x
17/21
Course ESS011 (2014)
Lecture 6: The random variable, its probability law and first two moments
Variance
Expected value is only one descriptor of the distribution. A variability
number provides more information. Most used one is
Variance Let X be a random variable with density f . Then
Var(X) := E(X − E(X))2
is called the variance.
Measures expected differences from mean
Common notation: Var(X) = σ 2
Large variance = large deviations from the mean are to be expected
Useful formulas: Var(c) = 0 and Var(cX) = c2 Var(X) for a
constant c.
18/21
Course ESS011 (2014)
Lecture 6: The random variable, its probability law and first two moments
Variance example
Example: For butterflies Var(Y ) = 2 (details later)
The following formula is useful for computation:
Var(X) = E(X 2 ) − [E(X)]2
Proof: Open the square inside definition.
Example: For the heartbeats 1 E(X 2 ) = 4926.4 so
Var1 (X) = 4926.4 − 702 = 26.4
For the heartbeats 2 Var(Y ) = 5630.32 − 702 = 730.32.
Definitely not the same distribution.
19/21
Course ESS011 (2014)
Lecture 6: The random variable, its probability law and first two moments
Standard deviation
Variance is a squared difference, so it does not share the units with the
random variable.
More common in practice is to talk about
Standard deviation (sd) For a r.v. X with variance σ 2 the quantity
√
p
σ = Var(X) = σ 2
is called the standard deviation.
Example:
hearbeats 1: stan. dev. σ1 = 5.14 beats per minute
hearbeats 2: sd. σ1 = 27.02 beats per minute
Rule of thumb: roughly 95% of times the outcome falls within two σ
from µ.
20/21
Course ESS011 (2014)
Lecture 6: The random variable, its probability law and first two moments
Summary
Many things today:
Random variable: a model for events
Distribution: probabilities of random variables
Density and cumulative distribution function: compact expressions
for a distribution
Expectation/mean: Theoretical average number
Variance/standard deviation: scale of variation around mean
0.08
P(outcome is x)
0.04
21/21
25
24
23
22
21
20
19
18
17
16
9
outcome = x
15
8
7
14
13
6
12
11
10
5
4
3
2
1
0.00
The figure on the first slide: Density of a random
variable with a Poisson distribution, mean 9 and sd 3.
0.12
Next time we look at some of the commonly used families of distributions.