Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
28.10.1998
Random Numbers
Probability Theory and
Statistics
• ref: Law-Kelton -91, Chapter 4
• Random number (satunnaisluku)
• Sample space
(otosavaruus)
Random Numbers
Distributions
Samples
Confidence Intervals
• discrete
• continuous
• Random variable
S = {1, 2, 3, 4, 5, 6}
S = [0, 1] S =[0,1)
(satunnaismuuttuja)
– one way to generate random numbers
– determined by a function or rule: R -> S
– random variables: X, Y=5X, Z=g(X,Y)
28.10.1998
Copyright Teemu Kerola 1998
1
Random Variable X
• Distribution function
F(x) = P(X ≤ x)
• Density function f(x) = P(X=x) = P(x)
• Both F(x) and f(x) determine the (same)
distribution
dice:
P(X=6) = 1/6
• X discrete?
– P(a ≤ X ≤ b) = ∑ab f(x) = F(b) - F(a)
– P(a < X ≤ b) = ∫ab f(x) = F(b) - F(a)
28.10.1998
Copyright Teemu Kerola 1998
(tiheysfunktio)
Prob
• X continuous?
28.10.1998
f(x)
Density:
Discrete distribution
X = Random[1,6]
1/6
0
0 1 2 3 4 5 6
-∞
∞
sum = 1
F(x)
1
(kertymäfunktio)
Distribution:
temperature:
P(20 ≤ X ≤ 24)
= 80%
Copyright Teemu Kerola 1998
3
f(x)
0
0 1 2 3 4 5 6
-∞
28.10.1998
∞
Copyright Teemu Kerola 1998
Continuous distribution
X = Random[1,6]
Density:
2
4
Continuous distribution
X = Random[0,1)
f(x)
Density:
1
1/5
-∞
0
0 1 2
F(x)
1
5 6
∞
Distribution:
-∞
28.10.1998
Prob. Theory and Statistics
-∞
area = 1
area = 1
0
0
F(x)
1
∞
1
Distribution:
0
0 1 2
5 6
Copyright Teemu Kerola 1998
∞
-∞
5
28.10.1998
0
0
1
Copyright Teemu Kerola 1998
∞
6
1
28.10.1998
Continuous distribution
X = Expon(a)
f(x)
Properties for Random Var. X
1/a
memoryless
property
area = 1
∞
1
−∞
x x+∆x
5% percentile = X0.05
0
28.10.1998
0
7
Variance and Deviation
(varianssi)
Norm(0,1)
f(x)
µ ∞
• average E(X) = EX = µ = ∫x x f(x) dx
– describes where the values are located
• variance Var(X) = σ2 = E[ (X-µ)2 ] = E(X2)-µ2
– describes how close to average the values are
• Standard deviation
σ = σ2
Copyright Teemu Kerola 1998
Covariance
9
(kovarianssi)
60% of probability mass
– Does X depend on Y or vice versa or what?
• Covariance Cov(X,Y) = CXY
= E[ (X-µX)(Y-µY) ] = E(XY) - µX µY
• CXY = CYX
• CXX = Cov(X,X) = Var(X) = σ2 X large => Y ???
– CXY = 0 X, Y uncorrelated
– CXY > 0 X, Y positively correlated X large => Y large
– CXY < 0 X, Y negatively correlated
X large => Y small
Copyright Teemu Kerola 1998
−∞
−1.65 −0.84
0
0.84
∞
1.65
• Normal distribution Norm (µ, σ2)
• Standard Normal Distribution =
Z-distribution = Norm (0, 1) = N(0, 1)
• Z-percentiles Z0.95
= 1.645
= Z1 - α/2 for α = 10%
28.10.1998
Copyright Teemu Kerola 1998
Correlation
10
(korrelaatio)
• Problem: CXY varies much in magnitude
– can not deduct much from its magnitude
• X and Y are random variables
Prob. Theory and Statistics
8
90% of
probability mass
µ
µ
Copyright Teemu Kerola 1998
Standard Normal Distribution
very small
variance
small var.
28.10.1998
28.10.1998
(hajonta)
large var.
28.10.1998
95% percentile = X0.95
EX = ∫x x f(x) dx
– EX = ∑x x P(x)
∞
1
Copyright Teemu Kerola 1998
−∞
∞
x’ x’+∆x
(mediaani)
• median X0.5: F(X0.5)= 0.5
• mean, expected value EX = E(X) = µ (keskiarvo)
(unohtavaisuus
ominaisuus)
f(x)
P(X [x’, x’+∆x])
5%
0
F(x)
1
-∞
(fraktiili)
P(X [x, x+∆x])
5%
0
-∞
50% percentile = X0.5
f(x)
11
• Solution: scale with (geom. aver) variance
• Correlation
CXY
Cor(X,Y) = ρXY =
ρXY = 0
ρXY > 0
ρXY < 0
ρXY = 0.98
σ2Xσ2Y
∈[−1,1]
X, Y uncorrelated
X, Y positively correlated
X, Y negatively correlated
X, Y highly positively correlated
• Correlation does not mean causal dependence
– Fig. on homeworks vs. exams results
28.10.1998
Copyright Teemu Kerola 1998
12
2
28.10.1998
Independent X and Y ?
Set of Random Variables
(riippumattomuus)
(yhteisjakauman
tiheysfuntio)
• Random variables X, Y
• Joint prob. density function f (X,Y):
P(X∈A, Y∈B) = ∫A ∫B f(x,y) dx dy
• X and Y are independent if
f(x,y) = fX(x) . fY(y)
where
fX(x) = ∫-∞,∞ f(x,y) dy
marginal prob.
fY(y) = ∫-∞,∞ f(x,y) dx
density functions
28.10.1998
Copyright Teemu Kerola 1998
13
• Many random variables Xi or Xi
– i ∈ [1, N]
•
•
•
•
•
28.10.1998
Copyright Teemu Kerola 1998
• Random variables X1 , X2 , ...
• Xi‘s identically distributed?
EXi = µi = µ
∀i = 1, 2, ....
Var(Xi)= σi2 = σ2
∀i = 1, 2, ....
fXi(x) = f(x)
∀i = 1, 2, ....
• Xi‘s independent?
Xi and Xj are independent ∀ij ∈ {1, 2, ...}
• Now, Xi‘s are IID
• Problem: What is the distribution of X?
• Solution: try it out and see
• Sample: one set of values for X
– throw a dice for 100 times? 10000 times?
– measure response time from 100 trials?
– write down sales sum for next 10 years?
• New question: what can we tell about X from
the sample?
– independent and identically distributed (IID)
– almost never the case in real life
– almost always needed for the math to work out
Copyright Teemu Kerola 1998
– What are EX=µ and Var(X)= σ2 ?
15
28.10.1998
Copyright Teemu Kerola 1998
Sample Properties
• Sample Set S = { Xi | i=1,...,n }
• Sample points Xi
X ( n) =
∑X
i
(otosjoukko)
– problem: var( X )= σ2 is unknown
– solution: use sample variance s2(n) instead:
Prob. Theory and Statistics
Copyright Teemu Kerola 1998
∑ [ X − X(n) ] ∑ X
2
s2 (n) =
i
i
n −1
=
i
2
i
σ2
n
− n X2 (n)
(otosvarianssi)
n −1
(otoskeskiarvo)
– now,
(harhaton estimaatti)
is unbiased estimator for EX=µ: EX (n) = µ
– with large n, sample mean is close to µ
– how close? how large n?
28.10.1998
Var ( X(n) ) =
• variance for sample mean:
i
n
16
Sample Properties
(otospisteet)
– should be set of IID random variables
– Xi ~ X, i.e.,
Xi has the (unknown) same distribution than X
EXi=EX=µ and Var(Xi)=Var(X)= σ2
• Sample mean
14
Sample
IID Set of Random Variables
28.10.1998
i ∈ [1, ∞]
E(Xi) = EXi = µXi = µi
Var(Xi) = Var(Xi) = σXi2 = σi2
Cov(Xi, Xj) = Cov(Xi,Xj) = CXi,Xj = Ci,j = Cij
Cor(Xi, Xj) = Cor(Xi,Xj) = ρXi,Xj = ρi,j = ρij
All comparisons pair-wise!
17
s2 (n)
)
Var( X(n) ) =
=
n
∑ [X
i
− X(n) ]2
i
n(n − 1)
is unbiased estimator for Var (X(n) )
– know now how good estimator sample mean is
28.10.1998
Copyright Teemu Kerola 1998
18
3
28.10.1998
How to Use
Central Limit Theorem?
Central Limit Theorem (CLT)
• “When n becomes large, normalized sample
mean becomes normally distributed”
X(n) − µ
σ2 / n
• Problem: what if n is not very large?
• Solution: use Student distribution instead:
X(n) − µ
σ2 / n
P ( − z1−α / 2 ≤
P( − z1−α / 2 ≤
≈ T(n − 1)
(vapausaste)
Copyright Teemu Kerola 1998
19
{
28.10.1998
f(x)
}
{
{
28.10.1998
}
90%of prob. mass
−∞
}
21
f(x)
small sample
−1.65 −0.84
0
0.84
large sample
• Distribution for normalized µ based on sample
• 60% CI is narrower, but more likely to be wrong, i.e.,
40% change that it does not to contain the true mean
28.10.1998
Copyright Teemu Kerola 1998
• Question: X is normally distributed.
What is the mean µ of X? 1.20, 1.50, 1.68, 1.89, 0.95,
1.49, 1.58, 1.55, 0.50, 1.09
• Answer:
– Take sample of 10 observations for X
– compute X(10) and s2 (10)
1.34 and 0.17
µ
µ
• distribution s2 for µ (the true, unknown mean) based on
sample
• large sample, s2 small
• small sample, s2 large
• Strong Law of Large Numbers:
(suurten lukujen laki)
– get 90% percentile for T-distribution with 9 degrees of
freedom t9, 0.95 = 1.83
– compute 90% confidence interval for µ:
X(10) ± t 9, 0.95 s2 (10) / 10
→∞
X(n) n
→ µ with probability 1
23
22
Example A
N(µ , s2)
Copyright Teemu Kerola 1998
1.65 ∞
90% confidence interval
60% confidence interval
s (n) / n ) = 90% n ≤ 30 ?
2
Effect of Sample Size,
Strong Law of Large Numbers
Prob. Theory and Statistics
20
N(0, 1)
n > 30 ?
Copyright Teemu Kerola 1998
28.10.1998
(luottamusväli)
Copyright Teemu Kerola 1998
s (n) / n ) = 1 − α
2
P ( µ ∈ X(n) ± z0.95 s2 (n) / n ) = 90%
P ( µ ∈ X(n) ± t n −1, 0.95
}
s2 (n) / n ) = 1 − α
Confidence Interval (CI)
• 90% confidence interval, α = 10%
or
≤ z1− α / 2 ) = 1 − α
X(n) − µ
≤ z1−α / 2 ) = 1 − α
s2 (n) / n
P ( µ ∈ X(n) ± z1− α / 2
• (1-α) confidence interval
{
σ2 / n
• Solve for µ, get confidence interval for µ
Confidence Interval
P ( µ ∈ X(n) ± z1−α / 2
X(n) − µ
• Use unbiased estimator for σ2
where n-1 is the degrees of freedom for T
• Percentiles for N(0,1) and T(k) are tabulated
28.10.1998
• Use CLT for percentiles
(keskeinen
raja-arvolause)
→∞
n
→ N(0,1)
28.10.1998
Copyright Teemu Kerola 1998
(1.34 - 0.24, 1.24 + 0.24)
= (1.10, 1.44)
24
4
28.10.1998
Example B
Example C
• Question: I know the distribution.
Is the mean µ = 5? (“Two-tailed hypothesis” H0)
• Answer:
• Question: do X and Y have same mean?
• Answer:
– Take samples for X and Y, sample averages: X(n) and Y(m)
– get test factor tn+m-1 :
– Take sample of 10, compute test factor t9 :
t9 =
X(10) − 5
(testisuure)
t n + m −1 =
s2 (10) / 10
– get 90% confidence interval for T-distribution with 9
degrees of freedom (-t9, 0.95 , t9, 0.95 ) = (-1.83, 1.83)
– if test factor t9 is not in there, reject H0, i.e., say “no”
• 10% chance of being wrong! (why?)
– if tn is in there, accept H0, i.e., say “yes”
28.10.1998
Copyright Teemu Kerola 1998
25
X (n) − Y(m )
1
1
s
+
n m
where s 2 =
1
n+m −2
(∑ ( X
i
− X )2 +
∑ (Y
i
− Y )2
)
– get 90% confidence interval for T-distribution
– if tn+m-1 is not in there, reject H0, i.e., say “no”
• 10% chance of being wrong
28.10.1998
Copyright Teemu Kerola 1998
26
28.10.1998
Copyright Teemu Kerola 1998
28
Usual Assumptions
That Usually Do Not Apply
• Normal distribution
– central limit theorem
– strong law of large numbers
• Independent samples
– X1, X2, ...
• Assumptions required by probability theory and
statistics to work do not usually apply
• Results may still be usable
• Watch out and always check how far off you are
from required assumptions
28.10.1998
Prob. Theory and Statistics
Copyright Teemu Kerola 1998
27
5