Download Calculating the entropy

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Calculating the entropy on-the-fly
Daniel Lewandowski
Faculty of Information Technology and Systems,
TU Delft
Introducing a function h.
h - is a measure of the uncertainty about the
outcome of an experiment modelled using
probability distributions
Assumptions
We assume that h:
•depends only on the
probability of the outcome of
an experiment or event
•takes values in non-negative
real numbers
•is a continuous and
decreasing function
•h(p1p2)=h(p1)+h(p2)
The assumptions
forces h to be of
the form:
h(p)= - C log(p)
Definition of entropy
The entropy H is the expectation of the
function h.
Example:
x1, x2,…,xn are realizations of a rand. variab. X with
probabilities p1, p2,…,pn respectively. Then the entropy of
X is:
H ( X )  C  pi log  pi 
i
Units in which the entropy is measured
log2(x) – bits
log3(x) – trits
ln(x) – nats
log10(x) – Hartleys
Entropy of some continuous distributions
•The standard normal (Gaussian) distribution: H = 1,4189
•The Weibull distribution (=1,127; =2,5): H = 0,5496
•The Weibull distribution (=1,107; =1,5): H = 0,8892
•The gamma distribution (==5): H = 0,5441
Approximation of the density
Y1,Y2,…,Yn – samples
D0,D1,…,Dn – midpoints
D0=Y1 – (Y2 – Y1)/2,
Di=Yi+1 – (Yi+1 – Yi)/2,
Dn=Yn + (Yn – Yn-1)/2,
for i=2,…,n-1
Computations
The density above the Yi is estimated as:
1
Pi 
N Di  Di 1 
The entropy is then computed as:

1 N 
1

H    ln 
N i 1  N ( Di  Di 1 ) 
Grouping samples
Remark: The result of calculating the entropy without
grouping samples is biased – the bias is asymptotically equal
to  - 1 + ln2, ( - Euler constant)
2,5)
1000
Weibull
ples
eanw
1, eta
ithgrouping
grouping
1000
standard
al
sam(m
ples
ith
grouping
(theoretical
value
1000
gam mnorm
asam
sam
ples
(rho
==lam
bda==1,5)
5) ww
ith
0,5496)
1,4189)
(theoretical
value 0,8892)
0,5441)
Entropy
ungro
ungrouped
uped
1.6
1
0.7
1.4
0.6
0.8
1.2
0.5
1
0.6
0.4
0.8
0.3
0.4
0.6
0.2
0.4
0.2
0.1
0.2
0
gro uped by 5
gro uped by 5
gro uped by 25
50
gro uped by 25
gro uped by 100
1
3
55
7
99
11
11
13
Iteration nr
15
17
17
19
Numerical test – 5000 samples
Entropy
1,324
1,418
1,438
1,458
1,399
1,372
1,159
Exact :
1,418
The red line marks the exact density function of a standard normal vrb.
Results – 20 iterations (1000 samples)
by 1’s
by 25’s
by 50’s
by 100’s
standard
normal
Weibull
=2,5
Weibull
=1,5
gamma
==5
mean
1,149
0,285
0,613
0,274
deviation
0,025
0,0181
0,018
0,0266
mean
1,424
0,546
0,881
0,546
deviation
0,014
0,0222
0,0209
0,0243
mean
1,459
0,568
0,904
0,575
deviation
0,033
0,0266
0,0267
0,029
mean
1,506
0,616
0,943
0,628
deviation
0,039
0,028
0,0332
0,034
Compare results to exact solutions from slide 6
Updating the distribution
before updating
after updating
Dk
Yk
D(N+1) Y(N+1)
Dk+1
D(N+2)
Yk+1
Dk+2
Updating the entropy
HN – the entropy calculated based on N samples.
H N 1
1
 H N  ln N   N  A  B  
 ln N  1
N 1
where:
A  ln Dk 1  Dk   ln Dk  2  Dk 1 
B  ln D N 1  Dk   ln D N  2   D N 1   ln Dk  2  D N  2  
The program - properties
•Uses the approach from the previous slide
•Starts updating the entropy from N=4
•Written in VBA, uses spreadsheet only to store the samples
•Results exactly the same as computed in Matlab (for the same
samples)
•It is not grouping samples
Results
Comparison of results obtained using formula and program
(5000 samples – without grouping and adding the bias).
formula
program
standard normal
1,143918
1,143918
Weibull =2,5
0,290707
0,290707
Weibull =1,5
0,634027
0,634027
gamma ==5
0,280102
0,280102
The program updates the entropy HN starting from N = 4.
Results, cont.
exact solution program (1000 program (5000
samples)
samples)
standard normal 1,4189
1,4272
1,4209
Weibull =2,5
0,5496
0,5482
0,5709
Weibull =1,5
0,8892
0,8543
0,8849
gamma ==5
0,5441
0,5206
0,5544
Relative information – theoretical value = 2,0345
I(X|Y), 1000 samples, X=N(0,1), Y=N(5,3)
ungrouped
grouped by 2's
grouped by 5's
2,40
Relative information
grouped by 10's
2,30
2,20
2,10
2,00
1,90
1,80
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
iteration nr