Download python-bayesnets

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Aspects of Bayesian Inference
and
Statistical Disclosure Control
in Python
Duncan Smith
Confidentiality and Privacy Group
CCSR
University of Manchester
Introduction

Bayesian Belief Networks (BBNs)
probabilistic inference

Statistical Disclosure Control (SDC)
deterministic inference (attribution)
Bayesian Belief Networks

Decision-making in complex domains

Hard and soft evidence

Correlated variables

Many variables
Bayes’ Rule
P A B  
PB A
P B 
P  A
A prior belief and evidence
combined to give a posterior belief
Venn Diagram
Event B
Event A
A
only
Both
A&B
Neither A nor B
B
only
Inference
1. Prior probability table
P(A)
a
a
0.7
0.3
2. Conditional probability
table P(B|A)
a
a
b
3/7
2/3
b
4/7
1/3
1
1
3. Produce joint probability
table by multiplication
b
b
a
a
0.3
0.2
0.4
0.7
0.1
0.3
4. Condition on evidence
b
a
a
0.3
0.2
0.5
0.5
1
5. Normalise table
probabilities to sum to 1
b
a
a
0.6
0.4
def Bayes(prior, conditional, obs_level):
"""Simple Bayes for two categorical variables. 'prior'
is a Python list. 'conditional' is a list of lists
(‘column’ variable conditional on ‘row’ variable). 'obs_level'
is the index of the observed level of the row variable"""
levels = len(prior)
# condition on observed level
result = conditional[obs_level]
# multiply values by prior probabilities
result = [result[i] * prior[i] for i in range(levels)]
# get marginal probability of observed level
marg_prob = sum(result)
# normalise the current values to sum to 1
posterior = [value / marg_prob for value in result]
return posterior
Note: conditioning
can be carried out
before calculating the
joint probabilities,
reducing the cost of
inference
>>> A = [0.7, 0.3]
>>> B_given_A = [[3.0/7, 2.0/3], [4.0/7, 1.0/3]]
>>> Bayes(A, B_given_A, 0)
[0.59999999999999998, 0.39999999999999997]
>>>

The posterior distribution can be used as a
new prior and combined with evidence
from further observed variables

Although computationally efficient, this
‘naïve’ approach implies assumptions that
can lead to problems
Naive Bayes
P A, B ,C   PC APB AP A
A
B
C
A ‘correct’ factorisation
P A, B ,C   PC A, B PB AP A
A
B
C
Conditional independence

The Naive Bayes example assumes:
PC A, B   PC A

But if valid, the calculation is easier and
fewer probabilities need to be specified
The conditional independence
implies that if A is observed, then
evidence on B is irrelevant in
calculating the posterior of C
A Bayesian Belief Network
R
W

S
H
R and S are independent until H is
observed
A Markov Graph
R
W

S
H
The conditional independence structure is
found by marrying parents with common
children
Factoring

The following factorisation is implied
PH , R, S ,W   PW R PR PS PH R, S 

So P(S) can be calculated as follows
(although there is little point, yet)
PS   PS  PR  PH R, S  PW R 
R
H
W

If H and W are observed to be in states h
and w, then the posterior of S can be
expressed as follows (where epsilon
denotes ‘the evidence’)
PS    PS  PR  PH  h R, S  PW  w R 
R
Graph Triangulation
A
B
A
B
A
B
D
C
D
C
D
C
E
E
E
Belief Propagation
R,H,S
R
W,R

Message passing in a Clique Tree
S
R,S
R,H,S

W,R
Message passing in a Directed Junction
Tree
A Typical BBN
Belief Network Summary

Inference requires a decomposable graph

Efficient inference requires a good

decomposition
Inference involves evidence instantiation,
table combination and variable
marginalisation
Statistical Disclosure Control

Releases of small area population
(census) data

Attribution occurs when a data intruder
can make inferences (with probability 1)
about a member of the population
Lawyer
Profession
Accountant
Col sum


Department
A
B
C
18
4
2
2
3
0
20
7
2
Row sum
24
5
29
Negative Attribution - An individual who
is an accountant does not work for
Department C
Positive Attribution - An individual who
works in Department C is a lawyer

Release of the full table is not safe from an
attribute disclosure perspective (it contains
a zero)

Each of the two marginal tables is safe
(neither contains a zero)

Is the release of the two marginal tables
‘jointly’ safe?
The Bounds Problem

Given a set of released tables (relating to
the same population), what inferences
about the counts in the ‘full’ table can be
made?

Can a data intruder derive an upper bound
of zero for any cell count?
A non-graphical case
A
B
C

All 2 × 2 marginals of a 2×2×2 table

A maximal complete subgraph (clique)
without an individual corresponding
table
Var1
Var1
Var2
A
B
Var3
A
B
C
3
9
E
1
10
D
2
2
F
4
1
Var2
Var1 and Var2
Var3 C
D
Var3 A, C A, D B, C B, D
E
8
3
E
0
1
8
2
F
4
1
F
3
1
1
0

Original cell counts can be recovered from
the marginal tables
Lawyer
Profession
Accountant
Col sum

Department
A
B
C
20
7
2
5
5
2
20
7
2
Row sum
24
5
29
Each cell’s upper bound is the minimum of it’s
relevant margins (Dobra and Fienberg)
SDC Summary

A set of released tables relating to a given
population

If the resulting graph is both graphical and
decomposable, then the upper bounds can
be derived efficiently
Common aspects

Graphical representations
Graphs / cliques / nodes / trees

Combination of tables
Pointwise operations

BBNs
pointwise multiplication

SDC
and
pointwise minimum
pointwise addition
pointwise subtraction
}
For calculating
exact lower
bounds
Coercing Numeric built-ins

A table is a numeric array with an
associated list of variables

Marginalisation is trivial, using the built-in
Numeric.add.reduce() function and
removing the relevant variable from the list

Conditioning is easily achieved using a
Numeric.take() slice, appropriately
reshaping the array with
Numeric.reshape() and removing the
variable from the list

Pointwise multiplication

Numeric.multiply() generates the
appropriate table IF the two tables have
identical ranks and variable lists

This is ensured by adding new axes
(Numeric.NewAxis) for the ‘missing’ axes
and transposing one of the tables
(Numeric.transpose()) so that the variable
lists match
array([24, 5]) ['profession']
(2,)
array([20, 7, 2]) ['department']
(3,)
array([[24],
[ 5]])
['profession', 'department']
array([[20, 7, 2]])
['profession', 'department']
(2, 1)
(1, 3)
>>> prof * dept
array ([[480, 168, 48],
[100, 35, 10]])
['profession', 'department']
>>> (prof * dept).normalise(29)
array([[ 16.551, 5.793, 1.655],
[ 3.448, 1.206, 0.344]])
['profession', 'department']

Pointwise minimum / addition / subtraction

Numeric.minimum(), Numeric.add() and
Numeric.subtract() generate the
appropriate tables IF the two tables have
identical ranks and variable lists AND the
two tables also have identical shape

This is ensured by a secondary
preprocessing stage where the tables from
the first preprocessing stage are multiplied
by a ‘correctly’ shaped table of ones (this
is actually quicker than using
Numeric.concatenate())
array([[24],
[ 5]])
['profession', 'department']
array([ [20, 7, 2]])
['profession', 'department']
array([[20, 7, 2]
[20, 7, 2]])
(2, 1)
(1, 3)
(2nd stage preprocessing)
(2,3)
>>> prof.minimum(dept)
array([[20, 7, 2],
[ 5, 5, 2]])
['profession', 'department']
Summary

The Bayesian Belief Network software
was originally implemented in Python for
two reasons
1. The author was, at the time, a relatively
inexperienced programmer
2. Self-learning (albeit with some help) was
the only option

The SDC software was implemented in
Python because,
1. Python + Numeric turned out to be a
wholly appropriate solution for BBNs
(Python is powerful, Numeric is fast)
2. Existing code could be reused