Download Assignment 1 - IDA.LiU.se

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistical inference wikipedia , lookup

Foundations of statistics wikipedia , lookup

Bayesian inference wikipedia , lookup

Transcript
LINKÖPINGS UNIVERSITET
Department of Computer and
Information Science
Division of Statistics/ANd
732A45 Statistical Evidence Evaluation
Fall semester 2012
Assignment 1
Assignment 1
Below are four tasks that you shall try to solve. All tasks include the construction of one or
several Bayesian networks. In your solutions you should provide each network constructed
with its probability tables (except for task 3 where you are supposed to implement a network
that is given in the book). All questions put should be answered. Prepare your solutions in a
nice format that can be easily read.
Your solutions should be submitted at latest on Friday 30 November.
1. “Simple” example
Assume some rare fibres have been secured from a car seat and they are suspected to origin
from a sweatshirt worn by a suspect. The fibres match fibres of the sweatshirt but when
they are compared to a database of 3557 fibres, no match is found (the colour is unique).
a) Consider a feature with an unknown population proportion  . The Bayesian approach
to point estimation of  is to describe the initial uncertainty about it using a prior
distribution,
p( ) and update this distribution using available data x to the (hopefully) more
accurate posterior distribution q( | x ) . In general the posterior is obtained as
f x    p 
q x  
 f x   p d

where f (x |  ) is the likelihood of  with the data.
In many cases, data consist of a number n of sampled elements (hypergeometrical
sampling) or random trials (binomial sampling) in which the absolute frequency x of
elements possessing the feature has been obtained. With binomial sampling (which is a
good approximation to hypergeometric sampling when the population is large) the
likelihood becomes
 n
n x
f x        x  1   
x
 
By choosing a beta(a,b) - distribution as prior, i.e.
 a 1  1   b 1
where B(a,b) is the Beta function
p  
Ba, b 
it can easily be shown (do it as an exercise) that the posterior is a
beta(a + x, b + n – x )-distribution
As a point estimate of  we may then take the posterior mean (assuming the so-called
loss function to be squared loss.
Assume a beta(1,1) prior for  (i.e. a uniform distribution) and compute a point
estimate using the data given above.
LINKÖPINGS UNIVERSITET
Department of Computer and
Information Science
Division of Statistics/ANd
732A45 Statistical Evidence Evaluation
Fall semester 2012
Assignment 1
b) Construct a Bayesian network for this scenario and calculate the evidence value of the
match between the fibres and the sweatshirt with respect to the pair of propositions
Hp : “The sweatshirt is the source of the secured fibres”
Hd : “Something else is the source of the secured fibres”
c) Use the network to compute the posterior probability of Hp if the prior odds for Hp are
100 to 1 against (i.e. evaluated as 0.01). Do you think the commissioner has a case
against the suspect?
2. One-trace problem
Assume a shoe mark was found at a crime scene. A suspect has a shoe with a sole pattern
that matches the pattern of the shoe mark. The frequency of this pattern is about 1.2%
among all shoes that would be potential sources of the mark.
Construct a Bayesian network to calculate the evidence value of the match under the
conditions:
a) The floor of the crime scene was cleaned prior to the offense and the suspect has only
one pair of shoes.
b) The floor of the crime scene was cleaned prior to the offense and the suspect has five
pairs of shoes. He wore the shoes with the matching pattern when he was caught.
c) One estimates that four persons other than the offender may have been walking around
at the crime scene after it was cleaned but prior to the offense. The suspect has only
one pair of shoes.
d) One estimates that four persons other than the offender may have been walking around
at the crime scene after it was cleaned but prior to the offense. The suspect has five
pairs of shoes and he did not wear the pair of shoes with the matching pattern when he
was caught.
3. Distinct components
Consider the example on page 107ff in the book (section 4.2.5).
a) Build the network the way it is done in the example and run it. Confirm the result
obtained in the book.
b) Which impact have the choices of r (relevance term) and w (the conditional probability
of F given G and Hp ? ) for the evidence value? Try different values and draw your
conclusions.
4. The two-stain problem
At a crime scene, two stains of blood are found with different DNA profiles. The DNA of
the first stain has a relative frequency of 1.210-8 and the DNA of the second stain has a
relative frequency of 7.610-9 in the relevant population of DNA profiles.
LINKÖPINGS UNIVERSITET
Department of Computer and
Information Science
Division of Statistics/ANd
732A45 Statistical Evidence Evaluation
Fall semester 2012
Assignment 1
The crime only involved one offender and there is one suspect whose DNA profile matches
that of the first stain.
This is a classical problem which is thoroughly described in the book on pages 117 ff.
The proposition at crime level is Hp : “The suspect was the offender”. However, there is
more than one possibility for the stains to be relevant for the crime and accordingly there
are more than two states for the source node.
a) Try to construct a Bayesian network for this scenario with which you can calculate the
evidence value of the match between the suspect’s DNA and the DNA of the first stain.
Try first before looking at the book pages.
b) Build a parallel network in which there is only one stain with the same DNA profile as
that of the first stain and with a match in DNA between the suspect and the stain.
Compare the evidence value for this match with the evidence value of the match in the
previous scenario when either the first or the second stain is relevant for the offense
with equal probabilities.
c) What would happen if we extend the case to 3 stains with different DNA, 4 stains with
different DNA etc. but still with only one offender and with one of the stains relevant
for the offense (all equally likely to be that stain).