Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
LINKÖPINGS UNIVERSITET Department of Computer and Information Science Division of Statistics/ANd 732A45 Statistical Evidence Evaluation Fall semester 2012 Assignment 1 Assignment 1 Below are four tasks that you shall try to solve. All tasks include the construction of one or several Bayesian networks. In your solutions you should provide each network constructed with its probability tables (except for task 3 where you are supposed to implement a network that is given in the book). All questions put should be answered. Prepare your solutions in a nice format that can be easily read. Your solutions should be submitted at latest on Friday 30 November. 1. “Simple” example Assume some rare fibres have been secured from a car seat and they are suspected to origin from a sweatshirt worn by a suspect. The fibres match fibres of the sweatshirt but when they are compared to a database of 3557 fibres, no match is found (the colour is unique). a) Consider a feature with an unknown population proportion . The Bayesian approach to point estimation of is to describe the initial uncertainty about it using a prior distribution, p( ) and update this distribution using available data x to the (hopefully) more accurate posterior distribution q( | x ) . In general the posterior is obtained as f x p q x f x p d where f (x | ) is the likelihood of with the data. In many cases, data consist of a number n of sampled elements (hypergeometrical sampling) or random trials (binomial sampling) in which the absolute frequency x of elements possessing the feature has been obtained. With binomial sampling (which is a good approximation to hypergeometric sampling when the population is large) the likelihood becomes n n x f x x 1 x By choosing a beta(a,b) - distribution as prior, i.e. a 1 1 b 1 where B(a,b) is the Beta function p Ba, b it can easily be shown (do it as an exercise) that the posterior is a beta(a + x, b + n – x )-distribution As a point estimate of we may then take the posterior mean (assuming the so-called loss function to be squared loss. Assume a beta(1,1) prior for (i.e. a uniform distribution) and compute a point estimate using the data given above. LINKÖPINGS UNIVERSITET Department of Computer and Information Science Division of Statistics/ANd 732A45 Statistical Evidence Evaluation Fall semester 2012 Assignment 1 b) Construct a Bayesian network for this scenario and calculate the evidence value of the match between the fibres and the sweatshirt with respect to the pair of propositions Hp : “The sweatshirt is the source of the secured fibres” Hd : “Something else is the source of the secured fibres” c) Use the network to compute the posterior probability of Hp if the prior odds for Hp are 100 to 1 against (i.e. evaluated as 0.01). Do you think the commissioner has a case against the suspect? 2. One-trace problem Assume a shoe mark was found at a crime scene. A suspect has a shoe with a sole pattern that matches the pattern of the shoe mark. The frequency of this pattern is about 1.2% among all shoes that would be potential sources of the mark. Construct a Bayesian network to calculate the evidence value of the match under the conditions: a) The floor of the crime scene was cleaned prior to the offense and the suspect has only one pair of shoes. b) The floor of the crime scene was cleaned prior to the offense and the suspect has five pairs of shoes. He wore the shoes with the matching pattern when he was caught. c) One estimates that four persons other than the offender may have been walking around at the crime scene after it was cleaned but prior to the offense. The suspect has only one pair of shoes. d) One estimates that four persons other than the offender may have been walking around at the crime scene after it was cleaned but prior to the offense. The suspect has five pairs of shoes and he did not wear the pair of shoes with the matching pattern when he was caught. 3. Distinct components Consider the example on page 107ff in the book (section 4.2.5). a) Build the network the way it is done in the example and run it. Confirm the result obtained in the book. b) Which impact have the choices of r (relevance term) and w (the conditional probability of F given G and Hp ? ) for the evidence value? Try different values and draw your conclusions. 4. The two-stain problem At a crime scene, two stains of blood are found with different DNA profiles. The DNA of the first stain has a relative frequency of 1.210-8 and the DNA of the second stain has a relative frequency of 7.610-9 in the relevant population of DNA profiles. LINKÖPINGS UNIVERSITET Department of Computer and Information Science Division of Statistics/ANd 732A45 Statistical Evidence Evaluation Fall semester 2012 Assignment 1 The crime only involved one offender and there is one suspect whose DNA profile matches that of the first stain. This is a classical problem which is thoroughly described in the book on pages 117 ff. The proposition at crime level is Hp : “The suspect was the offender”. However, there is more than one possibility for the stains to be relevant for the crime and accordingly there are more than two states for the source node. a) Try to construct a Bayesian network for this scenario with which you can calculate the evidence value of the match between the suspect’s DNA and the DNA of the first stain. Try first before looking at the book pages. b) Build a parallel network in which there is only one stain with the same DNA profile as that of the first stain and with a match in DNA between the suspect and the stain. Compare the evidence value for this match with the evidence value of the match in the previous scenario when either the first or the second stain is relevant for the offense with equal probabilities. c) What would happen if we extend the case to 3 stains with different DNA, 4 stains with different DNA etc. but still with only one offender and with one of the stains relevant for the offense (all equally likely to be that stain).