Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BAYESIAN NETWORK Submitted By Faisal Islam Srinivasan Gopalan Vaibhav Mittal Vipin Makhija Prof. Anita Wasilewska State University of New York at Stony Brook References [1]Jiawei Han:”Data Mining Concepts and Techniques”,ISBN 153860-489-8 Morgan Kaufman Publisher. [2] Stuart Russell,Peter Norvig “Artificial Intelligence – A modern Approach ,Pearson education. [3] Kandasamy,Thilagavati,Gunavati , Probability, Statistics and Queueing Theory , Sultan Chand Publishers. [4] D. Heckerman: “A Tutorial on Learning with Bayesian Networ ks”, In “Learning in Graphical Models”, ed. M.I. Jordan, The MIT Press, 1998. [5] http://en.wikipedia.org/wiki/Bayesian_probability [6] http://www.construction.ualberta.ca/civ606/myFiles/Intro%2 0to%20Belief%20Network.pdf [7] http://www.murrayc.com/learning/AI/bbn.shtml [8] http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html [9] http://en.wikipedia.org/wiki/Bayesian_belief_network CONTENTS HISTORY CONDITIONAL PROBABILITY BAYES THEOREM NAÏVE BAYES CLASSIFIER BELIEF NETWORK APPLICATION OF BAYESIAN NETWORK PAPER ON CYBER CRIME DETECTION HISTORY Bayesian Probability was named after Reverend Thomas Bayes (1702-1761). He proved a special case of what is current ly known as the Bayes Theorem. The term “Bayesian” came into use around the 1950’s. Pierre-Simon, Marquis de Laplace (1749-1 827) independently proved a generalized ve rsion of Bayes Theorem. http://en.wikipedia.org/wiki/Bayesian_probability HISTORY (Cont.) 1950’s – New knowledge in Artificial Intelligen ce 1958 Genetic Algorithms by Friedberg (Hollan d and Goldberg ~1985) 1965 Fuzzy Logic by Zadeh at UC Berkeley 1970 Bayesian Belief Network at Stanford University (Judea Pearl 1988) The idea’s proposed above was not fully developed until later. BBN became popular in the 1990s. http://www.construction.ualberta.ca/civ606/myFiles/Intro%20to%20Belief%20Network.pdf HISTORY (Cont.) Current uses of Bayesian Networks: Microsoft’s printer troubleshooter. Diagnose diseases (Mycin). Used to predict oil and stock prices Control the space shuttle Risk Analysis – Schedule and Cost Overru ns. CONDITIONAL PROBABILITY Probability : How likely is it that an event will happen? Sample Space S Element of S: elementary event An event A is a subset of S P(A) P(S) = 1 Events A and B P(A|B)- Probability that event A occurs given that event B has already occurred. Example: There are 2 baskets. B1 has 2 red ball and 5 blue ball. B2 has 4 re d ball and 3 blue ball. Find probability of picking a red ball from basket 1? CONDITIONAL PROBABILITY The question above wants P(red ball | basket 1). The answer intuitively wants the probability o f red ball from only the sample space of b asket 1. So the answer is 2/7 The equation to solve it is: P(A|B) = P(A∩B)/P(B) [Product Rule] P(A,B) = P(A)*P(B) [ If A and B are independe nt ] How do you solve P(basket2 | red ball) ??? BAYESIAN THEOREM A special case of Ba yesian Theorem: P(A∩B) = P(B) x P(A|B ) A B P(B∩A) = P(A) x P(B|A ) Since P(A∩B) = P(B∩A ), P(B) x P(A|B) = P(A) x P(B|A) P( A) P( B | A) P( A) P( B | A) P( A | B) ( B) P APB | A PA PB | A => P(A|B) = [P(A) xPP( BAYESIAN THEOREM Solution to P(basket2 | red ball) ? P(basket 2| red ball) = [P(b2) x P(r | b2)] / P(r) = (1/2) x (4/7)] / (6/14) = 0.66 BAYESIAN THEOREM Example 2: A medical cancer diagnosis problem There are 2 possible outcomes of a diagnos is: +ve, -ve. We know .8% of world popul ation has cancer. Test gives correct +ve r esult 98% of the time and gives correct –v e result 97% of the time. If a patient’s test returns +ve, should we diagnose the patient as having cancer? BAYESIAN THEOREM P(cancer) = .008 P(+ve|cancer) = .98 P(+ve|-cancer) = .03 P(-cancer) = .992 P(-ve|cancer) = .02 P(-ve|-cancer) = .97 Using Bayes Formula: P(cancer|+ve) = P(+ve|cancer)xP(cancer) / P(+ve) = 0.98 x 0.008 = .0078 / P(+ve) P(-cancer|+ve) = P(+ve|-cancer)xP(-cancer) / P(+ve) = 0.03 x 0.992 = 0.0298 / P(+ve) So, the patient most likely does not have cancer. BAYESIAN THEOREM General Bayesian Theorem: Given E1, E2,…,En are mutually disjoint ev ents and P(Ei) ≠ 0, (i = 1, 2,…, n) P(Ei/A) = [P(Ei) x P(A|Ei)] / Σ P(Ei) x P(A| Ei) i = 1, 2,…, n BAYESIAN THEOREM Example: There are 3 boxes. B1 has 2 white, 3 bla ck and 4 red balls. B2 has 3 white, 2 bl ack and 2 red balls. B3 has 4 white, 1 bla ck and 3 red balls. A box is chosen at r andom and 2 balls are drawn. 1 is white and other is red. What is the probability t hat they came from the first box?? BAYESIAN THEOREM Let E1, E2, E3 denote events of choosing B1, B2, B3 respectively. Let A be the ev ent that 2 balls selected are white and r ed. P(E1) = P(E2) = P(E3) = 1/3 P(A|E1) = [2c1 x 4c1] / 9c2 = 2/9 P(A|E2) = [3c1 x 2c1] / 7c2 = 2/7 P(A|E3) = [4c1 x 3c1] / 8c2 = 3/7 BAYESIAN THEOREM P(E1|A) = [P(E1) x P(A|E1)] / Σ P(Ei) x P(A |Ei) = 0.23727 P(E2|A) = 0.30509 P(E3|A) = 1 – (0.23727 + 0.30509) = 0.4576 4 BAYESIAN CLASSIFICATION Why use Bayesian Classification: Probabilistic learning: Calculate explicit probabilities for hypothesis, among the mo st practical approaches to certain types of learning problems Incremental: Each training example can incrmentally increase/decrease the probabil ity that a hypothesis is correct. Prior knowl edge can be combined with observed dat a. BAYESIAN CLASSIFICATION Probabilistic prediction: Predict multiple hypotheses, weighted by their probabiliti es Standard: Even when Bayesian methods are computationally intractable, they ca n provide a standard of optimal decisio n making against which other method s can be measured NAÏVE BAYES CLASSIFIER A simplified assumption: attributes are conditionally independent: Greatly reduces the computation cost, onl y count the class distribution. NAÏVE BAYES CLASSIFIER The probabilistic model of NBC is to find the probab ility of a certain class given multiple dijoint (assum ed) events. The naïve Bayes classifier applies to learning tasks where each instance x is described by a conjuncti on of attribute values and where the target functio n f(x) can take on any value from some finite set V. A set of training examples of the target function is provided, and a new instance is presented, de scribed by the tuple of attribute values <a1,a2, …,an>. The learner is asked to predict the target v alue, or classification, for this new instance. NAÏVE BAYES CLASSIFIER Abstractly, probability model for a classifier is a conditional model P(C|F1,F2,…,Fn) Over a dependent class variable C with a small nuumber of outcome or classes conditional o ver several feature variables F1,…,Fn. Naïve Bayes Formula: P(C|F1,F2,…,Fn) = argmaxc [P(C) x P(F1|C) x P(F2|C) x…x P(Fn|C)] / P(F1,F2,…,Fn) Since P(F1,F2,…,Fn) is common to all probabili ties, we donot need to evaluate the denomitat or for comparisons. NAÏVE BAYES CLASSIFIER Tennis-Example NAÏVE BAYES CLASSIFIER Problem: Use training data from above to classify t he following instances: a) <Outlook=sunny, Temperature=cool, Humidity=high, Wind=strong> b) <Outlook=overcast, Temperature=cool , Humidity=high, Wind=strong> NAÏVE BAYES CLASSIFIER Answer to (a): P(PlayTennis=yes) = 9/14 = 0.64 P(PlayTennis=n) = 5/14 = 0.36 P(Outlook=sunny|PlayTennis=yes) = 2/9 = 0.22 P(Outlook=sunny|PlayTennis=no) = 3/5 = 0.60 P(Temperature=cool|PlayTennis=yes) = 3/9 = 0 .33 P(Temperature=cool|PlayTennis=no) = 1/5 = .2 0 P(Humidity=high|PlayTennis=yes) = 3/9 = 0.33 P(Humidity=high|PlayTennis=no) = 4/5 = 0.80 P(Wind=strong|PlayTennis=yes) = 3/9 = 0.33 NAÏVE BAYES CLASSIFIER P(yes)xP(sunny|yes)xP(cool|yes)xP(high|y es)xP(strong|yes) = 0.0053 P(no)xP(sunny|no)xP(cool|no)xP(high|no) x P(strong|no) = 0.0206 So the class for this instance is ‘no’. We ca n normalize the probility by: [0.0206]/[0.0206+0.0053] = 0.795 NAÏVE BAYES CLASSIFIER Answer to (b): P(PlayTennis=yes) = 9/14 = 0.64 P(PlayTennis=no) = 5/14 = 0.36 P(Outlook=overcast|PlayTennis=yes) = 4/9 = 0.44 P(Outlook=overcast|PlayTennis=no) = 0/5 = 0 P(Temperature=cool|PlayTennis=yes) = 3/9 = 0.33 P(Temperature=cool|PlayTennis=no) = 1/5 = .20 P(Humidity=high|PlayTennis=yes) = 3/9 = 0.33 P(Humidity=high|PlayTennis=no) = 4/5 = 0.80 P(Wind=strong|PlayTennis=yes) = 3/9 = 0.33 P(Wind=strong|PlayTennis=no) = 3/5 = 0.60 NAÏVE BAYES CLASSIFIER Estimating Probabilities: In the previous example, P(overcast|no) = 0 which causes the formulaP(no)xP(overcast|no)xP(cool|no)xP(high| no)xP(strong|nno) = 0.0 This causes problems in comparing becau se the other probabilities are not consid ered. We can avoid this difficulty by usin g mestimate. NAÏVE BAYES CLASSIFIER M-Estimate Formula: [c + k] / [n + m] where c/n is the origina l probability used before, k=1 and m= equivalent sample size. Using this method our new values of probility is given below- NAÏVE BAYES CLASSIFIER New answer to (b): P(PlayTennis=yes) = 10/16 = 0.63 P(PlayTennis=no) = 6/16 = 0.37 P(Outlook=overcast|PlayTennis=yes) = 5/12 = 0.42 P(Outlook=overcast|PlayTennis=no) = 1/8 = .13 P(Temperature=cool|PlayTennis=yes) = 4/12 = 0.33 P(Temperature=cool|PlayTennis=no) = 2/8 = .25 P(Humidity=high|PlayTennis=yes) = 4/11 = 0.36 P(Humidity=high|PlayTennis=no) = 5/7 = 0.71 P(Wind=strong|PlayTennis=yes) = 4/11 = 0.36 P(Wind=strong|PlayTennis=no) = 4/7 = 0.57 NAÏVE BAYES CLASSIFIER P(yes)xP(overcast|yes)xP(cool|yes)xP(hi gh|yes)xP(strong|yes) = 0.011 P(no)xP(overcast|no)xP(cool|no)xP(high| no)xP(strong|nno) = 0.00486 So the class of this instance is ‘yes’ NAÏVE BAYES CLASSIFIER The conditional probability values of all t he attributes with respect to the class are pre-computed and stored on disk. This prevents the classifier from comput ing the conditional probabilities every ti me it runs. This stored data can be reused to reduc e the BAYESIAN BELIEF NETWORK In Naïve Bayes Classifier we make the assumpti on of class conditional independence, that is gi ven the class label of a sample, the value of the attributes are conditionally independent of one another. However, there can be dependences between value of attributes. To avoid this we use Bayesia n Belief Network which provide joint conditional probability distribution. A Bayesian network is a form of probabilistic graphical model. Specifically, a Bayesian netwo rk is a directed acyclic graph of nodes represe nting variables and arcs representing depen dence relations among the variables. BAYESIAN BELIEF NETWORK A Bayesian network is a representation of the joint distribution over all the variables represen ted by nodes in the graph. Let the variables be X(1), ..., X(n). Let parents(A) be the parents of the node A. T hen the joint distribution for X(1) through X(n) i s represented as the product of the proba bility distributions P(Xi|Parents(Xi)) for i = 1 to n. If X has no parents, its probability dist ribution is said to be unconditional, otherwise it is conditional. BAYESIAN BELIEF NETWORK BAYESIAN BELIEF NETWORK By the chaining rule of probability, the j oint probability of all the nodes in the gr aph above is: P(C, S, R, W) = P(C) * P(S|C) * P(R|C) * P(W|S,R) W=Wet Grass, C=Cloudy, R=Rain, S=Sprinkler Example: P(W∩-R∩S∩C) = P(W|S,-R)*P(-R|C)*P(S|C)*P(C) = 0.9*0.2*0.1*0.5 = 0.009 BAYESIAN BELIEF NETWORK What is the probability of wet grass on a given day - P(W)? P(W) = P(W|SR) * P(S) * P(R) + P(W|S-R) * P(S) * P(-R) + P(W|-SR) * P(-S) * P(R) + P(W|-S-R) * P(-S) * P(-R) Here P(S) = P(S|C) * P(C) + P(S|-C) * P(-C) P(R) = P(R|C) * P(C) + P(R|-C) * P(-C) P(W)= 0.5985 Advantages of Bayesian Approac h Bayesian networks can readily handle incomplete data sets. Bayesian networks allow one to learn about causal relationships Bayesian networks readily facilitate use of prior knowledge. APPLICATIONS OF BayesianNetwork Sources/References Naive Bayes Spam Filtering Using Word-Position-Based Attributes- http://www.ceas.cc/paper s-2005/144.pdf by-: Johan Hovold, Department of Computer Science,Lund University Box 118, 221 00 Lund, Sweden.[E-mail [email protected]] [Presented at CEAS 2005 Second Conference on Email and Anti-Spam July 21 & 22, at Stanford University] Tom Mitchell , “ Machine Learning” , Tata Mcgraw Hill A Bayesian Approach to Filtering Junk EMail, Mehran Sahami Susan Dumaisy David Heckermany Eric Horvitzy Gates Building Computer Science Department Microsoft Research, Stanford University Redmond W Stanford CA fsdumais heckerma horvitzgmicrosoftcom [Presented at AAAI Workshop on Learning for Text Categorization, July 1998, Madison, Wiscon sin] Problem??? real world Bayesian network application – “Learning to classify text. “ Instances are text documents we might wish to learn the target concept “electronic ne ws articles that I find interesting,” or “pages on the Worl d Wide Web that discuss data mining topics.” In both cases, if a computer could learn the target conc ept accurately, it could automatically filter the large volu me of online text documents to present only the most relevan t documents to the user. TECHNIQUE learning how to classify text, based on the naive Bayes classifier it’s a probabilistic approach and is among the most effe ctive algorithms currently known for learning to classify t ext documents, Instance space X consists of all possible text documents given training examples of some unknown target function f(x), which can take on any value from some finite set V we will consider the target function classifying document s as interesting or uninteresting to a particular person, u sing the target values like and dislike to indicate these t wo classes. Design issues how to represent an arbitrary text docume nt in terms of attribute values decide how to estimate the probabilities r equired by the naive Bayes classifier Approach Our approach to representing arbitrary text document s is disturbingly simple: Given a text document, such as this paragraph, we define an attribute for each wor d position in the document and define the value of t hat attribute to be the English word found in that pos ition. Thus, the current paragraph would be described by 111 attribute values, corresponding to the 111 wor d positions. The value of the first attribute is the word “our,” the value of the second attribute is the word “a pproach,” and so on. Notice that long text documents will require a larger number of attributes than short do cuments. As we shall see, this will not cause us any t rouble. ASSUMPTIONS assume we are given a set of 700 training documents that a friend has classified as dislike and another 300 she has classified as like We are now given a new document and asked to classify it let us assume the new text document is the preceding paragraph We know (P(like) = .3 and P (dislike) = .7 in the current example P(ai , = wk|vj) (here we introduce wk to indicate the kth word in the English vocabulary) estimating the class conditional probabilities (e.g., P(ai = “our”Idislike)) is more problematic because we must estimate one such probability term for each combination of text position, English word, and target value. there are approximately 50,000 distinct words in the English vocabulary, 2 possible target values, and 111 text positions in the current example, so we must estimate 2*111* 50, 000 =~10 million such terms from the training data. we shall assume the probability of encountering a specific word wk (e.g., “chocolate”) is independent of the specific word position being considered (e.g., a23 versus a95) . we estimate the entire set of probabilities P(a1= wk|vj), P(a2= wk|vj)... by the single position-independent probability P(wklvj) net effect is that we now require only 2* 50, 000 distinct terms of the form P(wklvj) We adopt the rn-estimate, with uniform priors and with m equal to the size of the word vocabulary n total number of word positions in all training examples whose target value is v, nk is the number of times word Wk i s found among these n word positions, and Vocabulary is th e total number of distinct words (and other tokens) found wi thin the training data. Final Algorithm Examples is a set of text documents along with their target values. V is the set of all possible target values. This function learns the probability terms P( wk| vj), describing the probability that a randomly drawn word from a document in class vj will be the English word Wk. It also learns the class prior probabilities P(vi). 1. collect all words, punctuation, and other tokens that occur in Examples • Vocabulary set of all distinct words & tokens occurring in any text document from Examples 2. calculate the required P(vi) and P( wk| vj) probability terms • For each target value vj in V do • docsj the subset of documents from Examples for which the target value is vj • P(v1) IdocsjI / \Examplesl • Textj a single document created by concatenating all members of docsj • n total number of distinct word positions in Textj • for each word Wk in Vocabulary nk number of times word wk occurs in Textj • P(wkIvj) nk+1/n+|Vocabulary| CLASSIFY_NAIVE_BAYES_TEXT( Doc) Return the estimated target value for the document Doc. ai denotes the word found in the ith position within Doc. • positions all word positions in Doc that contain tokens found in Vocabulary • Return VNB, where During learning, the procedure LEARN_NAIVE_BAYES_TEXT examines all training documents to extract the vocabulary of all words and tokens that appear in the text, then counts their frequencies among the different target classes to obtain the necessary probability estimates. Later, given a new document to be classified, the procedure CLASSIFY_NAIVE_BAYESTEXT uses these probability estimates to calculate VNB according to Equation Note that any words appearing in the new document that were not observed in the training set are simply ignored by CLASSIFY_NAIVE_BAYESTEXT Effectiveness of the Algorithm Problem classifying usenet news articles target classification for an article name of the usenet newsgroup in which the article appeared In the experiment described by Joachims (1996), 20 electronic newsgroups were considered 1,000 articles were collected from each newsgroup, forming a data set of 20,0 00 documents. The naive Bayes algorithm was then applied using two-thirds o f these 20,000 documents as training examples, and performance was measur ed over the remaining third. 100 most frequent words were removed (these include words such as “the” and “of’), and any word occurring fewer than three times was also removed. The resulting vocabulary contained approximately 38,500 words. The accuracy achieved by the program was 89%. comp.graphics misc.forsale soc.religion.christian alt.atheism comp.os.ms-winclows.misc rec.autos talk.politics.guns sci.space cornp.sys.ibm.pc.hardware rec.sport.baseball talk.politics.mideast sci.crypt comp.windows.x rec.motorcycles talk.politics.misc sci.electronics comp.sys.mac.hardware rec.sport.hockey talk.creligion.misc sci .med APPLICATIONS A newsgroup posting service that learns to assign documents to the appropriate newsgroup. NEWSWEEDER system—a program for reading netnews that allows the user to rate articles as he or she reads them. NEWSWEEDER then uses these rated articles (i.e its learned profile of user interests to suggest the most highly rated new articles each day Naive Bayes Spam Filtering Using Word- Positi on-Based Attributes Thank you ! Bayesian Learning Networks Approach to Cybercrime Detection Bayesian Learning Networks Approach to Cybercrime Detection N S ABOUZAKHAR, A GANI and G MANSON The Centre for Mobile Communications Research (C4MCR), University of Sheffield, Sheffield Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK [email protected] [email protected] [email protected] M ABUITBEL and D KING The Manchester School of Engineering, University of Manchester IT Building, Room IT 109, Oxford Road, Manchester M13 9PL, UK [email protected] [email protected] REFERENCES 1. David J. Marchette, Computer Intrusion Detection and Network Monitoring, A statistical Viewpoint, 2001,Springer-Verlag, New York, Inc, USA. 2. Heckerman, D. (1995), A Tutorial on Learning with Bayesian Networks, Technical Report MSR-TR-95-06, Microsoft Corporation. 3. Michael Berthold and David J. Hand, Intelligent Data Analysis, An Introduction, 1 999, Springer, Italy. 4. http://www.ll.mit.edu/IST/ideval/data/data_index.html, accessed on 01/12/2002 5. http://kdd.ics.uci.edu/ , accessed on 01/12/2002. 6. Ian H. Witten and Eibe Frank, Data Mining, Practical Machine Learning Tools and Techniques with Java Implementations, 2000, Morgan Kaufmann, USA. 7. http://www.bayesia.com , accessed on 20/12/2002 Motivation behind the paper.. Growing dependence of modern society on telecommunication and information networks. Increase in the number of interconnected networks to the Internet has led to an increase in security threats and cyber crimes. Structure of the paper In order to detect distributed network attacks as early as possible, an under research and development probabilistic approach, based on Bayesian networks has been proposed. Where can this model be utilized Learning Agents which deploy Bayesian network approach are considered to be a promising and useful tool in determini ng suspicious early events of Internet threats. Before we look at the detai ls given in the paper lets understand what Bayesian Networks are and how they are constructed…………. Bayesian Networks A simple, graphical notation for conditional independe nce assertions and hence for compact specification of full joint distributions. Syntax: a set of nodes, one per variable a directed, acyclic graph (link ≈ "directly influences" ) a conditional distribution for each node given its parents: P (Xi | Parents (Xi)) In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the Some conventions………. Variables depicted as node s Arcs represent probabilistic dependence between variables. Conditional probabilities encode the strength of dependencies. Missing arcs implies conditional independence. Semantics The full joint distribution is defined as the product of the local conditional distributions: P (X1, … ,Xn) = πi = 1 P (Xi | Parents(Xi)) e.g., P(j m a b e) = P (j | a) P (m | a) P (a | b, e) P (b) P (e) Example of Construction of a BN Back to the discussion of the paper………. Description This paper shows how probabilistically B ayesian network detects communication network attacks, allowing for generalizati on of Network Intrusion Detection Syste ms (NIDSs). Goal How well does our model detect or classif y attacks and respond to them later on. The system requires the estimation of two quantities: The probability of detection (PD) Probability of false alarm (PFA). It is not possible to simultaneously achi eve a PD of 1 and PFA of 0. Input DataSet The 2000 DARPA Intrusion Detection Ev aluation Program which was prepared a nd managed by MIT Lincoln Labs has pr ovided the necessary dataset. Sample dataset Construction of the network The following figure shows the Bayesian network that has been automatically constructed by the learning algorithms of BayesiaLab. The target variable, activity_type, is directl y connected to the variables that heavily contribute to its knowledge such as servic e and protocol_type. Data Gathering MIT Lincoln Labs set up an environment t o acquire several weeks of raw TCP dump data for a local-area network (LAN) simulating a typical U.S. Air Force LAN. T he generated raw dataset contains about few million connection records. Mapping the simple Bayesian Network that we saw to the one used in the paper Observation 1: As shown in the next figure, the most pro bable activity corresponds to a smurf at tack (52.90%), an ecr_i (ECHO_REPLY) service (52.96%) and an icmp protocol (53.21%). Observation 2: What would happen if the probability of receiving ICMP protocol packets is incre ased? Would the probability of having a smurf attack increase? Setting the protocol to its ICMP value in creases the probability of having a smur f attack from 52.90% to 99.37%. Observation 3: Let’s look at the problem from the opposite di rection. If we set the probability of portsweep attack to 100%,then the value of some associ ated variables would inevitably vary. We note from Figure 4 that the probabilities o f the TCP protocol and private service have b een increased from 38.10% to 97.49% and fr om 24.71% to 71.45% respectively. Also, we can notice an increase in the REJ and RSTR fl ags. How do the previous examples work?? PROPOGATION Data Data Benefits of the Bayesian Model The benefit of using Bayesian IDSs is the abili ty to adjust our IDS’s sensitivity. This would allow us to trade off between accuracy and sensitivity. Furthermore, the automatic detection network anomalies by learning allows distinguishing th e normal activities from the abnormal ones. Allow network security analysts to see the amount of information being contributed by e ach variable in the detection model to the kno wledge of the target node Performance evaluation QUESTIONS OR QUERIES Thank you !