Download handout - Indiana University Computer Science Department

1/2/13 I529: Machine Learning in Bioinformatics (Spring 2013) Definitions   Probabilistic models –  A model means a system that simulates the object under consideration –  A probabilistic model is one that produces different outcomes with different probabilities (BSA) Probabilistic models Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Why probabilistic models Probability   The biological system being analyzed is stochastic   Or noisy   Or completely deterministic, but because a number of hidden variables effecting its behavior are unknown, the observed data might be best explained with a probabilistic model   Experiment: a procedure involving chance that leads to different results   Outcome: the result of a single trial of an experiment   Event: one or more outcomes of an experiment   Probability: the measure of how likely an event is –  Between 0 (will not occur) and 1 (will occur) Templates for equations Yuzhen Ye January 1, 2013 Example: a fair 6-sided dice Random variable 1 Equation for slides   Random variables are functions that assign a L Y (nisi /N ) unique number possible outcome of an R(S|P ) = to each experiment i=1 bsi   An example   Outcome: The possible outcomes of this experiment are 1, 2, 3, 4, 5 and 6   Events: 1; 6; even   Probability: outcomes are equally likely to occur X= –  Experiment: tossing a coin –  Outcome space: {heads, tails} ⇢ –  1 if heads X= 0 if tails –  P(A) = The Number Of Ways Event A Can Occur / The Total Number Of Possible Outcomes –  P(1)=P(6)=1/6; P(even)=3/6=1/2; 2 (1) (2) –  More exactly, X is a discrete random variable Mutual Information and others –  P(X=1)=1/2, P(X=0)=1/2 X fx y M Iij = fxi yj log2 i j fxi fxj xy (3) i j 2 2/5 3 3/5 M Iij = log2 + log2 =0 5 1 ⇤ 2/5 5 1 ⇤ 3/5 k( i , k( i , j) = j) = hLi e 1 X n=m=0 k( i , j) = 1 X n=0 n n n!n! Hi n m n!m! hLi Hin LTi , Lj Hjn LTj i + LTi , Lj e Hj LTj i (5) hLi Hin LTi , Lj Hjm LTj i 1 X n=m=0,n6=m n m n!n! (4) hLi Hin LTi , Lj Hjm LTj i (6) (7) 1 1/2/13 Templates for equations Yuzhen Ye January 1, 2013 Probability distribution 1 Probability Equation for slidesmass function (pmf) L Y (n   A probability mass function isisai /N function that ) R(S|P ) = gives the probability thati=1 a discrete random bs i variable is exactly equal to some value; it is often the primary means ⇢ of defining a discrete 1 if heads X= probability distribution 0 if tails   An example   Probability distribution: the assignment of a probability P(x) to each outcome x.   A fair dice: outcomes are equally likely to occur  the probability distribution over the all six outcomes P(x)=1/6, or 6. Templates forx=1,2,3,4,5 equations Templates for equations   A loaded dice: outcomes are unequally likely to occur  the probability over the all six Yuzhen Yuzhen Yedistribution Ye outcomes P(x)=f(x), Templates forx=1,2,3,4,5 equationsor 6, but ∑f(x)=1. P (X) = JanuaryJanuary 2, 2013 2, 2013 Yuzhen Ye 1 1 Equation for slides January 2, 2013 1 Equation for slides Equation for slides random random variable variable L L Y Y (nis /N )(nisi /N ) R(S|P ) = R(S|P ) =i bs i bs i i=1 2 (1) k( i , (2) (2) (3) = hLi e The probability of1 a DNA sequence   Event: Observing a DNA sequence S=s1s2…sn: si ∈ {A,C,G,T};   Random sequence model (or Independent and identically-distributed, i.i.d. model): si occurs at random with the probability P(si), independent of all other residues in the sequence; n   P(S)= ∏ P (si ) i =1   This model will be used as a background model (or called a null hypothesis). Hi LTi , Lj e Hj LTj i (3) (4) n m j) = (2) 1 (3) (3) (4) (4) (4) 1 X 1 n n n m X n joint T n T nY T m T –  P(X,Y) probability (distribution) Yuzhen Yeof XhLand hLi H i H i L i , Lj H j L j i i L i , Lj H j L j i + n!n! n!n! n=0 –  P(X,Y)=P(X|Y)P(Y)=P(Y|X)P(X) n=m=0,n6=m January 2, 2013 –  P(X|Y)=P(X), X and Y are independent (5) –  P(y): y=D1 or D2 L Y (nisi /N ) –  P(i, D1)=P(i| D1)P(D1) R(S|P ) = bs i i=1 –  P(i| D1)=P(i| D2), independent events (5) (1) random variable X= (5) (5) (6)   Example: experiment 11 (selecting a dice), experiment 2 (rolling Equation for slidesthe selected dice) 1 P (x, y)1 P (x, y) =R 1 P (y) P (x, y)dy (2)   Two experiments (random variables) X and Y k( i , P (X = x) = Z 0b Z b P (a  XP f (x)dx (ab)=X  b) = f (x)dx Z ab a P (a  X  b) = f (x)dx marginalmarginal probability probability a   Cumulative distribution (cdf) Z Z x function Z P (x) ==P (x) P (x, =fy)dy P (x, y)dy P (X  x) (t)d(t) P (x|y) ⌘ j) 1 X   A pdf must be integrated over an interval to yield conditional probability f xi y j fxi fxj Joint k( probability hLi Hin LTi , Lj Hjm LTj i i, j ) = n!m! n=m=0 Templates for equations probability densityafunction probability, since P (X = a) = 0= x) = 0 P (X Z P (x, y) P (x,Py)(x, y) P (x, y) (x) = ⌘ P (x, P (x|y)P ⌘ = Ry)dy = R P (x|y) P (y) P (x, y)dyP (x, y)dy P (y) fxi yj log2 2 2/5 3 3/5 M Iij = log2 + log2 =0 5 1 ⇤ 2/5 5 1 ⇤ 3/5 (1) Probability density function (pdf) conditional probability marginal probability conditional probability X xi y j i=1 L Y (nis /N ) R(S|P⇢) = ⇢ i 1 ifi=1heads 1bsifi heads X= X= 0 if tails 0 if tails random variable ⇢ 1 if heads probability mass function probability mass function X= 0 if tails 8 8 probability mass function < 1/2 heads < 1/2 heads   Probability density functions (pdf) are for 1/2 P (X) =8 P (X) =tails1/2 tails : : 0 than others heads 0 others random < 1/2 continuous rather discrete P (X) variables; f(x)= 1/2 tails probability density function probability density function: 0 others 8 < 1/2 heads 1/2 tails : 0 others Mutual Information and others M Iij = (1) (1) ⇢ 1 if heads 0 if tails probability mass function P (X) = (6) probability density function Marginal probability 8 < 1/2 heads 1/2 tails : 0 others P (X = a) = 0 (the   The distribution of the marginal variables marginal distribution) is obtained by marginalizing Z b over the distribution of the variables being discarded P (a  X  b) = f (x)dx (so the discarded variables are marginalized out) a marginal P(X|Y)P(Y) probability   P(X)=∑ Y Z   Example: experiment 1 (selecting a dice), P (x) = P (x, y)dy experiment 2 (rolling the selected dice) (2) (3) (4) –  P(y): y=D1 probability or D2 conditional –  P(i) =P(i| D1)P(D1)+P(i| D2)P(D2) –  P(i| D1)=P(i| D2), independent events P P(i| (x|y) –  P(i)= P(i| D1)(P(D1)+P(D2))= D1)⌘ P (x, y) P (x, y) =R P (y) P (x, y)dy (5) 1 2 1 if heads 0 if tails X= probability mass function P (X) = probability density function 8 < 1/2 heads 1/2 tails : 0 others 1/2/13 P (X = x) = 0 P (a  X  b) = P (X  x) = Z Z (2) b f (x)dx x f (t)d(t) 1 Conditional probability marginal probability   Conditioning the joint distribution on a particular Z observation P (x) = P (x, y)dy   Conditional probability P(X|Y): the measure of how conditional probability likely an event X happens under the condition Y; P (x|y) ⌘ (3) a P (x, y) P (x, y) =R P (y) P (x, y)dy (4) Probability models (5)   A system that produces different outcomes with different probabilities. (6)   It can simulate a class of objects (events), assigning each an associated probability. –  Example: two dices D1, D2   Simple objects (processes)  probability distributions •  P(i|D1) probability for 1 picking i using dicer D1 •  P(i|D2) probability for picking i using dicer D2 Binomial distribution Typical probability distributions           Binomial distribution Gaussian distribution Multinomial distribution Poisson distribution Dirichlet distribution   An experiment with binary outcomes: 0 or 1;   Probability distribution of a single experiment: P(‘1’)=p and P(‘0’) = 1-p;   Probability distribution of N tries of the same experiment ⎛ N ⎞ k N −k   Bi(k ‘1’s out of N tries) ~ ⎜⎜⎝ k ⎟⎟⎠ p (1 − p) Probabilitydistributions distributions Probability YuzhenYeYe Yuzhen January2, 2,2013 2013 January CommonDistributions Distributions Gaussian distribution 1 1 Common Gaussian distribution   When N -> ∞, Bi -> Gaussian distribution Gaussian Distribution. • •Gaussian Distribution.   The Gaussian (normal) distribution is aprobability distribution that The Gaussian normal) distribution a continuous The Gaussian (or(or normal) distribution is is a continuous probability distribution that continuous probability distribution with probability has a bell-shaped probability density function as: has a bell-shaped probability density function as: density function defined as: 1 x µ 2 1 2 f (x; ) = p1p e e12 ( x2 ( µ )2 ) f (x; µ,µ, ↵2↵) = (1)(1) 2⇡2⇡ µ: mean (expectation); σ :2 variance (σ: the standard where the mean or expectation, and 2 is is variance the standard deviation). where µµ is is the mean or expectation, and variance ( ( is is the standard deviation). derivation) If we define a new variable u= (x µ)/ µ)/, we , we have, If we define a new variable u= (x have, If we define a new variable u=(x-µ)/σ 2 u2 /2 1 1 eu2 /2 f (x) f (x) ⇠ ⇠p p2⇡ e 2⇡ (2)(2) Figure from Wikipedia standard normal distribution when µ = 0 and σ2 =1 Binominal Distribution. • •Binominal Distribution. Thebinomial binomialdistribution distributiongives givesthe thediscrete discreteprobability probabilitydistribution distributionof ofobtaining obtaining The exactly successes out Bernoulli trials (where the result each Bernoulli trial exactly nn successes out of of N NBernoulli trials (where the result of of each Bernoulli trial true with probability p and false with probability q= is is true with probability of of p and false with probability of of q= 1 1 p):p): N n (N n) n) P (n|N ) =N npnpq (N q P (n|N )= n (3)(3) Theprobability probabilityof ofobtaining obtainingmore moresuccesses successesthan thanthe then nobserved observedin ina abinomial binomialdisdisThe tribution tribution is:is: N NX X N k (N k) q k) N k kp (N P P= = (4)(4) k p q k=n+1 k=n+1 3 1/2/13 events occur with a known average rate ( ) and independently of the time since the last event (i.e., it is a Poisson process): p (n) = n e (5) n! Multinomial distribution I receive 10 emails everyday on average, what’s my chance of receiving Example: a fair dice Example 1: no email today? Example 2:   An experiment with K independent outcomes   Probability: outcomes (1,2,…,6) are equally likely to occur with probabilities θi, i =1,…,K, ∑θi =1. • Multinominal Distribution.   Probability distributionoutcomes of N tries of the same An with K independent with probabilities ✓i for i = 1, , K, P experiment experiment, getting nn i occurrences of outcome i, ∑ni ✓ = 1. The probability of getting occurrences of outcome i is giving by (n = i i =N (n={ni}). {ni }), K Y P (n|✓) = M 1 (n) ✓ini (6)   Probability of rolling 1 dozen times (12) and getting each outcome twice: –  12 () 12! 1 26 6 ~3.4×10-3 i=1 M (n) = Q ni ! n1 !n2 ! · · · nK ! P = Pi ( k nk )! ( k nk )! (7) • Gamma Distribution. The gamma distribution has been used to model the rate of evolution at di↵erent sites in DNA sequences. The gamma distribution is conjugate to the Poisson which gives the probability of seeing n events over some interval, when there is a probability p of an individual event occurring in that interval. Since the number of events in an interval is a rate, the gamma distribution is appropriate for modeling probabilities of rates. • Dirichlet Distribution In Bayesian statistics, we need distributions over probability parameters to use as prior distributions. A natural choice for a density over probaa loaded dice bilities is theExample: Dirichlet distribution: 2 2.1 € Poisson distribution   Poisson gives the probability of seeing n events over some interval, when there is a probability p of an individual event occurring in that period.   Probability: outcomes (1,2,…,6) are unequally likely to occur: P(6)=0.5, Logistic Linear Regression P(1)=P(2)=…=P(5)=0.1 Fitting Liker other regressions, logistic regression makes use of one or more predictor variables that may be either continuous or categorical data. Also, like other linear regression models, the   Probability of rolling 1 dozen times (12) and expected value (average value) of the response variable is fit to the predictors—the expected getting each outcome twice: value of a Bernoulli distribution the probability of a case. But, unlike ordinary 2 is simply 10 12! -4 –  2regression 6 (0.5) × (0.1) linear regression, logistic is used~1.87×10 for predicting binary outcomes (Bernoulli trials) instead of continuous values. Given this di↵erence, it is necessary that logistic regression take the natural logarithm of the odds (referred to as the logit or log-odds) to create a € 2 Poisson distribution for sequencing coverage modeling Poisson distribution C Assuming uniform distribution of reads: Length of genomic segment: L Number of reads: n Length of each read: l Coverage λ = n l / L How much coverage is enough (or what is sufficient oversampling)? Lander-Waterman model: P(x) = (λx * e-λ ) / x! P(x=0) = e-λ where λ is coverage 4 K Y P (n|✓) = M 1 (n) ✓ini (6) • Gamma Distribution. i=1 The gamma distribution has been used to model theQrate of evolution at di↵erent ni ! n1 !n sites in DNA sequences. The gamma distribution to the Poisson which 2 ! · · · nK ! is conjugate P M (n) = = Pi (7) ( k nwhen gives the probability of seeing n events( over interval, there is a probability k )! k )! k nsome p of an individual event occurring in that interval. Since the number of events in an •interval Gamma is Distribution. a rate, the gamma distribution is appropriate for modeling probabilities of The gamma distribution has been used to model the rate of evolution at di↵erent rates. 1 ↵ sites in DNA sequences. The gamma is conjugate to the Poisson which e x x↵ distribution g(x,of↵,seeing ) = n events over fsome or 0 interval, < x, ↵, when < 1 there is a probability (8) gives the probability (↵) p of an individual event occurring in that interval. Since the number of events in an • Dirichlet Bayesian statistics, need distributions overprobabilities probabilityof interval Distribution is a rate, the In gamma distribution is we appropriate for modeling parameters to use as prior distributions. A natural choice for a density over probarates. x ↵ 1 ↵ e x bilities is the Dirichlet distribution: g(x, ↵, ) = f or 0 < x, ↵, < 1 (8) (↵)K K Y ↵ 1 X Dirichlet Indistribution = Z 1 (↵) ✓i i we (need✓idistributions 1) (9) • Dirichlet DistributionD(✓|↵) Bayesian statistics, over probability i=1 i=1 parameters to use as prior distributions. A natural choice for a density over proba  ,Outcomes: θ=(θ↵1, θ>2,…, θK)constants specifying the Dirichlet distribilities is ↵ the Dirichlet where ↵= , with 0, are 1 ↵ 2 , · · · , ↵Kdistribution: i bution, and the ✓i satisfies 0  ✓i  1 and sum to 1. K K Y X i 1 expressed in terms of the gamma The normalizing factor ZD(✓|↵) for the = Dirichlet Z 1 (↵) can ✓i↵be ( ✓i 1) (9)   Density: function: i=1 i=1 Q Z Y K K X ↵i 1 iP (↵i ) ( are✓iconstants 1)d✓ = specifying (10) where ↵ = ↵1 , ↵2 ,Z(↵) · · · , ↵=K , with✓↵ the Dirichlet distrii i > 0, ( i ↵i ) i=1 sum to 1. bution, and the ✓i satisfies 0i=1  ✓i  1 and The normalizing factor Z for the Dirichlet can be expressed in terms of the gamma 2  different α function:   (α1, α2,…, αK) are constants Q Z Y K K X (↵ gives different probability distribution over θ.i ) Z(↵) = ✓i↵i 1 ( ✓i 1)d✓ = iP (10) ( i ↵i )   K=2  Beta distribution i=1 1/2/13 Example: dice factories   Dice factories produce all kinds of dices: θ(1), θ(2),…, θ(6)   A dice factory distinguish itself from the others by parameters α=(α1,α2 ,α3 , α4 , α5 , α6)   The probability of producing a dice θ in the factory α is determined by D(θ|α) i=1 2 Parameter estimation: MLE and MAP MLE Probabilistic model   Selecting a model   Estimating the model parameters (learning): Yuzhen Ye from large sets of trusted examples –  A model can be anything from a simple distribution to a complex stochastic grammar with many implicit probability distributions –  Probabilistic distributions (Gaussian, binominal, etc) –  Probabilistic graphical models •  •  •  •  Markov models Hidden Markov models (HMM) Bayesian models Stochastic grammars January 2, 2013 1 MLE   Given a set of data D (training set), find a model with parameters θ with the maximal likelihood P(D|θ)   Data  model (learning) –  The parameters of the model have to be inferred from the data –  MLE (maximum likelihood estimation) & MAP (maximum a posteriori probability)   Model  data (inference/sampling) ✓ˆM LE = arg max P (D|✓) ✓ 2 (1) MAP ✓ˆM AP = arg max P (✓|D) ✓ P (D|✓)P (✓) P (D) = arg max P (D|✓)P (✓) = arg max ✓ Parameter estimation: MLE and MAP Yuzhen Ye Example: a loaded dice January 2, 2013   Loaded dice: to estimate parameters θ1, θ2,…, θ6, based on N observations D=d1,d2,…dN 1 2 MLE   θi=ni / N, where ni is the occurrence of i outcome (observed frequencies), thePmaximum ✓ˆM LE = arg is max (D|✓) ✓ likelihood solution (BSA 11.5) We can show that P (n|✓M LE ) > P (n|✓) for any ✓ 6= ✓M LE ) MAP   Learning from counts ✓ˆM AP ✓ (2) Here the prior is used (as in Bayesian statistics): When to use MLEP (D|✓)P (✓) P (✓|D) = P (D) P (D|✓)P (✓) = is P   A drawback of MLE that it can give poor (D|✓)P (✓) ✓ Pare estimations when the data scarce (1) (3) –  E.g, if you flip coin twice, you may only get heads, then P(tail) = 0   It may be wiser to apply prior knowledge (e.g, we assume P(tail) is close to 0.5) –  Use MAP instead = arg max P (✓|D) ✓ P (D|✓)P (✓) P (D) = arg max P (D|✓)P (✓) = arg max ✓ ✓ (2) 1 Here the prior is used (as in Bayesian statistics): P (✓|D) = = P (D|✓)P (✓) P (D) P (D|✓)P (✓) P ✓ P (D|✓)P (✓) 5 (3) Yuzhen Ye January 2, 2013 1 1/2/13 MLE ✓ˆM LE = arg max P (D|✓) (1) ✓ 2 MAP Parameter estimation: MLE and MAP ✓ˆM AP = arg max P (✓|D) ✓ Yuzhen P (D|✓)P (✓) Ye P (D) = arg max PJanuary (D|✓)P (✓) 2, 2013 = arg max ✓ MAP (2) ✓ Example: two dices Here the prior is used (as in Bayesian statistics):   Bayesian statistics 1 MLE P (✓|D) = = 2 P (D|✓)P (✓) P (D) (✓) ✓ˆPM(D|✓)P LE = arg max P (D|✓) P ✓ ✓ P (D|✓)P (✓) P(θParameter )  prior probability estimation: MLE P(θ|D)  posterior probability Yuzhen Ye MAP ✓ˆM AP   MAP   Prior probabilities: fair dice 0.99; loaded dice: 0.01;   Loaded dice: P(6)=0.5, P(1)=…P(5)=0.1 (1)   Data: 3 consecutive ‘6’es: (3) and MAP –  P(loaded|3’6’s)=P(loaded)*[P(3’6’s|loaded)/P(3’6’s)] = 0.01*(0.53 / C) –  P(fair|3’6’s)=P(fair)*[P(3’6’s|fair)/P(3’6’s)] = 0.99 * ((1/6)3 / C) –  Model comparison by using likelihood ratio: P(loaded| 3’6’s) / P(fair|3’6’s) < 1 –  So fair dice is more likely to generate the observation. = arg max P 2, (✓|D) January 2013 ✓ P (D|✓)P (✓) P (D) = arg max P (D|✓)P (✓) = arg max 1 ✓ MLE (2) ✓ ✓ˆM LE = arg max P (D|✓) (1) ✓ Here the prior is used (as in Bayesian statistics): We can show that P (n|✓ 1 M LE ) > P (n|✓) for any ✓ 6= ✓M LE 2 P (D|✓)P (✓) P (D) P (D|✓)P (✓) = P ✓ˆM AP = arg max P (✓|D) ✓ P (D|✓)P (✓) P (✓|D) = MAP (3) ✓ P (D|✓)P (✓) We see the posterior is itself like a Dirichlet distribution like the but with di↵erent = prior, arg max ✓ P (D) parameters. Now we have, = arg max P (D|✓)P (✓) (2) Z(n + ↵) ✓ P (n) = (6)   Use prior knowledge when the data is scarce Z(↵)M (n) multinominal distribution, can use Dirichlet withdi↵erent parameters ↵ We seeFor thethe posterior is itself like a Dirichletwedistribution like thedistribution prior, but with We then perform an integral to find the Dirichlet posterior mean estimator, as prior for the Use distribution parameters. NowThe we have, as the prior. the posterior for the multinomial distribution with observations n: Z Z + ↵) Y Z(n multinomial distribution: (6) k 1 d✓ P (n|✓)D(✓|↵) (7) ✓i P=(n|✓)P ✓knk +↵(✓) ✓iP M E = ✓i D(✓|n + ↵)d✓ = Z 1 (n + ↵) P (n) (n) –  Posterior P (✓|n) = k Z(↵)M = (3) P (n) P (n) We then perform an integral to find the posterior mean estimator, Learning from counts: including prior –  Posterior mean estimator Z Z Z(n + ↵ + i ) distribution and the expression Y n +↵for 1D(✓|↵) yields, Inserting multinominal M= E 1 ✓iP M✓PEthe (7) = Z(n ✓i D(✓|n ✓k k(8)k d✓ i + ↵) + ↵)d✓ = Z (n + ↵) ✓i Y k 1 Z(n + ↵) n +↵ 1 i i PP(✓|n) ✓i = D(✓|n + ↵) M E = ni + ↵ i ✓i = P (n)Z(↵)M (n) (9) + ↵ + i ) P (n)Z(↵)M (n) i N + A P M E Z(n ✓i = (8) (4) –  Equivalent to add αi as pseudo-counts to the ↵) Here the prior is used We (as don’t in Bayesian statistics): 1Z(n +functions have to get involved with gamma in order to finish the calculation, observation ni (BSAP11.5) ni + ↵ i ME because we know P that both P (✓|n) and D(✓|n + ↵) are properly normalized probability ✓ = (9) i (D|✓)P (✓)about statistics N + A –  We can forget and use P (✓|D) over = ✓. This means that all the pre factors must cancel and, distributions (D) Here the prior is usedP(as in Bayesian pseudo-counts in thestatistics): parameter estimation! P (D|✓)P (✓) P (✓|n) = D(✓|n + ↵) (5) = P (10) (✓) = P (D|✓)P (✓) ✓ P (D|✓)P P (✓|D) P (D) P (D|✓)P (✓) = P 1 (10) ✓ P (D|✓)P (✓) Sampling   Probabilistic model with parameter θ  P(x| θ) for event x;   Sampling: generate a large set of events xi with probability P(xi| θ);   Random number generator ( function rand() picks a number randomly from the interval [0,1) with the uniform density;   Sampling from a probabilistic model  transforming P(xi| θ) to a uniform distribution –  For a finite set X (xi∈X), find i s.t. P(x1)+…+P(xi-1) < rand(0,1) < P(x1)+…+P(xi-1) + P(xi) Entropy Mutual information   Probabilities distributions P(xi) over K events   Measure of independence of two random variable X and Y   P(X|Y)=P(X), X and Y are independent  P(X,Y)/P(X)P(Y)=1   M(X;Y)=∑x,y P(x,y)log[P(x,y)/P(x)P(y)]   H(x)=-∑ P(xi) log P(xi) –  Maximized for uniform distribution P(xi)=1/K –  A measure of average uncertainty –  0  independent 2 2 6

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download handout - Indiana University Computer Science Department