* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slides - UTSA CS
Survey
Document related concepts
Transcript
CS5263 Bioinformatics Lecture 9: Motif finding Biological & Statistical background Roadmap • Review of last lecture • Intro to probability and statistics • Intro to motif finding problems – Biological background Multiple Sequence Alignment Scoring functions • Ideally: x – Maximizes probability that sequences evolved from common ancestor ? y z w • In practice: – Sum of Pairs v x: y: z: x: ACGCGG-C y: ACGC-GAC AC-GCGG-C AC-GC-GAG GCCGC-GAG x: AC-GCGG-C; z: GCCGC-GAG; y: AC-GCGAG z: GCCGCGAG Algorithms • • • • MDP Progressive alignment Iterative refinement Restricted DP MDP • Similar to pair-wise alignment – O(2NLN) running time – O(LN) memory (i-1,j-1,k-1) (i-1,j-1,k) F(i-1,j-1,k-1) + S(xi, xj, xk), F(i-1,j-1,k ) + S(xi, xj, -), F(i-1,j ,k-1) + S(xi, -, xk), F(i,j,k) = max F(i ,j-1,k-1) + S(-, xj, xk), F(i-1,j ,k ) + S(xi, -, -), F(i ,j-1,k ) + S(-, xj, -), F(i ,j ,k-1) + S(-, -, xk) (i,j-1,k-1) (i,j-1,k) (i-1,j,k-1) (i-1,j,k) (i,j,k-1) (i,j,k) Progressive alignment • Most popular multiple alignment algorithm – CLUSTALW • Main idea: – Construct a guide tree based on pair-wise alignment scores – Align the most similar sequences first – Progressively add other sequences • Pros: fast (O(NL2) • Cons: initial bad alignment is frozen Iterative Refinement • Basic idea: – Do progressive alignment first – Iteratively: • Remove a sequence, and realign it back while keeping the rest fixed • A note of its convergence guarantee – Every time we realign a sequence, we improve its score – Therefore, the algorithm must converge to either a global or local maximum Restricted MDP • Similar to bounded DP in pair-wise alignment 1. Construct progressive multiple alignment m 2. Run MDP, restricted to radius R from m z y Running Time: O(2N RN-1 L) x Today • Probability and statistics • Biology background for motif finding Probability Basics • Definition (informal) – Probabilities are numbers assigned to events that indicate “how likely” it is that the event will occur when a random experiment is performed – A probability law for a random experiment is a rule that assigns probabilities to the events in the experiment – The sample space S of a random experiment is the set of all possible outcomes Example 0 P(Ai) 1 P(S) = 1 Random variable • A random variable is a function from a sample to the space of possible values of the variable – When we toss a coin, the number of times that we see heads is a random variable – Can be discrete or continuous • The resulting number after rolling a die • The weight of an individual Cumulative distribution function (cdf) • The cumulative distribution function FX(x) of a random variable X is defined as the probability of the event {X≤x} F (x) = P(X ≤ x) for −∞ < x < +∞ Probability density function (pdf) • The probability density function of a continuous random variable X, if it exists, is defined as the derivative of FX(x) • For discrete random variables, the equivalent to the pdf is the probability mass function (pmf): Probability density function vs probability • What is the probability for somebody weighting 200lb? • The figure shows about 0.62 – What is the probability of 200.00001lb? • The right question would be: – What’s the probability for somebody weighting 199-201lb. • The probability mass function is true probability – The chance to get any face is 1/6 Some common distributions • Discrete: – – – – – Binomial Multinomial Geometric Hypergeometric Possion • Continuous – – – – – – Normal (Gaussian) Uniform EVD Gamma Beta … Probabilistic Calculus • If A, B are mutually exclusive: – P(A U B) = P(A) + P(B) • Thus: P(not(A)) = P(Ac) = 1 – P(A) A B Probabilistic Calculus • P(A U B) = P(A) + P(B) – P(A ∩ B) Conditional probability • The joint probability of two events A and B P(A∩B), or simply P(A, B) is the probability that event A and B occur at the same time. • The conditional probability of P(B|A) is the probability that B occurs given A occurred. P(A | B) = P(A ∩ B) / P(B) Example • Roll a die – If I tell you the number is less than 4 – What is the probability of an even number? • P(d = even | d < 4) = P(d = even ∩ d < 4) / P(d < 4) • P(d = 2) / P(d = 1, 2, or 3) = (1/6) / (3/6) = 1/3 Independence • P(A | B) = P(A ∩ B) / P(B) => P(A ∩ B) = P(B) * P(A | B) • A, B are independent iff – P(A ∩ B) = P(A) * P(B) – That is, P(A) = P(A | B) • Also implies that P(B) = P(B | A) – P(A ∩ B) = P(B) * P(A | B) = P(A) * P(B | A) Examples • Are P(d = even) and P(d < 4) independent? – – – – P(d = even and d < 4) = 1/6 P(d = even) = ½ P(d < 4) = ½ ½ * ½ > 1/6 • If your die actually has 8 faces, will P(d = even) and P(d < 5) be independent? • Are P(even in first roll) and P(even in second roll) independent? • Playing card, are the suit and rank independent? Theorem of total probability • Let B1, B2, …, BN be mutually exclusive events whose union equals the sample space S. We refer to these sets as a partition of S. • An event A can be represented as: •Since B1, B2, …, BN are mutually exclusive, then P(A) = P(A∩B1) + P(A∩B2) + … + P(A∩BN) •And therefore P(A) = P(A|B1)*P(B1) + P(A|B2)*P(B2) + … + P(A|BN)*P(BN) = i P(A | Bi) * P(Bi) Example • Row a loaded die, 50% time = 6, and 10% time for each 1 to 5 • What’s the probability to have an even number? Prob(even) = Prob(even | d < 6) * Prob(d<6) + Prob(even | d=6) * Prob(d=6) = 2/5 * 0.5 + 1 * 0.5 = 0.7 Another example • We have a box of dies, 99% of them are fair, with 1/6 possibility for each face, 1% are loaded so that six comes up 50% of time. We pick up a die randomly and roll, what’s the probability we’ll have a six? • P(six) = P(six | fair) * P(fair) + P(six | loaded) * P(loaded) – 1/6 * 0.99 + 0.5 * 0.01 = 0.17 > 1/6 Bayes theorem • P(A ∩ B) = P(B) * P(A | B) = P(A) * P(B | A) Likelihood => P(B | A) = Posterior probability of A P ( A | B ) P (B ) Prior of B P( A) Normalizing constant This is known as Bayes Theorem or Bayes Rule, and is (one of) the most useful relations in probability and statistics Bayes Theorem is definitely the fundamental relation in Statistical Pattern Recognition Bayes theorem (cont’d) • Given B1, B2, …, BN, a partition of the sample space S. Suppose that event A occurs; what is the probability of event Bj? • P(Bj | A) = P(A | Bj) * P(Bj) / P(A) = P(A | Bj) * P(Bj) / jP(A | Bj)*P(Bj) Bj: different models In the observation of A, should you choose a model that maximizes P(Bj | A) or P(A | Bj)? Depending on how much you know about Bj ! Example • Prosecutor’s fallacy – Some crime happened – The suspect did not leave any evidence, except some hair – The police got his DNA from his hair • Some expert matched the DNA with that of a suspect – Expert said that both the false-positive and false negative rates are 10-6 • Can this be used as an evidence of guilty against the suspect? Prosecutor’s fallacy • • • • Prob (match | innocent) = 10-6 Prob (no match | guilty) = 10-6 Prob (match | guilty) = 1 - 10-6 ~ 1 Prob (no match | innocent) = 1 - 10-6 ~ 1 • Prob (guilty | match) = ? Prosecutor’s fallacy P (g | m) = P (m | g) * P(g) / P (m) ~ P(g) / P(m) • P(g): the probability for someone to be guilty with no other evidence • P(m): the probability for a DNA match • How to get these two numbers? – We don’t really care P(m) – We want to compare two models: • P(g | m) and P(i | m) Prosecutor’s fallacy • P(i | m) = P(m | i) * P(i) / P(m) = 10-6 * P(i) / P(m) • Therefore P(i | m) / P(g | m) = 10-6 * P(i) / P(g) • P(i) + P(g) = 1 • It is clear, therefore, that whether we can conclude the suspect is guilty depends on the prior probability P(i) • How do you get P(i)? Prosecutor’s fallacy • How do you get P(i)? • Depending on what other information you have on the suspect • Say if the suspect has no other connection with the crime, and the overall crime rate is 10-7 • That’s a reasonable prior for P(g) • P(g) = 10-7, P(i) ~ 1 • P(i | m) / P(g | m) = 10-6 * P(i) / P(g) = 10-6/10-7 = 10 • P(observation | model1) / P(observation | model2): likelihood-ratio test • LR test • Often take logarithm: log (P(m|i) / P(m|i)) • Log likelihood ratio (score) • Or log odds ratio (score) • Bayesian model selection: log (P(model1 | observation) / P(model2 | observation)) = LLR + log P(model1) - log P(model2) Prosecutor’s fallacy • P(i | m) / P(g | m) = 10-6/10-7 = 10 • Therefore, we would say the suspect is more likely to be innocent than guilty, given only the DNA samples • We can also explicitly calculate P(i | m): P(m) = P(m|i)*P(i) + P(m|g)*P(g) = 10-6 * 1 + 1 * 10-7 = 1.1 x 10-6 P(i | m) = P(m | i) * P(i) / P(m) = 1 / 1.1 = 0.91 Prosecutor’s fallacy • If you have other evidences, P(g) could be much larger than the average crime rate • In that case, DNA test may give you higher confidence • How to decide prior? – – – – – Subjective? Important? There are debates about Bayes statistics historically Some strongly support, some strongly against Growing interests in many fields • However, no question about conditional probability • If all priors are equally possible, decisions based on bayes inference and likelihood test are equivalent • We use whichever is appropriate Another example • A test for a rare disease claims that it will report a positive result for 99.5% of people with the disease, and 99.9% of time of those without. • The disease is present in the population at 1 in 100,000 • What is P(disease | positive test)? • What is P(disease | negative test)? Yet another example • We’ve talked about the boxes of casinos • 99% fair, 1% loaded (50% at six) • We said if we randomly pick a die and roll, we have 17% of chance to get a six • If we get 3 six in a row, what’s the chance that the die is loaded? • How about 5 six in a row? • P(loaded | 3 six in a row) = P(3 six in a row | loaded) * P(loaded) / P(3 six in a row) = 0.5^3 * 0.01 / (0.5^3 * 0.01 + (1/6)^3 * 0.99) = 0.21 • P(loaded | 5 six in a row) = P(5 six in a row | loaded) * P(loaded) / P(5 six in a row) = 0.5^5 * 0.01 / (0.5^5 * 0.01 + (1/6)^5 * 0.99) = 0.71 Relation to multiple testing problem • When searching a DNA sequence against a database, you get a high score, with a significant p-value • P(unrelated | high score) / P(related | high score) = P(high score | unrelated) * P(unrelated) P(high score | related) * P(related) Likelihood ratio • P(high score | unrelated) is much smaller than P(high score | related) • But your database is huge, and most sequences should be unrelated, so P(unrelated) is much larger than P(related) Question • We’ve seen that given a sequence of observations, and two models, we can test which model is more likely to generate the data – Is the die loaded or fair? – Either likelihood test or Bayes inference • Given a set of observations, and a model, can you estimate the parameters? – Given the results of rolling a die, how to infer the probability of each face? Question • You are told that there are two dice, one is loaded with 50% to be six, one is fair. • Give you a series of numbers resulted from rolling the two dice • Assume die switching is rare • Can you tell which number is generated by which die? Question • You are told that there are two dice, one is loaded, one is fair. But you don’t know how it is loaded • Give you a series of numbers resulted from rolling the two dice • Assume die switching is rare • Can you tell how is the die loaded and which number is generated by which die?