Download ppt

Tutorial #3 by Ma’ayan Fishelson This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at www.cs.huji.ac.il/~pmai. Changes made by Dan Geiger, Ydo Wexler and Ma’ayan Fishelson. Example: Binomial Experiment Head Tail When tossed, it can land in one of two positions: Head or Tail. We denote by  the (unknown) probability P(H). Estimation task: Given a sequence of toss samples x[1], x[2],…,x[M] we want to estimate the probabilities P(H) =  and P(T) = 1 - . Statistical Parameter Fitting Consider instances x[1], x[2], …, x[M] such that: – The set of values x can take is known. – Each one is sampled from the same distribution. i.i.d. – Each one is sampled independently of the rest. samples The task is to find the vector of parameters  that have generated the given data. This parameter vector  can be used to predict future data. The Likelihood Function How good is a particular  ? It depends on how likely it is to generate the observed data: LD ( )  P( D |  )   P( x[m] |  ) m L() The likelihood for the sequence H,T, T, H, H is: LD ( )    (1  )  (1  )   0 0.2 0.4  0.6 0.8 1 Sufficient Statistics To compute the likelihood in the thumbtack example we only require NH and NT (the number of heads and the number of tails). LD ( )   NH  (1   ) NT NH and NT are sufficient statistics for the binomial distribution. Sufficient Statistics A sufficient statistic is a function of the data that summarizes the relevant information for the likelihood. Formally, s(D) is a sufficient statistics if for any two datasets D and D’: s(D) = s(D’ )  LD() = LD’() Datasets Statistics Maximum Likelihood Estimation MLE Principle: Choose parameters that maximize the likelihood function • One of the most commonly used estimators in statistics. • Intuitively appealing. • One usually maximizes the log-likelihood function defined as lD() = logeLD(). Example: MLE in Binomial Data Applying the MLE principle we get NH NT  lD    N H log   NT log 1     1 NH ˆ   N H  NT Example: (NH,NT ) = (3,2) L() (Which coincides with what one would expect) MLE estimate is 3/5 = 0.6 0 0.2 0.4 0.6 0.8 1 From Binomial to Multinomial • For example, suppose that X can have the values 1,2,…,K (for example, a die has 6 faces). • We want to learn the parameters 1, 2. …, K. Sufficient statistics: N1, N2, …, NK - the number of times each outcome is observed. K Nk Likelihood function: LD ( )   k k 1 Nk ˆ MLE: k   N  Example: Multinomial • Let x1,x2,…,xn be a protein sequence. • We want to learn the parameters q1, q2,…,q20 corresponding to the frequencies of the 20 amino-acids. • Let N1, N2,…,N20 be the number of times each amino-acid is observed in the sequence. Likelihood function: 20 LD (q )   qk k 1 MLE: Nk qk  n Nk Is MLE all we need? • Suppose that after 10 observations, – ML estimate is P(H) = 0.7 for the thumbtack. – Would you bet on heads for the next toss? • Suppose now that after 10 observations, – ML estimate is P(H) = 0.7 for a coin. – Would you place the same bet? Solution: The Bayesian approach that incorporates your subjective prior knowledge. E.g., you may know a priori that some amino acids have high frequencies and some have low frequencies. How would one use this information ? Bayes’ rule Bayes’ rule: P x | y   P y  P y | x   P( x) where, P x    P x | y   P y  y It holds because: Px, y   Px | y  P y  P  x    P  x, y    P  x | y   P  y  y y Example: Dishonest Casino • • • A casino uses 2 kind of dice: 99% are fair. 1% is loaded: 6 comes up 50% of the times We pick a die at random and roll it 3 times. We get 3 consecutive sixes.  What is the probability the die is loaded? Dishonest Casino (2) The solution is based on using Bayes rule and the fact that while P(loaded | 3 sixes) is not known, the other three terms in Bayes rule are known, namely: • P(3 sixes | loaded)=(0.5)3 • P(loaded)=0.01 • P(3 sixes) = P(3 sixes|loaded) P(loaded)+P(3 sixes|fair) P(fair) P(3sixes | loaded )  P(loaded ) P(loaded | 3sixes)  P(3sixes) Dishonest Casino (3) P (3sixes | loaded )  P (loaded ) P (loaded | 3sixes)   P (3sixes) P (3sixes | loaded )  P (loaded )   P (3sixes | loaded ) P (loaded )  P (3sixes | fair ) P( fair )  0.53  0.01 3 1 0.5  0.01     0.99 6 3  0.21 Biological Example: Proteins • Extra-cellular proteins have a slightly different amino acid composition than intra-cellular proteins. • From a large enough protein database (SWISS-PROT), we can get the following: qainti  p(ai|int) - the frequency of amino acid ai for intra-cellular proteins qaexti  p(ai|ext) - the frequency of amino acid ai for extra-cellular proteins p(int) - the probability that any new sequence is intra-cellular p(ext) - the probability that any new sequence is extra-cellular Biological Example: Proteins (2) Q: What is the probability that a given new protein sequence x=x1x2….xn is extra-cellular? • A: Assuming that every sequence is either extra-cellular or intra-cellular (but not both), we can write: p(ext)  1  p(int) . • Thus, px  p(ext)  px | ext  p(int)  px | int  Biological Example: Proteins (3) • Using conditional probability we get px | ext    p(xi | ext) , px | int    p( xi | int) i i • By Bayes’ theorem Pext | x   p(ext) p( xi | ext) i p(ext) p( xi | ext)  p(int)  p( xi | int) i i • The probabilities p(int), p(ext) are called the prior probabilities. • The probability P(ext|x) is called the posterior probability.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download ppt