Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Information theory • Uncertainty – Can we measure it? – Can we work with it? • • • • Information (Uncertainty == Information)? Related concepts Surprise, Surprise! Lecture 3, CS567 1 Uncertainty • Quantum mechanics – Heisenberg principle – Is everything still at absolute zero? – What is the temperature of a black hole? • Mathematical uncertainty – Godel – Some propositions are not amenable to mathematical proof • Can you guarantee that a given computer program will ever terminate? – Turing – The Halting problem • Intractable problems – NP complete, NP hard • Chaos theory – Weather forecasting (“guess casting”) Lecture 3, CS567 2 • Can we measure/work with uncertainty? Quantum mechanics – Planck’s constant represents the lower bound on uncertainty in quantum mechanics – Satisfactory explanation of numerous observations that defy classical physics • Undecidability – Many problems worth deciding upon can still be decided upon • Computational Intractability – Important to correctly classifying a problem (P, NP,NPC, NPH) – Work with small n – Find heuristic and locally optimally solutions • Chaos theory – Still allows for prediction in short time (or other parameter) domains – “Weather forecaster makes or breaks viewer rating” Lecture 3, CS567 3 • Information Common interpretation – Data • Information as capacity – – – – 1 bit for Boolean data, 8 for a word 2 bits for nucleic acid character/6 bits for codon/4.3 bits for amino acid character 8 bit wide channel transmission capacity 8 bit ASCII • Information as information gained – “received” the sequence ATGC, got 8 bits – “received” the sequence A?GC, got 6 bits – “received” the sequence “NO CLASS TODAY”, got 112 bits and a bonus surge of joy! • Information as additional information gained – I know she’ll be at the party, in a red or blue dress • Seen at party, but too far off to see color of dress => No information gained • Seen in red or blue dress => 1 bit of information gained Lecture 3, CS567 4 Information == Uncertainty • Information as uncertainty – Higher the uncertainty, higher the potential information to be gained – Sequence of alphabetical characters implies uncertainty of 5.7 bits/character – Sequence of amino acids implies uncertainty of 4.3 bits/character • Higher the noise, lesser the information (gained) • Higher the noise, lesser the uncertainty! Lecture 3, CS567 5 Related concepts • Uncertainty • Information • Complexity – The more bits needed to specify something, higher the complexity • Probability – If all messages are equally probable, information (gained) is maximum => Uniform probability distribution has the highest information (is most uncertain) – If a particular message is received most of the time, information (gained) is low => Biased distribution has lower information • Entropy – Degree of disorder/Number of possible states • Surprise Lecture 3, CS567 6 Surprise, Surprise! • Response to information received • Degrees of surprise – “The instructor is in FH302 right now” I already know that. Yawn…… (Certainty, Foregone conclusion) – “The instructor is going to leave the room between 1:45 and 2:00 pm” That’s about usual (Likely) – “The instructor’s research will change the world and he’s getting the Award for best teaching this semester” Wow! (Unlikely, but not impossible. Probability = 10-1000000000000000000) – “The instructor is actually a robotic machine that teaches machine learning” No way !! (Impossible, Disbelief) Lecture 3, CS567 7 Measuring surprise • Measures: Level of adrenaline, muscular activity or volume of voice • Lower the P(xi), higher the surprise • Surprise = 1/P (xi)? – Magnitude OK – Not defined for impossible events (By conventional interpretation of surprise) – But Surprise = 1 for certain events, 2 for half likely things? • Surprise = Log (P(xi)) – 0 for certain events – But negative for most events • Surprise = - Log (P(xi)) – – – – 0 for certain events Positive value Proportional to degree of surprise If base 2 logarithm used, expressed in bits Lecture 3, CS567 8 Information Theory • Surprise/Surprisal • Entropy • Relative entropy – Versus differences in information content – Versus expectation • Mutual Information • Conditional Entropy Lecture 3, CS567 9 Surprise • Surprise = - log P(xi) • “Average” surprise = Expectation (Surprise) = i P(xi) (- log P(xi)) = - i P(xi) log P(xi) = Uncertainty = Entropy = H (P) • • • • Uncertainty of a coin toss = 1 bit Uncertainty of a double headed coin toss = 0 bit For uniform distribution, entropy is maximal For distribution where only a particular event occurs and others never do, entropy is zero • Between these two extremes for all other distributions Lecture 3, CS567 10 Entropy • For uniform probability distributions, entropy increases monotonically with number of possible outcomes – Entropy for coin toss, nucleic acid base, amino acid is 1, 2 and 4.3 bits respectively – Which is why we win small lucky draws but not the grand sweepstakes • Can entropy be negative? Zero? Lecture 3, CS567 11 Relative Entropy • H (P,Q) = i P(xi) log (P(xi)/Q(xi)) • Cross-entropy/Kullback-Liebler ‘distance’ • Difference in entropy between two distributions – Early poll ratings of 2 candidates P: (75%,25%) – Later poll ratings of 2 candidates Q: (50%,50%) – H (P,Q) = 0.19 bit; H (Q,P) = 0.21 bit • “One-way” asymmetric distance along “axis of uncertainty” Lecture 3, CS567 12 Relative Entropy == Difference in information content? • Information may be gained or lost between two uncertain states H (Q) – H (P) = 1 – 0.8 = 0.2 bit = H (P,Q) • Difference in information content equals H (P,Q) only if Q is uniform If Q = (0.4,0.6) then H (P,Q) = 0.35 bit and H (Q) – H (P) = 0.97 – 0.8 = 0.2 bit ≠ H (P,Q) • Is information gained always positive? Can it be zero or negative? • Is relative entropy always positive? Can it be zero or negative? Lecture 3, CS567 13 Relative Entropy == Expectation? • For sequence alignment scores expressed as logodds ratios – Random variable: How unusual is this amino acid? – Distribution P: Domain specific distribution model • Biased probability of amino acid occurrences – Distribution Q: Null model • Uniform probability of amino acid occurrences • Expected score for occurrence of amino acid = a P(a) log (P(a)/Q(a)) • Generally applicable to all log-odds representations Lecture 3, CS567 14 Mutual Information • Related to the independence of variables – Does P(ai,bi) equal P(ai)P(bi)? • Given knowledge of variable b, does it change the uncertainty of variable a – Do I have a better idea of what value a will take, or am I still in the dark? • Mutual Information = Relative entropy between P(ai,bi) and P(ai)P(bi) = M (a,b) = i P(ai,bi) log (P(ai,bi) / [P(ai)P(bi)]) Lecture 3, CS567 15 Conditional Entropy • Conditional Entropy of successive positions in a random sequence = 2 bits for DNA, 4.3 for protein • Conditional Entropy of DNA base pairs = 0 • Probability that a student is present in the room and is taking this course, given that the student is present? (Conditional probability in terms of information content: Still some residual uncertainty) Lecture 3, CS567 16