Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability amplitude wikipedia , lookup

Quantum entanglement wikipedia , lookup

T-symmetry wikipedia , lookup

Transcript
Information theory
• Uncertainty
– Can we measure it?
– Can we work with it?
•
•
•
•
Information
(Uncertainty == Information)?
Related concepts
Surprise, Surprise!
Lecture 3, CS567
1
Uncertainty
• Quantum mechanics
– Heisenberg principle
– Is everything still at absolute zero?
– What is the temperature of a black hole?
• Mathematical uncertainty – Godel
– Some propositions are not amenable to mathematical proof
• Can you guarantee that a given computer program will ever
terminate? – Turing
– The Halting problem
• Intractable problems
– NP complete, NP hard
• Chaos theory
– Weather forecasting (“guess casting”)
Lecture 3, CS567
2
•
Can we measure/work with
uncertainty?
Quantum mechanics
– Planck’s constant represents the lower bound on uncertainty in quantum
mechanics
– Satisfactory explanation of numerous observations that defy classical physics
• Undecidability
– Many problems worth deciding upon can still be decided upon
• Computational Intractability
– Important to correctly classifying a problem (P, NP,NPC, NPH)
– Work with small n
– Find heuristic and locally optimally solutions
• Chaos theory
– Still allows for prediction in short time (or other parameter) domains
– “Weather forecaster makes or breaks viewer rating”
Lecture 3, CS567
3
•
Information
Common interpretation
– Data
• Information as capacity
–
–
–
–
1 bit for Boolean data, 8 for a word
2 bits for nucleic acid character/6 bits for codon/4.3 bits for amino acid character
8 bit wide channel transmission capacity
8 bit ASCII
• Information as information gained
– “received” the sequence ATGC, got 8 bits
– “received” the sequence A?GC, got 6 bits
– “received” the sequence “NO CLASS TODAY”, got 112 bits and a bonus surge
of joy!
• Information as additional information gained
– I know she’ll be at the party, in a red or blue dress
• Seen at party, but too far off to see color of dress => No information gained
• Seen in red or blue dress => 1 bit of information gained
Lecture 3, CS567
4
Information == Uncertainty
• Information as uncertainty
– Higher the uncertainty, higher the potential information to be
gained
– Sequence of alphabetical characters implies uncertainty of 5.7
bits/character
– Sequence of amino acids implies uncertainty of 4.3 bits/character
• Higher the noise, lesser the information (gained)
• Higher the noise, lesser the uncertainty!
Lecture 3, CS567
5
Related concepts
• Uncertainty
• Information
• Complexity
– The more bits needed to specify something, higher the complexity
• Probability
– If all messages are equally probable, information (gained) is
maximum => Uniform probability distribution has the highest
information (is most uncertain)
– If a particular message is received most of the time, information
(gained) is low => Biased distribution has lower information
• Entropy
– Degree of disorder/Number of possible states
• Surprise
Lecture 3, CS567
6
Surprise, Surprise!
• Response to information received
• Degrees of surprise
– “The instructor is in FH302 right now” I already know that.
Yawn…… (Certainty, Foregone conclusion)
– “The instructor is going to leave the room between 1:45 and 2:00
pm” That’s about usual (Likely)
– “The instructor’s research will change the world and he’s getting
the Award for best teaching this semester” Wow! (Unlikely, but not
impossible. Probability = 10-1000000000000000000)
– “The instructor is actually a robotic machine that teaches machine
learning” No way !! (Impossible, Disbelief)
Lecture 3, CS567
7
Measuring surprise
• Measures: Level of adrenaline, muscular activity or volume of voice

• Lower the P(xi), higher the surprise
• Surprise = 1/P (xi)?
– Magnitude OK
– Not defined for impossible events (By conventional interpretation of surprise)
– But Surprise = 1 for certain events, 2 for half likely things?
• Surprise = Log (P(xi))
– 0 for certain events
– But negative for most events
• Surprise = - Log (P(xi))
–
–
–
–
0 for certain events
Positive value
Proportional to degree of surprise
If base 2 logarithm used, expressed in bits
Lecture 3, CS567
8
Information Theory
• Surprise/Surprisal
• Entropy
• Relative entropy
– Versus differences in information content
– Versus expectation
• Mutual Information
• Conditional Entropy
Lecture 3, CS567
9
Surprise
• Surprise = - log P(xi)
• “Average” surprise
= Expectation (Surprise)
= i P(xi) (- log P(xi))
= - i P(xi) log P(xi)
= Uncertainty = Entropy = H (P)
•
•
•
•
Uncertainty of a coin toss = 1 bit
Uncertainty of a double headed coin toss = 0 bit
For uniform distribution, entropy is maximal
For distribution where only a particular event occurs and
others never do, entropy is zero
• Between these two extremes for all other distributions
Lecture 3, CS567
10
Entropy
• For uniform probability distributions, entropy
increases monotonically with number of possible
outcomes
– Entropy for coin toss, nucleic acid base, amino acid is 1,
2 and 4.3 bits respectively
– Which is why we win small lucky draws but not the
grand sweepstakes
• Can entropy be negative? Zero?
Lecture 3, CS567
11
Relative Entropy
• H (P,Q) = i P(xi) log (P(xi)/Q(xi))
• Cross-entropy/Kullback-Liebler ‘distance’
• Difference in entropy between two distributions
– Early poll ratings of 2 candidates P: (75%,25%)
– Later poll ratings of 2 candidates Q: (50%,50%)
– H (P,Q) = 0.19 bit; H (Q,P) = 0.21 bit
• “One-way” asymmetric distance along “axis of
uncertainty”
Lecture 3, CS567
12
Relative Entropy == Difference in
information content?
• Information may be gained or lost between two uncertain
states
H (Q) – H (P) = 1 – 0.8 = 0.2 bit = H (P,Q)
• Difference in information content equals H (P,Q) only if Q
is uniform
If Q = (0.4,0.6) then H (P,Q) = 0.35 bit and
H (Q) – H (P) = 0.97 – 0.8 = 0.2 bit ≠ H (P,Q)
• Is information gained always positive? Can it be zero or
negative?
• Is relative entropy always positive? Can it be zero or
negative?
Lecture 3, CS567
13
Relative Entropy == Expectation?
• For sequence alignment scores expressed as logodds ratios
– Random variable: How unusual is this amino acid?
– Distribution P: Domain specific distribution model
• Biased probability of amino acid occurrences
– Distribution Q: Null model
• Uniform probability of amino acid occurrences
• Expected score for occurrence of amino acid
= a P(a) log (P(a)/Q(a))
• Generally applicable to all log-odds representations
Lecture 3, CS567
14
Mutual Information
• Related to the independence of variables
– Does P(ai,bi) equal P(ai)P(bi)?
• Given knowledge of variable b, does it change the
uncertainty of variable a
– Do I have a better idea of what value a will take, or am I
still in the dark?
• Mutual Information = Relative entropy
between P(ai,bi) and P(ai)P(bi)
= M (a,b) = i P(ai,bi) log (P(ai,bi) / [P(ai)P(bi)])
Lecture 3, CS567
15
Conditional Entropy
• Conditional Entropy of successive positions in a
random sequence = 2 bits for DNA, 4.3 for protein
• Conditional Entropy of DNA base pairs = 0
• Probability that a student is present in the room and
is taking this course, given that the student is
present? (Conditional probability in terms of
information content: Still some residual uncertainty)
Lecture 3, CS567
16