Download Lecture 8 - Thayer School of Engineering at Dartmouth

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

Transcript
ENGS 4 - Lecture 8
Technology of Cyberspace
Winter 2004
Thayer School of Engineering
Dartmouth College
Instructor: George Cybenko, x6-3843
[email protected]
Assistant: Sharon Cooper (“Shay”), x6-3546
Course webpage:
www.whoopis.com/engs4
ENGS4 2004 Lecture 8
Today’s Class
• Midterm take home – Wednesday Feb 4
• Assignment 2 – this weekend, due Feb 10
• Examples of state-based prediction from
Lecture 7
• Discussion
• Break
• Mini-lecture on SPAM
• Probability and Markovian Prediction
• Project Discussion
ENGS4 2004 Lecture 8
State-based Prediction
• What are examples of state-based prediction?
• Weather - http://www.ecmwf.int/
• Astronomy http://science.nasa.gov/RealTime/jtrack/Spacecraft.html
• Chemistry http://polymer.bu.edu/java/java/movie/index.html
• Biology - http://arieldolan.com/ofiles/JavaFloys.aspx
• Physics http://otrc93.ce.utexas.edu/~waveroom/Applet/WaveKinematics/WaveKinematics.html
• Medicine http://www.esg.montana.edu/meg/notebook/example1.html
• Others?
ENGS4 2004 Lecture 8
Limitations of State-Based
Prediction
• Chaos –
http://math.bu.edu/DYSYS/applets/nonlinear-web.html
http://math.bu.edu/DYSYS/applets/
• Complexity http://www.hut.fi/%7Ejblomqvi/langton/index.html
Why is society/politics so hard to predict?
• Randomness –
http://www-stat.stanford.edu/~susan/surprise/Birthday.html
The “Monty Hall” Problem
A card trick
ENGS4 2004 Lecture 8
Markovian Systems
A system generates “events” from a “sample
space” S.
– e1 , e2 , e3 , …
– 0 <= Prob(e) = probability of event e <= 1
– P(S) = 1
– “probability” is interpreted as a frequency
– in a very large sequence of independent trials,
{e1 , e2 , e3 , … , eN } the frequency of an event
is approximately its probability (eg. Law of
Large Numbers)
ENGS4 2004 Lecture 8
Example – Coin Tossing
• S = { Heads, Tails } = { H , T }
• Generate a sample sequence:
e1 e2 e3 e4 e5 …
• The number of head will approach 1/2
• These are independent trials….next coin
toss does not depend on the previous coin
toss
• Prob(next toss = head given previous toss
was head) = Prob(next toss = head) = 1/2
ENGS4 2004 Lecture 8
Independence
• Now we are going to roll two die – a red one and
a blue one
• The sum of the two dice on a roll is
“independent” of the previous roll but not
independent of the number showing on the red
die.
• Example: What is the probability distribution of
the sum given that the red die is showing 4?
What is the distribution if we have no information
about the red die?
ENGS4 2004 Lecture 8
Coin Toss
• An event is two consecutive tosses of a coin
• So a sequence of tosses: HHTTHHTHT produces
events HH, HT, TT, TH, etc
• Prob(event) =1/4
• Prob(HH)=1/4
• Prob(HH and previous event was TH)=1/2
• Prob(HH and previous event was HT)=0
• Prob(HH and previous two events) = Prob(HH and
previous event)
• Prob(current event | all past events) =
Prob( current event | immediately previous event)
• This is a “Markov Process”
• Probability of the current state depends only on the
previous state
ENGS4 2004 Lecture 8
Discussion
• What kinds of “prediction” technology is
used in your field of study?
• Is it technically correct or just convenient?
• Why is that approach used?
ENGS4 2004 Lecture 8
Break
ENGS4 2004 Lecture 8
Examples of Markovian Prediction
in Cyberspace
Exponential backoff in packet-based networking
– Computers on a broadcast LAN typically put packets
into the network without concern or knowledge of
whether other computers are simultaneously putting
packets out as well.
– The protocol does not avoid collisions, but it can
detect collisions
– When a collision is detected, each computer rolls a
die (in software of course) and waits that much time
before attempting to retry
– If another collision is detected, roll another die with
longer delays (a multiple, m, of the previous delays)
ENGS4 2004 Lecture 8
Examples of Markovian Prediction in
Cyberspace
Data Compression
–
–
–
–
The system consists of a sender and a receiver
The sender sends symbols, e1 e2 e3 e4 e5 ….
symbol at time t is et - this is the “event”
if the next symbol being sent is completely predictable
based on the previous symbols sent, then we don’t
need to transmit it!!
– this is the foundation for much of data compression:
• if Prob( next symbol | previous symbol ) = 0 or 1 the system
is deterministic
• if Prob( next symbol | previous symbol ) = 1/2 (two symbols)
then the system is maximally random and unpredictable
• if Prob(next symbol = 1 | previous symbol = 0 ) = 0.7 then we
can guess the next
symbol
will be8 1 when we see a 0 and we
ENGS4
2004 Lecture
will be right 70% of the time
Elements of Shannon’s
Information Theory
1948, Bell Labs, Claude Shannon (later at MIT)
“Channel”
Receiver
(A “decoder”)
Source
(A process)
1. How much “information” is there is the source? (Source coding)
2. What is the information carrying capacity of the channel? (Channel coding)
ENGS4 2004 Lecture 8
Model of a Source
The source produces messages consisting of sequences
of symbols.
The symbols come from a finite alphabet: A = { a , b , ... }
The symbols occur with probabilities determined by a
Markov probability distribution:
Prob ( s(t) = x | s(t-1) = y , s(t-2) = z,...)
= Prob ( s(t) = x | s(t-1) = y) = p(x|y)
Simplest case: Prob( s(t) = x | s(t-1) = y ) = p(x)
Independent, identically distributed discrete random variables
ENGS4 2004 Lecture 8
Example
Alphabet is { 1, 2 , 3, 4, 5, 6}
The messages are n rolls of a fair die.
EG: 534261555242436
P(x|y) = P(x) = 1/6 for all symbols x.
This is a uniformly random source...each symbol has equal
probability of appearing in a position in a message.
ENGS4 2004 Lecture 8
http://storm.prohosting.com/~glyph/crypto/freq-en.shtml
ENGS4 2004 Lecture 8
Another example
The messages are :
S1 - “The sun will rise today.”
S2 - “The sun will not rise today.”
Symbols: English sentences with fewer than 10 words and
using the above words.
Prob(S1) = 1, Prob(S2) = 0
Completely deterministic!!!
ENGS4 2004 Lecture 8
Efficient binary coding
A code is a mapping of symbols to 0/1 strings so that
the resulting encoded messages are uniquely decodable.
EG
p=
1
2
3
4
5
6
111
100
101
011
010
000
1/6
1/6
1/6
1/6
1/6
1/6
Average bits per symbol = 3/6 x 6 = 3
Can we do better?
ENGS4 2004 Lecture 8
Huffman coding
1
3/6 x 4 +
2/6 x 2
= 2 2/3 < 3
2/3
1/3
1/3
1/3
1/6
1
1/6
2
1/6
3
1/6
4
1/6
5
1/6
6
000
001
010
011
10
11
ENGS4 2004 Lecture 8
How about the other example?
S1
0
S2
1
0
1
Average number of bits per symbol = 1.
Is this the best possible? No.
ENGS4 2004 Lecture 8
Entropy of a source
Alphabet ={ s1 , s2 , s3 , ..., sn }
SOURCE
Prob(sk) = pk
Entropy = H = - S pk log 2 (pk)
Shannon’s Source Coding Theorem:
H <= Average number of bits per symbol
used by any decodable code.
ENGS4 2004 Lecture 8
Examples
H(die example) = - 6 x 1/6 x log(1/6)
H(Sun rise) = 0 log 0 + 1 log 1 = 0
How can we achieve the entropy bound?
Block codes can do better than pure Huffman coding.
“Universal codes”.... eg. Lempel-Ziv.
ENGS4 2004 Lecture 8
Project Discussion
ENGS4 2004 Lecture 8