Download Text compression hw

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CMSC 121-03 Intro to CS
Fall 2016
For this exercise, you may work in groups. Put all names on the paper and hand in only one copy.
Name(s): __________________________________________________________
1) From a popular Dr Seuss book, we have the following text:
One fish two fish red fish blue fish.
black fish blue fish old fish new fish.
a) How many bytes would it take to store this text?
b) Note that the word fish appears many times. Create a dictionary where fish is replaced by a
single character. (What are your options for which character to choose?)
c) How many bytes does it take to store this dictionary?
d) What does the compressed text look like using this dictionary?
e) How many bytes does it take to store the text now? (Include the size of the dictionary and the
compressed text.)
f) What is the compression ratio for this?
g) Attempt to determine a ratio that is close to the answer for f, but better for marketing purposes.
(EG. A ratio of 66 : 35 would be close to 2 : 1)
h) What is the “space savings” for this? (1 – compressed size/original size) Be sure that the
dictionary gets included as appropriate.
i) Another possible replacement is to replace the letters bl with a single character. Will this
help make it smaller? (Remember that the dictionary takes up space as well.) Explain your
answer.
CMSC 121-03 Intro to CS
Fall 2016
2) From Dr Seuss’s “The Foot Book” we find:
Up feet
Down feet
Here come clown feet.
How many bytes would it take to store this text?
Consider two possible ways of compressing:
a) replace feet with a single character and replace own with a single character.
b) replace own feet with a single character.
Which of these two possible compressions would be better? Explain your answer with
actual numbers.