Download ELE 301, Fall 2013 Designing Real Systems LAB 5: Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
ELE 301, Fall 2013
Designing Real Systems
LAB 5: Data Compression
Contributed by Jen-Tang Lu
Nov. 18, 2013
In class, you have learned the lossless compression algorithms: LZ77, LZ78, and LZW. In
this lab, you will test those algorithms as well as other data compression algorithms for
various information sources.
1. Prelab
ο‚·
ο‚·
Review class notes on the Lempel-Ziv algorithms.
To get familiar with Bzip2 and RAR compression algorithms,
Read Data Compression: The Complete Reference, written by David Salomon
https://svn.princeton.edu/ele301_f2013/public/lab5_materials/David_Salomon_Data_C
ompression_The_Complete_Ref.pdf
Section 1.2 (Run-Length Encoding, RLE), 1.5 (Move-to-Front Coding, MTF), 8.1 (The
Burrows-Wheeler Method), and Section 2.18 (Prediction with Partial Match, PPM)
Read Introduction to Data Compression written by Khalid Sayood
http://proquest.safaribooksonline.com/book/information-technology-and-softwaredevelopment/9780124157965
Section 6.3 (Prediction with Partial Match, PPM) and 6.4 (Burrows-Wheeler
Transform, BWT)
2. Performance Testing
In this lab, you will compare the performance of different compression algorithms. You
do not need to implement the algorithms on your own. To investigate their performances
and limitations, you can measure the compression rate, which is defined as:
πΆπ‘œπ‘šπ‘π‘Ÿπ‘’π‘ π‘ π‘’π‘‘ 𝑆𝑖𝑧𝑒
Compression Rate≑ π‘ˆπ‘›π‘π‘œπ‘šπ‘π‘Ÿπ‘’π‘ π‘ π‘’π‘‘ 𝑆𝑖𝑧𝑒
Your goal is to fill out the below table.
Deflate
Bzip2
LZMA
LZW
RAR
Text (.txt)
Bitmap
(.bmp)
Postscript
(.ps)
Application
(.exe)
pdf file
Random
raw file
To fill out the table, you can choose any English texts, bitmaps, postscripts, and programs
(.exe), and then measure the compression rate. In order to test the compression
performance, the file size should not be too small.
As for text files, you also try different file sizes (by simply truncating the files), and plot
how the compression rate varies with the size of the file. Besides, you also need to create
a random raw file and measure the compression rate. (Hint: Math.random() in Java).
The following is the methods to perform different compression algorithms in Java:
In this lab, you don’t need to run programs on the Nexus. Just create a Java Project and
run it on your computer.
(1) Deflate: In java.util.zip package, there are DeflaterOutputStream and InflaterInputStream
that make it easy to compress (deflate) and uncompress (inflate) files. To read and write a
file, you can use FileInputStream and FileOutputStream.
public static void main(String[] args) throws Exception{
FileInputStream fis= new FileInputStream (β€œinput.txt”);
FileOutputStream fos= new FileOutputStream(β€œdeflated.txt”);
DeflaterOutputStream dos = new DeflaterOutputStream(fos);
copy(fis,dos);
FileInputStream fis2 = new FileInputStream(β€œdeflated.txt”);
InflaterInputStream iis = new InflaterInputStream(fis2);
FileOutputStream fos2 = new FileOutputStream(β€œinflated.txt”);
copy(iis,fos2);
}
public static void copy(InputStream is, OutputStream os) throws Exception {
int oneByte;
while ((oneByte = is.read()) != -1) {
os.write(oneByte);
}
os.close();
is.close();
}
You can find that the size of deflated.txt is less than that of input.txt. Also, the
inflated.txt should be exactly the same as the input.txt
(2) Bzip2: You can use the source code provided by the Apache Software Foundation.
https://svn.princeton.edu/ele301_f2013/public/lab5_materials/Bzip2/
Just like what you did in the Deflate part, but use CBZip2OutputStream and
CBZip2InputStream for bzip2 compress and decompress, respectively.
(3) LZMA: You can use the 7zip lzma source code to perform the LZMA compression and
decompression.
https://svn.princeton.edu/ele301_f2013/public/lab5_materials/LZMA/
Use the SevenZip package to create a new Java project.
Follow the user guide:
https://svn.princeton.edu/ele301_f2013/public/lab5_materials/LZMA/Readme.txt
and try to modify the main(String[] args) in the LzmaAlone.java to run compression
and decompression. (Hint: String args[] should be like β€œe input.txt input.lzma” for
compression )
(4) LZW: To run LZW, you can use the following code.
https://svn.princeton.edu/ele301_f2013/public/lab5_materials/LZW/
You might find that LZW does not work for some cases. Sometimes the file size
would grow after the LZW compression. What is the possible reason? (Postlab)
(5) RAR: Since RAR is not open source, you can use WinRAR.
http://www.rarlab.com/download.htm
3. Post lab
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
According to your experimental results, which file types are easier for
compression? Is the random raw file easy to be compressed? And why?
From your results, which algorithm would be better?
Plot how the compression rate varies with the size of the file
Why does the compression rate decrease with file length? And why does it seem
to converge to a limit?
Why does LZW compression fail to compress some files?