Download Huffman Compression (continued)

CIS265/506 Storage Basics       Hard Disks are come in several interfaces and formats. Storage Capacity is measured in Gigabytes Bandwidth determines how fast data can be moved to or from storage. It is measured in MB/Sec with both sustained and burst rates for read and write. Access Time is in ms and consist of seek time (the head moving across the platter), rotation latency (time it takes for the drive to rotate to correct position) and Block Transfer Time (time to read/write a block). In general, higher RPMS, smaller platter size and more numerous platters all make for faster access Mean Time Between Failure (MBTF) usually the number of hours of operation before a drive will fail (on average). Interface is the protocol that the drive uses to communicate with the PC. Terminology      Heads consists of the number of read/write ‘needles’ that can access your drive. In general 2 per platter Spindle what the drive platters spin on Platter is a magnetically coated disk that resembles a record and stores numerous 0s or 1s. May have multiple platters stacked on top of one another in a disk (typically 20 GB a platter for IDE and 18GB a platter for SCSI) Tracks and Cylinder (multi-platter tracks) positional descriptor assigned to each “ring” of a disk Sector another positional descriptor of the disk. A pie shaped pie slice of the disk that contains many sectors Terminology  Blocks are the combined position of sector and track numbers and typically store 512 to 4096 Bytes each. Blocks are separated by Inter Block Gaps which serve as “speed bumps” so that the drive knows where blocks begin and end. Blocks can be combined into contiguous, logically addressable units called clusters  Hardware Address consists of block, sector and track numbers Why do we care?  Hard drive performance is measured in milliseconds (ms) while your computer processes information in nanoseconds (ns).  Hard drives are usually 1000’s of times slower than your CPU.  Any speedup in hard drive access yields a serious speedup in machine performance. From “Data Structures for Java” William H. Ford William R. Topp Chapter 23 File Compression Binary Files  File types are text files and binary files. Java deals with files by creating a byte stream that connects the file and the application.  Binary files can be handled with DataInputStream and DataOutputStream classes. Binary Files (continued)  A data input stream lets an application read primitive Java data types from an underlying input stream in a machine-independent way.  A data output stream lets an application write primitive Java data types to an output stream in a portable way. An application can then use a data input stream to read the data back in. File Compression  Lossless compression loses no data and is used for data backup. File Compression (continued)  Lossy compression is used for applications like sound and video compression and causes minor loss of data. File Compression (continued)  The compression ratio is the ratio of the number of bits in the original data to the number of bits in the compressed image. For instance, if a data file contains 500,000 bytes and the compressed data contains 100,000 bytes, the compression ratio is 5:1 Huffman Compression  Huffman compression relies on counting the number of occurrences of each 8-bit byte in the data and generating a sequence of optimal binary codes called prefix codes.  The Huffman algorithm is an example of a greedy algorithm. A greedy algorithm makes an optimal choice at each local step in the hope of creating an optimal solution to the entire problem. Huffman Compression (continued)  The algorithm generates a table that contains the frequency of occurrence of each byte in the file. Using these frequencies, the algorithm assigns each byte a string of bits known as its bit code and writes the bit code to the compressed image in place or the original byte.  Compression occurs if each 8-bit char in a file is replaced by a shorter bit sequence. Huffman Compression (continued)  Use a binary tree to represent bit codes. A left edge is a 0 and a right edge is a 1. Each interior node specifies a frequency count, and each leaf node holds a character and its frequency. Huffman Compression (continued)  Each data byte occurs only in a leaf node. Such codes are called prefix codes.  A full binary tree is one in where each interior node has two children.  By converting the tree to a full tree, we can generate better bit codes for our example. Huffman Compression (continued)  To compress a file replace each char by its prefix code. To uncompress, follow the bit code bit‑by‑bit from the root of the tree to the corresponding character. Write the character to the uncompressed file.  Good compression involves choosing an optimal tree. It can be shown that the optimal bit codes for a file are always represented by a full tree. Huffman Compression (continued)  A Huffman tree generates the minimum number of bits in the compressed image. It generates optimal prefix codes. Building a Huffman Tree  For each of the n bytes in a file, assign the byte and its frequency to a tree node, and insert the node into a minimum priority queue ordered by frequency. Building a Huffman Tree (continued)  Remove two elements, x and y, from the priority queue, and attach them as children of a node whose frequency is the sum of the frequencies of its children. Insert the resulting node into the priority queue.  In a loop, perform this action n-1 times. Each loop iteration creates one of the n-1 interior nodes of the full tree. Building a Huffman Tree (continued)  With a minimum priority queue the least frequently occurring characters have longer bit codes, and the more frequently occurring chars have shorter bit codes. Huffman Tree  Review pages 415-422 in your text for code and additional information Serialization  A persistent object can exist apart from the executing program and can be stored in a file.  Serialization involves storing and retrieving objects from an external file.  The classes ObjectOutputStream and ObjectInputStream are used for serialization. Serialization (continued)  Assume anObject is an instance of a class that implements the Serializable interface. Serialization (continued)  Deserializing an Object.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Huffman Compression (continued)