Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CSE331 – Lecture 21 – Heaps, Binary Files, and Bit Sets Chapter 14 File Structure Complete Binary Tree Example Direct File Access Maximum and Minimum Heaps bitVector Class Heap Insertion Lossless Compression pushHeap() popHeap() Lossy Compression Adjusting popHeap() Example of Building Huffman Tree Heap Sort Summary Slides Heapifying 1 Main Index Contents Complete Binary Tree 5 v[0] 1 3 v[2] v[1] 6 9 v[4] v[3] 7 v[7] 2 0 v[8] 2 v[5] 8 v[9] Main Index Contents 4 v[6] Indexing & Other Heap Properties Node[ j ] has ... Node[ 2*j+1 ] as left child Node[ 2*j+2 ] as right child Node[ (j-1)/2 ] as parent 3 MAX Heap : parent > both children MIN Heap: parent < both children Main Index Contents Maximum and Minimum Heaps 55 40 52 50 20 11 10 25 10 5 22 (A) M aximum Heap (9 nodes) (B) M aximum Heap (4 nodes) 10 5 11 15 50 10 25 20 52 30 40 55 22 (C) M inimum Heap (9 nodes) 4 30 15 (D) M inimum Heap (4 nodes) Main Index Contents Heap Before and After Insertion of 50 63 63 v[0] v[0] 30 40 30 v[2] v[1] 25 10 v[4] v[3] 5 3 18 v[7] v[8] v[9] 8 v[5] v[2] v[1] 38 25 10 v[6] v[4] v[3] 38 v[5] v[6] 3 18 50 v[7] v[8] v[9] v[10] (b) Main Index 8 5 (a) 5 40 Contents Example of tree reordering in pushHeap() 63 63 63 v[0] ... 30 ... v[0] v[1] ... 50 v[9] 30 25 v[10] 18 v[9] ... v[1] 30 v[4] 25 v[10] Step 2 Compare 50 and 30 (Exchange v[4] and v[1]) Main Index ... 50 v[4] Step 1 Compare 50 and 25 (Exchange v[10] and v[4]) 6 v[0] v[1] v[4] 18 ... 50 Contents 18 25 v[9] v[10] Step 3 Compare 50 and 63 (50 in correct location) pushHeap() – new value added at end & bubbles upward template <typename T, typename Compare> void pushHeap(vector<T>& v, int last, Compare comp) { // new item v[last-1] & v[0] to v[last-2] are in heap order int currentPos = last-1; // current node int parentPos = (currentPos-1)/2; // it’s parent T target = v[last-1]; // new value being pushed while (currentPos != 0) // not at root yet { if (comp(target,v[parentPos])) // out of order { // copy parent down & move up a level in tree v[currentPos] = v[parentPos]; currentPos = parentPos; parentPos = (currentPos-1)/2; } else // heap condition is ok. break break; } // copy target to correct location v[currentPos] = target; } 7 Main Index Contents Exchanging first & last elements in popHeap() 63 18 v[0] v[0] 30 40 30 40 v[1] v[2] v[1] v[2] 25 10 v[4] v[3] 5 3 18 v[7] v[8] v[9] 8 38 10 v[5] v[6] v[3] v[4] 5 3 63 v[7] v[8] v[9] 8 38 v[5] v[6] After exchanging the root and last element in the heap Before a deletion 8 25 Main Index Contents Adjusting the semi-heap in popHeap() ... 40 40 v[0] v[0] ... 18 38 v[2] v[2] 8 38 8 v[5] v[6] v[5] Main Index v[6] Step 2: Exchange 18 and 38 Step 1: Exchange 18 and 40 9 18 Contents PopHeap() template <typename T, typename Compare> void popHeap(vector<T>& v, int last, Compare comp) { T temp; // exchange first and last elements in heap temp = v[0]; v[0] = v[last-1]; v[last-1] = temp; // trickle down root over the range [0, last-1) adjustHeap(v, 0, last-1, comp); } 10 Main Index Contents adjustHeap() – converts semi-heap to heap : range [first,last) template <typename T, typename Compare> void adjustHeap(vector<T>& v, int first, int last, Compare comp) { int currentPos = first; // start at v[first] T target = v[first]; // save value temporarily int childPos = 2 * currentPos + 1; // select left child while (childPos <= last-1) // more nodes to check { if ((childPos+1 <= last-1) && comp(v[childPos+1], v[childPos])) childPos = childPos + 1; // select right child if (comp(v[childPos],target)) { v[currentPos] = v[childPos]; // copy selected child up currentPos = childPos; // move down childPos = 2 * currentPos + 1; // left child } else // target belongs at currentPos break; } v[currentPos] = target; // copy target to currentPos } 11 Main Index Contents Heap Sort Two steps (1) Build a heap (Heapify the vector) (2) For each Index from v.sixe()-1 to 1 do – – 12 Swap v[0] and v[Index] : This places a single item in the sorted portion of the vector (following the heap) Adjust semi-heap (vector within range [0,Index) ) To sort In Order : Use MAX Heap To sort In Reverse Oreder : Use MIN Heap Main Index Contents Building a Heap (Heapifying a Vector) 9 17 12 30 20 50 65 4 60 19 Initial Vector 9 17 12 30 65 50 4 20 60 19 adjustHeap () at 4 causes no changes (A) 13 Main Index Contents Example of Heapifying a Vector (Cont…) 9 9 65 30 20 50 4 60 12 17 12 65 60 30 19 4 19 65 9 12 4 20 30 17 12 19 adjustHeap() at 1 moves 12 down two levels (D) 14 60 50 60 65 50 17 adjustHeap () at 2 moves 17 down (C) adjustHeap () at 3 moves 30 down (B) 30 20 50 Main Index 19 4 20 17 9 adjustHeap() at 0 moves 9 down three levels (E) Contents Making a Heap: Heapifying using makeHeap() template <typename T, typename Compare> void makeHeap(vector<T>& v, Compare comp) { int heapPos, lastPos; // compute the size of the heap and the index // of the last parent lastPos = v.size(); heapPos = (lastPos - 2)/2; // filter down every parent in order from last parent // down to root while (heapPos >= 0) { adjustHeap(v,heapPos, lastPos, comp); heapPos--; } } 15 Main Index Contents Example of heap sort int arr[] = {50, 20, 75, 35, 25}; vector<int> v(arr, 5); 75 35 50 25 20 Heapified Tree 16 Main Index Contents Example of Implementing heap sort (Cont….) Calling popHeap() with last = 4 deletes 50 and stores it in h[3] Calling popHeap() with last = 5 deletes 75 and stores it in h[4] 50 35 1 2 1 20 25 75 20 17 20 35 25 75 50 Main Index 25 Calling popHeap() with last = 2 deletes 25 and stores it in h[1] 1 25 50 2 75 50 Calling popHeap() with last = 3 deletes 35 and stores it in h[2] 20 35 Contents 35 75 heapSort() - in “d_sort.h” template <typename T, typename Compare> void heapSort (vector<T>& v, Compare comp) { // "heapify" the vector v makeHeap(v, comp); int i, n = v.size(); // iteration that determines elements v[n-1] ... v[1] for(i = n; i > 1;i--) { // call popHeap() to move next largest to v[n-1] popHeap(v, i, comp); } } 18 Main Index Contents Program using heapSort() #include <iostream> #include <vector> #include <functional> #include "d_sort.h“ #include "d_util.h“ #include "d_random.h“ using namespace std; // // // // for for for for greater<T>() operator heapSort() writeVector() randomNumber int main() { vector<int> v; randomNumber rnd; int i; // create a vector with 15 random integers for (i = 0; i < 15; i++) v.push_back(rnd.random(100)); cout << "Sort in ascending order" << endl << " "; heapSort(v,greater<int>()); // also callable with less<int>() writeVector(v); cout << endl; return 0; } 19 Main Index Contents File Structure A text file contains ASCII characters with a newline sequence separating lines. A binary file consists of data objects that vary from a single character (byte) to more complex structures that include integers, floating point values, programmergenerated class objects, and arrays. each data object in a file a record File as a direct access structure 20 R0 R1 R2 R3 R4 Ri Rn-2 Rn-1 0 1 2 3 4 currentPos n-2 n-1 Main Index Contents offset beg 21 Direct File Access The functions seekg() and seekp() allow the application to reposition the current file pointers. The seek functions take an offset argument that measures the number of bytes from the beginning (beg), ending (end), or current position (cur) in the file. If a file is used for both input and output, use the seek functions tellg() and seekg(). expands offset file offset offset cur Main Index end Contents x Implementing the bitVector Class bitMask() returns an unsigned character value containing a 1 in the bit position representing i. x x x 0 x x x member[vectorIndex(i)] Set bit i 0 0 0 0 1 0 0 0 x x x bitMask(i) | member[vectorIndex(i)] x x x x 1 x x x ~bitMask(i ) & member[vectorIndex(i)] x x x x 0 x x x bitMask(i) x x x x 1 member[vectorIndex(i)] Clear bit i 1 1 1 1 0 1 1 1 ~bitMask(i) 22 Main Index Contents Lossless Compression data compression loses no information original data can be recovered exactly from the compressed data. normally apply to "discrete" data, such as text, word processing files, computer applications, and so forth This paper ... ... ... ... ... Submitted by J. Q. Student 23 Compress 1001011101010111001010 ... ... ... 1010010110111010110011 Main Index Contents Reconstruct This paper ... ... ... ... ... Submitted by J. Q. Student Lossy Compression loses some information during compression and the data cannot be recovered exactly shrink the data further than lossless compression techniques. Sound files often use this type of compression, 10110101011101 ... 10111011101110 10110111 11011010110111 ... 11010110110101 11011011011011 24 Compress 1101011010101110100010 ... ... ... 0100100100000101101001 Main Index Contents Reconstruct 10110101011101 ... 10111011101110 11011 ... 11010110110101 11011011011011 Huffman Tree – data codes are from edges (root to leaf) 25 Main Index Contents 26 Huffman Codes Variable length binary codes Shortest codes for most frequently occurring data value Longest codes for least frequently occurring data values Codes have unique prefixes Codes are determined from binary tree built from priority queue of data value/frequency pairs Tree is built bottom-up (always balanced) Main Index Contents Building Huffman Tree d:6 a:16 b:4 e:20 f:3 c:8 Priority Queue 27 Main Index Contents Building Huffman Tree (Cont…) d:6 a:16 7 e:20 f:3 c:8 Priority Queue 28 Main Index Contents b:4 Building Huffman Tree (Cont…) 29 Main Index Contents Building Huffman Tree (Cont…) 30 Main Index Contents Summary Slide 1 §- Heap - an array-based tree that has heap order - maximum heap: if v[i] is a parent, then v[i] v[2i+1] and v[i] v[2i+2] (a parent is its children) - root, v[0], is the maximum value in the vector - minimum heap: the parent is its children. - v[0] is the minimum value - Insertion: place the new value at the back of the heap and filtering it up the tree. 31 Main Index Contents Summary Slide 2 §- Heap (Cont…) - Deletion: exchanging its value with the back of the heap and then filtering the new root down the tree, which now has one less element. - Insert and delete running time: O(log2 n) - heapifying: apply the filter-down operation to the interior nodes, from the last interior node in the tree down to the root - running time: O(n) - The O(n log2 n) heapsort algorithm heapifies a vector and erases repeatedly from the heap, locating each deleted value in its final position. 32 Main Index Contents Summary Slide 3 §- Binary File - a sequence of 8-bit characters without the requirement that a character be printable and with no concern for a newline sequence that terminates lines - often organized as a sequence of records: record 0, record 1, record 2, ..., record n-1. - uses for both input and output, and the C++ file <fstream> contains the operations to support these types of files. - the open() function must use the attribute ios::binary 33 Main Index Contents Summary Slide 4 §- Binary File (Cont…) - For direct access to a file record, use the function seekg(), which moves the file pointer to a file record. - accepts an argument that specifies motion from the beginning of the file (ios::beg), from the current position of the file pointer (ios::cur), and from the end of the file (ios::end) - use read() function to inputs a sequence of bytes from the file into block of memory and write() function to output from a block of memory to a binary file 34 Main Index Contents Summary Slide 5 §- Bit Manipulation Operators - | (OR), & (AND), ^ (XOR), ~ (NOT), << (shift left), and >> (shift right) - use to perform operations on specific bits within a character or integer value. - The class, bitVector, use operator overloading - treat a sequence of bits as an array, with bit 0 the left-most bit of the sequence - bit(), set(), and clear() allow access to specific bits - the class has I/O operations for binary files and the stream operator << that outputs a bit vector as an ASCII sequence of 0 and 1 values. 35 Main Index Contents Summary Slide 6 §- File Compression Algorithm - encodes a file as sequence of characters that consume less disk space than the original file. - Two types of compression algorithms: 1) lossless compression – restores the original file. – approach: count the frequency of occurrence of each character in the file and assign a prefix bit code to each character - file size: the sum of the products of each bit-code length and the frequency of occurrence of the corresponding character. 36 Main Index Contents Summary Slide 7 §- File Compression Algorithm (Cont…) 2) lossy compression – loses some information during compression and the data cannot be recovered exactly – normally used with sound and video files - The Huffman compression algorithm builds optimal prefix codes by constructing a full tree with the most frequently occurring characters and shorter bit codes near the top of the tree. The less frequently occurring characters occur near the bottom of the tree and have longer bit codes. 37 Main Index Contents Summary Slide 8 §- File Compression Algorithm (Cont…) - If the file contains n distinct characters, the loop concludes after n-1 iterations, having built the Huffman Tree. - implementation requires the use of a heap, bit operations, and binary files - The use of the bitVector class simplifies the construction of the classes hCompress and hDecompress, which perform Huffman compression and decompression. - works better with textfiles; they tend to have fewer unique characters than binary files. 38 Main Index Contents