Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Basic Data Structures • Stack and Queue • List • Bucket and Hash Memorize the Data • Memorization is a basic function of computers • The efficiency on the utility depends on the way (structure) of memorization cost for memorization, cost for search… - write the data to a book as they come - make shelves of books • We should choose the ways, according to the objectives to store the data Way of memorization is said “data structure” The Way of Memory • The unit of memory can keep a value - write the data to a book as they come allocate memory of some quantity, and write data from memory with the smallest indices array, stack, queue… - make shelves of books structure the memory units by linking them with indices/pointers list, heap, binary tree, bucket, hash… See them, one by one Memory by Array Question: we want to memorize the data, that come one by one. How do we memorize them? Answer: write the latest at the next of the current end • However, memory has no function of keeping the time the value is written • Even if they have, it is not easy to find the last one; there are huge number of memory units • …thus, we use a memory as a variable keeping the position of the unit of written last Stack • See an example array V counter 1 0 2 V • The structure of a pair of “array” and “counter (for the last position)” is called stack (the counter is called a “stack pointer”) Delete a Value • Next, we think about reading and deleting the values written to the memory - read an arbitral value, and delete it read the last one (at stack pointer), and decrease stack pointer - delete ”xxx”-th value copy the last position to “xxx”, and decrease stack pointer - delete the values “xxx” scanning the array is needed array V V V V V stack pointer 5 A Stack Subroutine • Implementation of stack • Use with “stack” and “value” int STACK_push ( STACK *S, int a){ if ( S->t == S->max ) return (1); // overflow error typedef struct { S->h[S->t] = a; int *h; // array for data S->t ++; int end; // size of array return (0); int t; // counter } } STACK int STACK_pop ( STACK *S, int *a){ if ( S->t == 0 ) return (1); // underflow error S->t --; *a = S->h[S->t]; return (0); } h[] V V V V t V end Examples of Usage • Reverse the given string ABCDEFGH • “undo” function for word processors … end h[] Column: Stack without Overflow • Stack has a limit given by the array size • In some case, we don’t know the amount of data to be stacked (such as, read file and memorize all the numbers in the file; it’s OK if we are allowed to scan the file before the execution…) • When an overflow occurs, we make a new stack of a larger size, and copy the old one to the new one • However, if we increment the size by one, overflow occurs in every insertion, and thus wastful Column: Stack without Overflow (2) • When we make a new stack, doubling the size is efficient Once overflow occurs, the number of cells of stacks existing in the memory at the same time is bounded by the number of values times three • The total cost for copy is also bounded by the twice the current number of values no loss in the sense of time complexity Column: Stack without Overflow (3) • A code is written as this void STACK_push ( STACK *S, int a){ if ( S->t == S->max ){ // overflow error int i, *h = malloc (sizeof(int)*max*2 ); // using realloc is easy for ( i=0 ; i<S->t ; i++ ) h[i] = S->h[i]; free ( S->h ); S->h = h; } S->h[S->t] = a; S->t ++; } FILO • Read and delete an arbitral one value (= the last one) used in the case, for example, a user put ★’s on the display and computer deletes all when the button is pushed () - The value written last is read first - Such a data structure is called FILO(First In Last Out) counter 5 array V V V V V Queue; FIFO • In some case, we want to read first the value written first (FIFO; First In First Out ) For example, delete ★’s in the order of putting counter services follow this rule (customer = value) • Such a data structure is called ”queue” counter 5 array V V V V V Counters for Queue • A queue needs a pointer to indicate “the place at which the value is written first” so, we need two counters (pointers), for the position to be read, and the position to be written the position to be read is called “head” that to be written is called “tail” head tail array V V V V V 2 7 Overflow • Stack overflows when numbers come more than its size • Queue overflows after inserting n+1 values, even though we deleted many values Set the tail to the head of the array, when overflow head also • When the tail passes the head, really overflow occurs head tail array V V V V V 5 10 0 Adjustment for Passing • When the tail catches up the head, something happens (all cells are written some values) this situation is the same as the empty queue! we cannot distinguish them • Ways out of this are + prepare flag to distinguish them (one bit) + not write the last cell (size of queue will be n-1) head tail array V V V V V V V V V V 5 5 A Subroutine for Queue • An implementation of queue is the following • they input a queue structure and the value to be written int QUEUE_ins ( QUEUE *Q, int a){ if (( Q->t +1 ) % Q->end == Q->s ) return (1); // overflow error Q->h[Q->t] = a; typedef struct { Q->t = ( Q->t +1 ) % Q->end; int *h; // array for data return (0); int end; // size of array } int s; // counter for head int t; // counter for tail int QUEUE_ext ( QUEUE *S, int *a){ } QUEUE if ( Q->s == Q->t ) return (1); // underflow error *a = Q->h[Q->s]; Q->s = ( Q->s +1 ) % Q->end; return (0); t s end } h[] V V V Example of Usages • Input numbers one by one, and output five of them at once, at some points • Draw the trajectory of the mouse cursor, with the fixed length (store the locations of mouse cursor in queue (ex., 30 locations in each second), and delete the ones before the specified period) end h[] List: Ins/Del with Keeping the Order • Arrays are simple and useful, but need much cost for keeping the ordering • Can we have advantage for the ordering, with possibly lose some advantages on other functions For example, random access can be lost (we can access to the k-th element in constant time for any k) + customers in the line of a counter service, with allowing cancel and breaking into the line + edition of document; insert/delete/move words, sentences, and sections, even pictures, in the sequence of letters (and objects) Idea: Simulate a Chain • A (real) chain is useful, in such a situation however, finding the kth is not light simulate this structure in the computer • In a chain, the neighboring relations are fixed, but the place is not. Thus, each ring (cell) of the chain can be located at any place in the memory, and adjacency has to be kept • Each cell has to store its neighbor (previous, and next) thus, each cell has three values 1 + the value + the previous cell (position, or pointer) + the next cell (position, or pointer) 5 7 3 Strategy for Insertion/deletion • When detach/insert a ring of a chain, we, cut the relation to the neighbors (of the ring) for a cell, change the “adjacent relation” of its “neighbors” • For insertion, change the adjacency relation of the cells, on the place to be inserted (the inserting cell becomes a new neighbor) • For deletion, directly connect each other, the neighbors of removing cell • Both can be done in constant time when the target cell is given, and does not change the order 1 5 7 3 Structure Using Pointer • Define this structure and allocate one block for each request no limit for the size (length), while arrays have Note: definition of LIST needs LIST itself, thus we need a trick of using _LIST_ typedef struct { • The head and tail of a list has to be kept in memory, otherwise the list will be lost in the large memory space (a LIST structure can point head/tail) LIST *prv; // pointer LIST *nxt; // pointer int h; // value } LIST typedef struct _LIST_ { struct _LIST_ *prv; // pointer struct _LIST_ *nxt; // pointer int h; // value } LIST Code (Initialization) • For initialization, prepare a LIST structure ● as the root of the list, and set nxt and prv to itself, to represent an empty list ● int LIST_init ( LIST *L ){ L->prv = L; L->nxt = L; } • After inserting several cells to this empty list, the nxt/prv of ● points head / tail ● 1 5 7 3 Insertion • Insertion is done with giving (the pointers to) the cell to be inserted, and the cell just before the place to be inserted change the pointers of the cells on the place int LIST_ins ( LIST *l, LIST *p ){ p->nxt->prv = l; l->nxt = p->nxt; p->nxt = l; l->prv = p; } ● 1 5 7 3 • Notice that the order of changing the pointers is crucial In some bad orderings, the operation will not be done correctly Deletion • Deletion of a given cell is done by connecting the previous cell and the next cell by pointers int LIST_del ( LIST *l ){ l->nxt->prv = l->prv; l->prv->nxt = l->nxt; } ● 1 5 7 3 • The pointers of “l” need not to be modified, since it is out (further, if we want to recover the cell in the list, in future, we can immediately identify the place to be recovered by looking at the non-deleted pointers) In Usual Textbooks • Generally, head/tail are supposed to point “NULL” nxt/prv is NULL, then that is the end list 1 5 7 3 • Theoretically beautifully, but bothering for programming Insertion/deletion concerned with the edge needs an exception Insert before NULL, set the prv of NULL to X are impossible, so we have to avoid them by several if statements with considering the place to be operated Loop along a List • Tracing a list can be done by going to the nxt repeatedly, starting from ● LIST *p; int e; for ( p=●->nxt ; p!=● ; p=p->nxt ){ e = p->h; … } ● 1 • Opposite direction is done by using prv 5 7 3 Recover a Cell • A cell just removed (the neighbors are not operated) can be recovered by inserting it to the position at which the cell was int LIST_recov ( LIST *l ){ LIST_ins ( l, l->prv); } ● 1 5 7 3 • The position is stored at prv/nxt • In this way, we can recover all removed cells in the opposite order of the removal • Removed cells can be identified by setting prv := NULL still nxt indicates the position 1 5 7 3 Usages of List • Insert 1 to n to a list one by one so that i is inserted to the position next to j that is randomly chosen from 1…i-1 (random permutation is generated in linear time) • Jobs in a time scale, new job comes, and some jobs will be canceled A Sophisticated Usage • We have n pairs of values: (x1,y1) ,…, (xn,yn) • We want to know the nk’/m th largest x value in the pairs whose y has rank of at most k, for each k, k’ (=1,…,m) • Straightforward method spends O( n2 log n) time • A sophisticated algorithm using a list spends only O( n( m+log n)) time A Sophisticated Usage (2) • Make a list of pairs sorted by values of x • Store pointers to nk/m th largest values for all k • Make a list of pairs sorted by values of y, and trace the list from the largest; at the same time, the currently visited pair is removed from the first list • Update the positions of rank of 1/m th to m/m th This can be done by shifting the positions to right or left, according to the value of removed cell (we know the number of cells on the left side and right side) • Intuitively, the complexity is; O( n2 log n) O( n( m+log n)) thus n times faster Single Link List • If we are always given the previous cell for insertion/deletion, we do not have to have pointers to the previous cell (only to the next cell) ● 1 5 7 3 • Operations will be limited, memory/program/speed will be efficient + Making slides? + Merging sorted sequences of numbers List Realized by Array • List is realizable by arrays of cells instead of bothering pointers (on memory allocation, segmentation fault,…) Advantage: cells have indices, thus we can easily allocate weight/extra value for all cells just allocating an array Disadvantage: array needs cost to re-size • Many applications in the real world needs fixed number of cells, thus no disadvantage • In this case, all cells are stored in one structure typedef struct { int *prv; // index to previous int *nxt; // index to next int *h; // value } ALIST Example of Array List • The i-th cells of arrays h, prv, and nxt are h, prv, and nxt of cell i • Consider the first/last cell as the root of the list 0 1 2 3 h V V V V 4 (●) prv 4 0 3 1 2 nxt 1 3 4 2 0 typedef struct { int *prv; // index to previous int *nxt; // index to next int *h; // value } ALIST Bucket • Queues and lists are useful but not so for the search ex) find all values of 1 digit • Some structure would make the search efficient • A simple case is classification by values, since we usually want to find values near by the given key Idea of Bucket • We prepare one structure (array, list, etc.) for each value, then the values are classified Ex) we are given numbers from 0 to 99, and classify them according to their digits in ten’s place • Each structure is realized by a list, since we don’t know the size of the structures after inputting all the numbers 0 1 2 3 4 5 6 7 8 9 Usage of Bucket • Sorting numbers by their digits in ten’s place • Transposition of a sparse matrix A: 1,9,5 B: 1,2 C: 1,6,7 D: 4,5 0 1 2 3 4 5 6 7 8 9 Application: Radix Sort • Buckets can sort the numbers by a digit in linear time • Using this, we repeatedly sort the numbers from lower-orders to higher orders, with keeping the past order in the ties • We need two buckets, but they can be one; we insert all numbers in the first bucket, keeping the ordering, and re-sort by scanning the bucket 0 1 2 3 4 5 6 7 8 9 Hash • Buckets are useful when the values are classified finely, however, then, we need many buckets we need huge memory moreover, scanning the bucket takes long time • … then, are their any trade-off; such as high, but not perfect, accuracy, and non-large memory • Further, we can restrict ourselves to just “find this value” neither “larger than this”, nor “between XXX and YYY” Idea of Hash • Consider the case of string data • Question ”Is string S inside this bucket?” will be answered quickly, if we prepare buckets for all possibilities needs much memory space • However, we can assume usually that strings are not many • So, let’s use the first two letters for the bucket classification Two strings will be in the same bucket even if they have different third letters; this reduces the memory for buckets if a bucket is empty, checking its inside is light however, if it contains many, check involves long scan so the operation will be heavy Bucket for Strings • ”Doesn’t bucket accept only of numbers?” • Yes! Thus we have to convert a string to a number (index), and classify the strings according to the index 1: ABCABC 2: ABBBBB 3: CCCBBB • ”First two letters” is converted to a number, such as when alphabet size is three, suppose that A=0,B=1,C=2 and regard a string as a 3-digit number AB 1, CC 8 0 1 2 3 4 5 6 7 8 9 Bias of Distribution • Sometimes, first two letters are not uniform in the data ex., English words (“st”, “th”, and “re” are frequent) some buckets will have many, and the others will have few • … then, can we use some good mapping functions instead of “first two letters”? (such function is called hash function, the functional value of a data is called hash key, or hash value) further, considering the real world applications, similar value should have (much) different hash values Ex) For x1,x2,x3 ,… , the modulo of (x1)1+(x2)2+(x3)3 and ((x1+1)x2+1)x3… by (#buckets) Determine the Size • How can we determine the size of hash? (#buckets) • Basically, any bucket should have few, in particular, 1 or 2 values originally, we have n values, thus O(n) is acceptable for #buckets • Then, we can set them to constant multiplication of n particularly, we have no loss of space complexity • As same as stacks, we double the size when the hash overflows no loss on the time complexity Summary Stack and queue: combination of array and counters adopts sequentially coming data List: store the adjacency relation between data so that the order of the data is kept efficiently on insertion and deletion Bucket: make the search easy by classifying the data by their values Hash: buckets with hash keys for keeping both classification accuracy and memory efficiency high