Download int

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Hash table wikipedia , lookup

Rainbow table wikipedia , lookup

Linked list wikipedia , lookup

Array data structure wikipedia , lookup

Transcript
Basic Data Structures
• Stack and Queue
• List
• Bucket and Hash
Memorize the Data
• Memorization is a basic function of computers
• The efficiency on the utility depends on the way (structure)
of memorization
 cost for memorization, cost for search…
- write the data to a book as they come
- make shelves of books
• We should choose the ways, according to the objectives to
store the data
Way of memorization is said “data structure”
The Way of Memory
• The unit of memory can keep a value
- write the data to a book as they come
 allocate memory of some quantity, and write data from
memory with the smallest indices
 array, stack, queue…
- make shelves of books
 structure the memory units by linking them with
indices/pointers
 list, heap, binary tree, bucket, hash…
See them, one by one
Memory by Array
Question: we want to memorize the data, that come one by one.
How do we memorize them?
 Answer: write the latest at the next of the current end
• However, memory has no function of keeping the time the
value is written
• Even if they have, it is not easy to find the last one; there are
huge number of memory units
• …thus, we use a memory as a variable keeping the position of
the unit of written last
Stack
• See an example
array
V
counter
1
0
2
V
• The structure of a pair of “array” and “counter (for the last
position)” is called stack
(the counter is called a “stack pointer”)
Delete a Value
• Next, we think about reading and deleting the values written to
the memory
- read an arbitral value, and delete it
 read the last one (at stack pointer), and decrease stack pointer
- delete ”xxx”-th value
 copy the last position to “xxx”, and decrease stack pointer
- delete the values “xxx”
 scanning the array is needed
array
V
V
V
V
V
stack pointer 5
A Stack Subroutine
• Implementation of stack
• Use with “stack” and “value”
int STACK_push ( STACK *S, int a){
if ( S->t == S->max ) return (1); // overflow error typedef struct {
S->h[S->t] = a;
int *h; // array for data
S->t ++;
int end; // size of array
return (0);
int t; // counter
}
} STACK
int STACK_pop ( STACK *S, int *a){
if ( S->t == 0 ) return (1); // underflow error
S->t --;
*a = S->h[S->t];
return (0);
}
h[] V
V
V
V
t
V
end
Examples of Usage
• Reverse the given string
ABCDEFGH
• “undo” function for word processors
…
end
h[]
Column: Stack without Overflow
• Stack has a limit given by the array size
• In some case, we don’t know the amount of data to be stacked
(such as, read file and memorize all the numbers in the file;
it’s OK if we are allowed to scan the file before the execution…)
• When an overflow occurs, we make a new stack of a larger size,
and copy the old one to the new one
• However, if we increment the size by one, overflow occurs in
every insertion, and thus wastful
Column: Stack without Overflow (2)
• When we make a new stack, doubling the size is efficient
 Once overflow occurs, the number of cells of stacks existing
in the memory at the same time is bounded by the number of
values times three
• The total cost for copy is also bounded by the twice the current
number of values
 no loss in the sense of time complexity
Column: Stack without Overflow (3)
• A code is written as this
void STACK_push ( STACK *S, int a){
if ( S->t == S->max ){ // overflow error
int i, *h = malloc (sizeof(int)*max*2 ); // using realloc is easy
for ( i=0 ; i<S->t ; i++ ) h[i] = S->h[i];
free ( S->h );
S->h = h;
}
S->h[S->t] = a;
S->t ++;
}
FILO
• Read and delete an arbitral one value (= the last one)
 used in the case, for example, a user put ★’s on the display
and computer deletes all when the button is pushed ()
- The value written last is read first
- Such a data structure is called FILO(First In Last Out)
counter 5
array
V
V
V
V
V
Queue; FIFO
• In some case, we want to read first the value written first
(FIFO; First In First Out )
 For example, delete ★’s in the order of putting
 counter services follow this rule (customer = value)
• Such a data structure is called ”queue”
counter 5
array
V
V
V
V
V
Counters for Queue
• A queue needs a pointer to indicate “the place at which the
value is written first”
 so, we need two counters (pointers), for the position to be
read, and the position to be written
 the position to be read is called “head”
that to be written is called “tail”
head
tail
array
V
V
V
V
V
2
7
Overflow
• Stack overflows when numbers come more than its size
• Queue overflows after inserting n+1 values, even though we
deleted many values
 Set the tail to the head of the array, when overflow
 head also
• When the tail passes the head, really overflow occurs
head
tail
array
V
V
V
V
V
5
10
0
Adjustment for Passing
• When the tail catches up the head, something happens
(all cells are written some values)
 this situation is the same as the empty queue!
 we cannot distinguish them
• Ways out of this are
+ prepare flag to distinguish them (one bit)
+ not write the last cell (size of queue will be n-1)
head
tail
array
V
V
V
V
V
V
V
V
V
V
5
5
A Subroutine for Queue
• An implementation of queue is the following
• they input a queue structure and the value to be written
int QUEUE_ins ( QUEUE *Q, int a){
if (( Q->t +1 ) % Q->end == Q->s ) return (1); // overflow error
Q->h[Q->t] = a;
typedef struct {
Q->t = ( Q->t +1 ) % Q->end;
int *h; // array for data
return (0);
int end; // size of array
}
int s; // counter for head
int t; // counter for tail
int QUEUE_ext ( QUEUE *S, int *a){
} QUEUE
if ( Q->s == Q->t ) return (1); // underflow error
*a = Q->h[Q->s];
Q->s = ( Q->s +1 ) % Q->end;
return (0);
t
s
end
}
h[]
V
V
V
Example of Usages
• Input numbers one by one, and output five of them at once, at
some points
• Draw the trajectory of the mouse cursor, with the fixed length
(store the locations of mouse cursor in queue (ex., 30 locations
in each second), and delete the ones before the specified
period)
end
h[]
List: Ins/Del with Keeping the Order
• Arrays are simple and useful, but need much cost for keeping
the ordering
• Can we have advantage for the ordering, with possibly lose
some advantages on other functions
 For example, random access can be lost
(we can access to the k-th element in constant time for any k)
+ customers in the line of a counter service, with allowing
cancel and breaking into the line
+ edition of document; insert/delete/move words, sentences,
and sections, even pictures, in the sequence of letters (and
objects)
Idea: Simulate a Chain
• A (real) chain is useful, in such a situation
however, finding the kth is not light
 simulate this structure in the computer
• In a chain, the neighboring relations are fixed, but the place is
not. Thus, each ring (cell) of the chain can be located at any
place in the memory, and adjacency has to be kept
• Each cell has to store its neighbor (previous, and next)
thus, each cell has three values
1
+ the value
+ the previous cell (position, or pointer)
+ the next cell (position, or pointer)
5
7
3
Strategy for Insertion/deletion
• When detach/insert a ring of a chain, we,
 cut the relation to the neighbors (of the ring)
 for a cell, change the “adjacent relation” of its “neighbors”
• For insertion, change the adjacency relation of the cells, on the
place to be inserted (the inserting cell becomes a new neighbor)
• For deletion, directly connect each other, the neighbors of
removing cell
• Both can be done in constant time when the target cell is given,
and does not change the order
1
5
7
3
Structure Using Pointer
• Define this structure and allocate one block for each request
 no limit for the size (length), while arrays have
Note: definition of LIST needs LIST itself,
thus we need a trick of using _LIST_ typedef struct {
• The head and tail of a list has to be
kept in memory, otherwise the list
will be lost in the large memory space
(a LIST structure can point head/tail)
LIST *prv; // pointer
LIST *nxt; // pointer
int h; // value
} LIST
typedef struct _LIST_ {
struct _LIST_ *prv; // pointer
struct _LIST_ *nxt; // pointer
int h; // value
} LIST
Code (Initialization)
• For initialization, prepare a LIST structure ● as the root of the
list, and set nxt and prv to itself, to represent an empty list
●
int LIST_init ( LIST *L ){
L->prv = L;
L->nxt = L;
}
• After inserting several cells to this empty list, the nxt/prv of ●
points head / tail
●
1
5
7
3
Insertion
• Insertion is done with giving (the pointers to) the cell to be
inserted, and the cell just before the place to be inserted
change the pointers of the cells on the place
int LIST_ins ( LIST *l, LIST *p ){
p->nxt->prv = l;
l->nxt = p->nxt;
p->nxt = l;
l->prv = p;
}
●
1
5
7
3
• Notice that the order of changing the pointers is crucial
In some bad orderings, the operation will not be done correctly
Deletion
• Deletion of a given cell is done by connecting the previous cell
and the next cell by pointers
int LIST_del ( LIST *l ){
l->nxt->prv = l->prv;
l->prv->nxt = l->nxt;
}
●
1
5
7
3
• The pointers of “l” need not to be modified, since it is out
(further, if we want to recover the cell in the list, in future, we
can immediately identify the place to be recovered by looking
at the non-deleted pointers)
In Usual Textbooks
• Generally, head/tail are supposed to point “NULL”
 nxt/prv is NULL, then that is the end
list
1
5
7
3
• Theoretically beautifully, but bothering for programming
 Insertion/deletion concerned with the edge needs an exception
 Insert before NULL, set the prv of NULL to X are impossible,
so we have to avoid them by several if statements with
considering the place to be operated
Loop along a List
• Tracing a list can be done by going to the nxt repeatedly,
starting from ●
LIST *p;
int e;
for ( p=●->nxt ; p!=● ; p=p->nxt ){
e = p->h;
…
}
●
1
• Opposite direction is done by using prv
5
7
3
Recover a Cell
• A cell just removed (the neighbors are not operated) can be
recovered by inserting it to the position at which the cell was
int LIST_recov ( LIST *l ){
LIST_ins ( l, l->prv);
}
●
1
5
7
3
• The position is stored at prv/nxt
• In this way, we can recover all removed cells in the opposite
order of the removal
• Removed cells can be identified by setting prv := NULL
still nxt indicates the position
1
5
7
3
Usages of List
• Insert 1 to n to a list one by one so that i is inserted to the
position next to j that is randomly chosen from 1…i-1
(random permutation is generated in linear time)
• Jobs in a time scale, new job comes, and some jobs will be
canceled
A Sophisticated Usage
• We have n pairs of values: (x1,y1) ,…, (xn,yn)
• We want to know the nk’/m th largest x value in the pairs
whose y has rank of at most k, for each k, k’ (=1,…,m)
• Straightforward method spends O( n2 log n) time
• A sophisticated algorithm using a
list spends only O( n( m+log n)) time
A Sophisticated Usage (2)
• Make a list of pairs sorted by values of x
• Store pointers to nk/m th largest values for all k
• Make a list of pairs sorted by values of y, and trace the list
from the largest; at the same time, the currently visited pair is
removed from the first list
• Update the positions of rank of 1/m th to m/m th
This can be done by shifting the positions to right or left,
according to the value of removed cell (we know the number
of cells on the left side and right side)
• Intuitively, the complexity is; O( n2 log n)  O( n( m+log n))
thus n times faster
Single Link List
• If we are always given the previous cell for insertion/deletion, we
do not have to have pointers to the previous cell
(only to the next cell)
●
1
5
7
3
• Operations will be limited, memory/program/speed will be efficient
+ Making slides?
+ Merging sorted sequences
of numbers
List Realized by Array
• List is realizable by arrays of cells instead of bothering pointers
(on memory allocation, segmentation fault,…)
Advantage: cells have indices, thus we can easily allocate
weight/extra value for all cells just allocating an array
Disadvantage: array needs cost to re-size
• Many applications in the real world needs fixed number of
cells, thus no disadvantage
• In this case, all cells are stored
in one structure
typedef struct {
int *prv; // index to previous
int *nxt; // index to next
int *h; // value
} ALIST
Example of Array List
• The i-th cells of arrays h, prv, and nxt are h, prv, and nxt of cell i
• Consider the first/last cell as the root of the list
0 1 2 3
h V V V V
4 (●)
prv 4
0
3
1
2
nxt 1
3
4
2
0
typedef struct {
int *prv; // index to previous
int *nxt; // index to next
int *h; // value
} ALIST
Bucket
• Queues and lists are useful but not so for the search
 ex) find all values of 1 digit
• Some structure would make the search efficient
• A simple case is classification by values, since we usually want to
find values near by the given key
Idea of Bucket
• We prepare one structure (array, list, etc.) for each value, then
the values are classified
Ex) we are given numbers from 0 to 99, and classify them
according to their digits in ten’s place
• Each structure is realized by a list, since we don’t know the size
of the structures after inputting all the numbers
0 1 2 3 4 5 6 7 8 9
Usage of Bucket
• Sorting numbers by their digits in ten’s place
• Transposition of a sparse matrix
A: 1,9,5
B: 1,2
C: 1,6,7
D: 4,5
0 1 2 3 4 5 6 7 8 9
Application: Radix Sort
• Buckets can sort the numbers by a digit in linear time
• Using this, we repeatedly sort the numbers from lower-orders
to higher orders, with keeping the past order in the ties
• We need two buckets, but they can be one;
we insert all numbers in the first bucket, keeping the ordering,
and re-sort by scanning the bucket
0 1 2 3 4 5 6 7 8 9
Hash
• Buckets are useful when the values are classified finely,
however, then, we need many buckets
 we need huge memory
moreover, scanning the bucket takes long time
• … then, are their any trade-off; such as high, but not perfect,
accuracy, and non-large memory
• Further, we can restrict ourselves to just “find this value”
neither “larger than this”, nor “between XXX and YYY”
Idea of Hash
• Consider the case of string data
• Question ”Is string S inside this bucket?” will be answered quickly,
if we prepare buckets for all possibilities
 needs much memory space
• However, we can assume usually that strings are not many
• So, let’s use the first two letters for the bucket classification
 Two strings will be in the same bucket even if they have different
third letters; this reduces the memory for buckets
 if a bucket is empty, checking its inside is light
however, if it contains many, check involves long scan so the
operation will be heavy
Bucket for Strings
• ”Doesn’t bucket accept only of numbers?”
• Yes! Thus we have to convert a string to a number (index), and
classify the strings according to the index
1: ABCABC
2: ABBBBB
3: CCCBBB
• ”First two letters” is converted to a number, such as when alphabet
size is three, suppose that A=0,B=1,C=2 and regard a string as a
3-digit number
AB  1,
CC  8
0 1 2 3 4 5 6 7 8 9
Bias of Distribution
• Sometimes, first two letters are not uniform in the data
ex., English words (“st”, “th”, and “re” are frequent)
 some buckets will have many, and the others will have few
• … then, can we use some good mapping functions instead of “first
two letters”?
(such function is called hash function, the functional value of a
data is called hash key, or hash value)
 further, considering the real world applications, similar value
should have (much) different hash values
Ex) For x1,x2,x3 ,… , the modulo of (x1)1+(x2)2+(x3)3 and
((x1+1)x2+1)x3… by (#buckets)
Determine the Size
• How can we determine the size of hash? (#buckets)
• Basically, any bucket should have few, in particular, 1 or 2 values
 originally, we have n values, thus O(n) is acceptable for #buckets
• Then, we can set them to constant multiplication of n
particularly, we have no loss of space complexity
• As same as stacks, we double the size when the hash overflows
 no loss on the time complexity
Summary
Stack and queue: combination of array and counters adopts
sequentially coming data
List: store the adjacency relation between data so that the order of
the data is kept efficiently on insertion and deletion
Bucket: make the search easy by classifying the data by their values
Hash: buckets with hash keys for keeping both classification
accuracy and memory efficiency high