Download Data Structures

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Data Structures
Alan, Tam Siu Lung
[email protected]
96397999
99967891
Prerequisite
• Familiarity with Pascal/C/C++
• Asymptotic Complexity
• Techniques learnt
–
–
–
–
–
Recursion
Divide and Conquer
Exhaustion
Greedy
[Dynamic Programming exempted]
• Algorithms learnt
– Bubble / Insertion / Selection / Shell / Merge / Quick /
Bucket / Radix Sorting
– Linear / Binary / Interpolation Searching
What our Programming Language provides?
• Built-in Data Types
–
–
–
–
–
–
–
Character/String (length limit?)
Integral (signed/unsigned 8 [?], 16, 32, 64 [?] bit)
Floating Point (signed/unsigned 32, 64, 80 [?] bit)
Fixed Point [?]
Complex [?]
Pointer/Reference
Function Pointer/Reference
What our Programming Language provides?
• Aggregate Data Types
– Array [base-definable?]
• Multiple Values of same type
• Access by numeric index
– Record/Struct/Class
• Multiple Values of different types
• Function Aggregation + Inheritance + Polymorphism
[?]
– Unions [?]
What our Programming Language provides?
• Built-in Language Constructs
– Branching (If, Else)
– Loops (For, While, Until)
– Function/Procedure Calling
• In C++’s view, statements and operators are
functions as well
• a = b  int &operator=(int &a, const int &b)
• a > b  bool operator>(const int &a, const int &b)
• *a  int &operator*(int *a)
• a[b]  string &operator[](string &a[], int b)
– Recursion
– Even more for more sophisticated languages!
For most of the remaining time
• We concentrate at
–
–
–
–
Pointer
Array
Record
and how they interact
• We will use a C++-like notation
–
–
–
–
array<int> meaning an array of integer
int* is acronym of pointer<int>
Records are written as: struct<int, int, string>
Capital types are “variables” which means it can
be replaced by any types
Formal Definition: Pointer
• Concept:
– pointer<Type> p; (Type *p) [^p in Pascal]
• Operations:
– *p  Type &operator*(Type *p) [p^ in Pascal]
• Returns the pointed value
• Error if p is null/nil
– &y  Type *operator&(Type &p) [@y in Pascal]
• Returns the address of a value
– p = x  Type *operator=(Type *p, Type *x)
• Pointer assignment
Formal Definition: Pointer
• More Operators
– p < q  bool operator<(Type *p, Type *q)
• Returns if pointer p is smaller
– ++p  Type *operator++(Type *p) [inc(p) in
Turbo Pascal]
• Point to next element (in an array)
– --p  Type *operator--(Type *p) [dec(p) in Turbo
Pascal]
• Point to previous element (in an array)
– p + n  Type *operator+(Type *p, int n) [not in
Turbo Pascal]
• Point to nth next element (in an array)
Programming Syntax: Pointer
int main() {
int a[10];
int *b = &a[1];
*b = 1;
b = new int(2);
delete b;
b = 0;
}
var
a : array[1..10] of integer;
b : ^integer;
begin
b = @a[2];
b^ = 1;
new(b);
b^ = 2;
dispose(b);
b = nil;
end.
Array
• Concept
– array<Type, Size : int>
– array<Type, Lower : int, Upper : int>
• Operations
– Type &operator[](Type a[], int index)
• Requires 0 <= index < Size
• Requires Lower <= index <= Upper
• Analysis
– a[x] is equivalent to *(a + x)
– which is equivalent to (Type *)(@a + x * sizeof(a))
– It is sometimes slower than necessary!
Example: Prime Finding
• primes[] stores all primes found
primes[0] = 2;
for each i
for each v in primes[]
if (v * v > i) then begin
primes.add(n);
break;
end;
if (i mod v = 0) then break;
Solution
#include <iostream>
using namespace std;
int main() {
int primes[100], *last = primes;
cout << (*last++ = 2) << endl;
for (int i = 3; i < 100; ++i) {
int *j = primes;
do {
if (*j * *j > i) {
cout << (*last++ = i) << endl;
break;
}
if (i % *j == 0) break;
} while (++j < last);
}
}
var
primes: array[1..100] of integer;
i : integer; last, j: ^integer;
begin
last := @primes;
last^ := 2; inc(last);
for i := 3 to 100 do begin
j := @primes;
repeat
if j^ * j^ > i then begin
last^ := i; inc(last);
writeln(i); break;
end;
if (i mod j^ = 0) break;
inc(j);
until j >= last;
end;
end.
Record
•
•
•
•
Like Arrays
Identified by names instead of index
Each name is associated with a type
Pair is a special record with 2 elements, Key
and Value
– Keys are unique (i.e. keys identify records)
– Keys are comparable (i.e. sort-able) [sometimes]
– Since Value can itself be a record, all records
with a unique portion can be represented as a
pair)
Programming Syntax: Record
struct Point {
double x, y;
};
struct Rect {
Point tl, br;
int color;
};
int main() {
Rect rect;
rect.color = 255;
rect.tl.x = 0.0;
}
type
Point = record x, y : real; end;
Rect = record
tl, br : Point;
color : integer;
end;
var
rect : Rect;
begin
rect.color := 255;
rect.tl.x := 0.0;
with rect do begin
color := 255;
tl.x := 0.0;
end;
end.
Linked List
• Combining Pointer and Record
• linkedlist<string>:
type
pNode = ^Node;
Node: record
value : string;
next : pNode;
end;
var
head: pNode;
Linked List
• Operations
– void Add(linkedlist<Type> p, Type &v)
• Add an element to the Linked List
– Node *Search(linkedlist<Type> p, Type &v)
• Returns null/nil if not found
– void InsertAfter(Node node, Type &v)
• Insert an element after another
– void Remove(Node node)
• How to implement?
• C++: x->y == (*x).y
Linked List Implementation
Node *list;
void Add(int v) {
Node *old = list;
list = new Node();
list.next = old;
list.value = v;
}
Node *Search(int v) {
for (Node *p = list; p; p = p->next)
if (p->value == v) return v;
return 0;
}
Node *InsertAfter(Node *n, int v) {
Node *old = n.next;
n.next = new Node();
n.next.next = old;
n.next.value = v;
}
var
list: pNode;
procedure Add(v : integer);
var old : pNode;
old := list;
new(list);
list.next := old;
list.value := v;
}
function Search(v : integer) : pNode;
var n : pNode;
begin
n := list;
while (n <> nil) and (n^.value <> v)
do p := p^.next;
Search := n;
end;
{ InsertAfter is similar to Add }
Array Implementation
/
1
2
3
K N V
K N V
1
2
3
4
5
1
2
3
4
5
2
3
4
5
/
/
/
/
/
/
Add 2
1
2
K N V
1
2
3
4
5
/
3
4
5
/
2
/
/
/
/
Add 3
/
1
4
5
/
2
3
/
/
/
Array Implementation
2
3
Remove 2
2
1
K N V
K N V
1
2
3
4
5
1
2
3
4
5
/
1
4
5
/
2
3
/
/
/
Remove 3
1
2
K N V
1
2
3
4
5
/
3
4
5
/
2
/
/
/
/
3
/
4
5
/
/
3
/
/
/
Abstraction
• Both of the implementations feature the same complexity
–
–
–
–
O(1) Addition
O(n) Searching
O(1) Insertion
O(1) Removal
• Sometimes we don’t care how it gets implemented
– We only want a data structure which provides the operations we
want.
• We define Abstract Data Types (ADTs) to mean a collection of
Data Structures providing certain operations
– Plane
– Polynomial
– Graph
• We don’t even care how fast the operations in an ADT are,
though practically we do
Dictionary (Map, Associative Array)
• Dictionary is unordered container of kv-pairs
• map<Key, Value>
– void Insert(map<Key, Value> &c, Key &key,
Value &value)
– int Size(map<Key, Value> &c)
– Value &Search(map<Key, Value> &list, Key &key)
– void Delete(map<Key, Value> &list, Key &key)
List ADT
• List ADT is ordered container of kv-pairs
• list<Key, Value>
–
–
–
–
–
–
–
void Insert(list<Key, Value> &c, int pos, Type &value)
Type &Find-ith(list<Key, Value> &c, int pos)
void Delete-ith(list<Key, Value> &c, int pos)
int Size(list<Key, Value>)
Type &Search(list<Key, Value> &c, Key &key)
void Delete(list<Key, Value> &c, Key &key)
…
• A List can be implemented by array (Vector/Table),
linked list (LinkedList), etc
• A List is also a Dictionary
Time Complexity
Average Case
Add
Remove
Search
Array
O(1)
O(n)
O(n)
Sorted Array
O(n)
O(n)
O(lg n)
Linked List
O(1)
O(n)
O(n)
• We seldom remove anyway
• There is no way to make both Add/Search fast
• In general, it is difficult if we do not depend on
features of the Key
Direct Addressing Implementation
0 Ant


5 Boy


99 Car
• Use the Vector ADT
• The key is the location
• Efficient: O(1) for all
operations
• Infeasible: if the key can range
from 1 to 20000000000, if the
key is not numeric ...
Hash Function
• Hash Function: hm(k)
• Map all keys “by calculation” into an integer
domain, e.g. 0 to m ─ 1
• E.g. CRC32 hashes strings into 32-bit
integer (i.e. m = 232)
–
–
–
–
Alan: 1598313570
Max: 3452409927
Man: 943766770
On: 2246271074
Hash Table Implementation
•
•
•
•
Use a Table<int, Value> ADT of size m
Use hm(Key) as the key
All operations can be done like using Table
Solved except
– Collision: What to do if two different k have same h(k)
– How to find a suitable hash function
• If good hash functions are used, hash tables
provide near O(1) insertion, searching and removal
– But it is difficult to get it right
– And it is not easy to code
– C++: hash_map<Key, Value, hash_func>
• Read 2003 Advanced Notes on Hash Table if you
are motivated enough
Binary Search Tree Implementation
• Sorted Array is fast for searching
– But it is slow when inserted at front
• Idea
– Store separate arrays
– If value < v, insert to left array
– If value >= v, insert to right array
• Now we have a Data Structure which is
– Worst Case N / 2 + 1 insertion (N in the past)
– lg(N) + 1 searching
v
Binary Search Tree Implementation
• Now we have a Data Structure which is
– N / 2 + 1 insertion (N in the past)
– lg(N) + 1 searching
• If we store “N / 2” elements in this DS
– N / 4 + 1 insertion
– lg(N) searching
• If both of left and right arrays use this DS [Recursion]
– N / 4 + 2 insertion
– lg(N) + 1 searching
• Continue this process lg(N) times
– lg(N) + 2 insertion
– lg(N) + 1 searching
– How will it look like?
Binary Search Tree Implementation
struct Node {
Node *left, right;
int *value;
};
6
3
1
8
4
7
9
7.5
type
pNode = ^Node;
Node = record
left, right : ^Node;
value : int;
end;
Introduction to Tree
•
•
•
•
•
•
Node
Root
Leaf / Internal
Parent / Children
[Proper] Ancestors / Descendants
Siblings
Binary Search Tree Implementation
• Operations
– Searching
• If target < current, go to left
• If target > current, go to right
– Insertion
• Search
• Insert it there
– Removal
• If it is leaf, just remove it.
• Otherwise, the smallest one larger than it is leaf. Replace!
• Worst Case
– If input is sorted, the tree will become …
– What can we do?
– C++: map<Key, Value, comparator>
Recess
Have a break!
Stack ADT
• Something your compiler has implemented
for you.
void pow(int x, int n) {
if (n == 0) return 1;
int v = pow(x, n / 2);
if (n % 2 == 0) return v * v;
return x * v * v;
}
• pow(3, 5)→pow(3, 2)→pow(3, 1)→pow(3, 0)
Stack ADT
• But
– It mandates what to be put in stack
– It couples control flow with data flow
• So we will still implement our own stack
• Last-in-first-out
– When do we need this behavior?
• Array?
– Fast, but fixed size
– C++: stack<Type>
Array Implementation of Stack
int stack[100];
int top = 0;
void push(int v) {
stack[top++] = v;
}
int pop() {
return stack[--top];
}
var
stack : array[1..100] of integer;
top : integer;
procedure push(v : integer);
begin
inc(top);
stack[top] := v;
end;
function pop : integer;
begin
pop := stack[top];
dec(top);
end;
Queue ADT
• First-in-first-out
– When do we need this behavior?
– Major use is Breadth First Search in Graph
• Array?
– Fast, but fixed size
– Circular?
– C++: queue<Type>
Array Implementation of Queue
int queue[100];
int head = 0, tail = 0;
void enqueue(int v) {
queue[tail++] = v;
}
int dequeue() {
return queue[head++];
}
var
queue : array[1..100] of integer;
head, tail : integer;
procedure enqueue(v : integer);
begin
inc(tail);
stack[tail] := v;
end;
function dequeue : integer;
begin
inc(head);
pop := stack[head];
end;
Priority Queue ADT
• PriorityQueue<Priority, Value>
– void Push(Priority &p, Value& v)
• Add an element
– Value &Top()
• Returns the element with maximum priority
– void Pop()
• Remove the element with maximum priority
• Again both Array and Linked List can do it
suboptimally. A maximum heap can finish
Push and Pop in O(lg n) and Top in O(1).
• C++: priority_queue<Type, comparator>
Heap
• In an array with N elements
– We can obtain maximum value of an array in O(1) time
if every Add() updates this value.
– But removal of it destroys all knowledge and requires
N – 1 operations to recalculate.
• If we have 2 arrays of N / 2 elements
– We only need N / 2 time because only the array with
maximum extracted is recalculated.
6
2 6 3 4 2 5 3
8
3 1 5 7 8 5 4
Heap
8
2 7 3 4 2 5 3 3 1 5 6 5 4
7
2 3 4 2 5 3
4
2 3
2
6
3 1 5 5 4
3
2
5
4
5
3 1
5
2 3
3
3
1
4
Heap
8
7
6
4
2
3
2
5
5
5
3
3
1
4
Heap
7
6
4
2
3
2
5
5
5
3
3
1
4
Heap
7
6
4
2
3
2
5
5
5
3
3
1
4
Heap
7
5
6
2
5
5
4
3
2
3
3
1
4
Heap
7
5
6
4
2
3
2
5
5
3
3
1
4
Heap
8
7
6
4
2
3
2
5
5
5
3
3
1
4
Heap
4
7
6
4
2
3
2
5
5
5
3
3
1
8
Heap
7
4
6
4
2
3
2
5
5
5
3
3
1
8
Heap
7
5
6
4
2
3
2
5
5
4
3
3
1
8
Heap
•
•
•
•
•
•
•
•
•
•
Left Complete Binary Tree
1 2 3 4 5 6 7 8 91011121314
[8, 7, 6, 4, 5, 5, 5, 2, 3, 2, 3, 3, 1, 4]
[4, 7, 6, 4, 5, 5, 5, 2, 3, 2, 3, 3, 1] 8
[7, 4, 6, 4, 5, 5, 5, 2, 3, 2, 3, 3, 1] 8
[7, 5, 6, 4, 4, 5, 5, 2, 3, 2, 3, 3, 1] 8
[1, 5, 6, 4, 4, 5, 5, 2, 3, 2, 3, 3] 7, 8
[6, 5, 1, 4, 4, 5, 5, 2, 3, 2, 3, 3] 7, 8
[6, 5, 5, 4, 4, 1, 5, 2, 3, 2, 3, 3] 7, 8
[6, 5, 5, 4, 4, 3, 5, 2, 3, 2, 3, 1] 7, 8
Related documents