Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Quadtree wikipedia , lookup

Linked list wikipedia , lookup

Array data structure wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
Data Structure
5/3/2017
Che-Rung Lee
5/3/2017
CS135601 Introduction to Information Engineering
1
Data abstraction
• Main memory is organized as a sequence
of addressable cells, but the data we want
to model is usually not.
• Use “model” and “simulation”
5/3/2017
CS135601 Introduction to Information Engineering
2
Pointers
• What is a pointer?
– A special data that records memory address
• Example in C
int a = 3;
int *p = NULL;
p = &a;
variable
a
p
address
0x03
0x04
value
3
5
0
0x03
*p = 5;
5/3/2017
CS135601 Introduction to Information Engineering
3
Outline
•
•
•
•
•
Customized data type
Array and list
Stack and queue
Trees
Hash table
5/3/2017
CS135601 Introduction to Information Engineering
4
Customized Data Type
5/3/2017
CS135601 Introduction to Information Engineering
5
How to model a warrior?
•
•
•
•
•
•
•
Class
Skills
Equipments
Life point
Magic point
Money
…
5/3/2017
Diablo III
But computers only have primitive data
types: integer, real, character, and Boolean.
CS135601 Introduction to Information Engineering
6
User-defined data types
• Conglomerate of primitive data types
collected under a single name
• Example in C: struct
User-defined data type
typedef struct {
char class[10]; // Barbarian, Witch, Wizard or Monk
int lifePoint; // min is 0, max is 100
int level;
// min is 1, max is 72
…
} Warrior;
An instance of
type Warrior
Warrior player1;
player1.lifePoint = 100;
5/3/2017
CS135601 Introduction to Information Engineering
7
Abstract data type
• A full model of abstract data type should
include the operations of the model
– Like +-*/, input, output for primitive data types
• Example in C++: class
class Warrior {
char class[10]; // Barbarian, Witch, Wizard or Monk
…
void fight(….); // function that defines the action “fight”
};
– This is called an object, which we will talk
more in the programming language lesson.
5/3/2017
CS135601 Introduction to Information Engineering
8
Heterogeneous array
• The storage that contains different types of
data is called a heterogeneous array
– struct and class are heterogeneous arrays
– The items are called components.
– The storage that contains the same type of
data is called a homogeneous array
• Example
5/3/2017
struct {
char Name[25];
int
Age;
int
SkillRating;} Employee;
CS135601 Introduction to Information Engineering
9
Storage of heterogeneous array
• Static method:
– components are stored
one after the other in a
contiguous block
Meredith W Linsmeyer
• Dynamic method:
– components are stored
in separate locations
identified by pointers
5/3/2017
23 6.2
pointers
Meredith W Linsmeyer
23
6.2
CS135601 Introduction to Information Engineering
10
Array and List
5/3/2017
CS135601 Introduction to Information Engineering
11
When to use arrays?
• Stock prices, student names, temperature
readings
– One dimensional array
• Matrix, images, the grades of class, train
schedule
– Two dimensional array
• Computed Tomography(斷層掃描)
– Three dimensional array
5/3/2017
CS135601 Introduction to Information Engineering
12
Storing arrays
• Use a variable to denote the address of
the first element
– Ex: int Readings[24];
Relative address
called “index”
0
1
2
3
5/3/2017
In C, the index
starts from 0
CS135601 Introduction to Information Engineering
13
Two dimensional array
• Two dimensional array is stored in a one
dimensional memory cells.
• Two ways to order the data
column
row
Row major order
a11
a12
a13
a11 a12 a13 a21 a22 a23 a31 a32 a33 a41 a42 a43
a21
a22
a23
Column major order
a31
a32
a33
a41
a42
a43
a11 a21 a31 a41 a12 a22 a32 a42 a13 a23 a33 a43
– What is the memory location of A[2][3] in the
row (column)
major order?
5/3/2017
CS135601 Introduction to Information Engineering
14
High dimensional array
• Consider the dimensional array A[m][n][k]
– What is the size of the array?
– What is the memory location of A[1][2][3] in
the row major order?
This changes first
• The row major order
– What is the memory location of A[1][2][3] in
the column major order?
• The row major order
5/3/2017
CS135601 Introduction to Information Engineering
This changes first
15
When to use list?
• List is a collection of data which are
arranged sequentially.
– One dimensional array is a list of elements
– Two dimensional array can be viewed as a
list of rows/columns
– A string is a list of characters
– Music is a list of sounds
– Stacks and queues can be implemented
using lists
• We will talk those later
5/3/2017
CS135601 Introduction to Information Engineering
16
Contiguous list
• List is stored in a contiguous block of
memory cells (an array)
– Ex: list of names. Each name is occupied 8
bytes.
5/3/2017
CS135601 Introduction to Information Engineering
17
Linked list
• List in which each entries are linked by
pointers
– Head pointer: Pointer to first entry in list
– NIL pointer: A “non-pointer” value used to
indicate end of list
Use customized data
type to define
5/3/2017
CS135601 Introduction to Information Engineering
18
Static v.s. dynamic data structures
• Static data structures:
– Size and shape does not change
– Contiguous list
– Easily to locate elements. No need to store
address.
• Dynamic data structures:
– Size and shape can change
– Linked list
– Easily to delete/insert elements
5/3/2017
CS135601 Introduction to Information Engineering
19
Linked list: delete/insert element
• Delete
• Insert
5/3/2017
CS135601 Introduction to Information Engineering
20
Stack and Queue
5/3/2017
CS135601 Introduction to Information Engineering
21
What is a stack?
• A list in which entries are removed and
inserted only at the head
– Top: The head of stack
– Bottom or base: The tail of stack
– Push: To insert an entry at the top
– Pop: To remove the entry at the top
– LIFO: Last-in-first-out
top
bottom
5/3/2017
CS135601 Introduction to Information Engineering
22
When to use stacks?
• When the algorithm needs data LIFO?
– EX1: reverse a word, ABCCBA
• Push A
• Push B
• Push C
• Pop C
• Pop B
• Pop A
A
B
– EX2: check matching parentheses (3*[(1+1)*2]
• Push “(“
• Push “[“
• Push “(“
5/3/2017
• Find “)”, pop “(“, matched
• Find “]”, pop “[“, matched
• No more “)”, but still one “(“ in stack,
not matched
CS135601 Introduction to Information Engineering
23
C
Stack implementation
• Using a list + a pointer (head)
5/3/2017
CS135601 Introduction to Information Engineering
24
Queue
• A list in which entries are removed at the
head and are inserted at the tail.
– Enqueue: insert an entry at the tail
– Dequeue: remove an entry at the head
– FIFO: First-in-first-out
5/3/2017
Tail
CS135601 Introduction to Information Engineering
Head
25
Examples of using queues
• Ex1: the job queues
in operating system
• Ex2: simulation of the Josephus problem
– Dequeue 1
– Enqueue 1
– Dequeue 2
– Dequeue 3
– Enqueue 3
5/3/2017
6
5
4
3
2
1
Operation counts  2n
CS135601 Introduction to Information Engineering
26
Queue implementation
• A list + 2 pointers (head+tail)
– Enqueue A, B, C
– Dequeue A, enqueue D
– Dequeue B, enqueue E
Head pointer
Tail pointer
• If using a static list, the
queue crawls through
memory as entities are
inserted and removed.
5/3/2017
CS135601 Introduction to Information Engineering
A
B
C
D
E
27
Circular queue
• A technique that uses a fixed region of
memory space to implement queue.
head
tail
A
E
B
C
Enqueue A, B, C
Dequeue A, Enqueue D
Dequeue B, Enqueue E
5/3/2017
CS135601 Introduction to Information Engineering
D
28
Trees
5/3/2017
CS135601 Introduction to Information Engineering
29
What is a tree?
• A collection of nodes that are linked in a
hierarchical structure, in which every node
is linked by one parent, except the root.
– Node: An entry in a tree
– Parent: The node immediately
above a specified node
– Root: The node at the top
– Terminal or leaf node:
A node at the bottom
5/3/2017
CS135601 Introduction to Information Engineering
30
Hierarchical relations
• Parent: The node immediately above a node
– The parent of F is B
• Child: A node immediately below a node
– The children of C are G and H.
A
• Ancestor: Parent, parent of parent, etc.
– The ancestor of K are F, B, and A.
B
C
D
• Descendent: Child, child of child, etc.
– The descendent of B are E, F, K, and L.
E
• Siblings: Nodes sharing a common
parent
F
K
G
H
I
J
L
– The siblings of C are B and D.
5/3/2017
CS135601 Introduction to Information Engineering
31
Depth and height
• Textbook’s definition
A
– The depth of a tree is the longest
path from the root to a leaf node
• The length of a path is the
number of nodes on the path
• Ex: the depth of the tree is 4
B
E
• Conventional definition
C
F
K
G
D
H
I
J
L
• Use the word “height” instead of depth
• The length of a path is the number of links on the path
• Ex: The height of the tree is 3 (= 4 – 1)
5/3/2017
CS135601 Introduction to Information Engineering
32
What are trees used for?
• Representing hierarchical data
– Organization chart
• Searching data
– Game tree
5/3/2017
CS135601 Introduction to Information Engineering
33
Binary tree
• A tree in which each parent has at most
two children
Left child
Left subtree
5/3/2017
Right child
Right subtree
CS135601 Introduction to Information Engineering
34
Storing a binary tree in a list
• This is called a heap in some applications.
5/3/2017
CS135601 Introduction to Information Engineering
35
Advantages of using heap
• Easily to find the index of parent & children
– Parent(B) = [index of B] / 2 = 1
– LeftChild(B) = [index of B]*2 = 4
– RightChild(B) = [index of B]*2 + 1= 5
5/3/2017
CS135601 Introduction to Information Engineering
36
Problems for heap
• Heap is inefficient for storing the binary
tree that is sparse and unbalanced
– Sparse: most node has one or zero child
– Unbalanced: the right subtree is much larger
than the left
subtree, or
vice versa
5/3/2017
CS135601 Introduction to Information Engineering
37
Storing a binary tree
using pointers
• Each node
Use customized
data type to
define
5/3/2017
CS135601 Introduction to Information Engineering
38
Recursive structure
• Tree is a recursive structure
– The subtrees of a tree are trees
• The recursive algorithms for
a binary tree may look like this
procedure some_operation (root)
if (root is not NULL) then
( call some_operation(root.left_child)
do some operations on root
call some_operation(root.right_child))
– It is a depth first, in order algorithm for tree
5/3/2017
CS135601 Introduction to Information Engineering
39
Hash Table
5/3/2017
CS135601 Introduction to Information Engineering
40
Search
• Search is a common task in daily life
– Phone book: given a name, fine the phone
number
– Dictionary: given a word, find it’s definition
– Map: given an address, find the location or
route
– DNS: given an URL, find it’s IP address
• Tree can be used to speedup searches.
– How? And what is the operation count?
5/3/2017
CS135601 Introduction to Information Engineering
41
Constant time search
• Something can be found in constant time
– EX: What is fifth element of the array A? A[4]
• An array is like a lookup table. One can
use the index to query and get the value
• Can we use this idea to organize data so
that searches can be done in the constant
time?
– Hash table (or hash map)
5/3/2017
CS135601 Introduction to Information Engineering
42
Hash table
• Each record of data has a key field
– Key is like the index of an array.
– An unique identification of the data (ideally)
• The storage space is divided into buckets
– Each bucket is like an array cell.
– Each record is stored in the bucket
corresponding to its key, so it can be retrieved
in constant time
5/3/2017
CS135601 Introduction to Information Engineering
43
How to define the mapping?
• Unique identification of a record is usually
too large to be the index for storage
– For example, the ASCII code for a string
We do not want to create such a large array!!
5/3/2017
CS135601 Introduction to Information Engineering
44
Hash function
• A hash function computes a bucket
number for each key value
– EX: suppose there are only 41 buckets.
5/3/2017
CS135601 Introduction to Information Engineering
45
Problem
• Collision: The case of two or more keys
hashing to the same bucket
– Major problem when table is over 75% full
5/3/2017
CS135601 Introduction to Information Engineering
46
Solutions
• Use linked lists to store collided data
– The search time becomes linear to the
number of collided data
• Increase the number of buckets and
rehash all data
– Time/space tradeoff
• Design a better hash function/algorithm
– It’s a research problem
5/3/2017
CS135601 Introduction to Information Engineering
47
References
• Textbook 8.1, 8.2, 8.3, 8.5, 9.5
• Wikipedia
• Thomas H. Cormen, Charles E. Leiserson,
Ronald L. Rivest, Clifford Stein,
“Introduction to Algorithms”
Related courses
• 資料結構,演算法,程式語言
5/3/2017
CS135601 Introduction to Information Engineering
48