Download 06-IntroToDataStructures

Document related concepts

B-tree wikipedia , lookup

Array data structure wikipedia , lookup

Linked list wikipedia , lookup

Transcript
Data Structures
 A data structure is the building block of programming
 It defines how data is organized and consequently has
influence on how data is allocated in a computer memory
 The importance of data structures for algorithm
efficiency cannot be overemphasized.
 The efficiency of most algorithms is directly linked to the
data structure choice
 Some algorithms are basically a data structure definition
(plus the operations associated with the structure)
 For the same amount of data, different data structures can
take more or less space in the computer memory
© Ronaldo Menezes, Florida Tech
Data Structures
 We mentioned in the previous slide the
operations associated with the structure. What
are these?
 A data structure is not passive
 It consists of data and operations to manipulate the
data
 They are implementation of ADTs (Abstract Data
Types)
© Ronaldo Menezes, Florida Tech
Linear vs. Nonlinear
 For a structure to be linear all of the above has
to be true
 There is a unique first element
 There is a unique last element
 Every component has a unique predecessor
(except the first)
 Every component has a unique successor (except
the last)
 If at least one of the above is false, the structure
is nonlinear
© Ronaldo Menezes, Florida Tech
Direct vs. Sequential Access
 In any linear data structure we have to methods of
access the data stored
 Sequential structures are such that we can only access the
Nth element we have to accessed all element preceding N.
 This means that all elements from 1 to N-1 will have to be
accessed first.
 Think of how a specific music is accessed in an iPod shuffle
(with the shuffle off!)
 Direct access structures are such that any element of the
structure can be accessed directly.
 There is no need to access any other object besides the element
required.
 Think CD player here.
© Ronaldo Menezes, Florida Tech
Elementary Data Structures
Elementary Data Structures
Linear
Nonlinear
Sequential
Access
Direct Access
Set
Homogeneous
Components
Heterogeneous
Components
General
LIFO
FIFO
Array
Record
List
Stack
Queue
Arrays
 One of the most common types of data structures
 Normally pre-defined in most programming languages
 Has the advantage of providing direct access to
elements
 But also disadvantages:
 Fixed size
 Homogeneous elements
 Normally implemented by using contiguous allocation
of memory cells
 This is not however required at the ADT definition level.
 The array implementation may give the impression of
contiguousness.
© Ronaldo Menezes, Florida Tech
Arrays as ADTs
 Domain
 A collection of fixed number of components of the same
type
 A set of indexes used to access the data stored in the
array.
 There is a one-to-one relation between index and objects stored.
 Operations
 valueAt(i): Index i is used to access the value stored in
the corresponding position of the array
 Most languages use the [ i ] as syntax instead of valueAt.
 store(i,v): Stores the value v into the array position i
 Most languages use the = operator
© Ronaldo Menezes, Florida Tech
Prime Testing
 Given a sequence of numbers from 2..N where
N is given as an input, write an algorithm that
finds all prime numbers between 2 and N.
 How do we do this?
 If I allow you to use an array of size N can you
make use of the direct access that exist in the
array to improve your solution?
© Ronaldo Menezes, Florida Tech
Sieve of Eratosthenes
(Prime Testing)
public class Sieve {
public static void main (String args[]) {
int n = Integer.parseInt(args[0]);
boolean numbers[] = new boolean[n+1];
for (int i = 2; i <= n; i++) {
numbers[i] = true;
}
for (int i = 2; i <= n; i++) {
if (numbers[i]) {
for (int j = i; j*i <= n; j++) {
numbers[j*i] = false;
}
}
}
for (int i = 2; i <= n; i++) {
if (numbers[i]) {
System.out.println(i);
}
}
}
}
© Ronaldo Menezes, Florida Tech
Bernoulli Trials
 Named after the Swiss mathematician Jacob
Bernoulli
 The term is used to refer to an experiment on a
random process that could have 2 possible
outcomes
 Success or failure
 Processes are independent
 Did the gambler win or loose?
 Is the child a boy or girl?
 If I flip a coin does it land heads or tails?
 A Bernoulli process is a series of executions of
Bernoulli Trials
© Ronaldo Menezes, Florida Tech
Coin Flipping Simulation
(simulation of Bernoulli trials)
public class CoinFlippingSimulation {
private static boolean heads() {
return (Math.random() < 0.5);
}
public static void main (String args[]) {
int cnt = 0, j;
int n = Integer.parseInt(args[0]);
int m = Integer.parseInt(args[1]);
int[] result = new int[n+1];
for (int i = 0; i < m; i++) {
cnt = 0;
for (j = 0; j < n; j++) {
if (heads()) cnt++;
}
result[cnt]++;
}
for (j = 0; j <= n; j++) {
if (result[j] == 0) {
System.out.print(".");
}
for (int i = 0; i < result[j]; i+=10) {
System.out.print("*");
}
System.out.println();
}
}
}
© Ronaldo Menezes, Florida Tech
Histogram of Execution
16 Flips
1000 Executions
Circular Lists
 Circular linked lists are just a variation of a singly linked list
 They have the same "internal" structure (internal nodes).
 The difference lies on the fact that link member of the last node
points to the first instead of pointing to null
 Circular list are useful when we're not interested on which
node is the first or the last in the list
 In this structure the head can point to any node and it is not used
to keep nodes from being lost in the list.
 Some problems are better described using circular lists
 eg. the Josephus election
 Algorithms for dealing with circular linked lists are simpler than
for singly linked lists because we don't need to test for special
cases
 If the last node needs to be identified, we need to test
differently (from other linked lists)
 The node that is pointing to the same location as the head is the
last one.
© Ronaldo Menezes, Florida Tech
Pictorial View of Circular Linked Lists
head
5
12
© Ronaldo Menezes, Florida Tech
9
Josephus Election
class Josephus {
static class Node {
int val; Node next;
Node(int v) { val = v; }
}
public static void main(String[] args) {
int N = Integer.parseInt(args[0]);
int M = Integer.parseInt(args[1]);
// creation of circular list
Node t = new Node(1);
Node x = t;
for (int i = 2; i <= N; i++) {
x = (x.next = new Node(i));
}
x.next = t;
while (x != x.next) {
// finds element to eliminate
for (int i = 1; i < M; i++) {
x = x.next;
}
// eliminate element
x.next = x.next.next;
}
System.out.println("The elected is " + x.val);
}
}
© Ronaldo Menezes, Florida Tech
Generalized Josephus Election
(Take Home Quiz)
 Implement the “cat” version of the Josephus
where members have multiple lives.
 In this version you should also be allowed to
choose where on the list you’d like to start
 Later compare your implementations with the
standard Josephus problem given in the previous
slide
 The generalized solution with the number of lives being
1 should give the same solution as the standard
version.
 Due on Monday
 Hand printout of program to me.
© Ronaldo Menezes, Florida Tech
What is a List?
 It is a linear structure that provides only sequential access
to its elements
 It normally has two "special" named elements called head
and tail.
 They point to the first and last element of the list respectively
 Lists differ from arrays because they don't have a fixed size
but like arrays, they can only store elements of the same
type
 A homogeneous structure
 A well defined list can be used as the basis for the
implementation of several other data structures, such as
queues and stacks.
 Main advantage over arrays is easy insertion and deletion of
nodes
 When implemented using dynamic allocation
© Ronaldo Menezes, Florida Tech
Lists as ADTs
 Domain
 A collection elements of the same type
 A list cursor which enable us to walk in the list from position
1 to n
 Operations
 create(): create a new list. Should be done using
constructors
 isEmpty() & isFull(): returns true if the list is empty and full
respectively
 insertEnd(v) & insertBeginning(v): Insert the value v either
in the end or in the beginning of the list
 delete(v): deletes first occurrence of a value v
 deleteAll(v): Delete all occurrences of v in the list
 reset(): Delete all elements in the list.
© Ronaldo Menezes, Florida Tech
Implementation via Arrays
create()
0
1
2
3
4
MAX_SIZE = 8
size = 0
head = -1
tail = 0
© Ronaldo Menezes, Florida Tech
5
6
7
Implementation via Arrays
insertEnd(40)
0
1
2
3
4
12
3
33
7
2
v 40
© Ronaldo Menezes, Florida Tech
5
6
7
Implementation via Arrays
insertEnd(40)
0
1
2
3
4
5
12
3
33
7
2
40
if list is not full
list[tail++] = v
size++
© Ronaldo Menezes, Florida Tech
6
7
Implementation via Arrays
insertBeginning(22)
0
1
2
3
4
5
12
3
33
7
2
40
v 22
© Ronaldo Menezes, Florida Tech
6
7
Implementation via Arrays
insertBeginning(22)
0
1
2
3
4
5
6
12
3
33
7
2
40
40
v 22
© Ronaldo Menezes, Florida Tech
7
Implementation via Arrays
insertBeginning(22)
0
1
2
3
4
5
6
12
3
33
7
2
2
40
v 22
© Ronaldo Menezes, Florida Tech
7
Implementation via Arrays
insertBeginning(22)
0
1
2
3
4
5
6
12
3
33
7
7
2
40
v 22
© Ronaldo Menezes, Florida Tech
7
Implementation via Arrays
insertBeginning(22)
0
1
2
3
4
5
6
12
3
33
33
7
2
40
v 22
© Ronaldo Menezes, Florida Tech
7
Implementation via Arrays
insertBeginning(22)
0
1
2
3
4
5
6
12
3
3
33
7
2
40
v 22
© Ronaldo Menezes, Florida Tech
7
Implementation via Arrays
insertBeginning(22)
0
1
2
3
4
5
6
12
12
3
33
7
2
40
v 22
© Ronaldo Menezes, Florida Tech
7
Implementation via Arrays
insertBeginning(22)
0
22
1
2
3
4
5
6
12
3
33
7
2
40
if list is not full
for i=(tail-1) to 0
list[i+1] = list[i]
list[0] = v
tail++
size++
© Ronaldo Menezes, Florida Tech
7
Implementation via Arrays
delete(33)
0
22
1
2
3
4
5
6
12
3
33
7
2
40
© Ronaldo Menezes, Florida Tech
7
Implementation via Arrays
delete(33)
0
22
1
2
3
4
5
6
12
3
33
7
2
40
is this 33?
© Ronaldo Menezes, Florida Tech
7
Implementation via Arrays
delete(33)
0
22
1
2
3
4
5
6
12
3
33
7
2
40
is this 33?
© Ronaldo Menezes, Florida Tech
7
Implementation via Arrays
delete(33)
0
22
1
2
3
4
5
6
12
3
33
7
2
40
is this 33?
© Ronaldo Menezes, Florida Tech
7
Implementation via Arrays
delete(33)
0
22
1
2
3
4
5
6
12
3
33
7
2
40
is this 33?
© Ronaldo Menezes, Florida Tech
7
Implementation via Arrays
delete(33)
0
22
1
2
3
4
5
6
12
3
7
7
2
40
© Ronaldo Menezes, Florida Tech
7
Implementation via Arrays
delete(33)
0
22
1
2
3
4
5
6
12
3
7
2
2
40
© Ronaldo Menezes, Florida Tech
7
Implementation via Arrays
delete(33)
0
22
1
2
3
4
5
6
12
3
7
2
40
40
© Ronaldo Menezes, Florida Tech
7
Implementation via Arrays
delete(33)
0
22
1
2
3
4
5
12
3
7
2
40
6
if list is not empty
i = head+1;
while (v != list[i] and i < tail)
i++;
if (i < tail)
for j=i to (tail-2)
list[i] = list[i+1];
list[tail-1] = null;
size--;
tail--;
© Ronaldo Menezes, Florida Tech
7
Linked Lists
 Another (and better) alternative to implement lists is the use of
dynamic allocation (aka Linked Lists).
 It consists of a collection of records called nodes each containing at
least one member that gives the location of the next node in the list.
 Note that in the array-lists discussed before, the location of the “next” node was
implicit
 In the simplest case a node contains a data member representing the
value and a link member representing the location of the successor of
this node.
 Because we have the link member we can store nodes
anywhere in memory and not necessarily contiguously.
 The standard pictorial view of a linked list is:
head
5
12
© Ronaldo Menezes, Florida Tech
9
/
Advantages and Disadvantages
 Advantages of Dynamic Lists (or Linked Lists)
 Fair use of memory
 Size of the structure does not need to be declared
in advance
 Common operations (insert, delete, etc.) are
cheaper
 Disadvantages
 Algorithms are more complex, harder to read and
harder to debug
 Allocation and de-allocation of memory space
impose some overhead to the performance of the
algorithm
© Ronaldo Menezes, Florida Tech
Inserting a New Node
head
5
12
/
© Ronaldo Menezes, Florida Tech
Inserting a New Node
head
5
12
/
temp
13
© Ronaldo Menezes, Florida Tech
Inserting a New Node
head
5
12
/
temp
13
© Ronaldo Menezes, Florida Tech
Inserting a New Node
head
5
12
/
temp
13
© Ronaldo Menezes, Florida Tech
Inserting a New Node
head
5
12
/
13
© Ronaldo Menezes, Florida Tech
Inserting at index 2
head
13
5
12
/
© Ronaldo Menezes, Florida Tech
Inserting at index 2
head
13
5
12
/
index
© Ronaldo Menezes, Florida Tech
Inserting at index 2
head
5
13
12
/
index
temp
© Ronaldo Menezes, Florida Tech
7
Inserting at index 2
head
5
13
12
/
index
temp
© Ronaldo Menezes, Florida Tech
7
Inserting at index 2
head
13
5
12
/
7
© Ronaldo Menezes, Florida Tech
Insertion Checks
 When examining a list, we most often have to
consider three separate cases
 The node is the first node of the list
 The node is an “interior” node
 The node is the last node of the list
 The above is the reason why some programs
choose to use sentinels in their linked lists.
© Ronaldo Menezes, Florida Tech
Deleting element 7
head
13
5
7
12
/
© Ronaldo Menezes, Florida Tech
Deleting element 7
head
13
5
7
12
/
current
© Ronaldo Menezes, Florida Tech
Deleting element 7
head
13
5
7
12
/
temp
current
© Ronaldo Menezes, Florida Tech
Deleting element 7
head
13
5
7
12
/
temp
current
© Ronaldo Menezes, Florida Tech
Deleting element 7
head
13
5
12
/
temp
current
© Ronaldo Menezes, Florida Tech
Deleting element 7
head
13
5
12
/
© Ronaldo Menezes, Florida Tech
Array Implementation of
Linked Lists
 It is common to associate linked lists with
dynamic data but the two concepts are
independent.
 Arrays can be used to implement linked lists
 Linked Lists are very much related to the machine
memory
 So what if we simulate what is going on in
memory?
 The standard way to do it is to have
 An array of “nodes” which will simulate the
memory
 An array of boolean which keeps track of the free
© Ronaldo Menezes, Florida Tech
Pictorial View
public class Node {
char data;
int next;
...
}
Node[] list = new Node[100];
int head;
boolean[] free = new boolean[100];
head
5
[0]
[1]
G
[2]
[3]
[4]
F
2
-1
[0]
[1]
[2]
[3]
FALSE
TRUE
FALSE
TRUE
[4]
TRUE
© Ronaldo Menezes, Florida Tech
[5]
P
...
0
...
[5]
FALSE
...