Download Lecture 6: Intro to Data Structures and the Standard Template Library

Document related concepts

Lattice model (finance) wikipedia , lookup

Quadtree wikipedia , lookup

Array data structure wikipedia , lookup

Interval tree wikipedia , lookup

B-tree wikipedia , lookup

Linked list wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
COMS W3101: Programming
Languages (C++)
Instructor: Austin Reiter
Lecture 6
Outline for Today (Last Lecture)
• Intro to Data Structures
• Standard Template Library (STL)
Last Homework
• HW 4 questions?
• Show example robot output
DISCLAIMER
• Today we are condensing what is usually a semesterlong course into two hours!
• Take it with a grain of salt
– I’m just trying to introduce the tools, what’s out there, and
hope you play with them on your own
• We’ve spent the entire course on the “rules and
practices” of C++
– STL is an entire other area of study of C++
– I wish we did an entire 6-week course on STL alone!
– Now that you know templates and the rules of objects,
hopefully you can appreciate the powers of the library.
Data Structures
• Up to now we’ve studied fixed-size data
structures (arrays)
• More useful are dynamically-sized data
structures: grow and shrink during execution (size
unknown during compile time)
• Also, the data structures are arranged
(conceptually) different than arrays
– Ex: the data doesn’t need to be arranged contiguously
in memory. This often helps speed up certain
processes (sorting, searching, reordering, etc)
Data Structures
• These data structures are implemented
independent of type
– Templates!
• The concepts of how the data is arranged is
independent of what is being stored
– However, as usual, you must consider the
operations being done to your data in the storage
container.
• Ex: many containers store things as sorted in some way.
So your structure must have a concept of “less than”
Data Structures
• Vector: just like an array, but can grow and shrink dynamically
• Linked List: collection of data items logically “lined up in a row”
– We can insert and remove anywhere in the list
• Stack: list of items arranged in a last-in, first-out ordering.
– Insertions and removals are only made at the top of the stack
– Very important for compilers and operating systems
• Think about memory allocations: Stack-vs-Heap
• Queue: opposite of stacks; arranged in a first-in, first-out ordering.
– Insertions are made at the back and removals are made at the front
– Like a “waiting line”
Data Structures
• Binary Tree: useful for high-speed searching
and sorting of data.
– Often useful for representation of file directories
• In the data structures we present today, we
use classes, class templates, inheritance and
many other concepts we’ve already learned to
create and package reusable and maintainable
data structure!
STL
• This prepares us for using the Standard
Template Library (STL), which is a major part
of the C++ Standard Library.
• Once we understand the structures and
concepts they represent, we can make more
informed decisions about which are best for
our applications
• They are all implemented as templates
Self-Referential Classes
• A self-referential class contains a pointer
member to a class object of the same class
type:
class Node
{
public:
Node( int );
void setData( int );
int getData() const;
void setNextPtr( Node * );
Node* getNextPtr() const;
private:
int data;
Node* nextPtr;
};
//
//
//
//
//
constructor
set data member
get data member
set pointer to next Node
get pointer to next Node
// data stored in this Node
// pointer to another object of same type
Self-Referential Classes
• The member nextPtr is a link. It can “tie”
an object of type Node to another object of
the same type.
• These types of objects can be linked together
to form useful data structures such as lists,
queues, stacks and trees

15

10
2 self-referential class objects linked
together to form a list.
Self-Referential Classes
• The member nextPtr is a link. It can “tie”
an object of type Node to another object of
the same type.
• These types of objects can be linked together
to form useful data structures such as lists,
queues, stacks and trees

15

10
This represents a NULL “next” Node ptr.
It usually represents the end of a data
structure.
Pointers
• This should start to answer how pointers are
useful beyond simple memory allocation and
data passing
Memory Allocation
• Dynamic data structures means dynamic memory
allocations (both larger and smaller) which enable
programs to hold different amounts of memory during
run-time.
• The data structure must maintain how many elements
it currently has and how to best re-allocate to reduce
calls to new and delete
– For example, often STL will resize by 2x greater than the
current capacity when it needs more memory, thereby
reducing (over time) the number of times it needs to reallocate
• However, this can be wasteful when it gets to larger and larger
sizes!
Linked Lists
• A linear collection of self-referential class
objects, called nodes, connected by pointer
links (hence the term “linked list”)
• A linked list is accessed via a pointer to that
list’s first node
– Each subsequent node is accessed via the linkpointer member stored in the previous node
– The last node points to a NULL node, indicating
the end of the list
Linked Lists
• They are dynamic in the sense that new nodes
are created as needed
• A node can contain any type of data
• This along with stacks and queues are linear
data structures, whereas trees are nonlinear
data structures
– More on these in a bit
Linked Lists
• Linked lists are advantageous to arrays when
the number of data elements to be
represented at one time is unpredictable
– The length of the list can increase/decrease as
necessary
– C++ array lengths are fixed at compile time, and
can become “full”
– Linked lists only become full if the system runs out
of memory
Linked Lists
• However, the data in a linked list is not stored
contiguously
– This means accessing arbitrary elements from a
list is not as efficient as in a vector or array
– They are accessed via pointers from the previous
element (i.e., no indices)
• The nodes are stored contiguously

H
firstPtr


D

…
Q
lastPtr
Linked Lists
• Usually we provide functions to add elements to the
front or to the back as well as remove from the front or
back
• We provide pointers (referred to as iterators) to the
beginning and end of the list and we can go through
the nodes one-by-one
• This is called a singly linked list
– Each node contains a pointer to the next node “in
sequence”
• We can also construct a circular, singly linked list
– The last node pointer is not NULL. It points back to the
first element
Linked Lists
• A doubly linked list allows traversal both forwards and
backwards
– Each node has a pointer to both the “next” and “previous”
nodes, separately
• And finally, we can construct a circular, doubly linked
list
– Same as a doubly linked list but the forward pointer of the
last node points to the first node and the backward pointer
of the first node points to the last node

12
lastPtr

firstPtr


7

…

5
Stacks
• We previously implemented a fixed-size stack
using an array
• We can also do it using a pointer-based linked-list
implementation
• A stack allows nodes to be added and removed
only from the top. It is referred to as a LIFO data
structure, for last-in first-out.
• It can be thought of as a constrained version of a
linked-list
– The link member in the last node of the stack is set to
NULL to indicate the bottom of the stack
Stacks
• The push() method inserts a new node at the
top
• The pop() method removes a node from the
top
• By using a linked-list as the implementation:
–
–
–
–
A push inserts data at the front of the list
A pop removes an element from the front of the list
Nothing else changes
Reusability!
Queues
• Similar to a stack, a queue is like a checkout line
from a supermarket. The first person on the line
is the first person processed
• Queue nodes are removed from the head (front)
of the queue and are inserted at the tail (back) of
the queue
• It is referred to as a FIFO, for first-in first out
ordering
• The insert operation is often referred to as
enqueue. The remove operation is often referred
to as dequeue.
Queues
• We can use a linked-list to implement a queue
also:
– The enqueue inserts elements at the back of the
list
– The dequeue removes elements from the front
of the list
– Nothing else changes
– Reusability!
Linear Data Structures
• Vectors are fairly straightforward, as they are
simply resizable arrays
– We’ll show some concrete examples in STL
• Let’s look at a non-linear data structure…
Trees
• A two-dimensional nonlinear data structure,
tree nodes contain 2 or more links
• In a binary tree, all nodes contain two links
– None, one or both of which may be NULL

left subtree
of node
containing B

B
root node pointer


A
C
D
right subtree of
node containing
B
Trees
• Node B is the root of the tree
• Each link in the root node refers to a child
(nodes A and D)
– The children of a node are called siblings

left subtree
of node
containing B

B
root node pointer


A
C
D
right subtree of
node containing
B
Binary Search Tree
• A binary search tree (BST) has the characteristic that the
values in any left subtree of a node are less than the value
in its parent.
– Similarly, all values in any right subtree of a node are greater
than the value in its parent
• The shape of a BST can vary depending on the order that
the data is inserted into the tree!
47
25
77
11
43
31
65
44
68
Binary Search Trees
• We could spend a few lectures on BSTs.
• They are very important for efficient searching of
values
• They represent the (provably) fastest way to
search for an element using a comparison
approach!
• There are different ways to traverse a tree to
achieve different goals, which we won’t go into
here.
• TAKE THE DATA STRUCTURES COURSE (taught in
Java)
STL
• We’ve repeatedly (hopefully!) reiterated the
importance of software reuse
• STL defines powerful, template-based
reusable software components that
implement common data structures and
algorithms
• Developed by Alexander Stepanov and Meng
Lee at Hewlett Packard and is based on
research in generic programming
STL
• There are 3 main components to STL:
– Containers: popular templatized data structures
– Iterators: like pointers
– Algorithms
STL
• Let’s define a few terms:
– A container is a holder which stores a collection of
elements. They are implemented in STL as class
templates.
– An iterator is how we reference individual elements in
containers, and they are similar (in concept) to
pointers
• However they are just another class with overloaded
operators!
• STL algorithms work on iterators, however standard arrays
can be manipulated by STL algorithms by using pointers as
iterators
STL Algorithms
• Functions that perform common data
manipulations, such as:
– Searching
– Sorting
– Comparing Elements (or entire containers)
• There are approximately 70 algorithms
available
– Most of them use iterators
Containers
Standard Library container class
Description
SEQUENCE CONTAINERS
vector
Rapid insertions and deletions at back.
Direct access to any element.
deque
Rapid insertion and deletions at front or
back. Direct access to any element.
list
Doubly-linked list, rapid insertion and
deletion anywhere.
Containers
Standard Library container class
Description
ASSOCIATIVE CONTAINERS
set
Rapid lookup, no duplicates allowed.
multiset
Rapid lookup, duplicates allowed.
map
One-to-one mapping, no duplicates allowed,
rapid key-based lookup.
multimap
One-to-many mapping, duplicates allowed,
rapid key-based lookup.
Containers
Standard Library container class
Description
CONTAINER ADAPTORS
stack
Last-in, first-out (LIFO).
queue
First-in, first-out (FIFO).
priority_queue
Highest priority element is always the first
element out.
Containers Overview
• Sequence Containers: represent linear data
structures, such as vectors and linked lists.
• Associative Containers: nonlinear containers that
typically can locate elements stored in the
containers quickly.
– These usually store sets of key/value pairs
• Container Adaptors: constrained versions of
sequential containers. STL implements these
using the sequence containers, but more
constrained in use.
Wed Reference
• Good reference for STL online:
– Containers:
http://www.cplusplus.com/reference/stl/
– Algorithms:
http://www.cplusplus.com/reference/algorithm/
Common Functions
• Most STL containers provide common functionality
• Many generic operations, such as:
– size() - how many elements in the container
– Constructors – can create empty containers or copies of
containers (copy constructor)
– empty()
– insert() – add an item to the container (behavior
changes according to data structure)
– Assignment (=)
– Comparison (<, <=, >, >=, ==, !=)
– swap() – swap the elements of two containers
STL Headers
<vector>
<list>
<deque>
<queue>
<stack>
<map>
<set>
Considerations
• When an element is inserted into a container, a
copy of that element is made!
– One of the biggest mistakes is not realizing this fact!!
– The element should provide its own copy constructor
and assignment operator
• Many associative containers require overloading
of comparison operators (==, <)
– Example: set orders elements using a binary tree. It
must be able to say one object is “less-than” another
object
– Similar for the std::sort() function
Iterators
• Many features in common with pointers
• Hold state information sensitive to particular
containers on which they operate
– Therefore, iterators are implemented appropriately to
each container type
• Certain iterator operations are uniform across
containers
– Example: the dereferencing operator (*) dereferences
an iterator like a pointer. Also the ++ operator moves
it to the next element (again, this is specific to the
container type).
• Also the -> operator is overloaded
Iterators
• STL containers usually provide begin() and
end() member functions
– begin() – returns an iterator pointing to the first
element of the container
– end() – returns an iterator pointing to the first
element past the end of the container (i.e., an
element that doesn’t exist)
• If iterator i points to a particular element, then
++i points to the “next element” in the
container
– Also, *i refers to the element pointed to by i.
Iterators
• The end() iterator is used to determine
when you’ve reached the end of the container.
– For example, to loop through the elements of a
container, you’d like to do it just like an array, but
not all containers access elements like this. So the
analog is:
for (std::map::iterator i = myMap.begin(); i != myMap.end(); i++)
{
// process elements of std::map myMap
}
Iterators
• There are two types of iterators:
– We use an object of type iterator to refer to a
container element that can be modified (readwrite)
– We use an object of type const_iterator to
refer to a container element that cannot be
modified (read-only)
Introduction to Algorithms
• STL algorithms are used generically across a
variety of STL containers
• Some examples include: inserting, deleting,
searching, sorting
– The algorithms operate on container elements
indirectly through iterators
• STL algorithms often return iterators that indicate
the results of the algorithms
– Example: std::find() locates an element and
returns an iterator to that element, or the end iterator
to indicate the element wasn’t found in the container
Algorithms
• Some common mutating-sequence algorithms, meaning
algorithms that result in modifications of the containers to
which the algorithms are applied:
– Copy – copy elements of one container, element-by-element, to
another container of the same type
– Remove – remove an element from a container
– Fill – fill all elements of the container with a single “value”
– Swap – swap elements of two containers of the same type
– Find – search for an element in a container
– Many, many, many more…
• Usually don’t have think about memory allocations or sizes.
The overloaded operators all work themselves out.
Sequence Containers
• vector, list and deque
– vector and deque based on arrays
• Vector is one of the most popular containers in STL
– Changes size dynamically
– Can be assigned to one another (unlike “raw” arrays)
– Insertion at the back is efficient, but expensive in the
middle
• Applications that require frequent insertions and
deletions at both ends normally use deque instead of
vector (more efficient)
– Frequent insertions/deletions in the middle use a list
Sequence Containers
• The front() method returns a reference to
the first element (not an iterator)
• The back() method returns a reference to
the last element (not an iterator, and not one
past the last element)
• The push_back() method adds an element
to the back of the container
• The pop_back() removes the last element
of the container
vector
• See example code of basic operations
– Size = the number of elements currently stored in the
vector
– Capacity = the number of elements that can be stored
in the container without allocating more memory
(usually double capacity when more memory is
needed)
• There is resize() and reserve() for you to control
this getting out of control manually
– See example code of element manipulation functions
list
• Let’s look at some list code
– sort arranges elements in ascending order
(different from std::sort()). You can supply a
binary predicate function to sort user-defined objects
– splice removes elements from one container and
places them into the other container before the
iterator position specified as the first argument
– merge removes all elements from one container and
inserts them in sorted order into the other container
(both lists must be sorted in the same order before
this operation is performed!)
• You can probably imagine this algorithm: it’s pretty
straightforward
deque
• Let’s look at some deque code
– Provides benefits of vector and list in one
container
Associative Containers
• STL’s associative containers provide direct access to store
and retrieve elements via keys
• The four associative containers are:
–
–
–
–
set
multiset
map
multimap
• Each container maintains keys in sorted order
– set and multiset use the values as the keys (the object
must have the comparison operator< overloaded)
– map and multimap have a std::pair<key,value> to
sort the objects
• Here, the key type must have the operator< overloaded
Associative Containers
• Let’s look at some set code
• Let’s look at some map code
Container Adaptors
• stack – implemented with a deque underneath, by default
– push() inserts elements at the top of the stack (calls
push_back() of deque)
– pop() removes elements from top of the stack (calls pop_back()
of deque)
– top() gets a reference to the element at the top of the stack (calls
back() of deque)
– empty()
– size()
• Can also choose a list or vector as the implementation:
std::stack<int> s1; // stack using deque as implementation
std::stack<int, std::vector<int> > s2; // uses vector as implementation
std::stack<int, std::list<int> > s3; // uses list as implementation
Container Adaptors
• Queue – implemented with deque, by default
– Push() inserts elements at the back of the queue
– Pop() removes elements from front of the queue
– Back() retrieves a reference to the back of the queue and
front() gets a reference to the front of the queue
• Can also choose a list as the implementation:
std::queue<double> q1; // uses deque as implementation
std::queue<double, std::list<double> > q2; // uses list implementation
Algorithms
• Let’s look at some code for some algorithms
(can’t cover them all)
– fill/generate
• fill, fill_n: set every element in a range of
container elements to a specific value
• generate, generate_n: create values for every
element in a range of container elements
Algorithms
• Let’s look at some code for some algorithms
(can’t cover them all)
– equal/mismatch
• equal: compares two sequences of values for
equality. If any value is different, false is returned (or if
they are of different length). The operator== must be
overloaded for user-defined types.
• mismatch: compares two sequences of values and
returns an std::pair of iterators indicating the
location in each sequence of the mismatched elements.
If all elements match, return the end iterators.
Algorithms
• Let’s look at some code for some algorithms (can’t
cover them all)
– Mathematical Algorithms
• random_shuffler: randomly reorders the elements in the
range from v.begin() up to, but not including, v.end() in v.
• count, count_if: counts the elements with a particular value
in the container. The second variant specifies an arbitrary function
to check a value with a condition (greater than 9)
• accumulate: sum the values in the container. The third
argument is the initial value of the total.
• for_each: apply a general function to every element of the
container, one-by-one. The function takes a single argument of
the type of the container (and may also modify it via reference)
• transform: apply a general function to every element in a
container and stores the result in another container
Algorithms
• Let’s look at some code for some algorithms
(can’t cover them all)
– find/sort/binary_search
• find, find_if: locates a particular value in a container
and returns an iterator where it is located, or end if not
found. If multiple copies, returns the first occurrence.
• sort: arranges the elements in a container in ascending
order. You may also supply a binary predicate function
which takes 2 arguments and returns a comparison b/w
them which determines the ordering.
• binary_search: searches for a value in a sorted
sequence (in ascending order). Returns a bool indicating if
value was found.
Algorithms
• Let’s look at some code for some algorithms (can’t
cover them all)
– Function Objects: most STL algorithms allow you to pass a
function pointer into the algorithm to help the algorithm
perform its task.
– STL’s designers allowed for more flexibility by allowing any
algorithm that can receive a function pointer to receive an
object of a class that overloads the parentheses
operator():
• Example: if using the binary_search algorithm the function
object must receive two arguments and return a bool.
– The advantage over function pointers is that they are
implemented as class templates. Also you can have data
members which work within the functor operator().
Algorithms
• Let’s look at some code for some algorithms
(can’t cover them all)
– Function Objects: see code example
STL function objects
Type
STL function objects
Type
divides<T>
arithmetic
logical_or<T>
logical
equal_to<T>
relational
minus<T>
arithmetic
greater<T>
relational
modulus<T>
arithmetic
greater_equal<T>
relational
negate<T>
arithmetic
less<T>
relational
not_equal_to<T>
relational
less_equal<T>
relational
plus<T>
arithmetic
logical_and<T>
logical
multiplies<T>
arithmetic
logical_not<T>
logical
More Algorithms
• There are many more STL algorithms that we can’t
cover.
• Some are mathematical, some manipulate the values
• Good textbook on STL containers and algorithms:
“Effective STL: 50 Ways to Improve Your Use of the
Standard Template Library” by Scott Meyers
• Lots of nuances to the containers and algorithms that
are important to understand
– Most are straightforward, but some use-cases require a
complete understanding of the side-effects of certain
operations.
Course Wrap-Up
• I hope you learned something about C++
• I know we went fast, but I wanted to present a full
treatment of the language.
• Please go off and play and learn more on your own
• Hopefully can appreciate the thinking that is involved with
programming in C++
– Very low level memory concepts, real-time aspects, interactions
with the OS, etc…
• C++ is a beautiful language, and even after years of using it
you will continue to learn new things about it every day.
– I certainly do!
• Thank you!