Download Lecture1MRM

Document related concepts

Quadtree wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
Welcome and Introduction
15-211
Fundamental Data Structures and
Algorithms
Margaret Reid-Miller
and Aleks Nanevski
January 13, 2004
What is this course about?
What is this course about?
 How to solve computing problems.
Problem analysis, to abstract away details
and divide into smaller subproblems.
Mathematical foundations for precise
formulations of problems and solutions.
Data structures and algorithms to solve
problems correctly and efficiently.
Java programming and modular software
construction for good implementations.
An Example
Generating mazes
 We want to write a program to
generate mazes.
 We want every maze to be solvable.
Generating mazes, cont’d
 Also: We want mazes to be fun, i.e.,
 We want maze solutions to be unique.
 We want every “room” to be reachable.
How should we think about this?
Let’s hack!
2am in the WeH cluster…
No, let’s think!
Thinking about the problem
 Think about a grid of rooms
separated by walls.
 Each room can be given a name.
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
Randomly
knock out
walls until
we get a
good maze.
Mathematical formulation
 A set of rooms:
 {a, b, c, d, e, f, g, h,
i, j, k, l, m, n, o, p}
 Pairs of adjacent
rooms that have
an open wall
between them.
 For example, (a,b)
and (g,k) are pairs.
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
Graphs
 Abstractly, this is a mathematical
structure called a graph.
Informally, a graph consists of a set of
nodes and a set of edges.
Mazes as graphs
 A maze is a graph where
each node represents a room, and
an edge from node x to node y
indicates that rooms x and y are
adjacent and there is no wall in
between them.
 A part of this course will be about
graph algorithms.
Why math?
 Q: Why is it useful to formulate the
problem so that mazes are graphs?
 A: Data structures are typically defined as
mathematical structures.
 A: Mathematics can be used to reason
about the correctness and efficiency of
data structures and algorithms.
 A: Mathematical structures make it easier
to think — to abstract away from
unnecessary details and avoid “hacking”.
Mazes as graphs
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
{(a,b), (b,c),
(a,e), (e,i), (i,j),
(f,j), (f,g), (g,h),
(d,h), (g,k),
(m,n), (n,o),
(k,o), (o,p), (l,p)}
Mazes as graphs
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
{(a,b), (b,c),
(a,e), (e,i), (i,j),
(f,j), (f,g), (g,h),
(d,h), (g,k),
(m,n), (n,o),
(k,o), (o,p), (l,p)}
Unique solutions
 What property must
the graph have for
the maze to have a
solution?
 A path from (a) to (p).
 What property must
it have for the
maze to have a
unique solution?
 Graph must be a tree.
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
Mazes as trees
a
 Informally, a tree
is a graph:
e
 with a unique root
node, and
i
 Why is it good to
have a spanning
tree?
c
j
 each node having a
unique parent.
 A spanning tree is
a tree that includes
all of the nodes.
b
Trees have
no cycles!
f
g
h
k
d
o
n
m
p
l
What is this course about?
1. Thinking about
problems abstractly.
Algorithm
 Now that we have a data structure in
mind, we can think about the
algorithm.
 Essentially:
Randomly pick a wall and delete it (add
it to the tree) if it won’t create a cycle.
Stop when a spanning tree has been
created.
This is Kruskal’s Algorithm.
Creating a spanning tree
 When adding a wall to the tree, how
do we detect that it won’t create a
cycle?
When adding wall (x,y), we want to
know if there is already a path from x
to y in the tree.
 In fact, there is a fast algorithm for
doing exactly this, called union-find.
Using the union-find algorithm
 We put rooms into
an equivalence
class if there is a
path connecting
them.
 Before adding an
edge (x,y) to the
tree, make sure
that x and y are
not in the same
equivalence class.
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
Partiallyconstructed maze
How fast?
 Is this a fast way to generate mazes?
How much time will it take to generate a
maze?
What do we mean by “fast”?
 In addition to finding the right
algorithms, analyzing the
performance of algorithms will be a
major part of this course.
Performance and Scaling

Suppose we have three algorithms to
choose from.
 Which one to select?

Systematic analysis reveals performance
characteristics.

For a problem of size n (i.e., detecting
cycles out of n nodes).
1. 100n sec
2. 7n2 sec
3. 2n sec
Performance and Scaling
100n sec
7n2 sec
2n sec
1
100 s
7 s
2 s
5
.5 ms
175 s
32 s
10
1 ms
.7 ms
1 ms
45
4.5 ms
14 ms
1 year
…
…
…
n
100
1,000
10,000
1,000,000
Performance and Scaling
100n sec
7n2 sec
2n sec
1
100 s
7 s
2 s
5
.5 ms
175 s
32 s
10
1 ms
.7 ms
1 ms
45
4.5 ms
14 ms
1 year
…
…
…
n
100
1,000
10,000
1,000,000
What?! One year?
 210 = 1,024
1024 sec  1 ms
 245 = 35,184,372,088,832
3.5 * 1013 sec
=

3.5 * 107 sec
1.1 year
Performance and Scaling
100n sec
7n2 sec
2n sec
1
100 s
7 s
2 s
5
.5 ms
175 s
32 s
10
1 ms
.7 ms
1 ms
45
4.5 ms
14 ms
1 year
100
100 ms
7 sec
1016 year
1 sec
12 min
--
10 sec
20 hr
--
1.6 min
.22 year
--
n
1,000
10,000
1,000,000
What is this course about?
2. Selecting good data
structures and fast
algorithms.
Modular design
 By thinking about the problem, we
have strong hints about the structure
of our program
Grids, Graphs (with edges and nodes),
Spanning trees, Union-find.
 With disciplined programming, we can
write our program to reflect this
structure.
 Modular programs are usually easier
to get right and easier to understand.
Modular design
Maze
Grid
Graph
MazeDraw
SpanningTree
Reusable
components
UnionFind
Is it correct?
 How will we know if we implemented
our solution correctly?
What do we mean by “correct”?
Will it generate the right answers?
Will it terminate?
 We will spend some time in this
course on techniques for proving the
correctness of programs.
What is this course about?
3. Implementing
programs that are
understandable and
correct.
Instant Quiz
Instant quiz: Two questions
a
 Given that we are
using trees to
represent mazes,
what would be a
simple algorithm for
solving mazes?
 Besides insisting on
spanning trees, what
other criteria might
one want in order to
improve the “quality”
or “fun factor” of the
mazes?
e
b
i
c
j
f
g
h
k
d
o
n
m
p
l
What is this course about?
The science in CS
 Not “hacking”, but:
Thinking about problems abstractly.
Selecting good data structures and
obtaining correct and fast algorithms.
Implementing programs that are
understandable and correct.
How the Course Works
Course staff
 A complete listing of the course staff
(including instructors and teaching
assistants) will be available on the
course Blackboard site.
 http://www.cmu.edu/blackboard
Prerequisites
 Basic programming skills.
Provided by 15-111 or 15-200.
 Basic discrete math and logic.
Provided by 21-127.
 This course is not about Java
programming, but about how to
solve problems on a computer.
How will 15-211 be different?
 Faster pace.
Lots of reading, thinking, programming.
 Assumption of mature work habits.
Starting early, overcoming obstacles.
 Larger homework assignments.
Typically requiring several days of work.
 More open-ended problems to solve.
Sometimes no single “right” answer.
Textbook
 Mark Allen Weiss, Data Structures &
Algorithm Analysis in Java.
 http://www.cs.fiu. edu/~weiss
 Addison-Wesley, 2002.
Recitations
 Starts tomorrow!
 Questions are answered, new material is
covered, and homeworks might be
graded.
 Some TAs will insist that you go to
recitation in order to get a grade on your
homeworks!
 Switching sections is possible.
 But the sections must be balanced.
 Talk to the TAs – they are in charge.
Recitations and your TA
 Recitation sessions are also the place
for you to get to know your TA…
 …and for your TA to get to know you
 Your TA will be responsible for
assigning almost all of your grades
for this course!
Go to Recitation!!!
Handouts and information
 Everything on Blackboard (Bb) system.
http://www.cmu.edu/blackboard
User id: your Andrew id.
Password: your Andrew password.
The Blackboard (Bb) system
 Check Bb frequently for
Announcements.
Course policies.
 You must read the cheating policy.
Contact information.
Course schedule and calendar.
Discussion bboard.
Requirements
 Participation in lectures and
recitations.
 Reading assignments.
 Homework assignments.
 Online quizzes.
 In-class and final exams.
Grades
 1 “warmup”.
 39 points.
 3 small homeworks.
 50 points each.
 3 large homeworks.
 100 points each.
 3 quizzes.
 20 points each.
 Midterm exam.
 125 points.
 Final exam.
 275 points.
 TA discretion.
 50 points.
Homework assignments
 Goal: Reinforce and apply what is
taught in class.
 Early homeworks will be small
individual programming assignments.
 Later homeworks will be much larger
assignments, working in groups of
two.
What kind of Assignments?
 How do we write our own little search
engine?
 How do we compress a file?
 lossless (winzip)
 lossy (jpeg)
 How do we design and program a game?
 How to analyze the game board?
 How to pick the best move?
Homework administration
 Electronic handout (on Bb), handin,
& handback.
 All due on Mondays 11:59pm.
 Cluster hours on some evenings, to
be announced.
 Up to 24 hours late for -50%.
No handins accepted after 24 hours.
If you need an extension
 DO NOT SEND EMAIL!
All emails on this topic will be ignored.
 You must see an instructor face-toface and explain the reasons.
 We will be reasonable.
If you need an extension
DO NOT SEND EMAIL!
Homework assignment 0
 HW0 is available now!
Download and get started today.
 HW0 due on Monday at 11:59pm!
Handin of a working assignment will get
the full 40 points credit.
Gets you started with the software tools
used in this course.
Software tools
 We recommend the use of standard
tools.
Sun’s Java development kit (J2SE 1.4.1).
Emacs text editor.
Unix (Linux, Solaris, MacOS X, Cygwin).
Web-based handin system.
Cheating
 There is no need to cheat in this course.
 If you are in trouble, come and talk to us.
 We are here to help you.
 We expect you to know what is useful
collaboration and what is cheating.
 We will be using advanced data-mining
tools to monitor the originality of all
programs that are handed in
Original code
 For programming assignments, you
will be required to hand in your own
original code
 You may discuss your homeworks
with others, but if doing so impairs
your ability to write truly original
code, then you are cheating
 More on this as the semester goes
on…
Consequences of cheating
 If you cheat, you should come to us
immediately for help.
We will try to be lenient.
 If we catch you, you will either
lose at least double credit, or
fail the course.
 Especially bad cases will be officially
reported to the Associate Dean.
Homework discipline
 Start early.
 Most assignments will require more
than 1 evening to complete.
Larger assignments will require several
days of group work.
 Remember:
Don’t start by sitting at a blank screen.
Humans will be trying to understand
your code when grading it.
Summary
 Go to recitation
 Check out the Blackboard site
 And welcome to the course!