Download PYTHON

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
10/21/2010
The Practice of
Computing Using
More Data Structures
PYTHON
William Punch
Richard Enbody
Chapter 8
Dictionaries
and Sets
• We have seen the list data structure and
its uses.
• We will now examine two, more advanced
data structures: the set and the dictionary.
• In particular, the dictionary is an important,
very useful part of Python as well as
generally useful to solve many problems.
1
2
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
What is a Dictionary?
• In data structure terms, a dictionary is
better termed an associative array or
associative list or a map.
• You can think if it as a list of pairs, where
the first element of the pair, the key, is
used to retrieve the second element, the
value.
• Thus we map a key to a value.
Dictionaries
3
4
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
Key Value Pairs
Python Dictionary
• The key acts as a “lookup” to find the
associated value.
• Just like a dictionary, you look up a word
by its spelling to find the associated
definition.
• A dictionary can be searched to locate the
value associated with a key.
• Use the { } marker to create a dictionary
• Use the : marker to indicate key:value
pairs:
contacts= {‘bill’: ‘353-1234’,
‘rich’: ‘269-1234’, ‘jane’: ‘3521234’}
print contacts
{‘jane’: ‘352-1234’,
‘bill’: ‘353-1234’,
‘rich’: ‘369-1234’}
5
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
6
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
1
10/21/2010
Keys and Values
• Key must be immutable:
– strings, integers, tuples are fine
– lists are NOT
• Value can be anything.
7
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
8
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
Collections but not a Sequence
• Dictionaries are collections, but they are
not sequences like lists, strings or tuples:
– there is no order to the elements of a
dictionary
– in fact, the order (for example, when printed)
might change as elements are added or
deleted.
Access Dictionary Elements
Access requires [ ], but the key is the index!
myDict={}
– an empty dictionary
myDict[‘bill’]=25
– added the pair ‘bill’:25
print myDict[‘bill’]
– prints 25
• So how to access dictionary elements?
9
10
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
Dictionaries are Mutable
Again, Common Operators
• Like lists, dictionaries are a mutable data
structure:
– you can change the object via various
operations, such as index assignment
Like others, dictionaries respond to these:
• len(myDict)
– number of key:value pairs in the dictionary
• element in myDict
myDict = {‘bill’:3, ‘rich’:10}
– boolean, is element a key in the dictionary
print myDict[‘bill’] # prints 2
• for key in myDict:
myDict[‘bill’] = 100
– iterates through the keys of a dictionary
print myDict[‘bill’] # prints 100
11
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
12
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
2
10/21/2010
Lots of Methods
•
•
•
•
Dictionaries are Iterable
for key in myDict:
print key
myDict.items() – all the key/value pairs
myDict.keys() – all the keys
myDict.values() – all the values
key in myDict
– prints all the keys
for key,value in myDict.items():
print key,value
does the key exist in the dictionary
• myDict.clear() – empty the dictionary
• myDict.update(yourDict) – for each key in
yourDict, updates myDict with that key/value
pair
– prints all the key/value pairs
for value in myDict.values():
print value
– prints all the values
13
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
14
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
Building Dictionaries Faster
Word Frequency
Gettysburg Address
Code Listings 8.2-8.5
• zip creates pairs from two parallel lists:
– zip(“abc”,[1,2,3]) yields
[(‘a’,1),(‘b’,2),(‘c’,3)]
• That’s good for building dictionaries. We
call the dict function which takes a list of
pairs to make a dictionary:
– dict(zip(“abc”,[1,2,3])) yields
– {'a': 1, 'c': 3, 'b': 2}
15
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
16
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
4 Functions
Passing Mutables
• addWord(word, wordDic). Add word to the
dictionary. No return.
• processLine(line, wordDic). Process line
and identify words. Calls addWord. No
return.
• prettyPrint(wordDic). Nice printing of the
dictionary contents. No return.
• main(). Function to start the program.
• Because we are passing a mutable data
structure, such as a dictionary, we do not
have to return the dictionary when the
function ends.
• If all we do is update the dictionary
(change the object) then the argument will
be associated with the changed object.
17
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
18
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
3
10/21/2010
def addWord(w, wcDict):
'''Update the word frequency: word is the key,
frequency is the value.'''
if w in wcDict:
wcDict[w] += 1
else:
wcDict[w] = 1
import string
def processLine(line, wcDict):
'''Process line, lowercase words added to the dictionary.'''
line = line.strip()
wordList = line.split()
for word in wordList:
# ignore the '--' that is in the file
if word != '--':
word = word.lower()
word = word.strip()
# get commas, periods and other punctuation out as well
word = word.strip(string.punctuation)
addWord(word, wcDict)
19
Sorting in prettyPrint
• The only built-in sort method works on
lists, so if we sort we must sort a list.
• For complex elements (like a tuple), the
sort compares the first element of each
complex element:
– (1, 3) < (2, 1)
– (3,0) < (1,2,3)
# True
# False
• A list comprehension (commented out) is
the equivalent of the code below it.
21
20
def prettyPrint(wcDict):
'''Print nicely from highest to lowest frequency.'''
# create a list of tuples, (value, key)
# valKeyList = [(val,key) for key,val in d.items()]
valKeyList=[]
for key,val in wcDict.items():
valKeyList.append((val,key))
# sort on list's first element, here the frequency.
# Reverse to get biggest first
valKeyList.sort(reverse=True)
print '%-10s%10s'%('Word', 'Count')
print '_'*21
for val,key in valKeyList:
print '%-12s %3d'%(key,val)
22
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
def main ():
wcDict={}
fObj = open('gettysburg.txt','r')
for line in fObj:
processLine(line, wcDict)
print 'Length of the dictionary:',len(wcDict)
prettyPrint(wcDict)
Sets
23
24
4
10/21/2010
Sets, as in Mathematical Sets
• In mathematics, a set is a collection of
objects, potentially of many different types.
• In a set, no two elements are identical.
That is, a set consists of elements each of
which is unique compared to the other
elements.
• There is no order to the elements of a set
• A set with no elements is the empty set
Creating a Set
mySet = set(“abcd”)
• The “set” keyword creates a set.
• The single argument that follows must be
iterable, that is, something that can be walked
through one item at a time with a for.
• The result is a set data structure:
print mySet
set(['a', 'c', 'b', 'd'])
25
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
26
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
Diverse Elements
No Duplicates
• A set can consist of a mixture of different
types of elements:
mySet = set([‘a’,1,3.14159,True])
• As long as the single argument can be
iterated through, you can make a set of it.
• Duplicates are automatically removed.
mySet = set(“aabbccdd”)
print mySet
set(['a', 'c', 'b', 'd'])
27
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
28
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
Common Operators
Set Operators
Most data structures respond to these:
• len(mySet)
– the number of elements in a set
• element in mySet
– boolean indicating whether element is in the
set
• for element in mySet:
– iterate through the elements in mySet
• The set data structure provides some
special operators that correspond to the
operators you learned in middle school.
• These are various combinations of set
contents.
• Unlike most Python names, these are full
names (this changes in Python 3.x).
29
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
30
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
5
10/21/2010
Set Ops, Intersection
Set Ops, Difference
mySet=set(“abcd”); newSet=set(“cdef”)
ab
cd
mySet=set(“abcd”); newSet=set(“cdef”)
ab
ef
mySet.intersection(newSet) returns
set([‘c’,’d’])
cd
ef
mySet.difference(newSet) returns
set([‘a’,’b’])
31
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
Set Ops, Union
Set Ops, symmetricDifference
mySet=set(“abcd”); newSet=set(“cdef”)
ab
cd
32
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
mySet=set(“abcd”); newSet=set(“cdef”)
ef
ab
mySet.union(newSet) returns
set([‘a’,’b’,‘c’,’d’,’e’,’f’])
cd
ef
mySet.symmetric_difference(newSet)
returns
set([‘a’,’b’,’e’,’f’])
33
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
34
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
Set Ops, super and sub set
Other Set Ops
mySet=set(“abc”); newSet=set(“abcdef”)
• mySet.add(“g”)
– Adds to the set, no effect if item is in set already.
• mSet.clear()
– Empties the set.
abc
• mySet.remove(“g”) versus
mySet.discard(“g”)
def
– remove throws an error if “g” isn’t there. discard
doesn’t care. Both remove “g” from the set.
mySet.issubset(newSet) returns True
newSet.issuperset(mySet) returns True
• mySet.copy()
– Returns a shallow copy of mySet.
35
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
36
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
6
10/21/2010
Copy vs. Assignment
Common/Unique Words
Code Listings 8.9-8.12
mySet=set(“abc”)
myCopy=mySet.copy()
myRefCopy=mySet
mySet.remove(‘b’)
print myCopy
mySet
print myRefCopy
set([‘a’,‘c’])
myRefCopy
set([‘a’,‘b’,‘c’])
myCopy
37
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
38
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
Common words in Gettysburg Address and
Declaration of Independence
4 Functions
• Can reuse or only slightly modify much of
the code for document frequency.
• The overall outline remains much the
same.
• For clarity, we will ignore any word that
has three characters or less (typically stop
words).
• addWord(word, wordSet). Add word to the
set (instead of dictionary). No return.
• processLine(line, wordSet). Process line
and identify words. Calls addWord. No
return. (No change except for parameters.)
• prettyPrint(wordSet). Nice printing of the
various set operations. No return.
• main(). Function to start the program.
39
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
40
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
def addWord(w, theSet):
'''A the word to the set. No word smaller than
length 3.'''
if len(w) > 3:
theSet.add(w)
41
import string
def processLine(line, theSet):
'''Process line, lowercase words added to the set.'''
line = line.strip()
wordList = line.split()
for word in wordList:
# ignore the '--' that is in the file
if word != '--':
word = word.lower()
word = word.strip()
# get commas, periods and other punctuation out as well
word = word.strip(string.punctuation)
addWord(word, theSet)
42
7
10/21/2010
def main ():
'''Compare the Gettysburg Address and the
Declaration of Independence.'''
GASet = set()
DoISet = set()
GAFileObj = open('gettysburg.txt')
DoIFileObj = open('declOfInd.txt')
for line in GAFileObj:
processLine(line, GASet)
for line in DoIFileObj:
processLine(line,DoISet)
prettyPrint(GASet, DoISet)
More Complicated prettyPrint
• The prettyPrint function applies the various
set operators to the two resulting sets.
• Prints, in particular, the intersection in a
nice format.
• Should this have been broken up into two
functions??
43
44
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
def prettyPrint(gaSet, doiSet):
# print some stats about the two sets
print 'Count of unique words of length 4 or greater'
print 'Gettysburg Addr: %d, Decl of Ind: %d\n'%\
(len(gaSet),len(doiSet))
print '%15s %15s'%('Operation', 'Count')
print '-'*35
print '%15s %15d'%('Union’, len(gaSet.union(doiSet)))
print '%15s %15d'%('Intersection',\
len(gaSet.intersection(doiSet)))
print '%15s %15d'%('Sym Difference’,\
len(gaSet.symmetric_difference(doiSet)))
print '%15s %15d'%('GA-DoI', len(gaSet.difference(doiSet)))
print '%15s %15d'%('DoI-GA', len(doiSet.difference(gaSet)))
45
# pretty print continued
# list intersection words, 5/line, alphabetical order
intersectionSet = gaSet.intersection(doiSet)
wordList = list(intersectionSet)
wordList.sort()
print '\n Common words to both'
print '-'*20
cnt = 0
for w in wordList:
if cnt % 5 == 0:
print
print '%13s'% (w),
46
cnt += 1
OK, What is a Namespace?
• We’ve had this discussion, but let’s review:
More on Scope
– A namespace is an association of a name and
a value.
– It looks like a dictionary, and for the most part it
is (at least for modules and classes).
47
48
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
8
10/21/2010
Scope
Multiple Scopes
• What namespace you might be using is
part of identifying the scope of the
variables and function you are using.
• By “scope”, we mean the context, the part
of the code, where we can make a
reference to a variable or function.
• Often, there can be multiple scopes that
are candidates for determining a
reference.
• Knowing which one is the right one (or
more importantly, knowing the order of
scope) is important.
49
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
50
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
Two Kinds
Unqualified
• Unqualified namespaces: this is what we
have pretty much seen so far - functions,
assignments, etc.
• Qualified namespaces: these are modules
and classes (we’ll talk more about this one
later in the Classes section).
• This is the standard assignment and def
we have seen so far.
• Determining the scope of a reference
identifies what its true ‘value’ is.
51
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
52
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
Unqualified Follow the LEGB Rule
• Local - inside the function in which it was
defined.
• If not there, enclosing/encompassing. Is it
defined in an enclosing function?
• If not there, is it defined in the global
namespace?
• Finally, check the built-in, defined as part of
the special builtin scope.
• Else, ERROR.
53
Local
def myFn (myParam = 25):
print myParam
myFn(100)
prints 100
myParam
ERROR
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
54
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
9
10/21/2010
Local and Global
Function Local Values
myVar = 123
• If a reference is assigned in a function,
then that reference is only available within
that function.
• If a reference with the same name is
provided outside the function, the
reference is reassigned.
# global
def myFn(myVar=456): # parameter is local
yourVar = -10
# assignment local
print myVar, yourVar
myFn()
myFn(500)
myVar
yourVar
# prints 456 -10
# prints 500 -10
# prints 123
ERROR
55
56
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
Builtin
The Global Statement
• This is just the standard library of Python.
• To see what is there, look at
import __builtin__
dir(__builtin__)
• You can “cheat” by using the global
keyword in a function:
myVar = 100
def myFn():
global myVar
myVar = -1
myFn()
print myVar
# prints -1
57
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
58
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
Nested Functions
• Look up through the nested functions to
find a binding:
def aFunc():
x = 5
def bFunc():
#
x = 10
print x
bFunc()
aFunc()
bFunc()
# prints 5
ERROR, bFunc not defined
59
The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved
10