Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
10/21/2010 The Practice of Computing Using More Data Structures PYTHON William Punch Richard Enbody Chapter 8 Dictionaries and Sets • We have seen the list data structure and its uses. • We will now examine two, more advanced data structures: the set and the dictionary. • In particular, the dictionary is an important, very useful part of Python as well as generally useful to solve many problems. 1 2 Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved What is a Dictionary? • In data structure terms, a dictionary is better termed an associative array or associative list or a map. • You can think if it as a list of pairs, where the first element of the pair, the key, is used to retrieve the second element, the value. • Thus we map a key to a value. Dictionaries 3 4 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved Key Value Pairs Python Dictionary • The key acts as a “lookup” to find the associated value. • Just like a dictionary, you look up a word by its spelling to find the associated definition. • A dictionary can be searched to locate the value associated with a key. • Use the { } marker to create a dictionary • Use the : marker to indicate key:value pairs: contacts= {‘bill’: ‘353-1234’, ‘rich’: ‘269-1234’, ‘jane’: ‘3521234’} print contacts {‘jane’: ‘352-1234’, ‘bill’: ‘353-1234’, ‘rich’: ‘369-1234’} 5 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 6 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 1 10/21/2010 Keys and Values • Key must be immutable: – strings, integers, tuples are fine – lists are NOT • Value can be anything. 7 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 8 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved Collections but not a Sequence • Dictionaries are collections, but they are not sequences like lists, strings or tuples: – there is no order to the elements of a dictionary – in fact, the order (for example, when printed) might change as elements are added or deleted. Access Dictionary Elements Access requires [ ], but the key is the index! myDict={} – an empty dictionary myDict[‘bill’]=25 – added the pair ‘bill’:25 print myDict[‘bill’] – prints 25 • So how to access dictionary elements? 9 10 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved Dictionaries are Mutable Again, Common Operators • Like lists, dictionaries are a mutable data structure: – you can change the object via various operations, such as index assignment Like others, dictionaries respond to these: • len(myDict) – number of key:value pairs in the dictionary • element in myDict myDict = {‘bill’:3, ‘rich’:10} – boolean, is element a key in the dictionary print myDict[‘bill’] # prints 2 • for key in myDict: myDict[‘bill’] = 100 – iterates through the keys of a dictionary print myDict[‘bill’] # prints 100 11 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 12 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 2 10/21/2010 Lots of Methods • • • • Dictionaries are Iterable for key in myDict: print key myDict.items() – all the key/value pairs myDict.keys() – all the keys myDict.values() – all the values key in myDict – prints all the keys for key,value in myDict.items(): print key,value does the key exist in the dictionary • myDict.clear() – empty the dictionary • myDict.update(yourDict) – for each key in yourDict, updates myDict with that key/value pair – prints all the key/value pairs for value in myDict.values(): print value – prints all the values 13 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 14 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved Building Dictionaries Faster Word Frequency Gettysburg Address Code Listings 8.2-8.5 • zip creates pairs from two parallel lists: – zip(“abc”,[1,2,3]) yields [(‘a’,1),(‘b’,2),(‘c’,3)] • That’s good for building dictionaries. We call the dict function which takes a list of pairs to make a dictionary: – dict(zip(“abc”,[1,2,3])) yields – {'a': 1, 'c': 3, 'b': 2} 15 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 16 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 4 Functions Passing Mutables • addWord(word, wordDic). Add word to the dictionary. No return. • processLine(line, wordDic). Process line and identify words. Calls addWord. No return. • prettyPrint(wordDic). Nice printing of the dictionary contents. No return. • main(). Function to start the program. • Because we are passing a mutable data structure, such as a dictionary, we do not have to return the dictionary when the function ends. • If all we do is update the dictionary (change the object) then the argument will be associated with the changed object. 17 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 18 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 3 10/21/2010 def addWord(w, wcDict): '''Update the word frequency: word is the key, frequency is the value.''' if w in wcDict: wcDict[w] += 1 else: wcDict[w] = 1 import string def processLine(line, wcDict): '''Process line, lowercase words added to the dictionary.''' line = line.strip() wordList = line.split() for word in wordList: # ignore the '--' that is in the file if word != '--': word = word.lower() word = word.strip() # get commas, periods and other punctuation out as well word = word.strip(string.punctuation) addWord(word, wcDict) 19 Sorting in prettyPrint • The only built-in sort method works on lists, so if we sort we must sort a list. • For complex elements (like a tuple), the sort compares the first element of each complex element: – (1, 3) < (2, 1) – (3,0) < (1,2,3) # True # False • A list comprehension (commented out) is the equivalent of the code below it. 21 20 def prettyPrint(wcDict): '''Print nicely from highest to lowest frequency.''' # create a list of tuples, (value, key) # valKeyList = [(val,key) for key,val in d.items()] valKeyList=[] for key,val in wcDict.items(): valKeyList.append((val,key)) # sort on list's first element, here the frequency. # Reverse to get biggest first valKeyList.sort(reverse=True) print '%-10s%10s'%('Word', 'Count') print '_'*21 for val,key in valKeyList: print '%-12s %3d'%(key,val) 22 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved def main (): wcDict={} fObj = open('gettysburg.txt','r') for line in fObj: processLine(line, wcDict) print 'Length of the dictionary:',len(wcDict) prettyPrint(wcDict) Sets 23 24 4 10/21/2010 Sets, as in Mathematical Sets • In mathematics, a set is a collection of objects, potentially of many different types. • In a set, no two elements are identical. That is, a set consists of elements each of which is unique compared to the other elements. • There is no order to the elements of a set • A set with no elements is the empty set Creating a Set mySet = set(“abcd”) • The “set” keyword creates a set. • The single argument that follows must be iterable, that is, something that can be walked through one item at a time with a for. • The result is a set data structure: print mySet set(['a', 'c', 'b', 'd']) 25 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 26 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved Diverse Elements No Duplicates • A set can consist of a mixture of different types of elements: mySet = set([‘a’,1,3.14159,True]) • As long as the single argument can be iterated through, you can make a set of it. • Duplicates are automatically removed. mySet = set(“aabbccdd”) print mySet set(['a', 'c', 'b', 'd']) 27 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 28 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved Common Operators Set Operators Most data structures respond to these: • len(mySet) – the number of elements in a set • element in mySet – boolean indicating whether element is in the set • for element in mySet: – iterate through the elements in mySet • The set data structure provides some special operators that correspond to the operators you learned in middle school. • These are various combinations of set contents. • Unlike most Python names, these are full names (this changes in Python 3.x). 29 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 30 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 5 10/21/2010 Set Ops, Intersection Set Ops, Difference mySet=set(“abcd”); newSet=set(“cdef”) ab cd mySet=set(“abcd”); newSet=set(“cdef”) ab ef mySet.intersection(newSet) returns set([‘c’,’d’]) cd ef mySet.difference(newSet) returns set([‘a’,’b’]) 31 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved Set Ops, Union Set Ops, symmetricDifference mySet=set(“abcd”); newSet=set(“cdef”) ab cd 32 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved mySet=set(“abcd”); newSet=set(“cdef”) ef ab mySet.union(newSet) returns set([‘a’,’b’,‘c’,’d’,’e’,’f’]) cd ef mySet.symmetric_difference(newSet) returns set([‘a’,’b’,’e’,’f’]) 33 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 34 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved Set Ops, super and sub set Other Set Ops mySet=set(“abc”); newSet=set(“abcdef”) • mySet.add(“g”) – Adds to the set, no effect if item is in set already. • mSet.clear() – Empties the set. abc • mySet.remove(“g”) versus mySet.discard(“g”) def – remove throws an error if “g” isn’t there. discard doesn’t care. Both remove “g” from the set. mySet.issubset(newSet) returns True newSet.issuperset(mySet) returns True • mySet.copy() – Returns a shallow copy of mySet. 35 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 36 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 6 10/21/2010 Copy vs. Assignment Common/Unique Words Code Listings 8.9-8.12 mySet=set(“abc”) myCopy=mySet.copy() myRefCopy=mySet mySet.remove(‘b’) print myCopy mySet print myRefCopy set([‘a’,‘c’]) myRefCopy set([‘a’,‘b’,‘c’]) myCopy 37 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 38 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved Common words in Gettysburg Address and Declaration of Independence 4 Functions • Can reuse or only slightly modify much of the code for document frequency. • The overall outline remains much the same. • For clarity, we will ignore any word that has three characters or less (typically stop words). • addWord(word, wordSet). Add word to the set (instead of dictionary). No return. • processLine(line, wordSet). Process line and identify words. Calls addWord. No return. (No change except for parameters.) • prettyPrint(wordSet). Nice printing of the various set operations. No return. • main(). Function to start the program. 39 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 40 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved def addWord(w, theSet): '''A the word to the set. No word smaller than length 3.''' if len(w) > 3: theSet.add(w) 41 import string def processLine(line, theSet): '''Process line, lowercase words added to the set.''' line = line.strip() wordList = line.split() for word in wordList: # ignore the '--' that is in the file if word != '--': word = word.lower() word = word.strip() # get commas, periods and other punctuation out as well word = word.strip(string.punctuation) addWord(word, theSet) 42 7 10/21/2010 def main (): '''Compare the Gettysburg Address and the Declaration of Independence.''' GASet = set() DoISet = set() GAFileObj = open('gettysburg.txt') DoIFileObj = open('declOfInd.txt') for line in GAFileObj: processLine(line, GASet) for line in DoIFileObj: processLine(line,DoISet) prettyPrint(GASet, DoISet) More Complicated prettyPrint • The prettyPrint function applies the various set operators to the two resulting sets. • Prints, in particular, the intersection in a nice format. • Should this have been broken up into two functions?? 43 44 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved def prettyPrint(gaSet, doiSet): # print some stats about the two sets print 'Count of unique words of length 4 or greater' print 'Gettysburg Addr: %d, Decl of Ind: %d\n'%\ (len(gaSet),len(doiSet)) print '%15s %15s'%('Operation', 'Count') print '-'*35 print '%15s %15d'%('Union’, len(gaSet.union(doiSet))) print '%15s %15d'%('Intersection',\ len(gaSet.intersection(doiSet))) print '%15s %15d'%('Sym Difference’,\ len(gaSet.symmetric_difference(doiSet))) print '%15s %15d'%('GA-DoI', len(gaSet.difference(doiSet))) print '%15s %15d'%('DoI-GA', len(doiSet.difference(gaSet))) 45 # pretty print continued # list intersection words, 5/line, alphabetical order intersectionSet = gaSet.intersection(doiSet) wordList = list(intersectionSet) wordList.sort() print '\n Common words to both' print '-'*20 cnt = 0 for w in wordList: if cnt % 5 == 0: print print '%13s'% (w), 46 cnt += 1 OK, What is a Namespace? • We’ve had this discussion, but let’s review: More on Scope – A namespace is an association of a name and a value. – It looks like a dictionary, and for the most part it is (at least for modules and classes). 47 48 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 8 10/21/2010 Scope Multiple Scopes • What namespace you might be using is part of identifying the scope of the variables and function you are using. • By “scope”, we mean the context, the part of the code, where we can make a reference to a variable or function. • Often, there can be multiple scopes that are candidates for determining a reference. • Knowing which one is the right one (or more importantly, knowing the order of scope) is important. 49 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 50 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved Two Kinds Unqualified • Unqualified namespaces: this is what we have pretty much seen so far - functions, assignments, etc. • Qualified namespaces: these are modules and classes (we’ll talk more about this one later in the Classes section). • This is the standard assignment and def we have seen so far. • Determining the scope of a reference identifies what its true ‘value’ is. 51 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 52 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved Unqualified Follow the LEGB Rule • Local - inside the function in which it was defined. • If not there, enclosing/encompassing. Is it defined in an enclosing function? • If not there, is it defined in the global namespace? • Finally, check the built-in, defined as part of the special builtin scope. • Else, ERROR. 53 Local def myFn (myParam = 25): print myParam myFn(100) prints 100 myParam ERROR The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 54 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 9 10/21/2010 Local and Global Function Local Values myVar = 123 • If a reference is assigned in a function, then that reference is only available within that function. • If a reference with the same name is provided outside the function, the reference is reassigned. # global def myFn(myVar=456): # parameter is local yourVar = -10 # assignment local print myVar, yourVar myFn() myFn(500) myVar yourVar # prints 456 -10 # prints 500 -10 # prints 123 ERROR 55 56 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved Builtin The Global Statement • This is just the standard library of Python. • To see what is there, look at import __builtin__ dir(__builtin__) • You can “cheat” by using the global keyword in a function: myVar = 100 def myFn(): global myVar myVar = -1 myFn() print myVar # prints -1 57 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 58 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved Nested Functions • Look up through the nested functions to find a binding: def aFunc(): x = 5 def bFunc(): # x = 10 print x bFunc() aFunc() bFunc() # prints 5 ERROR, bFunc not defined 59 The Practice of Computing Using Python, Punch, Enbody, ©2011 Pearson Addison-Wesley. All rights reserved 10