Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Object-Oriented Programming in Python Goldwasser and Letscher Chapter 12 More Python Containers Terry Scott University of Northern Colorado 2007 Prentice Hall Introduction: What is Covered in Chapter 12 • • • • • • • Lists and Tuples. Dictionaries. Containers of Containers. Sets. Arrays. Python’s internal use of dictionaries. Case Study: a simple search engine. 2 Aspects of Containers • • • • order: ordered sequence (list, tuple, array). mutability: list is mutable. tuple is immutable. associativity: dict (dictionary) heterogeneity: most Python containers allow different types. Some containers in other languages and one that will be introduced in this chapter only allow one type for the individual components. This is called homogeneous • Storage – Python uses referential containers meaning that rather than the elements in the actual container, there are references (addresses) to the actual items. 3 Summary of Aspects of Python Containers list tuple dict set Ordered X Mutable X X Compact storage X X Associative Heterogeneous frozenset Array X X X X X X X X X 4 lists and tuples • Use indexes that are sequential. • Can add or subtract values to make index values start at 0 as these containers require (example months 1 – 12 : subtract 1). • Limitations: – Lack of permanency: employee id’s – what happens when an employee leaves. – Using Social Security Numbers for employees. SSN's are not sequential. – Maybe the index values are not numeric. 5 Dictionaries • Can use non-numeric index values. – director['Star Wars'] 'George Lucas' – director['The Godfather'] 'Francis Ford Coppola' – director['American Graffiti'] 'George Lucas' • Index values are called keys. • Keys must be immutable (int, str, tuple) • Dictionaries point to items such as 'George Lucas' and are called values. • No limits on values. 6 Python’s Dictionary Class director = { } # can also use director = dict() director['Star Wars'] = 'George Lucas' director['The Godfather']='Francis Ford Coppola' director[‘American Graffiti'] = 'George Lucas' director['Princess Bride'] = 'Rob Reiner' #can also do the following director ={'Star Wars': 'George Lucas', 'The Godfather': 'Francis Ford Coppola‘, 'American Graffiti' : 'George Lucas' , 'Princess Bride' : 'Rob Reiner' } 7 Dictionary Behaviors Syntax d[k] Semantics Returns value at key k, error if k not found. d[k] = value Assigns value to d at key value k. k in d True if k is a member of d else False. len(d) Returns the number of items in d. d.clear() Make d empty. d.pop(k) Remove key k and return value at k to caller. 8 Dictionary Behaviors Syntax Semantics d.popitem() Removes and returns arbitary key value pair. d.keys( ) Returns a list of all keys in d. d.values( ) Returns a list of all values in d. d.Items() Returns a list of tuples. for k in d Iterate over all keys short for: for k in d.keys() 9 Iterating Through Entries of a Dictionary #display a dictionary in sorted order on keys titles = director.keys() titles.sort() for movie in titles: print movie, 'was direct by', director[movie] #can streamline this syntax for movie in sorted(director): print movie, 'was directed by', director[movie] #can iterate over both keys and values for movie,person in director.items(): print movie, 'was directed by', person 10 Containers of Containers • list, tuple and dict can have values of any type including containers. • Modeling a two dimensional table (a list of lists): game = [['X','X','O'], ['O','O','X'], ['X','O','X ']] • bottomLeft = game[2][0] X X O O O X X O X 11 Tic-tac-toe Representation 12 Modeling Many-to-Many Relationships • Dictionary is a many-to-one relationship. Many keys may map to the same value. • Modeling many-to-many relationship. For example a single movie may have many actors. cast['The Princess Bride '] = ('Cary Elwes', 'Robin Wright Penn', 'Chris Sarandon', 'Mandy Patinkin', 'Andre the Giant',. . .) 13 Modeling Many-to-Many Relationships (continued) >>> #Tuple used since immutable and cast for >>> #movies is also. >>> 'Andre the Giant' in cast False #must refer to specific key value in cast >>> 'Andre the Giant' in cast.values() False #must refer to results not keys >>> 'Andre the Giant' in cast['The Princess Bride'] True 14 Reverse Dictionary • What if we want to know the keys that are associated with an item? • Can do this one at a time or build a complete reverse dictionary. original = {'A':1, 'B':3, 'C':3, 'D':4, 'E': 1, 'F': 3} reverse = {1: ['A', 'E'], 3: ['C', 'B', 'F'], 4: ['D'] } 15 Reverse Dictionary Code #can build a reverse dictionary def buildReverse(dictionary): reverse = { } for key,value in dictionary.items(): if value in reverse: reverse[value].append(key) else: reverse[value] = [key] return reverse 16 Sets and Frozensets • Originally programmers created a mathematical set class using a list or dictionary. • Python now contains a set class that is mutable and a frozenset class that is immutable. • Elements within the set are in arbitrary order. • Elements added to set must be immutable. • Elements of the set only occur once. • set constructor can be: – myset = set( ) – mysetb = set(container) #container can be any #immutable container 17 Set Accessor Methods: Standard Container Methods Syntax Semantics len(s) Returns the cardinality of set s. v in s Returns True if v is in set s, else False. Obviously does opposite of command above. Iterates over all values in set s in arbitrary order. v not in s for v in s 18 Set Accessor Methods: Comparing Two Sets Syntax Semantics s == t Returns True if s and t have identical elements, else False. Returns True if s is a proper subset of t, else False. Returns True if set s is a subset of t, else False. s<t s <= t s.issubset(t) s >= t Returns True if t is a proper subset of s, else False. s >= t Returns True if set t is a subset of s, else s.issuperset(t) False. 19 Set Accessor Methods: Creating a Third Set on Existing Sets Syntax Semantics s|t s.union(t) s&t s.Intersection(t) Returns a new set of elements in both sets but with no repeats. s–t s.difference(t) Returns a new set of elements that are in set s but not in set t. Returns a new set of elements that are in both sets s and t. s^t Returns a new set of elements s.symmetric_difference(t) that are in either set s or t but not both. s XOR t. 20 Set Mutator Methods Syntax Semantics s.add(v) Adds value v to set s. No effect if v is already in s. s.discard(v) Removes v from set s. No effect if v is not in s. s.remove(v) Removes v from set s if present. Else it raises KeyError. s.pop() Removes and returns arbitrary value from set s. s.clear() Removes all entries from the set s. 21 Set Mutator Methods (continued) Syntax Semantics s |= t, s.update(t) Alters set s making it the union of sets s and t. s &= t, s.intersection_update(t) Alters set s by making it the intersection of sets s and t. Alters set s making it s – t. s -= t s.difference_update(t) s ^= t Alters set s making it the s.symmetric_difference_update(t) XOR of s and t. 22 Set Operation Examples >>> set ([3, 2]) < set([1,2,3]) True >>> set ([3, 2]) < set([2,3]) False >>> set ([7]) < set([1,2,3]) False >>> set ([3, 2]) <= set([2,3]) True 23 Frozensets • A frozenset is immutable. • A frozenset can perform all accessor methods previously listed for a set but none of the mutator methods. • All elements of a set or frozenset must be immutable. Therefore a set can not consist of a set of sets but it could be a set of frozensets. • Dictionary keys recall can only be immutable therefore a frozenset can be a key of a dictionary. 24 Sets and Frozensets • Two sets operated on results in a set. • Two frozensets operated on results in a frozenset. • A set and a frozenset results in the type that occurs first in the operation. 25 Illustrating Set and Frozenset Operations >>> colors = set(['red', 'green', 'blue']) >>> stoplight = frozenset(['green', 'yellow', 'red']) >>> print colors & stoplight set(['green', 'red']) >>> print stoplight & colors frozenset(['green', 'red']) 26 Lists Versus Arrays • A list consists of a list of references to items. • Advantage is that a list can consist of different types. This is called a heterogeneous type. • Disadvantage is slightly slower access since it requires two accesses to retrieve an item (once to get the reference and once to retrieve the value from the reference). 27 Lists Versus Arrays (continued) • In contrast the array type consists of the actual items. • This is necessarily a homogenous data structure. • It of course has slightly faster access than a list since only one access is required to retrieve a value. 28 Illustrating Differences In How Information is Stored in Arrays and Lists 29 Using Arrays • • • • array('i') - stores integers. array('f') – stores floats. Other codes are possible. Array is not part of built-in types in Python so must import it using: from array import array. • Can also initialize array at same time as creating. >>> from array import array >>> yearArray = array('i', [1776, 1789, 1917,1979]) 30 Python's Internal Use Of Dictionaries • Python uses it's dictionary to keep track of namespaces. • Namespaces consist of identifiers in a particular scope. Scope is where a variable is known. • Python has a top-level scope called global. • The namespace inside of a function is called a local scope. 31 Python's Namespaces • globals() – function that displays identifiers in global namespace. • locals() – function that displays identifiers in local namespace. • When in a function the local namespace can be accessed. If an identifier is given that does not exist in the local namespace then the global scope is accessed. 32 Code Showing Global and Local Namespaces def demo(param): x = len(param) print 'local dictionary is', locals() print 'global dictionary is', globals() y = 'hello' demo(y) #output local dictionary is ('x': 10, 'param': 'hello') global dictionary is {(will list built-ins and) 'y': 'hello', 'demo': <function demo at 0x37d270>} 33 Global and Local Scopes • Referring to the previous slide notice that y does not exist in the local scope but we could print it out within the function. • Using variables in the global scope (called a global variable) is considered to be almost always bad programming style. • Assigning to a global variable that is immutable inside a function will not change its value in the global scope. • Assigning to a global variable that is mutable inside a function will change its value in the global scope since it uses a reference value. 34 Built-in Module • Python has a special module __builtins__ • Some words are reserved such as for. These have a predefined meaning and can not be used as an identifier for other situations. • Other words have a predefined meaning but can be used as an identifier in other situations. • These other words such as abs or True are contained in the __builtins__ module. >>> True = 5 >>> #No longer can True be used in the usual way >>> #for the current Python section. • The above is correct because the definition 'True = 5' will be found before checking for the definition 'True' in the __builtins__ module. 35 Importing Modules • One form for importing is: >>> from math import * • This brings in all identifiers from math. This is said to import the namespace. • This may be a problem when some identifier in the imported namespace covers over a wanted identifier. 36 Importing Modules (continue) • It is better to explicitly import the wanted methods shown below >>> from math import sqrt, pi • Another way to do this is >>> import math • To use the sqrt in this case is to do the following >>> math.sqrt(5) 37 Object-Oriented Name Resolution • Except for primitive types such as int, str the state of individual object is stored in an instance-level dictionary. • vars() retrieves identifiers and their values. >>> myTV = Television() >>> print vars(myTV) {'_prevChan': 2, '_volume': 5, '_channel': 2, '_powerOn': False, '_muted': False} 38 Object-Oriented Name Resolution (continued) • Notice it does not contain class level identifiers such as volumeUp( ) • This can be accessed using: >>> vars(Television) • When searching for an identifier Python first looks in the instance namespace, then to the class namespace then if inheritance is involved it moves up the inheritance chain. • This explains why a child class overriding a method in a base class works, since the child class namespace is looked at first before the parent class namespace. 39 A Simple Search Engine • Searching for words in a corpus (multiple documents). • Similar to generating an index of a book. • Can use split( ) to delineate words by white-space. • Still may have extra characters. Want (Text) to reduce to text and "document's" to reduce to document's. – Strip off non-alphabetic characters at start. – Strip off non-alphabetic characters at end. – Make lower case. 40 Code to Reduce to Actual Word def ourStrip(w): first = 0 #find first desirable letter while first < len(w) and not w[first].isalpha(): first += 1 last = len(w) – 1 #find last desirable letter while last > first and not w[last].isalpha(): last -= 1 return w[first:last+1].lower() #lower case letters 41 Code to Create an Index for a Single Document class TextIndex: #manage index of words def __init__(self, contents, sourceLabel): #make new index self._lines = contents.split('\n') self._label= sourceLabel self._wordAt = { } #build a new dictionary: has line numbers for linenum in range(len(self._lines)): #where word occurs words = self._lines[linenum].split() for w in words: # step through words w = ourStrip(w) #use our strip method if w: #w is not null if w not in self._wordAt: #word not in dictionary self._wordAt[w] = [ linenum ] #add to dictionary elif self._wordAt[w][-1] != linenum: #is linenum there self._wordAt[w].append(linenum) #if not add line 42 Code to Create anIndex for a Single Document (continued) def getLabel(self): #return label for document return self._label def getWords(self): #return unordered words list return self._wordAt.keys() def getContext(self, word, maxOccur = 10):#returns words word = ourStrip(word) #occurrences in context output = [ ] if word in self._wordAt: startContext = max(lineNum – 1, 0) stopContext = min(lineNum + 2, len(self._lines)) output.append('-' * 40) output.extend(self._lines[startContext:stopContext]) return'\n'.join(output) 43 Code for Search Engine Corpus class Engine: #supports word occurrences in a collection def __init__(self): #of text documents. self.corpus = { } #maps label to index self._hasWord = { } #maps words to a label def addDocument(self, contents, sourceLabel): if sourceLabel not in self._corpus: newIndex = TextIndex(contents, sourceLabel) self._corpus[sourceLabel] = newIndex for word in newIndex.getWords(): if word in self._hasWord: self._hasWord[word].add(sourceLabel) else: self._hasWord[word] = set([sourceLabel]) 44 Code for Search Engine Corpus (continued) def lookup(self, term): """Return set of labels for documents containing search term""" term = ourStrip(term) if term in self._hasWord: return set (self._hasWord[term]) else: return set() def getContext(term, docLabel, maxOccur = 10): #search for word in a document returning its context return self._corpus[docLabel].getContext(term,maxOccur ) 45 Code for Search Engine Corpus (continued) def makeReport(self, term,maxDocuments=10,maxContext=3): output = [ ] #produce formatted report of occurrences sources = self.lookup(term) #of term in documents. num = min(len(sources), maxDocuments) label = list(sources)[:num] for docLabel in labels: output.append('Document: ' + docLabel) context=self._corpus[docLabel].getContext(term,max Context) output.append(context) output.append(' =' * 40) return '\n'.join(output) 46 Unit Test for Simple Search Engine • See book page 429-430 for unit test for the simple search engine. 47