Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CS3101 Python Lecture 3 Agenda • • • • • • Scoping Documentation, Coding Practices, Pydoc Functions • • Named, optional, and arbitrary arguments Generators and Iterators Functional programming tools • lambda, map, filter, reduce Regular expressions Homework 3 Extra credit solution to HW1 •Dynamic programming: 15+- lines –determining whether a bill of N dollars is satisfiable – – – – resolves to whether you can satisfy a bill of N – J dollars where J is an item on your menu Create an empty list (knapsack) with N+1 entries Base case: we know we can satisfy a bill of 0 dollars For each item on your menu For index = 0 to N + 1 • If knapsack[index] is empty, and knapsack[index – item’s cost] is not: • We now know how to satisfy this bill, so append the current item to a solution list which lives at knapsack[index] Homework 3, Exercise 1 •Requirements: – 1. Write a program using – – – regular expressions retrieve the current weather from a website of your choosing. Just the temperature is OK. 2. Use that information to suggest a sport to play. ./sport.py It’s 36 degrees today. You should ski! http://www.nytimes.com/weather Homework 3, Exercise 2 •Requirements: – a) Write a program which – – – uses regular expressions and URLLIB to print the address of each image on Columbia’s homepage (www.columbia.edu) b) Use regular expressions to print the title of each of the news stories on the main page ./news.py ./images.py Scoping • • • Local Namespaces / Local scope • A functions parameters and variables that are bound within the function Module scope • Variables outside functions at the module level are global Hiding • Inner scope wins: if a name conflict occurs between a local variable and a global one, the local one takes precedence Global statement • • Local scope wins by default If within a function you must refer to a global variable of the same name, redeclare it first with the global keyword • • • ‘global identifiers’, where identifiers contains one or more IDs separated by commas Never use global if your function just accesses a global variable, only if it rebinds it Global in general is poor style, it breaks encapsulation, but you’ll see it out there Closures and Nested Scope • Using a def statement with another functions body defines a nester or inner function • • • The parent function is referred to as a the outer Nested functions may access outer functions parameters - but not rebind them This trick can be used to form closures as we’ll see in lambdas Closures • • • This example adopted from Python in a Nutshell def make_adder(augend): • • def add(addend): • return addend+augent return add Calling make_adder(7) returns a function that accepts a single argument and adds seven to it Namespace resolution • • • • • • Name resolutions proceeds as follows Local scope (i.e., this function) Outer scope (i.e., enclosing functions) Module level scope (i.e., global variables) Built in scope (i.e., predefined python keywords) A word to the wise - do not name your variables when there is a danger of conflicting with modules your may import • E.g., ‘open = 5’ is dangerous if you’re using file objects, later use of the open method might not resolve where you expect! Documentation and Pydoc def complex(real=0.0, imag=0.0): """Form a complex number. Keyword arguments: real -- the real part (default 0.0) imag -- the imaginary part (default 0.0) """ if imag == 0.0 and real == 0.0: return complex_zero ... • • String literal beginning method, class, or module: • One sentence concise summary, followed by a blank, followed by detail. References • http://www.python.org/dev/peps/pep-0257/ • • • • • Code is read MANY more times than it is written Trust me, it’s worth it First line should be a concise and descriptive statement of purpose Self documentation is good, but do not repeat the method name! (e.g., def setToolTip(text) #sets the tool tip) Next paragraph should describe the method and any side effects Then arguments Python’s thoughts on documentation • • A Foolish Consistency is the Hobgoblin of Little Minds http://www.python.org/dev/peps/pep-0008/ Functions, returning multiple values • • • • Functions can return multiple values (of arbitrary type), just separate them by commas Always reminded me of MATLAB def foo(): • return [1,2,3], 4, (5,6) myList, myInt, myTuple = foo() A word on mutable arguments • • Be cautious when passing mutable data structures (lists, dictionaries) to methods especially if they’re sourced from modules that are not your own When in doubt, either copy or cast them as tuples Semantics of argument passing • • • • • Recall that while functions can not rebind arguments, they can alter mutable types Positional arguments Named arguments Special forms *(sequence) and ** (dictionary) Sequence: • • • • zero or more positional followed by zero or more named zero or 1 * zero or 1 ** Positional arguments • • • • • • def myFunction(arg1, arg2, arg3, arg4, arg5, arg6): • ..... Potential for typos Readability declines Maintenance a headache Frequent headache in Java / C (I’m sure we can all recall some monster functions written by colleagues / fellow students) We can do better Named arguments • • • Syntax identifier = expression Named arguments specified in the function declaration optional arguments, the expression is their default value if not provided by the calling function Two forms • • • 1) you may name arguments passed to functions even if they are listed positionally 2) you may name arguments within a functions declaration to supply default values Outstanding for self documentation! Named argument example • • def add(a, b): • return a + b Equivilent calls: • • • print add(4,2) print add(a=4, b=2) print add(b=2, a=4) Default argument example • • • • def add(a=4, b=2): • return a+b print add(b=4) print add(a=2, b=4) print add(4, b=2) Sequence arguments • • • • Sequence treats additional arguments as iterable positional arguments def sum(*args): #equivilent to return sum(args) • • • sum = 0 for arg in args: • sum += arg return sum Valid calls: • • • sum(4,2,1,3) sum(1) sum(1,23,4,423,234) Sequences of named arguments • • **dct must be a dictionary whose keys are all strings, values of course are arbitrary each items key is the parameter name, the value is the argument # ** # collects keyword # arguments into a dictionary def foo(**args): print args foo(homer=‘donut’,\ lisa = ‘tofu’) {'homer': 'donut', 'lisa': 'tofu'} Optional arguments are everywhere # three ways to call the range function # up to range(5) [0, 1, 2, 3, 4] # from, up to range(-5, 5) [-5, -4, -3, -2, -1, 0, 1, 2, 3, 4] # from, up to , step range(-5, 5, 2) [-5, -3, -1, 1, 3] Arbitrary arguments example • • • • • We can envision a max function pretty easily # idea: max2(1,5,3,1) # >>> 5 # idea: max2(‘a’, ‘b’, ‘c’, ‘d’, ‘e’) # >>> e • def max2(*args): • for arg in args… Arbitrary arguments example • def max1(*args): • best = args[0] • for arg in args[1:]: • if arg > best: • best = arg • return best • def max2(*args): • return sorted(args)[0] Argument matching rules • General rule: more complicated to the right • For both calling and definitional code: • All positional arguments must appear first – Followed by all keyword arguments • Followed by the * form – And finally ** Functions as arguments • Of course we can pass functions are arguments as well def myCompare(x, y): –… sorted([5, 3, 1, 9], cmp=myCompare) Lambdas • The closer you can get to mathematics the more elegant your programs become • In addition to the def statement, Python provides an expression which in-lines a function – similar to LISP • Instead of assigning a name to a function, lambda just returns the function itself – anonymous functions When should you use Lambda • Lambda is designed for handling simple functions – Conciseness: lambdas can live places def’s cannot (inside a list literal, or a function call itself) – Elegance • Limitations – Not as general as a def – limited to a single expression, there is only so much you can squeeze in without using blocks of code • Use def for larger tasks • Do not sacrifice readability – More important that your work is a) correct and b) efficient w.r.t. to people hours Quick examples • Arguments work just like functions – including defaults, *, and ** • The lambda expression returns a function, so you can assign a name to it if you wish foo = (lambda a, b=“simpson”: a + “ “ + b) foo(“lisa”) lisa simpson foo(“bart”) bart simpson More examples • # Embedding lambdas in a list • myList = [(lambda x: x**2), (lambda x: x**3)] • for func in myList: – print func(2) • 4 • 8 • • • • • • # Embedding lambdas in a dictionary donuts = {'homer' : (lambda x: x * 4), 'lisa' : (lambda x: x * 0)} Donuts[‘homer’](2) 8 Donuts[‘lisa’](2) 0 Multiple arguments • (lambda x, y: x + " likes " + y)('homer', 'donuts') • 'homer likes donuts‘ State • def remember(x): – return (lambda y: x + y) • • • • • foo = remember(5) print foo <function <lambda> at 0x01514970> foo(2) 7 Maps • One of the most common tasks with lists is to apply an operation to each element in the sequence # w/o maps donuts = [1,2,3,4] myDonuts = [] for d in donuts: myDonuts.append(d * 2) print myDonuts [2, 4, 6, 8] # w maps def more(d): return d * 2 myDonuts = map(more, donuts) print myDonuts [2, 4, 6, 8] Map using Lambdas donuts = [1,2,3,4] • • • • def more(d): return d * 3 myDonuts = map(more, donuts) print myDonuts [3, 6, 9, 12] myDonuts = map((lambda d: d * 3), donuts) print myDonuts [3, 6, 9, 12] More maps • # map is smart • # understands functions requiring multiple arguments • # operates over sequences in parallel • pow(2, 3) • 8 • map(pow, [2, 4, 6], [1, 2, 3]) • [2, 16, 216] • map((lambda x,y: x + " likes " + y),\ Functional programming tools: Filter and reduce • Theme of functional programming – apply functions to sequences • Relatives of map: – filter and reduce • Filter: – filters out items relative to a test function • Reduce: – Applies functions to pairs of items and running results Filter • range(-5, 5) • [-5, -4, -3, -2, -1, 0, 1, 2, 3, 4] • def isEven(x): return x % 2 == 0 • filter ((isEven, range(-5,5)) • [-4, -2, 0, 2, 4] • filter((lambda x: x % 2 == 0), range(-5, 5)) • [-4, -2, 0, 2, 4] Reduce • A bit more complicated • By default the first argument is used to initialize the tally • def reduce(fn, seq): – tally = seq[0] – For next in seq: • tally = fn(tally, next) – return tally • FYI More functional tools available reduce((lambda x, y: x + y), \ [1,2,3,4]) 10 import operator reduce(operator.add, [1, 2, 3]) 6 List comprehensions revisited: combining filter and map • • • • # Say we wanted to collect the squares of the even numbers below 11 # Using a list comprehension [x ** 2 for x in range(11) if x % 2 == 0] [0, 4, 16, 36, 64, 100] • • • #Using map and filter map((lambda x: x ** 2), filter((lambda x: x % 2 == 0), range(11))) [0, 4, 16, 36, 64, 100] • • • # Easier way, this uses the optional stepping argument in range [x ** 2 for x in range(0,11,2)] [0, 4, 16, 36, 64, 100] Reading files with list comprehensions • • • • # old way lines = open(‘simpsons.csv’).readlines() [‘homer,donut\n’, ‘lisa,brocolli\n’] for line in lines: – line = line.strip()… • # with a comprehension • [line.strip() for line in open(‘simpsons.csv’).readlines()] • [‘homer,donut’, ‘lisa,brocolli’] • # with a lambda • map((lambda line: \ • line.strip(), open(‘simpsons.csv’).readlines()) Generators and Iterators • Generators are like normal functions in most respects but they automatically implement the iteration protocol to return a sequence of values over time • Consider a generator when – you need to compute a series of values lazily • Generators – Save you the work of saving state – Automatically save theirs when yield is called • Easy – Just use “yield” instead of “return” Quick example • def genSquares(n): • for i in range(N): • yield i ** 2 • print gen • <generator object at 0x01524BE8> • for i in genSquares(5): – print i, “then”, • 0 then 1 then 4 then 9 then 16 then Error handling preview • • • • • def gen(): i=0 while i < 5: i+=1 yield i ** 2 • • • • • • • • • • x = gen() x.next() >> > 1 x.next() >>> 4 … Traceback (most recent call last): File "<pyshell#110>", line 1, in <module> x.next() StopIteration try: x.next() except StopIteration: print "done” 5 Minute Exercise • Begin writing a generator produce primes • Start with 0 • When you find a prime, yield (return) that value • Write code to call your generator def genPrimes(): …. yield prime def main() g = genPrimes() while True: print g.next() Regular Expressions • A regular expression (re) is a string that represents a pattern. • Idea is to check any string with the pattern to see if it matches, and if so – where • REs may be compiled or used on the fly • You may use REs to match, search, substitute, or split strings • Very powerful – a bit of a bear syntactically Quick examples: Match vs. Search • • • • import re p = re.compile('[a-z]+') m = pattern.match('donut') print m.group(), m.start(), m.end() • donut 0 5 • m = pattern.search('12 donuts are • \ better than 1') • print m.group(), m.span() • donuts (3, 9) m = pattern.match(‘ \ 12 donuts are better \ than 1') if m: print m.group() else: print "no match“ no match Quick examples: Multiple hits • • • • import re p = re.compile('\d+\sdonuts') print p.findall('homer has 4 donuts, bart has 2 donuts') ['4 donuts', '2 donuts'] • • • • • import re p = re.compile('\d+\sdonuts') iterator = p.finditer('99 donuts on the shelf, 98 donuts on the shelf...') for match in iterator: print match.group(), match.span() • • 99 donuts (0, 9) 98 donuts (24, 33) Re Patterns 1 Pattern Matches . Matches any character ^ Matches the start of the string $ Matches the end of the string * Matches zero or more cases of the previous RE (greedy – match as many as possible) + Matches one or more cases of the previous RE (greedy) ? Matches zero or one case of the previous RE *?, +? Non greedy versions (match as few as possible) . Matches any character Re Patterns 2 Pattern Matches \d, \D Matches one digit [0-9] or non-digit [^0-9] \s, \S Matches whitespace [\t\n\r\f\v] or non-whitespace \w, \W Matches one alphanumeric char – (understands Unicode and various locales if set) \b, \B Matches an empty string, but only at the start or end of a word \Z Matches an empty string at the end of a whole string \\ Matches on backslash {m,n} Matches m to n cases of the previous RE […] Matches any one of a set of characters | Matches either the preceding or following expression (…) Matches the RE within the parenthesis Gotchas • RE punctuation is backwards – “.” matches any character when unescaped, or an actual “.” when in the form “\.” – “+” and “*” carry regular expression meaning unless escaped Quick examples * vs. +. \b .* vs .+ • The pattern \b • The pattern – ‘Homer.*Simpson’ will match: – r’\bHomer\b’ will find a hit searching – Homer – Homer Simpson • HomerSimpson • Homer Simpson • Homer Jay Simpson • The pattern • The pattern – ‘Homer.+Simpson’ will match: – r’\bHomer’ will find a hit searching – HomerJaySimpson • Homer Simpson • Homer Jay Simpson Sets of chars: [] • Sets of characters are denoted by listing the characters within brackets • [abc] will match one of a, b, or c • Ranges are supported • [0-9] will match one digit • You may include special sets within brackets – Such as \s for a whitespace character or \d for a digit p = re.compile('[HJ]') iterator=p.finditer(“\ HomerJaySimpson") for match in iterator: print match.group(), \ match.span() H (0, 1) J (5, 6) Alternatives: |s • A vertical bar matches a pattern on either side import re p = re.compile(‘Homer|Simpson') iterator=p.finditer(“HomerJaySimpson") for match in iterator: print match.group(), match.span() Homer (0, 5) aco (8, 12) RE Substitution • • • • • • • import re line = 'Hello World!' r = re.compile('world', re.IGNORECASE) m = r.search(line) >>> World print r.sub('Mars!!', line, 1) >>> Hello Mars!!! RE Splitting • • • • • • import re line = 'lots 42 of random 12 digits 77' r = re.compile('\d+') l = r.split(line) print l >>> ['lots ', ' of random ', ' digits ', ''] Groups: () • Frequently you need to obtain more information than just whether the RE matched or not. • Regular expressions are often used to dissect strings by writing a RE divided into several subgroups which match different components of interest. p = re.compile('(homer\s(jay))\ssimpson') m = p.match('homer jay simpson') print m.group(0) print m.group(1) print m.group(2) homer jay simpson homer jay jay Putting it all together (and optional flags) • • • • import re r = re.compile('simpson', re.IGNORECASE) print r.search("HomerJaySimpson").group() simpson • r = re.compile('([[A-Z][a-z]+).*?(\d+$)', re.MULTILINE) • • • iterator = r.finditer('Homer is 42\nMaggie is 6\nBart is 12') for match in iterator: print match.group(1), "was born", match.group(2), "years ago“ • • • Homer was born 42 years ago Maggie was born 6 years ago Bart was born 12 years ago Discussion: Who can explain how this RE works? Finding tags within HTML • • • • • • • import re line = '<tag>my eyes! the goggles do \ nothing.</tag>' r = re.compile('<tag>(.*)</tag>') m = r.search(line) print m.group(1) >>> my eyes! the goggles do nothing. 5 Minute Exercise • Download the Columbia homepage to disk • Open it with python • Use regular expressions to being extracting the news import re line = '<tag>my eyes! the goggles do \ nothing.</tag>' r= re.compile('<tag>(.*)</tag >') m = r.search(line) print m.group(1) >>> my eyes! the goggles do nothing.