Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Using Python for CGI programming Guido van Rossum CNRI (Corporation for National Research Initiatives, Reston, Virginia, USA) [email protected] www.python.org 11/12/1999 © 1999 CNRI, Guido van Rossum 1 Overview • 1 minute advocacy • 30 minutes basic Python tutorial • 30 minutes on Python CGI programming • 30 minutes CGI case study: FAQ wizard • Spanish Inquisition 11/12/1999 © 1999 CNRI, Guido van Rossum 2 Why Python? • Have your cake and eat it, too: Productivity and readable code • VHLLs will gain on system languages (John Ousterhout) • "Life is better without braces" (Bruce Eckel) 11/12/1999 © 1999 CNRI, Guido van Rossum 3 Basic Python tutorial 11/12/1999 © 1999 CNRI, Guido van Rossum 4 Tutorial outline • interactive "shell" • basic types: numbers, strings • container types: lists, dictionaries, tuples • variables • control structures • functions & procedures • classes & instances • modules & packages • exceptions • files & standard library 11/12/1999 © 1999 CNRI, Guido van Rossum 5 Interactive “shell” • Great for learning the language • Great for experimenting with the library • Great for testing your own modules • Type statements or expressions at prompt: >>> print "Hello, world" Hello, world >>> x = 12**2 >>> x/2 72 >>> # this is a comment 11/12/1999 © 1999 CNRI, Guido van Rossum 6 Numbers • The usual suspects • 12, 3.14, 0xFF, 0377, (-1+2)*3/4**5, abs(x), 0<x<=5 • C-style shifting & masking • 1<<16, x&0xff, x|1, ~x, x^y • Integer division truncates :-( • 1/2 -> 0 # float(1)/2 -> 0.5 • Long (arbitrary precision), complex • 2L**100 -> 1267650600228229401496703205376L • 1j**2 -> (-1+0j) 11/12/1999 © 1999 CNRI, Guido van Rossum 7 Strings • "hello"+"world" "helloworld" • "hello"*3 "hellohellohello" # repetition • "hello"[0] "h" # indexing • "hello"[-1] "o" # (from end) • "hello"[1:4] "ell" # slicing • len("hello") 5 # size • "hello" < "jello" 1 # comparison • "e" in "hello" 1 # concatenation # search • "escapes: \n etc, \033 etc, \xff etc" • 'single quotes' '''triple quotes''' r"raw strings" 11/12/1999 © 1999 CNRI, Guido van Rossum 8 Lists • a = [99, "bottles of beer", ["on", "the", "wall"]] • Flexible arrays, not Lisp-like linked lists • Same operators as for strings • a+b, a*3, a[0], a[-1], a[1:], len(a) • Item and slice assignment • a[0] = 98 • a[1:2] = ["bottles", "of", "beer"] -> [98, "bottles", "of", "beer", ["on", "the", "wall"]] • del a[-1] 11/12/1999 # -> [98, "bottles", "of", "beer"] © 1999 CNRI, Guido van Rossum 9 More list operations >>> a = range(5) # [0,1,2,3,4] >>> a.append(5) # [0,1,2,3,4,5] >>> a.pop() # [0,1,2,3,4] 5 >>> a.insert(0, 5.5) # [5.5,0,1,2,3,4] >>> a.pop(0) # [0,1,2,3,4] 5.5 >>> a.reverse() # [4,3,2,1,0] >>> a.sort() # [0,1,2,3,4] 11/12/1999 © 1999 CNRI, Guido van Rossum 10 Dictionaries • Hash tables, "associative arrays" • d = {"duck": "eend", "water": "water"} • Lookup: • d["duck"] -> "eend" • d["back"] # raises KeyError exception • Delete, insert, overwrite: • del d["water"] # {"duck": "eend", "back": "rug"} • d["back"] = "rug" # {"duck": "eend", "back": "rug"} • d["duck"] = "duik" # {"duck": "duik", "back": "rug"} 11/12/1999 © 1999 CNRI, Guido van Rossum 11 More dictionary ops • Keys, values, items: • d.keys() -> ["duck", "back"] • d.values() -> ["duik", "rug"] • d.items() -> [("duck","duik"), ("back","rug")] • Presence check: • d.has_key("duck") -> 1; d.has_key("spam") -> 0 • Values of any type; keys almost any • {"name":"Guido", "age":43, ("hello","world"):1, 42:"yes", "flag": ["red","white","blue"]} 11/12/1999 © 1999 CNRI, Guido van Rossum 12 Dictionary details • Keys must be immutable: – numbers, strings, tuples of immutables • these cannot be changed after creation – reason is hashing (fast lookup technique) – not lists or other dictionaries • these types of objects can be changed "in place" – no restrictions on values • Keys will be listed in arbitrary order – again, because of hashing 11/12/1999 © 1999 CNRI, Guido van Rossum 13 Tuples • key = (lastname, firstname) • point = x, y, z # paren’s optional • x, y, z = point • lastname = key[0] • singleton = (1,) # trailing comma! • empty = () # parentheses! • tuples vs. lists; tuples immutable 11/12/1999 © 1999 CNRI, Guido van Rossum 14 Variables • No need to declare • Need to assign (initialize) • use of uninitialized variable raises exception • Not typed if friendly: greeting = "hello world" else: greeting = 12**2 print greeting • Everything is a variable: • functions, modules, classes 11/12/1999 © 1999 CNRI, Guido van Rossum 15 Reference semantics • Assignment manipulates references • x = y does not make a copy of y • x = y makes x reference the object y references • Very useful; but beware! • Example: >>> a = [1, 2, 3]; b = a >>> a.append(4); print b [1, 2, 3, 4] 11/12/1999 © 1999 CNRI, Guido van Rossum 16 Changing a shared list a = [1, 2, 3] a 1 2 3 1 2 3 1 2 3 a b=a b a a.append(4) 4 b 11/12/1999 © 1999 CNRI, Guido van Rossum 17 Changing an integer a=1 a 1 a b=a 1 b a new int object created by add operator (1+1) 2 a = a+1 b 11/12/1999 1 © 1999 CNRI, Guido van Rossum old reference deleted by assignment (a=...) 18 Control structures if condition: while condition: statements statements [elif condition: statements] ... else: for var in sequence: statements statements break continue 11/12/1999 © 1999 CNRI, Guido van Rossum 19 Grouping indentation In Python: In C: for i in range(20): for (i = 0; i < 20; i++) if i%3 == 0: { print i if (i%3 == 0) { if i%5 == 0: printf("%d\n", i); print "Bingo!" if (i%5 == 0) { print "---" printf("Bingo!\n"); } } printf("---\n"); } 11/12/1999 © 1999 CNRI, Guido van Rossum 0 Bingo! ------3 ------6 ------9 ------12 ------15 Bingo! ------18 ----- 20 Functions, procedures def name(arg1, arg2, ...): "documentation" # optional statements return # from procedure return expression # from function 11/12/1999 © 1999 CNRI, Guido van Rossum 21 Example function def gcd(a, b): "greatest common divisor" while a != 0: a, b = b%a, a # parallel assignment return b >>> gcd.__doc__ 'greatest common divisor' >>> gcd(12, 20) 4 11/12/1999 © 1999 CNRI, Guido van Rossum 22 Classes class name: "documentation" statements -orclass name(baseclass1, baseclass2, ...): ... Typically, statements contains method definitions: def name(self, arg1, arg2, ...): ... May also contain class variable assignments 11/12/1999 © 1999 CNRI, Guido van Rossum 23 Example class class Stack: "A well-known data structure…" def __init__(self): # constructor self.items = [] def push(self, x): self.items.append(x) # the sky is the limit def pop(self): x = self.items[-1] # what happens if it’s empty? del self.items[-1] return x def empty(self): return len(self.items) == 0 11/12/1999 # Boolean result © 1999 CNRI, Guido van Rossum 24 Using classes • To create an instance, simply call the class object: x = Stack() # no 'new' operator! • To use methods of the instance, call using dot notation: x.empty() # -> 1 x.push(1) x.empty() # [1] # -> 0 x.push("hello") x.pop() # [1, "hello"] # -> "hello" # [1] • To inspect instance variables, use dot notation: x.items 11/12/1999 # -> [1] © 1999 CNRI, Guido van Rossum 25 Subclassing class FancyStack(Stack): "stack with added ability to inspect inferior stack items" def peek(self, n): "peek(0) returns top; peek(-1) returns item below that; etc." size = len(self.items) assert 0 <= n < size # test precondition return self.items[size-1-n] 11/12/1999 © 1999 CNRI, Guido van Rossum 26 Subclassing (2) class LimitedStack(FancyStack): "fancy stack with limit on stack size" def __init__(self, limit): self.limit = limit FancyStack.__init__(self) # base class constructor def push(self, x): assert len(self.items) < self.limit FancyStack.push(self, x) 11/12/1999 # "super" method call © 1999 CNRI, Guido van Rossum 27 Class & instance variables class Connection: verbose = 0 # class variable def __init__(self, host): self.host = host # instance variable def debug(self, v): self.verbose = v # make instance variable! def connect(self): if self.verbose: # class or instance variable? print "connecting to", self.host 11/12/1999 © 1999 CNRI, Guido van Rossum 28 Instance variable rules • On use via instance (self.x), search order: – (1) instance, (2) class, (3) base classes – this also works for method lookup • On assigment via instance (self.x = ...): – always makes an instance variable • Class variables "default" for instance variables • But...! – mutable class variable: one copy shared by all – mutable instance variable: each instance its own 11/12/1999 © 1999 CNRI, Guido van Rossum 29 Modules • Collection of stuff in foo.py file – functions, classes, variables • Importing modules: – import string; print string.join(L) – from string import join; print join(L) • Rename after import: – import string; s = string; del string 11/12/1999 © 1999 CNRI, Guido van Rossum 30 Packages • Collection of modules in directory • Must have __init__.py file • May contain subpackages • Import syntax: – from P.Q.M import foo; print foo() – from P.Q import M; print M.foo() – import P.Q.M; print P.Q.M.foo() 11/12/1999 © 1999 CNRI, Guido van Rossum 31 Catching exceptions def foo(x): return 1.0/x def bar(x): try: print foo(x) except ZeroDivisionError, message: print "Can’t divide by zero:", message bar(0) 11/12/1999 © 1999 CNRI, Guido van Rossum 32 Try-finally: cleanup f = open(file) try: process_file(f) finally: f.close() # always executed print "OK" # executed on success only 11/12/1999 © 1999 CNRI, Guido van Rossum 33 Raising exceptions • raise IndexError • raise IndexError("k out of range") • raise IndexError, "k out of range" • try: something except: # catch everything print "Oops" raise # reraise 11/12/1999 © 1999 CNRI, Guido van Rossum 34 More on exceptions • User-defined exceptions – subclass Exception or any other standard exception • Old Python: exceptions can be strings – WATCH OUT: compared by object identity, not == • Last caught exception info: – sys.exc_info() == (exc_type, exc_value, exc_traceback) • Last uncaught exception (traceback printed): – sys.last_type, sys.last_value, sys.last_traceback • Printing exceptions: traceback module 11/12/1999 © 1999 CNRI, Guido van Rossum 35 File objects • f = open(filename[, mode[, buffersize]) – mode can be "r", "w", "a" (like C stdio); default "r" – append "b" for text translation mode – append "+" for read/write open – buffersize: 0=unbuffered; 1=line-buffered; buffered • methods: – read([nbytes]), readline(), readlines() – write(string), writelines(list) – seek(pos[, how]), tell() – fileno(), flush(), close() 11/12/1999 © 1999 CNRI, Guido van Rossum 36 Standard library • Core: – os, sys, string, getopt, StringIO, struct, pickle, ... • Regular expressions: – re module; Perl-5 style patterns and matching rules • Internet: – socket, rfc822, httplib, htmllib, ftplib, smtplib, ... • Miscellaneous: – pdb (debugger), profile+pstats – Tkinter (Tcl/Tk interface), audio, *dbm, ... 11/12/1999 © 1999 CNRI, Guido van Rossum 37 Python CGI programming 11/12/1999 © 1999 CNRI, Guido van Rossum 38 Outline • HTML forms • Basic CGI usage • Setting up a debugging framework • Security • Handling persistent data • Locking • Sessions • Cookies • File upload • Generating HTML • Performance 11/12/1999 © 1999 CNRI, Guido van Rossum 39 A typical HTML form <form method="POST" action="http://host.com/cgi-bin/test.py"> <p>Your first name: <input type="text" name="firstname"> <p>Your last name: <input type="text" name="lastname"> <p>Click here to submit form: <input type="submit" value="Yeah!"> <input type="hidden" name="session" value="1f9a2"> </form> 11/12/1999 © 1999 CNRI, Guido van Rossum 40 A typical CGI script #!/usr/local/bin/python import cgi def main(): print "Content-type: text/html\n" form = cgi.FieldStorage() # parse query if form.has_key("firstname") and form["firstname"].value != "": print "<h1>Hello", form["firstname"].value, "</h1>" else: print "<h1>Error! Please enter first name.</h1>" main() 11/12/1999 © 1999 CNRI, Guido van Rossum 41 CGI script structure • Check form fields – use cgi.FieldStorage class to parse query • takes care of decoding, handles GET and POST • "foo=ab+cd%21ef&bar=spam" --> {'foo': 'ab cd!ef', 'bar': 'spam'} # (well, actually, ...) • Perform action – this is up to you! – database interfaces available • Generate HTTP + HTML output – print statements are simplest – template solutions available 11/12/1999 © 1999 CNRI, Guido van Rossum 42 Structure refinement form = cgi.FieldStorage() if not form: ...display blank form... elif ...valid form...: ...perform action, display results (or next form)... else: ...display error message (maybe repeating form)... 11/12/1999 © 1999 CNRI, Guido van Rossum 43 FieldStorage details • Behaves like a dictionary: – .keys(), .has_key() # but not others! – dictionary-like object ("mapping") • Items – values are MiniFieldStorage instances • .value gives field value! – if multiple values: list of MiniFieldStorage instances • if type(...) == types.ListType: ... – may also be FieldStorage instances • used for file upload (test .file attribute) 11/12/1999 © 1999 CNRI, Guido van Rossum 44 Other CGI niceties • cgi.escape(s) – translate "<", "&", ">" to "<", "&", ">" • cgi.parse_qs(string, keep_blank_values=0) – parse query string to dictionary {"foo": ["bar"], ...} • cgi.parse([file], ...) – ditto, takes query string from default locations • urllib.quote(s), urllib.unquote(s) – convert between "~" and "%7e" (etc.) • urllib.urlencode(dict) – convert dictionary {"foo": "bar", ...} to query string "foo=bar&..." # note asymmetry with parse_qs() above 11/12/1999 © 1999 CNRI, Guido van Rossum 45 Dealing with bugs • Things go wrong, you get a traceback... • By default, tracebacks usually go to the server's error_log file... • Printing a traceback to stdout is tricky – could happen before "Content-type" is printed – could happen in the middle of HTML markup – could contain markup itself • What's needed is a... 11/12/1999 © 1999 CNRI, Guido van Rossum 46 Debugging framework import cgi def main(): print "Content-type: text/html\n" # Do this first try: import worker # module that does the real work except: print "<!-- --><hr><h1>Oops. An error occurred.</h1>" cgi.print_exception() # Prints traceback, safely main() 11/12/1999 © 1999 CNRI, Guido van Rossum 47 Security notes • Watch out when passing fields to the shell – e.g. os.popen("finger %s" % form["user"].value) – what if the value is "; cat /etc/passwd" ... • Solutions: – Quote: • user = pipes.quote(form["user"].value) – Refuse: • if not re.match(r"^\w+$", user): ...error... – Sanitize: • user = re.sub(r"\W", "", form["user"].value) 11/12/1999 © 1999 CNRI, Guido van Rossum 48 Using persistent data • Store/update data: – In plain files (simplest) • FAQ wizard uses this – In a (g)dbm file (better performance) • string keys, string values – In a "shelf" (stores objects) • avoids parsing/unparsing the values – In a real database (if you must) • 3rd party database extensions available • not my field of expertise 11/12/1999 © 1999 CNRI, Guido van Rossum 49 Plain files key = ...username, or session key, or whatever... try: f = open(key, "r") data = f.read() # read previous data f.close() except IOError: data = "" # no file yet: provide initial data data = update(data, form) # do whatever must be done f = open(key, "w") f.write(data) # write new data f.close() # (could delete the file instead if updated data is empty) 11/12/1999 © 1999 CNRI, Guido van Rossum 50 (G)DBM files # better performance if there are many records import gdbm key = ...username, or session key, or whatever... db = gdbm.open("DATABASE", "w") # open for reading+writing if db.has_key(key): data = db[key] # read previous data else: data = "" # provide initial data data = update(data, form) db[key] = data # write new data db.close() 11/12/1999 © 1999 CNRI, Guido van Rossum 51 Shelves # a shelf is a (g)dbm files that stores pickled Python objects import shelve class UserData: ... key = ...username, or session key, or whatever... db = shelve.open("DATABASE", "w") # open for reading+writing if db.has_key(key): data = db[key] # an object! else: data = UserData(key) # create a new instance data.update(form) db[key] = data db.close() 11/12/1999 © 1999 CNRI, Guido van Rossum 52 Locking • (G)DBM files and shelves are not protected against concurrent updates! • Multiple readers, single writer usually OK – simplest approach: only lock when writing • Good filesystem-based locking is hard – no cross-platform solutions – unpleasant facts of life: • processes sometimes die without unlocking • processes sometimes take longer than expected • NFS semantics 11/12/1999 © 1999 CNRI, Guido van Rossum 53 A simple lock solution import os, time def unlock(self): assert self.locked class Lock: self.locked = 0 os.rmdir(self.filename) def __init__(self, filename): self.filename = filename # auto-unlock when lock object is deleted self.locked = 0 def __del__(self): if self.locked: def lock(self): self.unlock() assert not self.locked while 1: try: # for a big production with timeouts, os.mkdir(self.filename) # see the Mailman source code (LockFile.py); self.locked = 1 # it works on all Unixes and supports NFS; return # but not on Windows, # or break except os.error, err: # and the code is very complex... time.sleep(1) 11/12/1999 © 1999 CNRI, Guido van Rossum 54 Sessions • How to correlate requests from same user? – Assign session key on first contact – Incorporate session key in form or in URL – In form: use hidden input field: • <input type="hidden" name="session" value="1f9a2"> – In URL: • http://myhost.com/cgi-bin/myprog.py/1f9a2 • passed in environment (os.environ[...]): – PATH_INFO=/1f9a2 – PATH_TRANSLATED=<rootdir>/1f9a2 11/12/1999 © 1999 CNRI, Guido van Rossum 55 Cookies • How to correlate sessions from the same user? – Store "cookie" in browser • controversial, but useful – Module: Cookie.py (Tim O'Malley) • writes "Set-Cookie" headers • parses HTTP_COOKIE environment variable – Note: using cookies affects our debug framework • cookies must be printed as part of HTTP headers • cheapest solution: – move printing of blank line into worker module – (and into exception handler of debug framework) 11/12/1999 © 1999 CNRI, Guido van Rossum 56 Cookie example import os, cgi, Cookie c["user"] = user c = Cookie.Cookie() print c try: c.load(os.environ["HTTP_COOKIE"]) except KeyError: pass form = cgi.FieldStorage() try: user = form["user"].value except KeyError: try: user = c["user"].value except KeyError: user = "nobody" 11/12/1999 print """ <form action="/cgi-bin/test.py" method="get"> <input type="text" name="user" value="%s"> </form> """ % cgi.escape(user) # debug: show the cookie header we wrote print "<pre>" print cgi.escape(str(c)) print "</pre>" © 1999 CNRI, Guido van Rossum 57 File upload example import cgi form = cgi.FieldStorage() if not form: print """ <form action="/cgi-bin/test.py" method="POST" enctype="multipart/form-data"> <input type="file" name="filename"> <input type="submit"> </form> """ elif form.has_key("filename"): item = form["filename"] if item.file: data = item.file.read() # read contents of file print cgi.escape(data) # rather dumb action 11/12/1999 © 1999 CNRI, Guido van Rossum 58 Generating HTML • HTMLgen (Robin Friedrich) http://starship.python.net/crew/friedrich/HTMLgen/html/main.html >>> print H(1, "Chapter One") <H1>Chapter One</H1> >>> print A("http://www.python.org/", "Home page") <A HREF="http://www.python.org/">Home page</A> >>> # etc. (tables, forms, the works) • HTMLcreate (Laurence Tratt) http://www.spods.dcs.kcl.ac.uk/~laurie/comp/python/htmlcreate/ • not accessible at this time 11/12/1999 © 1999 CNRI, Guido van Rossum 59 CGI performance • What causes slow response? – One process per CGI invocation • process creation (fork+exec) • Python interpreter startup time • importing library modules (somewhat fixable) – Connecting to a database! • this can be the killer if you use a real database – Your code? • probably not the bottleneck! 11/12/1999 © 1999 CNRI, Guido van Rossum 60 Avoiding fork() • Python in Apache (mod_pyapache) • problems: stability; internal design • advantage: CGI compatible • may work if CGI scripts are simple and trusted • doesn't avoid database connection delay • Use Python as webserver • slow for static content (use different port) • advantage: total control; session state is easy • FastCGI, HTTPDAPI etc. • ZOPE 11/12/1999 © 1999 CNRI, Guido van Rossum 61 ZOPE • Z Object Publishing Environment – http://www.zope.org – complete dynamic website management tool • written in cross-platform Python; Open Source – http://host/path/to/object?size=5&type=spam • calls path.to.object(size=5, type="spam") – DTML: templatized HTML (embedded Python code) – ZOBD (Z Object DataBase; stores Python objects) • transactionsm selective undo, etc. – etc., etc. 11/12/1999 © 1999 CNRI, Guido van Rossum 62 Case study 11/12/1999 © 1999 CNRI, Guido van Rossum 63 FAQ wizard • Tools/faqwiz/faqwiz.py in Python distribution • http://www.python.org /cgi-bin/faqw.py 11/12/1999 © 1999 CNRI, Guido van Rossum 64 faqw.py - bootstrap import os, sys try: FAQDIR = "/usr/people/guido/python/FAQ" SRCDIR = "/usr/people/guido/python/src/Tools/faqwiz" os.chdir(FAQDIR) sys.path.insert(0, SRCDIR) import faqwiz except SystemExit, n: sys.exit(n) except: t, v, tb = sys.exc_type, sys.exc_value, sys.exc_traceback print import cgi cgi.print_exception(t, v, tb) 11/12/1999 © 1999 CNRI, Guido van Rossum 65 faqwiz.py - main code class FaqWizard: def go(self): print 'Content-type: text/html' def __init__(self): req = self.ui.req or 'home' self.ui = UserInput() mname = 'do_%s' % req self.dir = FaqDir() try: meth = getattr(self, mname) def do_home(self): self.prologue(T_HOME) emit(HOME) except AttributeError: self.error("Bad request type %s." % `req`) else: try: def do_search(self): ... def do_index(self): ... def do_roulette(self): ... def do_show(self): ... def do_edit(self): ... def do_review(self): ... def do_help(self): ... meth() except InvalidFile, exc: self.error("Invalid entry file name %s" % exc.file) except NoSuchFile, exc: self.error("No entry with file name %s" % exc.file) except NoSuchSection, exc: self.error("No section number %s" % exc.section) ...etc... self.epilogue() 11/12/1999 © 1999 CNRI, Guido van Rossum 66 Example: do_roulette() def do_roulette(self): import random files = self.dir.list() if not files: self.error("No entries.") return file = random.choice(files) self.prologue(T_ROULETTE) emit(ROULETTE) self.dir.show(file) 11/12/1999 © 1999 CNRI, Guido van Rossum 67 Persistency • All data stored in files (faqNN.MMM.htp) • Backed up by RCS files (RCS/faqNN.MMM.htp,v) – RCS logs and diffs viewable • RCS commands invoked with os.system() or os.popen() • search implemented by opening and reading each file • NO LOCKING! – infrequent updates expected • in practice, one person makes most updates :-) – one historic case of two users adding an entry to the same section at the same time; one got an error back – not generally recommended 11/12/1999 © 1999 CNRI, Guido van Rossum 68 faqconf.py, faqcust.py • faqconf.py defines named string constants for every bit of output generated by faqwiz.py – designed for customization (e.g. i18n) – so you can customize your own faq wizard – e.g. OWNEREMAIL = "[email protected]" – this includes the list of sections in your faq :-( • faqcust.py defines overrides for faqconf.py – so you don't need to edit faqwiz.py • to make it easier to upgrade to newer faqwiz version 11/12/1999 © 1999 CNRI, Guido van Rossum 69 Webchecker • Tools/webchecker/webchecker.py in Python distribution • Not a CGI application but a web client application – while still pages to do: • request page via http • parse html, collecting links – pages once requested won't be requested again – links outside original tree treated as leaves • existence checked but links not followed – reports on bad links • what the bad URL is • on which page(s) it is referenced – could extend for other reporting 11/12/1999 © 1999 CNRI, Guido van Rossum 70 Reference URLs • Python websites – http://www.python.org (official site) – http://starship.python.net (community) • Python web programming topic guide – http://www.python.org/topics/web/ • These slides on the web (soon) – http://www.python.org/doc/essays/ppt/sd99east.ppt 11/12/1999 © 1999 CNRI, Guido van Rossum 71 Reference books • http://www.python.org/psa/bookstore/ • 1996 – Programming Python (Lutz) – [Internet Programming with Python (Watters e.a.)] • 1998 – Python Pocket Reference (Lutz) • 1999 – Learning Python (Lutz, Ascher) – Python: Essential Reference (Beazley) – Quick Python Book (Harms, McDonald) • Expected 1999/2000 – Win 32, Tkinter, teach-yourself-in-24-hrs, annotated archives, ... 11/12/1999 © 1999 CNRI, Guido van Rossum 72 Any questions? 11/12/1999 © 1999 CNRI, Guido van Rossum 73 Nobody expects the Spanish Inquisition! 11/12/1999 © 1999 CNRI, Guido van Rossum 74