Download Intro to Python and scripting

Python Scripting languages “are computer programing languages that are typically interpreted and can be typed directly from the keyboard. Thus scripts are often distinguished from programs, because programs are converted permanently into binary executable files (i.e., zeros and ones) before they are run.” Wikipedia Scripting languages I’ve used bash, csh, ksh, zsh, tcsh,sh, ARexx, AppleScript,cmd.exe, COMMAND.COM, Automator, ActionScript, Emacs Lisp, VBScript, PHP, JavaScript,VBScript, AWK,Perl,sed, Lisp, Ruby, Scheme, Tcl 1/4 of what is available Advantages of scripting languages • Less overhead to do simple tasks • Some things are easier to do in a scripting language than a compiled code • You can provide additional functionality to an existing code Uses echo “hello world” print STDERR “hello world\n” print “hello world” Uses • Parsing a log file • Running numerous jobs with parameter variations • Inversion using files Why python • Unlike perl, usually one way to do things • Easier to understand someone elses code • Has a strong numerics community (some of the features of matlab) • Help functionality • Object oriented Hello world #!/usr/bin/env python print “Hello world” Import #!/usr/bin/env python import sys print sys.argv[1] ./myprog.py “Hello world” hello world • An extensive list of modules add additional functionality • You use the import command to access the module content Import pydoc sys Help on built-in module sys: NAME sys • You can use the pydoc command to print out information about the module • Close to automatic ... later FILE (built-in) DESCRIPTION This module provides access to some objects used or maintained by the interpreter and to functions that interact strongly with the interpreter. Dynamic objects: argv -- command line arguments; argv[0] is the script pathname if known Variables • No need to declare • Need to assign (initialize) • use of uninitialized variable raises exception • Not typed if friendly: greeting = "hello world" else: greeting = 12**2 print greeting • Everything is a "variable": • Even functions, classes, modules Slide ©2001, 2002 Guido van Rossum Grouping indentation #!/usr/bin/env python import sys if sys.argv[0]%10==0: print “I am devisable by 10” elif if sys.argv[0]%5==0: print “I am devisable by 5” else: print “Not a multiple of 5” • C {} • Fortran “do/end do” • Python indentation Control Structures if condition: while condition: statements statements [elif condition: statements] ... else: for var in sequence: statements statements break continue Slide ©2001, 2002 Guido van Rossum Strings • "hello"+"world" "helloworld" # concatenation • "hello"*3 "hellohellohello" # repetition • "hello"[0] "h" # indexing • "hello"[-1] "o" # (from end) • "hello"[1:4] "ell" # slicing • len("hello") 5 # size • "hello" < "jello" 1 # comparison • "e" in "hello" 1 # search • "escapes: \n etc, \033 etc, \if etc" • 'single quotes' """triple quotes""" r"raw strings" Slide ©2001, 2002 Guido van Rossum Lists • Flexible arrays, not Lisp-like linked lists • a = [99, "bottles of beer", ["on", "the", "wall"]] • Same operators as for strings • a+b, a*3, a[0], a[-1], a[1:], len(a) • Item and slice assignment • a[0] = 98 • a[1:2] = ["bottles", "of", "beer"] -> [98, "bottles", "of", "beer", ["on", "the", "wall"]] • del a[-1] Slide # -> [98, "bottles", "of", "beer"] ©2001, 2002 Guido van Rossum More List Operations >>> a = range(5) # [0,1,2,3,4] >>> a.append(5) # [0,1,2,3,4,5] >>> a.pop() # [0,1,2,3,4] 5 >>> a.insert(0, 42) # [42,0,1,2,3,4] >>> a.pop(0) # [0,1,2,3,4] 5.5 Slide >>> a.reverse() # [4,3,2,1,0] >>> a.sort() # [0,1,2,3,4] ©2001, 2002 Guido van Rossum Dictionaries • Hash tables, "associative arrays" • d = {"duck": "eend", "water": "water"} • Lookup: • d["duck"] -> "eend" • d["back"] # raises KeyError exception • Delete, insert, overwrite: • del d["water"] # {"duck": "eend", "back": "rug"} • d["back"] = "rug" # {"duck": "eend", "back": "rug"} • d["duck"] = "duik" # {"duck": "duik", "back": "rug"} Slide ©2001, 2002 Guido van Rossum More Dictionary Ops • Keys, values, items: • d.keys() -> ["duck", "back"] • d.values() -> ["duik", "rug"] • d.items() -> [("duck","duik"), ("back","rug")] • Presence check: • d.has_key("duck") -> 1; d.has_key("spam") -> 0 • Values of any type; keys almost any • {"name":"Guido", "age":43, ("hello","world"):1, 42:"yes", "flag": ["red","white","blue"]} Slide ©2001, 2002 Guido van Rossum Dictionary Details • Keys must be immutable: – numbers, strings, tuples of immutables • these cannot be changed after creation – reason is hashing (fast lookup technique) – not lists or other dictionaries • these types of objects can be changed "in place" – no restrictions on values • Keys will be listed in arbitrary order – again, because of hashing Slide ©2001, 2002 Guido van Rossum Functions, Procedures def name(arg1, arg2, ...): """documentation""" # optional doc string statements Slide return # from procedure return expression # from function ©2001, 2002 Guido van Rossum Example Function def gcd(a, b): "greatest common divisor" while a != 0: a, b = b%a, a # parallel assignment return b >>> gcd.__doc__ 'greatest common divisor' >>> gcd(12, 20) 4 Slide ©2001, 2002 Guido van Rossum Classes class name: "documentation" statements -orclass name(base1, base2, ...): ... Most, statements are method definitions: def name(self, arg1, arg2, ...): ... May also be class variable assignments Slide ©2001, 2002 Guido van Rossum Example Class class Stack: "A well-known data structure…" def __init__(self): # constructor self.items = [] def push(self, x): self.items.append(x) # the sky is the limit def pop(self): x = self.items[-1] # what happens if it’s empty? del self.items[-1] return x def empty(self): return len(self.items) == 0 Slide ©2001, 2002 Guido van Rossum # Boolean result Using Classes • To create an instance, simply call the class object: x = Stack() # no 'new' operator! • To use methods of the instance, call using dot notation: x.empty() # -> 1 x.push(1) x.empty() # [1] # -> 0 x.push("hello") x.pop() # [1, "hello"] # -> "hello" # [1] • To inspect instance variables, use dot notation: x.items Slide # -> [1] ©2001, 2002 Guido van Rossum Subclassing class FancyStack(Stack): "stack with added ability to inspect inferior stack items" def peek(self, n): "peek(0) returns top; peek(-1) returns item below that; etc." size = len(self.items) assert 0 <= n < size return self.items[size-1-n] Slide ©2001, 2002 Guido van Rossum # test precondition Subclassing (2) class LimitedStack(FancyStack): "fancy stack with limit on stack size" def __init__(self, limit): self.limit = limit FancyStack.__init__(self) # base class constructor def push(self, x): assert len(self.items) < self.limit FancyStack.push(self, x) Slide ©2001, 2002 Guido van Rossum # "super" method call Class / Instance Variables class Connection: verbose = 0 # class variable def __init__(self, host): self.host = host # instance variable def debug(self, v): self.verbose = v # make instance variable! def connect(self): if self.verbose: # class or instance variable? print "connecting to", self.host Slide ©2001, 2002 Guido van Rossum Modules • Collection of stuff in foo.py file – functions, classes, variables • Importing modules: – import re; print re.match("[a-z]+", s) – from re import match; print match("[a-z]+", s) • Import with rename: – import re as regex – from re import match as m – Before Python 2.0: • import re; regex = re; del re Slide ©2001, 2002 Guido van Rossum Packages • Collection of modules in directory • Must have __init__.py file • May contain subpackages • Import syntax: – from P.Q.M import foo; print foo() – from P.Q import M; print M.foo() – import P.Q.M; print P.Q.M.foo() – import P.Q.M as M; print M.foo() Slide ©2001, 2002 Guido van Rossum # new Catching Exceptions def foo(x): return 1/x def bar(x): try: print foo(x) except ZeroDivisionError, message: print "Can’t divide by zero:", message bar(0) Slide ©2001, 2002 Guido van Rossum Try-finally: Cleanup f = open(file) try: process_file(f) finally: f.close() # always executed print "OK" # executed on success only Slide ©2001, 2002 Guido van Rossum File Objects • f = open(filename[, mode[, buffersize]) – mode can be "r", "w", "a" (like C stdio); default "r" – append "b" for text translation mode – append "+" for read/write open – buffersize: 0=unbuffered; 1=line-buffered; buffered • methods: – read([nbytes]), readline(), readlines() – write(string), writelines(list) – seek(pos[, how]), tell() – flush(), close() – fileno() Slide ©2001, 2002 Guido van Rossum Example: Submitting and monitoring jobs • Run a series of jobs where a single input parameter changes • Report how many jobs are running and when jobs finish Parse command line arguments #!/usr/bin/env python import commands import sys import time def parse_command_line(): args={} if len(sys.argv) !=8: print "run_monitor directory queue const_command_line val first number delta" args["dir"]=sys.argv[1] args["queue"]=sys.argv[2] args["const"]=sys.argv[3] args["val"] =sys.argv[4] args["first"]=sys.argv[5] args["number"]=sys.argv[6] args["delta"]=sys.argv[7] return args Parse command line arguments Run programs #!/usr/bin/env python import commands import sys import time def parse_command_line(): args={} if len(sys.argv) !=8: print "run_monitor directory queue const_command_line val first number delta" args["dir"]=sys.argv[1] args["queue"]=sys.argv[2] args["const"]=sys.argv[3] args["val"] =sys.argv[4] args["first"]=sys.argv[5] args["number"]=sys.argv[6] args["delta"]=sys.argv[7] return args Get environmental info Time functions Parse command line arguments Define subroutine #!/usr/bin/env python import commands import sys import time def parse_command_line(): args={} if len(sys.argv) !=8: print "run_monitor directory queue const_command_line val first number delta" args["dir"]=sys.argv[1] args["queue"]=sys.argv[2] args["const"]=sys.argv[3] args["val"] =sys.argv[4] args["first"]=sys.argv[5] args["number"]=sys.argv[6] args["delta"]=sys.argv[7] return args Define a dictionary Check # command line arguments Parse command line arguments Subroutine to create job list def create_job_list(args): coms={} val=float(args["first"]) for i in range(int(args["number"])): var="%s%f"%(args["val"],val) coms[var]="%s %s"%(args["const"],var) val+=float(args["delta"]) return coms args=parse_command_line() job_list=create_job_list(args) Create dictionary value,command Main program Run the job Run a job on the cluster def run_job(arg,key,val): lines=["#!/bin/tcsh" ,"#PBS -q %s"%args["queue"] ,"#PBS -V" ,"#PBS -N test" ,"#PBS -e %s/test.%s.err"%(args["dir"],key) ,"#PBS -o %s/test.%s.out"%(args["dir"],key) ,"cd %s"%args["dir"],val] f=open("run.sh","w") for line in lines: f.write("%s\n"%line) f.close() stat,out=commands.getstatusoutput("qsub run.sh") time.sleep(.5) job_id=out.split(".")[0] return job_id PBS options Get and return job id Check the status Check which jobs are running def print_status(ids): while len(ids)!=0: for id in ids: stat,out=commands.getstatusoutput("qstat %s"%id) if out.find("Unknown Job")>-1: print "%s is finished"%id ids.remove(id) print "%d jobs remaining"%len(ids) time.sleep(60) args=parse_command_line() job_list=create_job_list(args) ids=[] for val,com in job_list.items(): ids.append(run_job(args,val,com)) print_status(ids) Check job status Is the job in queue? Check every minute Main program Example: Parsing a log file • Read a file containing that tells who accessed various web pages • Find the pages that contain a given search string • Print out each page that was accessed containing search string sorted by date • import, string manipulation, dictionaries, classes Reading the file #!/usr/bin/env python file=open(“access_log”) for line in file.xreadlines(): print line file.close() The ugly #!/usr/bin/env python import re decipher=re.compile('^\s*(\S+) - - \[(.+)\] "\S+\s+(\S+).+".+(http:\S+)') file=open(“access_log”) for line in file.xreadlines(): print line file.close() 221.221.144.28 - - [11/Apr/2007:23:02:54 -0700] "GET /gifs//home.gif HTTP/1.1" 304 - "http:// sepwww.stanford.edu/research/reports/old_reports.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" The ugly #!/usr/bin/env python import re decipher=re.compile('^\s*(\S+) - - \[(.+)\] "\S+\s+(\S+).+".+(http:\S+)') file=open(“access_log”) for line in file.xreadlines(): print line file.close() 221.221.144.28 - - [11/Apr/2007:23:02:54 -0700] "GET /gifs//home.gif HTTP/1.1" 304 - "http:// sepwww.stanford.edu/research/reports/old_reports.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" The ugly #!/usr/bin/env python import re decipher=re.compile('^\s*(\S+) - - \[(.+)\] "\S+\s+(\S+).+".+(http:\S+)') file=open(“access_log”) for line in file.xreadlines(): print line file.close() 221.221.144.28 - - [11/Apr/2007:23:02:54 -0700] "GET /gifs//home.gif HTTP/1.1" 304 - "http:// sepwww.stanford.edu/research/reports/old_reports.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" The ugly #!/usr/bin/env python import re decipher=re.compile('^\s*(\S+) - - \[(.+)\] "\S+\s+(\S+).+".+(http:\S+)') file=open(“access_log”) for line in file.xreadlines(): print line file.close() 221.221.144.28 - - [11/Apr/2007:23:02:54 -0700] "GET /gifs//home.gif HTTP/1.1" 304 - "http:// sepwww.stanford.edu/research/reports/old_reports.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" The ugly #!/usr/bin/env python import re decipher=re.compile('^\s*(\S+) - - \[(.+)\] "\S+\s+(\S+).+".+(http:\S+)') file=open(“access_log”) for line in file.xreadlines(): print line file.close() 221.221.144.28 - - [11/Apr/2007:23:02:54 -0700] "GET /gifs//home.gif HTTP/1.1" 304 - "http:// sepwww.stanford.edu/research/reports/old_reports.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" The ugly #!/usr/bin/env python import re decipher=re.compile('^\s*(\S+) - - \[(.+)\] "\S+\s+(\S+).+".+(http:\S+)') file=open(“access_log”) for line in file.xreadlines(): print line file.close() 221.221.144.28 - - [11/Apr/2007:23:02:54 -0700] "GET /gifs//home.gif HTTP/1.1" 304 - "http:// sepwww.stanford.edu/research/reports/old_reports.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" The ugly #!/usr/bin/env python import re decipher=re.compile('^\s*(\S+) - - \[(.+)\] "\S+\s+(\S+).+".+(http:\S+)') file=open(“access_log”) for line in file.xreadlines(): print line file.close() 221.221.144.28 - - [11/Apr/2007:23:02:54 -0700] "GET /gifs//home.gif HTTP/1.1" 304 - "http:// sepwww.stanford.edu/research/reports/old_reports.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" The ugly #!/usr/bin/env python import re decipher=re.compile('^\s*(\S+) - - \[(.+)\] "\S+\s+(\S+).+".+(http:\S+)') file=open(“access_log”) for line in file.xreadlines(): print line file.close() 221.221.144.28 - - [11/Apr/2007:23:02:54 -0700] "GET /gifs//home.gif HTTP/1.1" 304 - "http:// sepwww.stanford.edu/research/reports/old_reports.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" Look for the string #!/usr/bin/env python import re decipher=re.compile('^\s*(\S+) - - \[(.+)\] "\S+\s+(\S+).+".+(http:\S+)') file=open(“access_log”) for line in file.xreadlines(): if decipher.search(line): print line file.close() Create a class to store access #!/usr/bin/env python import re class page_access: “””A class to record page accesses””” def __init_(self,page,host,date,ref): “””Initialize a page acceess””” self.host=host; self.date=date; self.page; self.ref=ref decipher=re.compile('^\s*(\S+) - - \[(.+)\]’+ ‘"\S+\s+(\S+).+".+(http:\S+)') file=open(“access_log”) for line in file.xreadlines(): res=decipher.search(line) if res: host=res.group(1); date=res.group(2) page=res.group(3); ref=res.group(4) print “file=%s date=%s page=%s ref=%s”% (host,date,page,ref) file.close() #!/usr/bin/env python import re Store all the page accesses class page_access: “””A class to record page accesses””” def __init_(self,page,host,date,ref): “””Initialize a page acceess””” self.host=host; self.date=date; self.page; self.ref=ref decipher=re.compile('^\s*(\S+) - - \[(.+)\]’+ ‘"\S+\s+(\S+).+".+(http:\S+)') file=open(“access_log”) accesses=[] for line in file.xreadlines(): res=decipher.search(line) if res: host=res.group(1); date=res.group(2) page=res.group(3); ref=res.group(4) accesses.append= page_access(page,host,date,ref) file.close() Add ability to print access info #!/usr/bin/env python import re class page_access: “””A class to record page accesses””” def __init_(self,page,host,date,ref): “””Initialize a page acceess””” self.host=host; self.date=date; self.page; self.ref=ref def print(self): print “file=%s host=%s date=%s ref=%s\n”% (self.page,self.host,self.date,self.ref) decipher=re.compile('^\s*(\S+) - - \[(.+)\]’+ ‘"\S+\s+(\S+).+".+(http:\S+)') file=open(“access_log”) accesses=[] for line in file.xreadlines(): res=decipher.search(line) if res: host=res.group(1); date=res.group(2) page=res.group(3); ref=res.group(4) accesses.append= page_access(page,host,date,ref) file.close() for acc in accesses: acc.print() Break parts into functions #!/usr/bin/env python import re class page_access: “””A class to record page accesses””” def __init_(self,page,host,date,ref): “””Initialize a page acceess””” self.host=host; self.date=date; self.page; self.ref=ref def print(self): print “file=%s host=%s date=%s ref=%s\n”% (self.page,self.host,self.date,self.ref) def print_access(accesses): for acc in accesses: acc.print() def read_file(file): decipher=re.compile('^\s*(\S+) - - \[(.+)\]’+ ‘"\S+\s+(\S+).+".+(http:\S+)') file=open(“access_log”) accesses=[] for line in file.xreadlines(): res=decipher.search(line) if res: host=res.group(1); date=res.group(2) page=res.group(3); ref=res.group(4) accesses.append= page_access(page,host,date,ref) file.close() return accesses accesses=read_file(“access_file”) print_accesses(accesses) Store by page name #!/usr/bin/env python import re class page_access: “””A class to record page accesses””” def __init_(self,host,date,ref): “””Initialize a page acceess””” self.host=host; self.date=date; self.ref=ref def print(self): return “host=%s date=%s ref=%s”% (self.host,self.date,self.ref) def print_access(links): for page, access in links.items(): print “Page:%s\n %s\n”%(page,access.print()) def read_file(file): decipher=re.compile('^\s*(\S+) - - \[(.+)\]’+ ‘"\S+\s+(\S+).+".+(http:\S+)') file=open(“access_log”) links={} for line in file.xreadlines(): res=decipher.search(line) if res: host=res.group(1); date=res.group(2) page=res.group(3); ref=res.group(4) links[page]= page_access(host,date,ref) file.close() return links accesses=read_file(“access_file”) print_accesses(accesses) Handle multiple references per page #!/usr/bin/env python import re class page_access: “””A class to record page accesses””” def __init_(self,host,date,ref): “””Initialize a page acceess””” self.host=host; self.date=date; self.ref=ref def print(self): return “host=%s date=%s ref=%s”% (self.host,self.date,self.ref) def print_access(links): for page, accesses in links.items(): print “Page:%s\n” for acc in accesses: print “%s\n”%(page,access.print()) def read_file(file): decipher=re.compile('^\s*(\S+) - - \[(.+)\]’+ ‘"\S+\s+(\S+).+".+(http:\S+)') file=open(“access_log”) links={} for line in file.xreadlines(): res=decipher.search(line) if res: host=res.group(1); date=res.group(2) page=res.group(3); ref=res.group(4) if not links.has_key(page): links[page]=[] links[page].append( page_access(host,date,ref)) file.close() return links accesses=read_file(“access_file”) print_accesses(accesses) Find a given search term #!/usr/bin/env python import re import sys class page_access: “””A class to record page accesses””” def __init_(self,host,date,ref): “””Initialize a page acceess””” self.host=host; self.date=date; self.ref=ref def print(self): return “host=%s date=%s ref=%s”% (self.host,self.date,self.ref) def print_access(links,grep): grepre=re.compile(grep) for page, accesses in links.items(): if grepre.search(page): print “Page:%s\n” for acc in accesses: print “%s\n”%(page,access.print()) def read_file(file): decipher=re.compile('^\s*(\S+) - - \[(.+)\]’+ ‘"\S+\s+(\S+).+".+(http:\S+)') file=open(“access_log”) links={} for line in file.xreadlines(): res=decipher.search(line) if res: host=res.group(1); date=res.group(2) page=res.group(3); ref=res.group(4) if not links.has_key(page): links[page]=[] links[page].append( page_access(host,date,ref)) file.close() return links accesses=read_file(“access_file”) print_accesses(accesses,sys.argv[1]) URLs • http://www.python.org – official site • http://starship.python.net – Community • http://www.python.org/psa/bookstore/ – (alias for http://www.amk.ca/bookstore/) – Python Bookstore Slide ©2001, 2002 Guido van Rossum

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Intro to Python and scripting