Download Intro to Python and scripting

Document related concepts
no text concepts found
Transcript
Python
Scripting languages
“are computer programing languages that are typically
interpreted and can be typed directly from the keyboard.
Thus scripts are often distinguished from programs, because
programs are converted permanently into binary executable
files (i.e., zeros and ones) before they are run.”
Wikipedia
Scripting languages I’ve
used
bash, csh, ksh, zsh, tcsh,sh, ARexx, AppleScript,cmd.exe,
COMMAND.COM, Automator, ActionScript, Emacs Lisp,
VBScript, PHP, JavaScript,VBScript, AWK,Perl,sed, Lisp, Ruby,
Scheme, Tcl
1/4 of what is available
Advantages of scripting
languages
• Less overhead to do simple tasks
• Some things are easier to do in a scripting
language than a compiled code
• You can provide additional functionality to
an existing code
Uses
echo “hello world”
print STDERR “hello world\n”
print “hello world”
Uses
• Parsing a log file
• Running numerous jobs with parameter
variations
• Inversion using files
Why python
• Unlike perl, usually one way to do things
• Easier to understand someone elses code
• Has a strong numerics community (some of
the features of matlab)
• Help functionality
• Object oriented
Hello world
#!/usr/bin/env python
print “Hello world”
Import
#!/usr/bin/env python
import sys
print sys.argv[1]
./myprog.py “Hello world”
hello world
•
An extensive list of
modules add additional
functionality
•
You use the import
command to access the
module content
Import
pydoc sys
Help on built-in module sys:
NAME
sys
•
You can use the pydoc
command to print out
information about the
module
•
Close to automatic ...
later
FILE
(built-in)
DESCRIPTION
This module provides access to some objects used or
maintained by the interpreter and to functions that
interact strongly with the interpreter.
Dynamic objects:
argv -- command line arguments; argv[0] is the script
pathname if known
Variables
• No need to declare
• Need to assign (initialize)
• use of uninitialized variable raises exception
• Not typed
if friendly: greeting = "hello world"
else: greeting = 12**2
print greeting
• Everything is a "variable":
• Even functions, classes, modules
Slide
©2001, 2002 Guido van Rossum
Grouping indentation
#!/usr/bin/env python
import sys
if sys.argv[0]%10==0:
print “I am devisable by 10”
elif if sys.argv[0]%5==0:
print “I am devisable by 5”
else:
print “Not a multiple of 5”
• C {}
• Fortran “do/end do”
• Python indentation
Control Structures
if condition:
while condition:
statements
statements
[elif condition:
statements] ...
else:
for var in sequence:
statements
statements
break
continue
Slide
©2001, 2002 Guido van Rossum
Strings
• "hello"+"world"
"helloworld"
# concatenation
• "hello"*3
"hellohellohello" # repetition
• "hello"[0]
"h"
# indexing
• "hello"[-1]
"o"
# (from end)
• "hello"[1:4]
"ell"
# slicing
• len("hello")
5
# size
• "hello" < "jello"
1
# comparison
• "e" in "hello"
1
# search
• "escapes: \n etc, \033 etc, \if etc"
• 'single quotes' """triple quotes""" r"raw strings"
Slide
©2001, 2002 Guido van Rossum
Lists
• Flexible arrays, not Lisp-like linked lists
• a = [99, "bottles of beer", ["on", "the", "wall"]]
• Same operators as for strings
• a+b, a*3, a[0], a[-1], a[1:], len(a)
• Item and slice assignment
• a[0] = 98
• a[1:2] = ["bottles", "of", "beer"]
-> [98, "bottles", "of", "beer", ["on", "the", "wall"]]
• del a[-1]
Slide
# -> [98, "bottles", "of", "beer"]
©2001, 2002 Guido van Rossum
More List Operations
>>> a = range(5)
# [0,1,2,3,4]
>>> a.append(5)
# [0,1,2,3,4,5]
>>> a.pop()
# [0,1,2,3,4]
5
>>> a.insert(0, 42)
# [42,0,1,2,3,4]
>>> a.pop(0)
# [0,1,2,3,4]
5.5
Slide
>>> a.reverse()
# [4,3,2,1,0]
>>> a.sort()
# [0,1,2,3,4]
©2001, 2002 Guido van Rossum
Dictionaries
• Hash tables, "associative arrays"
• d = {"duck": "eend", "water": "water"}
• Lookup:
• d["duck"] -> "eend"
• d["back"] # raises KeyError exception
• Delete, insert, overwrite:
• del d["water"] # {"duck": "eend", "back": "rug"}
• d["back"] = "rug" # {"duck": "eend", "back": "rug"}
• d["duck"] = "duik" # {"duck": "duik", "back": "rug"}
Slide
©2001, 2002 Guido van Rossum
More Dictionary Ops
• Keys, values, items:
• d.keys() -> ["duck", "back"]
• d.values() -> ["duik", "rug"]
• d.items() -> [("duck","duik"), ("back","rug")]
• Presence check:
• d.has_key("duck") -> 1; d.has_key("spam") -> 0
• Values of any type; keys almost any
• {"name":"Guido", "age":43, ("hello","world"):1,
42:"yes", "flag": ["red","white","blue"]}
Slide
©2001, 2002 Guido van Rossum
Dictionary Details
• Keys must be immutable:
– numbers, strings, tuples of immutables
• these cannot be changed after creation
– reason is hashing (fast lookup technique)
– not lists or other dictionaries
• these types of objects can be changed "in place"
– no restrictions on values
• Keys will be listed in arbitrary order
– again, because of hashing
Slide
©2001, 2002 Guido van Rossum
Functions, Procedures
def name(arg1, arg2, ...):
"""documentation""" # optional doc string
statements
Slide
return
# from procedure
return expression
# from function
©2001, 2002 Guido van Rossum
Example Function
def gcd(a, b):
"greatest common divisor"
while a != 0:
a, b = b%a, a
# parallel assignment
return b
>>> gcd.__doc__
'greatest common divisor'
>>> gcd(12, 20)
4
Slide
©2001, 2002 Guido van Rossum
Classes
class name:
"documentation"
statements
-orclass name(base1, base2, ...):
...
Most, statements are method definitions:
def name(self, arg1, arg2, ...):
...
May also be class variable assignments
Slide
©2001, 2002 Guido van Rossum
Example Class
class Stack:
"A well-known data structure…"
def __init__(self):
# constructor
self.items = []
def push(self, x):
self.items.append(x)
# the sky is the limit
def pop(self):
x = self.items[-1]
# what happens if it’s empty?
del self.items[-1]
return x
def empty(self):
return len(self.items) == 0
Slide
©2001, 2002 Guido van Rossum
# Boolean result
Using Classes
• To create an instance, simply call the class object:
x = Stack()
# no 'new' operator!
• To use methods of the instance, call using dot
notation:
x.empty()
# -> 1
x.push(1)
x.empty()
# [1]
# -> 0
x.push("hello")
x.pop()
# [1, "hello"]
# -> "hello"
# [1]
• To inspect instance variables, use dot notation:
x.items
Slide
# -> [1]
©2001, 2002 Guido van Rossum
Subclassing
class FancyStack(Stack):
"stack with added ability to inspect inferior stack items"
def peek(self, n):
"peek(0) returns top; peek(-1) returns item below that; etc."
size = len(self.items)
assert 0 <= n < size
return self.items[size-1-n]
Slide
©2001, 2002 Guido van Rossum
# test precondition
Subclassing (2)
class LimitedStack(FancyStack):
"fancy stack with limit on stack size"
def __init__(self, limit):
self.limit = limit
FancyStack.__init__(self)
# base class constructor
def push(self, x):
assert len(self.items) < self.limit
FancyStack.push(self, x)
Slide
©2001, 2002 Guido van Rossum
# "super" method call
Class / Instance Variables
class Connection:
verbose = 0
# class variable
def __init__(self, host):
self.host = host
# instance variable
def debug(self, v):
self.verbose = v
# make instance variable!
def connect(self):
if self.verbose:
# class or instance variable?
print "connecting to", self.host
Slide
©2001, 2002 Guido van Rossum
Modules
• Collection of stuff in foo.py file
– functions, classes, variables
• Importing modules:
– import re; print re.match("[a-z]+", s)
– from re import match; print match("[a-z]+", s)
• Import with rename:
– import re as regex
– from re import match as m
– Before Python 2.0:
• import re; regex = re; del re
Slide
©2001, 2002 Guido van Rossum
Packages
• Collection of modules in directory
• Must have __init__.py file
• May contain subpackages
• Import syntax:
– from P.Q.M import foo; print foo()
– from P.Q import M; print M.foo()
– import P.Q.M; print P.Q.M.foo()
– import P.Q.M as M; print M.foo()
Slide
©2001, 2002 Guido van Rossum
# new
Catching Exceptions
def foo(x):
return 1/x
def bar(x):
try:
print foo(x)
except ZeroDivisionError, message:
print "Can’t divide by zero:", message
bar(0)
Slide
©2001, 2002 Guido van Rossum
Try-finally: Cleanup
f = open(file)
try:
process_file(f)
finally:
f.close()
# always executed
print "OK" # executed on success only
Slide
©2001, 2002 Guido van Rossum
File Objects
• f = open(filename[, mode[, buffersize])
– mode can be "r", "w", "a" (like C stdio); default "r"
– append "b" for text translation mode
– append "+" for read/write open
– buffersize: 0=unbuffered; 1=line-buffered; buffered
• methods:
– read([nbytes]), readline(), readlines()
– write(string), writelines(list)
– seek(pos[, how]), tell()
– flush(), close()
– fileno()
Slide
©2001, 2002 Guido van Rossum
Example: Submitting and
monitoring jobs
• Run a series of jobs where a single input
parameter changes
• Report how many jobs are running and
when jobs finish
Parse command line
arguments
#!/usr/bin/env python
import commands
import sys
import time
def parse_command_line():
args={}
if len(sys.argv) !=8:
print "run_monitor directory queue const_command_line val first number delta"
args["dir"]=sys.argv[1]
args["queue"]=sys.argv[2]
args["const"]=sys.argv[3]
args["val"] =sys.argv[4]
args["first"]=sys.argv[5]
args["number"]=sys.argv[6]
args["delta"]=sys.argv[7]
return args
Parse command line
arguments
Run programs
#!/usr/bin/env python
import commands
import sys
import time
def parse_command_line():
args={}
if len(sys.argv) !=8:
print "run_monitor directory queue const_command_line val first number delta"
args["dir"]=sys.argv[1]
args["queue"]=sys.argv[2]
args["const"]=sys.argv[3]
args["val"] =sys.argv[4]
args["first"]=sys.argv[5]
args["number"]=sys.argv[6]
args["delta"]=sys.argv[7]
return args
Get environmental info
Time functions
Parse command line
arguments
Define subroutine
#!/usr/bin/env python
import commands
import sys
import time
def parse_command_line():
args={}
if len(sys.argv) !=8:
print "run_monitor directory queue const_command_line val first number delta"
args["dir"]=sys.argv[1]
args["queue"]=sys.argv[2]
args["const"]=sys.argv[3]
args["val"] =sys.argv[4]
args["first"]=sys.argv[5]
args["number"]=sys.argv[6]
args["delta"]=sys.argv[7]
return args
Define a dictionary
Check # command
line arguments
Parse command line
arguments Subroutine to create job list
def create_job_list(args):
coms={}
val=float(args["first"])
for i in range(int(args["number"])):
var="%s%f"%(args["val"],val)
coms[var]="%s %s"%(args["const"],var)
val+=float(args["delta"])
return coms
args=parse_command_line()
job_list=create_job_list(args)
Create dictionary
value,command
Main program
Run the job
Run a job on the cluster
def run_job(arg,key,val):
lines=["#!/bin/tcsh" ,"#PBS -q %s"%args["queue"]
,"#PBS -V"
,"#PBS -N test"
,"#PBS -e %s/test.%s.err"%(args["dir"],key)
,"#PBS -o %s/test.%s.out"%(args["dir"],key)
,"cd %s"%args["dir"],val]
f=open("run.sh","w")
for line in lines:
f.write("%s\n"%line)
f.close()
stat,out=commands.getstatusoutput("qsub run.sh")
time.sleep(.5)
job_id=out.split(".")[0]
return job_id
PBS options
Get and return
job id
Check the status
Check which jobs are running
def print_status(ids):
while len(ids)!=0:
for id in ids:
stat,out=commands.getstatusoutput("qstat %s"%id)
if out.find("Unknown Job")>-1:
print "%s is finished"%id
ids.remove(id)
print "%d jobs remaining"%len(ids)
time.sleep(60)
args=parse_command_line()
job_list=create_job_list(args)
ids=[]
for val,com in job_list.items():
ids.append(run_job(args,val,com))
print_status(ids)
Check job status
Is the job in queue?
Check every minute
Main program
Example:
Parsing a log file
• Read a file containing that tells who
accessed various web pages
• Find the pages that contain a given search
string
• Print out each page that was accessed
containing search string sorted by date
• import, string manipulation, dictionaries,
classes
Reading the file
#!/usr/bin/env python
file=open(“access_log”)
for line in file.xreadlines():
print line
file.close()
The ugly
#!/usr/bin/env python
import re
decipher=re.compile('^\s*(\S+) - - \[(.+)\] "\S+\s+(\S+).+".+(http:\S+)')
file=open(“access_log”)
for line in file.xreadlines():
print line
file.close()
221.221.144.28 - - [11/Apr/2007:23:02:54 -0700] "GET /gifs//home.gif HTTP/1.1" 304 - "http://
sepwww.stanford.edu/research/reports/old_reports.html" "Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
The ugly
#!/usr/bin/env python
import re
decipher=re.compile('^\s*(\S+) - - \[(.+)\] "\S+\s+(\S+).+".+(http:\S+)')
file=open(“access_log”)
for line in file.xreadlines():
print line
file.close()
221.221.144.28 - - [11/Apr/2007:23:02:54 -0700] "GET /gifs//home.gif HTTP/1.1" 304 - "http://
sepwww.stanford.edu/research/reports/old_reports.html" "Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
The ugly
#!/usr/bin/env python
import re
decipher=re.compile('^\s*(\S+) - - \[(.+)\] "\S+\s+(\S+).+".+(http:\S+)')
file=open(“access_log”)
for line in file.xreadlines():
print line
file.close()
221.221.144.28 - - [11/Apr/2007:23:02:54 -0700] "GET /gifs//home.gif HTTP/1.1" 304 - "http://
sepwww.stanford.edu/research/reports/old_reports.html" "Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
The ugly
#!/usr/bin/env python
import re
decipher=re.compile('^\s*(\S+) - - \[(.+)\] "\S+\s+(\S+).+".+(http:\S+)')
file=open(“access_log”)
for line in file.xreadlines():
print line
file.close()
221.221.144.28 - - [11/Apr/2007:23:02:54 -0700] "GET /gifs//home.gif HTTP/1.1" 304 - "http://
sepwww.stanford.edu/research/reports/old_reports.html" "Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
The ugly
#!/usr/bin/env python
import re
decipher=re.compile('^\s*(\S+) - - \[(.+)\] "\S+\s+(\S+).+".+(http:\S+)')
file=open(“access_log”)
for line in file.xreadlines():
print line
file.close()
221.221.144.28 - - [11/Apr/2007:23:02:54 -0700] "GET /gifs//home.gif HTTP/1.1" 304 - "http://
sepwww.stanford.edu/research/reports/old_reports.html" "Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
The ugly
#!/usr/bin/env python
import re
decipher=re.compile('^\s*(\S+) - - \[(.+)\] "\S+\s+(\S+).+".+(http:\S+)')
file=open(“access_log”)
for line in file.xreadlines():
print line
file.close()
221.221.144.28 - - [11/Apr/2007:23:02:54 -0700] "GET /gifs//home.gif HTTP/1.1" 304 - "http://
sepwww.stanford.edu/research/reports/old_reports.html" "Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
The ugly
#!/usr/bin/env python
import re
decipher=re.compile('^\s*(\S+) - - \[(.+)\] "\S+\s+(\S+).+".+(http:\S+)')
file=open(“access_log”)
for line in file.xreadlines():
print line
file.close()
221.221.144.28 - - [11/Apr/2007:23:02:54 -0700] "GET /gifs//home.gif HTTP/1.1" 304 - "http://
sepwww.stanford.edu/research/reports/old_reports.html" "Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
The ugly
#!/usr/bin/env python
import re
decipher=re.compile('^\s*(\S+) - - \[(.+)\] "\S+\s+(\S+).+".+(http:\S+)')
file=open(“access_log”)
for line in file.xreadlines():
print line
file.close()
221.221.144.28 - - [11/Apr/2007:23:02:54 -0700] "GET /gifs//home.gif HTTP/1.1" 304 - "http://
sepwww.stanford.edu/research/reports/old_reports.html" "Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
Look for the string
#!/usr/bin/env python
import re
decipher=re.compile('^\s*(\S+) - - \[(.+)\] "\S+\s+(\S+).+".+(http:\S+)')
file=open(“access_log”)
for line in file.xreadlines():
if decipher.search(line):
print line
file.close()
Create a class to store
access
#!/usr/bin/env python
import re
class page_access:
“””A class to record page accesses”””
def __init_(self,page,host,date,ref):
“””Initialize a page acceess”””
self.host=host; self.date=date;
self.page; self.ref=ref
decipher=re.compile('^\s*(\S+) - - \[(.+)\]’+
‘"\S+\s+(\S+).+".+(http:\S+)')
file=open(“access_log”)
for line in file.xreadlines():
res=decipher.search(line)
if res:
host=res.group(1); date=res.group(2)
page=res.group(3); ref=res.group(4)
print “file=%s date=%s page=%s ref=%s”%
(host,date,page,ref)
file.close()
#!/usr/bin/env python
import re
Store all the page
accesses
class page_access:
“””A class to record page accesses”””
def __init_(self,page,host,date,ref):
“””Initialize a page acceess”””
self.host=host; self.date=date;
self.page; self.ref=ref
decipher=re.compile('^\s*(\S+) - - \[(.+)\]’+
‘"\S+\s+(\S+).+".+(http:\S+)')
file=open(“access_log”)
accesses=[]
for line in file.xreadlines():
res=decipher.search(line)
if res:
host=res.group(1); date=res.group(2)
page=res.group(3); ref=res.group(4)
accesses.append=
page_access(page,host,date,ref)
file.close()
Add ability to print
access info
#!/usr/bin/env python
import re
class page_access:
“””A class to record page accesses”””
def __init_(self,page,host,date,ref):
“””Initialize a page acceess”””
self.host=host; self.date=date;
self.page; self.ref=ref
def print(self):
print “file=%s host=%s date=%s ref=%s\n”%
(self.page,self.host,self.date,self.ref)
decipher=re.compile('^\s*(\S+) - - \[(.+)\]’+
‘"\S+\s+(\S+).+".+(http:\S+)')
file=open(“access_log”)
accesses=[]
for line in file.xreadlines():
res=decipher.search(line)
if res:
host=res.group(1); date=res.group(2)
page=res.group(3); ref=res.group(4)
accesses.append=
page_access(page,host,date,ref)
file.close()
for acc in accesses:
acc.print()
Break parts into
functions
#!/usr/bin/env python
import re
class page_access:
“””A class to record page accesses”””
def __init_(self,page,host,date,ref):
“””Initialize a page acceess”””
self.host=host; self.date=date;
self.page; self.ref=ref
def print(self):
print “file=%s host=%s date=%s ref=%s\n”%
(self.page,self.host,self.date,self.ref)
def print_access(accesses):
for acc in accesses:
acc.print()
def read_file(file):
decipher=re.compile('^\s*(\S+) - - \[(.+)\]’+
‘"\S+\s+(\S+).+".+(http:\S+)')
file=open(“access_log”)
accesses=[]
for line in file.xreadlines():
res=decipher.search(line)
if res:
host=res.group(1); date=res.group(2)
page=res.group(3); ref=res.group(4)
accesses.append=
page_access(page,host,date,ref)
file.close()
return accesses
accesses=read_file(“access_file”)
print_accesses(accesses)
Store by page name
#!/usr/bin/env python
import re
class page_access:
“””A class to record page accesses”””
def __init_(self,host,date,ref):
“””Initialize a page acceess”””
self.host=host; self.date=date; self.ref=ref
def print(self):
return “host=%s date=%s ref=%s”%
(self.host,self.date,self.ref)
def print_access(links):
for page, access in links.items():
print “Page:%s\n %s\n”%(page,access.print())
def read_file(file):
decipher=re.compile('^\s*(\S+) - - \[(.+)\]’+
‘"\S+\s+(\S+).+".+(http:\S+)')
file=open(“access_log”)
links={}
for line in file.xreadlines():
res=decipher.search(line)
if res:
host=res.group(1); date=res.group(2)
page=res.group(3); ref=res.group(4)
links[page]=
page_access(host,date,ref)
file.close()
return links
accesses=read_file(“access_file”)
print_accesses(accesses)
Handle multiple
references per page
#!/usr/bin/env python
import re
class page_access:
“””A class to record page accesses”””
def __init_(self,host,date,ref):
“””Initialize a page acceess”””
self.host=host; self.date=date; self.ref=ref
def print(self):
return “host=%s date=%s ref=%s”%
(self.host,self.date,self.ref)
def print_access(links):
for page, accesses in links.items():
print “Page:%s\n”
for acc in accesses:
print “%s\n”%(page,access.print())
def read_file(file):
decipher=re.compile('^\s*(\S+) - - \[(.+)\]’+
‘"\S+\s+(\S+).+".+(http:\S+)')
file=open(“access_log”)
links={}
for line in file.xreadlines():
res=decipher.search(line)
if res:
host=res.group(1); date=res.group(2)
page=res.group(3); ref=res.group(4)
if not links.has_key(page): links[page]=[]
links[page].append(
page_access(host,date,ref))
file.close()
return links
accesses=read_file(“access_file”)
print_accesses(accesses)
Find a given search term
#!/usr/bin/env python
import re
import sys
class page_access:
“””A class to record page accesses”””
def __init_(self,host,date,ref):
“””Initialize a page acceess”””
self.host=host; self.date=date; self.ref=ref
def print(self):
return “host=%s date=%s ref=%s”%
(self.host,self.date,self.ref)
def print_access(links,grep):
grepre=re.compile(grep)
for page, accesses in links.items():
if grepre.search(page):
print “Page:%s\n”
for acc in accesses:
print “%s\n”%(page,access.print())
def read_file(file):
decipher=re.compile('^\s*(\S+) - - \[(.+)\]’+
‘"\S+\s+(\S+).+".+(http:\S+)')
file=open(“access_log”)
links={}
for line in file.xreadlines():
res=decipher.search(line)
if res:
host=res.group(1); date=res.group(2)
page=res.group(3); ref=res.group(4)
if not links.has_key(page): links[page]=[]
links[page].append(
page_access(host,date,ref))
file.close()
return links
accesses=read_file(“access_file”)
print_accesses(accesses,sys.argv[1])
URLs
• http://www.python.org
– official site
• http://starship.python.net
– Community
• http://www.python.org/psa/bookstore/
– (alias for http://www.amk.ca/bookstore/)
– Python Bookstore
Slide
©2001, 2002 Guido van Rossum