Download COMS W3101

Document related concepts
no text concepts found
Transcript
COMS W3101
Programming Languages: Python
Spring 2011
February 3rd, 2011
Instructor: Ang Cui
Lecture 2
Assignment 1
• Due Sunday @ 11:59pm
• Submit via Courseworks
• Questions?
Announcement
• We have a TA!
• Richard Li
– [email protected]
• Office Hours
– Wednesday 4PM – 6PM
– Location TBA
• Class Mailing List
– [email protected]
Last Week
•
•
•
•
Bureaucratic stuff
Obtaining and installing python
Language lexical structure
Basic data types
– Numbers, strings, tuples, lists
• Control flow
– If/else, while, for loops
Review
•
•
•
•
x=1
x = range(0,10)
len(x)
for i in x:
print float(i)
• dir(x)
• type(x)
• help(x)
• import this
• if len(x)>5:
print ‘yatta’
elif len(x)%3==0:
print ‘yosh!’
else:
None
Legal Statements?
• x = (1,2,3,4,5)
• y = [1,2,3,4,5]
•
•
•
•
•
x.pop() ?
y.pop() ?
x[-1:-1] ?
y[0:6] ?
y[0:0]
• y[0:5:2]
Too legit to quit
x = 10
y = 12
z=3
Too legit to quit
x = 10
y = 12
z=3
Too legit to quit
if True: print ‘hi’
if True: print \
‘hi’
What is the output?
x input() <- With the input ‘3+1’
Without ‘’?
Python 2.x vs
Python3?
What is the value of x?
x = 10/4
x = 10/float(4)
x = 10/4.0
This Week
•
•
•
•
Assignment 2
Project proposal
Scripting
More data structures
– Sequences,
dictionaries, sets,
None
• Slicing
• List comprehensions
•
•
•
•
Sorting
Functions
Imports
Command line
arguments
• File I/O
– CSV files
• Scoping
Assignment 2
• Will be posted tomorrow
• Due by 11:59pm, next Sunday
Project
• Pick a project
– Moderate size, end result should be several hundred to a
thousand lines of code
• Submit a proposal for approval
–
–
–
–
Email me
Due with HW2 (Sunday @ Midnight)
One or two paragraphs
Describe what your proposed program is suppose to accomplish
(i.e. input, output, algorithms)
• Ultimately…
– Project will be due on the day of the last lecture
– You will need to submit code and separate documentation (i.e. a
user manual)
Python Scripts
• First line is the shebang (“hash bang”)
– General in Unix
– Program loader will execute interpreter specified
– Sample shebang line shown on the right assumes that
interpreter is in the user’s PATH environment
– File must begin with #!
• To execute Python script in Unix, file must be given an
executable mode (chmod +x)
• In Windows, .py files are automatically associated with
the Python interpreter, so double-clicking .py files will
execute
• Lines after the shebang line are all Python code, and will
be executed in order
• Encoding can be changed from ASCII (i.e. to Unicode)
Performance
• Python is compiled to byte code
– Like Java
• Core language is highly optimized
– Built in data types implemented in C
– Built in methods are well implemented
• Sorting done in ~1200 lines of C in later versions
of Python
• Python used in high performance
environments more commonly than Java
.py vs .pyc
A .pyc is created when a MODULE is imported
for the first time.
import py_compile
py_compile.compile(‘test.py’)
Now, go the other way
More Data Types
Dictionaries, sets, sequences,
None
Sequences
•
•
•
•
An ordered container of items
Indexed by nonnegative integers
Lists, strings, tuples
Libraries and extensions provide other
kinds
• You can create your own (later)
Iterables
• Python concept which generalizes idea of
sequences
• All sequences are iterables
• Beware of bounded and unbounded iterables
– Sequences are bounded
– Iterables don’t have to be
– Beware when using unbounded iterables: it could
result in an infinite loop or exhaust your memory
Unbounded Iterables
• Unbounded_generator():
yield random.randint(0,1)
This will not terminate!
t = Unbounded_generator()
for i in t:
print t
Manipulating Sequences
• Concatenation
– Most commonly, concatenate sequences of the same
type using + operator
• Same type refers to the sequences (i.e. lists can only be
concatenated with other lists, you can’t concatenate a list and
a tuple)
• The items in the sequences can usually vary in type
– Less commonly, use the * operator
• Membership testing
– Use of in keyword
– E.g. ‘x’ in ‘xyz’ for strings
– Returns True / False
Slicing
• Extract subsequences from sequences by slicing
• Given sequence s, syntax is s[x:y] where x, y are
integral indices
– x is the inclusive lower bound
– y is the exclusive upper bound
•
•
•
•
If x < y, then s[x:y] will be empty
If y > len(s), s[x:y] will return s[x:len(s)]
If lower bound is omitted, then it is 0 by default
If upper bound is omitted, then it is len(s) by
default
Slicing (cont’d)
• Also possible: s[x:y:n] where n is the stride
– n is the positional difference between successive
items
– s[x:y] is the same as s[x:y:1]
– s[x:y:2] returns every second element of s from [x, y)
– s[::3] returns every third element in all of s
• x, y, n call also be negative
– s[y:x:-1] will return items from [y, x) in reverse
(assuming y > x)
– s[-2] will return 2nd last item
Slicing (cont’d)
• It is also possible to assign to slices
• Assigning to a slicing s[x:y:n] is equivalent
to assigning to the indices of s specified by
the slicing x:y:n
• Exercise:
– How would you create these lists:
• [0, 0, 2, 2, 4, 4, …]
Examples
>>>
>>>
[9,
>>>
>>>
>>>
[0,
>>>
>>>
[5,
>>>
[]
s = range(10)
s[::-1] # Reverses the entire list
8, 7, 6, 5, 4, 3, 2, 1, 0]
s = range(10)
s[1::2] = s[::2] #This is cute, but hurts readability. Poor form!
s
0, 2, 2, 4, 4, 6, 6, 8, 8]
s = range(10)
s[5:0:-1]
4, 3, 2, 1]
s[0:5:-1]
Negative stride works, but for the sake of style, use s.reverse() instead.
List Comprehensions
•
•
•
It is common to use a for loop to go through an iterable and build a new list
by appending the result of an expression
Python list comprehensions allows you to do this quickly
General syntax:
– [expression for target in iterable clauses]
– Clauses is a series of zero or more clauses, must be one of the following forms:
• for target in iterable
• if expression
•
•
•
List comprehensions are expressions
Underlying implementation is efficient
The following are equivalent:
–
–
•
result = []
for x in sequence:
result.append(x + 1)
result = [ x + 1 for x in sequence ]
Exercise: create a list comprehension of all the squares from 1 to 100
Examples
>>> a = ['dad', 'mom', 'son', 'daughter', 'daughter']
>>> b = ['homer', 'marge', 'bart', 'lisa', 'maggie']
>>> [ a[i] + ': ' + b[i] for i in range(len(a)) ]
['dad: homer', 'mom: marge', 'son: bart', 'daughter:
lisa', 'daughter: maggie']
>>> [ x + y for x in range(5) for y in range(5, 10) ]
[5, 6, 7, 8, 9, 6, 7, 8, 9, 10, 7, 8, 9, 10, 11, 8, 9, 10,
11, 12, 9, 10, 11, 12, 13]
>>> [ x for x in range(10) if x**2 in range(50, 100) ]
[8, 9]
Tip: Keep list comprehensions SIMPLE!
Sorting
• sorted() vs. list.sort()
– For iterables in general, sorted() can be used
– For lists specifically, there is the .sort() method
• Sorted()
– Returns a new sorted list from any iterable
• List.sort()
– Sorts a list in place
– Is extremely fast: uses “timsort”, a “non-recursive
adaptive stable natural mergesort/binary insertion sort
hybrid”
• you don’t need to know what these terms mean for this class
Sorting (cont’d)
• Standard comparisons understand numbers, strings, etc.
• Built in function cmp(x, y) is the default comparator
– Returns -1 if x < y
– Returns 0 if x == y
– Returns 1 if x > y
• You can create custom comparators by defining your
own function
– Function compares two objects and returns -1, 0, or 1 depending
on whether the first object is to be considered less than, equal to,
or greater than the second object
– Sorted list will always place “lesser” objects before “greater”
ones
– More on functions later
Sorting (cont’d)
• For list L, sort syntax is:
–
–
–
–
L.sort(cmp=cmp, key=None, reverse=False)
All arguments optional (default ones listed above)
cmp is the comparator function
If key is specified (to something other than None),
items in the list will be compared to the key(item)
instead of the item itself
• For sorted, syntax is:
– sorted(cmp, key, reverse)
Examples
>>> fruits = ['oranges', 'apples', 'pears', 'grapes']
>>> def food_cmp(x, y): # Custom comparator function
...
if x not in fruits or y not in fruits:
...
return 0
...
return cmp(fruits.index(x), fruits.index(y))
...
>>> L = ['grapes', 'grapes', 'apples', 'oranges', 'apples', 'pears']
>>> L.sort(cmp=food_cmp)
>>> L
['oranges', 'apples', 'apples', 'pears', 'grapes', 'grapes']
>>> L = ['grapes', 'grapes', 'apples', 'oranges', 'apples', 'pears']
>>> L.sort(key=fruits.index)
>>> L
['oranges', 'apples', 'apples', 'pears', 'grapes', 'grapes']
Dictionaries
• Dictionaries are Python’s built in mapping type
– A mapping is an arbitrary collection of objects indexed by nearly
arbitrary values called keys
• Mutable, not ordered
• Keys in a dictionary must be hashable
– Dictionaries implemented as a hash map
• Values are arbitrary objects, may be of different types
• Items in a dictionary are a key/value pair
• One of the more optimized types in Python
– When used properly, most operations run in constant time
Specifying Dictionaries
• Use a series of pairs of expressions, separated by commas within
braces
– i.e. {‘x’: 42, ‘y’: 3.14, 26: ‘z’} will create a dictionary with 3 items,
mapping ‘x’ to 42, ‘y’ to 3.14, 26 to ‘z’.
• Alternatively, use the dict() function
– Less concise, more readable
– dict(x=42, y=3.14)
– dict([[1,2], [3,4]])
• Creates dictionary with 2 items, mapping 1 to 2, 3 to 4
• Can also be created from keys
– dict.fromkeys(‘hello’, 2) creates {‘h’:2, ‘e’:2, …, ‘o’:2}
– First argument is an iterable whose items become the keys, second
argument is the value
• Dictionaries do not allow duplicate keys
– If a key appears more than once, only one of the items with that key is
kept (usually the last one specified)
Examples
>>> dict_one = {'x':42, 'y':120, 'z':55}
>>> dict_two = dict(x=42,y=120,z=55)
>>> dict_one == dict_two
True
>>> dict_there = {'x':42, 'y':111, 'z':56}
>>> dict_one == dict_there
False
Idiom
#histogram code
Histogram = {}
for item in somelist:
try:
Histogram[item] += 1
except KeyError, e:
Histogram[item] = 1
Idiom
#histogram code
Histogram = {}
for item in somelist:
if item in Histogram:
Histogram[item] = 1
else:
Histogram[item] += 1
Dictionaries (cont’d)
• Suppose x = {‘a’:1, ‘b’:2, ‘c’:3}
– To access a value: x[‘a’]
– To assign values: x[‘a’] = 10
– You can also specify a dictionary, one item at a time in this way
• Common methods
– .keys(), .values(), .items(), .setdefault(…), .pop(…)
• Dictionaries are containers, so functions like len() will
work
• To see if a key is in a dictionary, use in keyword
– E.g. ‘a’ in x will return True, ‘d’ in x will return False.
– Attempting to access a key in a dictionary when it doesn’t exist
will result in an error
More Examples
>>> x['dad'] = 'homer'
>>> x['mom']= 'marge'
>>> x['son'] = 'bart'
>>> x['daughter'] = ['lisa', 'maggie']
>>> print 'Simpson family:', x
Simpson family: {'dad': 'homer', 'daughter': ['lisa', 'maggie'], 'son':
'bart', 'mom': 'marge’}
>>> 'dog' in x
False
>>> x['dog'] = 'Santa\'s Little Helper'
>>> 'dog' in x
True
>>> print 'Family members: ', x.values()
Family members: ['homer', ['lisa', 'maggie'], "Santa's Little Helper",
'bart', 'marge']
>>> x.items()
[('dad', 'homer'), ('daughter', ['lisa', 'maggie']), ('dog', "Santa's
Little Helper"), ('son', 'bart'), ('mom', 'marge')]
Sets
• Arbitrarily ordered collections of unique items
• Items may be of different types, but must be
hashable
• Python has two built-in types: set and frozenset
– Instances of type set are mutable and not hashable
– Instances of type frozenset are immutable and
hashable
– Therefore, you may have a set of frozensets, but not
a set of sets
• Sets and frozensets are not ordered
Sets (cont’d)
• To create a set, call the built-in type set() with no
argument (i.e. to create an empty set)
– You can also include an argument that is an iterable, and the
unique items of the iterable will be added into the set
• Common set operations:
– Intersection, union, difference
• Methods (given sets S, T):
– Non-mutating: S.intersection(T), S.issubset(T), S.union(T)
– Mutating: S.add(x), S.clear(), S.discard(x)
• Exercise: count the number of unique items in the list L
generated using the following:
–
import random
L = [ random.randint(1, 50) for x in range(100) ]
Functions
Functions
• Most statements in a typical Python program are
grouped together into functions
• Request to execute a function is a function call
• When you call a function, you can pass in arguments
• A Python function always returns a value
– If nothing is explicitly specified to be returned, None will be
returned
Functions
• Functions are also objects
–
–
–
–
May be passed as arguments to other functions
May be assigned dynamically at runtime
May return other functions
May even be keys in a dictionary or items in a sequence!
– Functions can be overwritten at RUNTIME
• Potential Security Problem? (I think so, haven’t really tried it out)
A Note on Functions
• Python libraries are HUGE
• Look before writing a function to perform a
common task
• No need to reinvent the wheel
The def Statement
• The most common way to define a function
• Syntax:
–
def function-name(parameters):
statement(s)
– function-name is an identifier; it is a variable name that gets
bound to the function object when def executes
– parameters is an optional list of identifiers
• Each call to a function supplies arguments
corresponding to the parameters listed in the function
definition
• Example:
– def sum_of_squares(x, y):
return x**2 + y**2
Libraries Shipped With Python
• Numeric, math modules
• Files and directory
handling
• Data persistence
• Data compression and
archiving
• Cryptography
• OS hooks
• Interprocess
communication and
threading
• File formats (e.g. CSV)
• Internet data handling
• Structured markup
handling (XML, HTML)
• Internet protocols
• Multimedia services
• Internationalization
• GUIs
• Debugging, profiling
• OS-specific services
Parameters
• Parameters are local variables of the function
• Pass by reference vs. pass by value
– Passing mutable objects is done by reference
– Passing immutable objects is done by value
– Discuss: similarity to Java
• Supports default arguments
–
def f(x, y=[]):
y.append(x)
return y
– Beware: default value gets computed when def is executed, not when
function is called
– Exercise: what will the above function return?
• Supports optional, arbitrary number of arguments
– def sum_args(*numbers):
return sum(numbers)
Parameters
•
•
•
•
>>> def a(t, m=None, *args):
... print 'hi'
...
>>>
Docstrings
• An attribute of functions
• Documentation string (docstring)
• If the first statement in a function is a
string literal, that literal becomes the
docstring
• Usually docstrings span multiple physical
lines, so triple-quotes are used
• For function f, docstring may be accessed
or reassigned using f.__doc__
Sidenote on convention
• __SOMETHING__
– Have special meaning in python.
– In a class definition,
• __VAR__ denotes private variable. (But not really)
• Truly private variables don’t exist in Python.
• __init__, __cmp__, __iter__ etc, etc.
• In the module
__name__
__author__
__init__.py
Idiom
#!/bin/python
if __name__ == “__main__”:
print ‘hello world’
#Standalone program or a module?
The return Statement
• Allowed only inside a function body and
can optionally be followed by an
expression
• None is returned at the end of a function if
return is not specified or no expression is
specified after return
• Point on style: you should never write a
return without an expression
Calling a Function
• A function call is an expression with the syntax:
– function-object(arguments)
– function-object is usually the function name
– Arguments is a series of 0 or more comma-separated
expressions corresponding to the function’s parameters
– When the call is executed, the parameters are bound to the
argument values
• It is also possible to mention a function by omitting the ()
after the function-object
– This does not execute the function
• Exercise: write a function implements intersection for two
iterables
Examples
>>>
...
...
...
...
...
...
>>>
[5,
>>>
...
...
>>>
[5,
>>>
>>>
[5,
def intersect1(a, b):
result = []
for item in a:
if item in b:
result.append(item)
return result
intersect1(range(10), range(5, 20))
6, 7, 8, 9]
def intersect2(a, b): # same function using list comprehensions
return [ x for x in a if x in b ]
intersect2(range(10), range(5, 20))
6, 7, 8, 9]
f = {'i1':intersect1, 'i2':intersect2}
f['i1'](range(10), range(5, 15))
6, 7, 8, 9]
Modules and Imports
A Quick Note
Modules
• From this point on, we’re going to need to
use more and more modules
• A typical Python program consists of
several source files, each corresponding
to a module
• Code and data are grouped together for
reuse
• Modules are normally independent of one
another for ease of reuse
Imports
• Any Python source file can be used as a module by
executing an import statement
• Suppose you wish to import a source file named
MyModule.py), syntax is:
– import MyModule as Alias
– The Alias is optional
• Suppose MyModule.py contains a function called f()
– To access f(), simply execute MyModule.f()
– If Alias was specified, execute Alias.f()
• Stylistically, imports in a script file are usually executed
at the start (after the shebang, etc.)
• Modules will be discussed in greater detail when we talk
about libraries
Imports
• import sys
• from sys import *
• from sys import path
Command-Line Arguments
• Arguments passed in through the command line
(the console)
– $> ./script.py arg1 arg2 …
– Note the separation by spaces
– Encapsulation by quotes is usually allowed
• The sys module is required
– import sys
• Command-line arguments are stored in
sys.argv, which is just a list
– Recall argv, argc in C---no need for argc here, that is
simply len(sys.argv)
Example
#! /usr/bin/env python
# multiply.py
import sys
product = 1
for arg in sys.argv:
print arg, type(arg) # all arguments stored as strings
for i in range(1, len(sys.argv)):
product *= int(sys.argv[i])
print product
Command-Line Arguments
• Don’t reinvent the wheel
• Use getopt or argparse
• In fact, you should have your own argument
parsing magic.
• A python program that generates getopt /
argparse code based on a config file?
File I/O
File Objects
• Built in type in Python: file
• You can read / write data to a file
• File can be created using the open function
–
–
–
–
–
E.g. open(filename, mode=’r’, bufsize=-1)
A file object is returned
filename is a string containing the path to a file
mode denotes how the file is to be opened or created
bufsize denotes the buffer size requested for opening
the file, a negative bufsize will result in the operating
system’s default being used
– mode and bufsize are optional; the default behavior is
read-only
Idiom
• lines = open(filename, ‘r’)
• lines = open(filename, ‘r’).read()
• lines = open(filename, ‘r’).readlines()
• lines = map(lambda x: x.strip(),
open(filename, ‘r’).readlines())
• lines = map(lambda x: x.strip().split(‘,’),
open(filename, ‘r’).readlines())
#The last one is pushing it. Poor form!
File Modes
• Modes are strings, can be:
– ‘r’: file must exist, open in read-only mode
– ‘w’: file opened for write-only, file is overwritten if it already exists,
created if not
– ‘a’: file opened in write-only mode, data is appended if the file exists, file
created otherwise
– ‘r+’: file must exist, opened for reading and writing
– ‘w+’: file opened for reading and writing, created if it does not exist,
overwritten if it does
– ‘a+’: file opened for reading and writing, data is appended, file is created
if it does not exist
• Binary and text modes
– The mode string can also have ‘b’ to ‘t’ as a suffix for binary and text
mode
– Text mode is default
– On Unix, there is no difference between opening in these modes
– On Windows, newlines are handled in a special way
Tip on working with binary files
• The struct module is great!
• http://docs.python.org/library/struct.html
• Read from file, unpack, process, pack, back to file!
File Methods and Attributes
• Assume file object f
• Common methods
– f.close(), f.read(), f.readline(), f.readlines(),
f.write(), f.writelines(), f.seek()
• Common attributes
– f.closed, f.mode, f.name, f.newlines
Reading from File
#! /usr/bin/env python
# fileread.py
f = open('sample.txt', 'r')
# Read entire file into list.
lines = f.readlines()
# Go through file, line by line.
line_number = 1
# It is also acceptable to iterate directly through f
# (try it, but remember to remove f.readlines() first)
for line in lines:
print line_number, ":", line.strip()
line_number += 1
# Closing can be done automatically by the garbage collector,
# but it is innocuous to call, and is often cleaner to do so
# explicitly.
f.close()
Writing to File
#! /usr/bin/env python
# filewrite.py
f = open('sample2.txt', 'w')
for i in range(99, -1, -1):
# Notice how file objects have a readline() method, but not
# a writeline() method, only a writelines() method which writes
# lines from list.
if i > 1 or i == 0:
f.write('%d bottles of beer on the wall\n' % i)
else:
f.write('%d bottle of beer on the wall\n' % i)
f.close()
Exercises
• Try rewriting the code from ‘Reading from
File’ using read() instead of readlines()
– Note: read() returns the empty string when
EOF is reached
• Try rewriting the code from ‘Writing to
Files’ using writelines() instead of write()
CSV
• Comma-separated values
• Common file format, great for spreadsheets
• Basically:
– Each record (think row in spreadsheets) is terminated by a line break,
BUT
– Line breaks may be embedded (and be part of fields)
– Fields separated by commas
– Fields containing commas must be encapsulated by double-quotes
– Fields with embedded double-quotes must replace those with two
double-quotes, and encapsulate the field with double quotes
– Fields with trailing or leading whitespace must be encapsulated in
double quotes
– Fields with line breaks must be encapsulated with double quotes
– First record may be column names of each field
CSV (cont’d)
• Format is conceptually simple
• Sample CSV file:
– Name,Age,GPA
Alice,22,3.7
Bob,24,3.9
Eve,21,4.0
• Discussion: think a CSV/reader or writer would be easy
to implement?
• Exercise: how would you encode the following field:
– Hello, there
“bob”!
Reading a CSV File
• Don’t reinvent the wheel
• http://docs.python.org/library/csv.html
Reading a CSV File
• Use the csv module
• Create a reader:
– csv_reader = csv.reader(csvfile, \ dialect=‘excel’,
delimiter=‘,’, quotechar=‘”’)
– csvfile is opened csv file
• E.g. open(‘sample.csv’, ‘r’)
– Dialect, delimiter, quotechar (format parameters) are
all optional
• Controls various aspects of the CSV file
• E.g. use a different delimiter other than a comma
• Reader is an iterator containing the rows of the
CSV file
Writing a CSV File
• Again, use the csv module
• Create a writer object
– csv_writer = csv.writer(csvfile, delimiter=‘,’)
– Again, csvfile is a file object
– Format parameters allowed
• Common methods:
– .writerow(), .writerows()
Pretty Config Files
• There is a module for that.
• http://docs.python.org/library/configparser.html
Pretty Config Files
[Section1]
foodir: %(dir)s/whatever
dir=frob long: this value continues in
the next line
JSON
JavaScript Object Notation
http://docs.python.org/library/json.html
Encoding
JSON
JavaScript Object Notation
http://docs.python.org/library/json.html
Decoding
YAML
YetAnotherMarkupLanguage
http://pyyaml.org/
Good for config files
Can store structured data
Can be used for object persistence
ORM (Object Relational Model) is a better way to go.
Checkout: http://www.sqlalchemy.org/
We still have time.
Let’s abuse python.
Interesting modules: dis, gc
We still have time.
Disassemble python environment at runtime!
We still have time.
Change whatever you like.
We still have time.
Now, something more interesting. What else can you
overwrite?
Fun with garbage collection
Fun with garbage collection
Something that comes up in programming interviews once
in a while, the weakreference.
http://docs.python.org/library/weakref.html
Useful for large cache maps. If it’s collected, so be it. If it’s
not collected yet, great!