Download intro_to_python_note..

Document related concepts
no text concepts found
Transcript
Introduction to Python Programming
Copyright  2010 by David A. Wallace
revision D of 07-Aug-2010
Contents
1.
Where Python is similar to other languages
a.
Identifiers
b.
Operators
c.
Comments
d.
Constants
e.
Modular
2.
Where Python is different from other languages
a.
Indentation defines block structure
b.
Interpreted with garbage collection
c.
Variables are not declared and type is contextual
d.
Scope
3.
A slice of cheese
a.
What a Python program looks like
b.
Installing a Python development environment
4.
Intrinsic types
a.
Integer
b.
Float
c.
Complex
d.
Boolean
e.
String
f.
List
g.
Tuple
h.
Set
i.
Dictionary
5.
Language elements
a.
Identifiers
b.
Keywords
c.
Literals
d.
Operators
e.
Qualifiers and Modifiers
f.
Assignment is definition
g.
Expressions
h.
If / elif / else
i.
For
j.
k.
l.
m.
n.
While
Break and continue
Pass
Print
Exec
6.
Functions
a.
Def(ining)
b.
Argument passing
c.
Return
d.
Returning values
e.
Scope
f.
Global
g.
Docstring
h.
Built-in functions
7.
Objects
a.
Defining classes
b.
Inheritance
c.
Methods
d.
Attributes
e.
Initializer
f.
Scope
g.
Builtin Functions Useful for Classes
h.
Overloading
i.
Pseudo-private Variables
8.
Exceptions
a.
The Builtin Exceptions
b.
Catching Exceptions: Try and Except Code Blocks
c.
Catching Exceptions: Else and Finally Code Blocks
d.
Throwing Exceptions: the Raise statement
9.
File I/O
a.
The File Class
b.
Attributes and Methods of File
c.
Iteration using File
10.
Modules
a.
Import Statement
b.
Namespace Considerations and Aliasing
c.
Use of a Module as a Program
Preface and Disclaimer
The author is still learning the Python language. Since this is about the
fifteenth programming language he has used (counting all assemblers as one),
the programming style used in the examples herein is probably not exactly
what an expert Python programmer would use. Nor does he always express
his algorithms using idiomatic Python. And there are undoubtedly many
subtle features of Python which this introductory class will not cover.
The author's most recent programming experience has been in C++ and Java.
When analogies between Python and some other language are appropriate,
these analogies will be to those two; the assumption being that since these are
the two most widely-used languages at the moment, the audience is most likely
familiar to one or the other. However, expertise in either of these languages is
not a requirement for this course.
All examples and code fragments have, however, been checked for correctness
using the Python 2.6 development environment for Windows.
1. Where Python is similar to other programming languages
Python is a procedural language, like pretty much every computer language since
FORTRAN and COBOL. It consists of statements, usually organized into functions and/or
classes. Statements usually specify operations applied to operands which may be constants
or variables. Python uses data type which would be familiar to anyone proficient in
high-level languages newer than -- say -- Pascal.
Identifiers in Python follow the same syntax a C-like languages: they consist of one or more
letters or underscores followed by zero or more letters, digits or underscores. Identifiers are
case-sensitive.
Keywords in Python are reserved identifiers. Like all but a few languages (PL/1 springs to
mind), keywords are reserved in all contexts.
Operators in Python are similar to the operators in C, though not all of the C operators are
present in Python and a couple are spelled differently.
In Python, the constants for integer, float, string and boolean values follow the same syntax
as C++(except that the string constants can be delimited with either single-quote/apostrophe
or double-quote characters and the boolean constants are spelled True and False).
The usual whitespace characters (space, tab, newline) are recognized and delimit tokens
(except when they appear in string literals). Line continuation in Python is performed by
ending the current line with a backslash character, as in C.
Comments in Python are defined as from the mesh character (#) to the end of the line.
Therefore, one may have right-comments or full-line comments but there is no specific way
to have a block of text commented save by beginning each line with mesh.
Python can be organized into modules and packages which may be included (imported) into
a program in whole or partially. This capability is similar to the capabilities which exist in
Java.
2. Where Python is different from other programming languages
Firstly, Python is very sensitive to the formatting of its lines. In Python, the indentation of
a statement defines block structure. Failure to indent or indenting by an incorrect distance
is an error -- and the error may either be caught as a syntax problem or it may be
syntactically acceptable but cause a logic bug.
Python is an interpreted language, like BASIC. This means that you will only be told about
the first syntax error in your code when you attempt to run it. Where "first" is the first one
the interpreter encounters -- not necessarily the first one in statement-number order. If you
are only used to compiled languages (and that includes Java, even though it is technically
interpreted by the JRT), you may find this frustrating. Like Java, Python has a garbage
collector; there is no explicit way to allocate or free storage.
Variables in Python are defined from context. How they are assigned and from what will
determine their type. This includes arguments to functions and values returned from
functions. The one exception to this is the definition of an instance of a class.
The above idiosyncrasy implies a strange scope rule: global variables are not visible within
functions or classes and will be overridden by a local assignment to an identifier of the same
name. (There is a keyword which must be used to allow access to globals, though.)
Additionally, class instance attributes have to be qualified even within methods of the class;
this is like having to use this for all attributes in a C++ class.
3. A slice of cheese
Okay, I couldn't resist that. You see, Python (the name, not the language) comes from the
BBC TV comedy show Monty Python's Flying Circus and the official Python package
repository is called The Cheese Shop after one of the more famous sketches from that series.
Here's a simple Python program to print the first ten prime numbers:
primes = []
def is_prime(n):
"""
Determine if the given value can be evenly divided
by any of the previously-found prime numbers. If
so, return False. Otherwise, the given number is
a prime -- append it to the list of primes and
return True.
"""
global primes
for k in range(len(primes)):
if (n % primes[k] == 0):
return False
else:
primes.append(n)
return True
n = 2
while (len(primes) < 10):
is_prime(n) # ignore return value; we don't need it.
n = n + 1
for k in range(len(primes)):
print str(primes[k])
Figure 3-1: Print the first ten prime numbers
This simple program demonstrates many of the charming [sic] features of the Python
language: global and local scope, extensible arrays, enumeration, indefinite looping,
function definition and simple console output with type conversion. The result of
running the above program is:
2
3
5
7
11
13
17
19
23
29
output to the console (stdout).
Okay. Now that we've seen Python in action, the next thing you probably want to do is get
your own Python development environment. The best place to start is probably
http://python.org/download -- this is where you can download Python packages for Windows
and Mac OS/X. Just follow the instructions for the particular package you need.
I use the Windows package, myself. This includes an IDE called Idle which has a
syntax-sensitive editor and a debugger.
Get the version 2.x package rather than the version 3.x one -- the 3.x Python is newer and
not-entirely-compatible with 2.x and most of the Python code currently in circulation is 2.x.
Oh yes, I should mention that the download is free.
4. Intrinsic types
Python has nine intrinsic data types: integer, float, complex, boolean, string, list, tuple, set
and dictionary. Let's examine each one in turn.
Integers are probably boring. They are the familiar twos-complement 32-bit binary
numbers that are the meat-and-potatoes of data. There are no surprises here. Nothing to
see... move along... move along... Oh, wait! Long integers can also be defined and are
limited only by available memory. So go ahead and count all the permutations of the
quantum states of every proton in the universe if you want.
Floats are almost equally boring. The intrinsic floats are represented in binary as
double-precision float values in the underlying implementation (usually the same IEEE
standard as Standard C uses these days). There is no such thing as a single-precision float;
all floats are equivalent to the Standard C double type.
Complex is a built-in type in Python. Complex quantities have both a real and an
imaginary part and both parts are floats. Having this type built into the language is
unusual, but very nice to have if you are working with analog electronic circuit problems, for
example.
Booleans can only have values of True or False. When printed, they have the values "True"
and "False", but are otherwise identical to integers having the values 1 and 0, respectively.
Strings are arrays of characters. They come in two flavors: ASCII and Unicode. Python
has no character data type; a string of length one serves the purpose instead. Strings may
be indexed by character position to retrieve individual characters or by slicing to retrieve
substrings. (Note that these are read-only operations; strings are immutable sequence
objects, so to assign new values to one or more characters within the string requires a
different method.) A string literal is more like a Pascal string than a C string, in that it has
a length attribute as opposed to containing a terminating character value such as '\0'.
Lists are one-dimensional arrays of objects. You can have a homogeneous list where all
elements are the same type of object or a heterogeneous list where the various elements are of
different types. Elements of lists are accessed as in C, using an index or by the slice
expression. The number of elements and even the type of each element in a list can be
changed at will during execution.
Tuples are multi-element objects which are always manipulated as a single entity. A tuple
can only have the number and type of elements it was defined with. It is possible to change
the values of the elements in a tuple, but you cannot change the number of elements or their
types once the tuple is defined.
Sets are a kind of immutable sequence where no two elements can have the same value. You
can perform operations like union and intersection on sets.
Dictionaries are arrays of key / value pairs where key and value can both be any type but the
dictionary will always contain only unique keys. The value is retrieved, set or changed by
addressing the dictionary with a key. Dictionary keys are immutable objects. Dictionary
values are mutable.
5. Language elements
Identifiers in Python are character sequences of unlimited length which begin with either an
underscore or a letter and optionally may contain underscores, letters or digits. Identifiers
are case-sensitive. By convention, identifiers which begin and end with underscore are
reserved for the Python implementation and identifiers which begin with two consecutive
underscores are private to the class in which they appear. Your particular Python
implementation may or may not enforce these conventions.
Python reserves a set of identifiers as keywords. These are:
and
continue
except
global
lambda
raise
yield
as
def
exec
if
not
return
assert
del
finally
import
or
try
break
elif
for
in
pass
while
class
else
from
is
print
with
Table 5-1: Python keywords
Because these keywords are reserved, you may not use an identifier of the same spelling for
your own purposes.
Python allows you to define constants for creating values in many of the intrinsic types:
integer, float, string and complex. Here are examples of integer literals:
1
0177
0
123
12345678
1L
32768
0o177
0x7F
0b0110
0x3FFFFFFFFFL
162534173946247485950607867563534363785959676
Float literals can be specified either in float or exponent notation:
1.023
0.4
6.02e23
2.99792e8 8.98755430563e20
1073.2
7.07e-1
3.1415926
.707
0.0
note that an integer can also be interpreted as a float literal: 12 and 12.0 are the same float
literal when used in a context where a float value is required. (Warning: assignment is not
such a context; x=1 defines x as an integer. But x = 3.7652 / 14 would define x as a
float.)
Imaginary literals are used to define complex numbers. An imaginary literal is a complex
number whose real part is 0.0. Imaginary literals are just float literals with a suffixed j or J:
1.023j
0.4J
String literals come in four flavors. The "vanilla" string literal encodes ASCII-7 character
sequences and follows the backslash escape sequence conventions of C. The other flavors are
formed by preceeding the leading string delimiter with a type code prefix. If you are
familiar with standard C, the ASCII-7 string literals are pretty much the same except that
there are four possible delimiters you may choose from: the apostrophe ('), the double-quote
("), three successive apostrophes (''') or three successive double-quotes ("""). Whichever
delimiter you start with determines the one you must end with; embedded characters which
aren't an exact match for the initial delimiter need not be escaped. Here are some examples:
''
# empty string delimited with apostrophe
"'" # an apostrophe, delimited with double-quotes
"\n" # newline
""" This string's using three " characters to avoid needing
escapes"""
The backslash escape conventions are nearly the same as Standard C:
\a
\b
\f
\n
\r
\t
\v
\'
\"
\\
\123
#
#
#
#
#
#
#
#
#
#
#
#
\x7B #
#
alarm (bell)
backspace
form-feed
newline (line-feed)
carriage return
horizontal tab
vertical tab
apostrophe
double-quote
backslash
the character whose value is octal 123
(use 1 to 3 digits)
the character whose value is hexadecimal 7B
(always use 2 digits)
Table 5-2: String literal escape sequences
Unlike Standard C, a backslash followed by an unrecognized character results in the
backslash and the character which follows being left in the string. (I.e.: 'a\pe' results in
a\pe, not ape.)
Unicode string literals are similar to the vanilla string literals except that the character set is
Unicode rather than ASCII. The type code prefix for Unicode strings is the letter u (either
lower- or upper-case). The escape convention in unicode strings is a bit richer than in
ASCII-7 strings, in that the escape sequence \u1234 encodes the 16-bit unicode character
whose value is hexadecimal 1234 and \U12345678 encodes the 32-bit character whose
value is hexadecimal 12345678.
Raw string literals are byte sequences. The escape conventions do not apply. The type
prefix for raw strings is the letter r (either lower- or upper-case). The raw string can encode
any character sequence except one which ends in an odd number of backslashes. (IMO, this
is an implementation bug.)
Unicode raw strings are also possible. The type prefix for them is the two letters ur (either
case, but in that order). Again, backslash escapes are not applicable but the character set is
Unicode.
String literals can be implicitly concatenated by juxtaposition or being separated only by
white-space, as in C++.
Python implements several predefined constants. While these are not precisely keywords,
you should probably consider them as such; you cannot assign anything to them:
None
NotImplemented
Ellipsis
True
False
Table 5-3: Predefined constants
Python uses a rich set of operators which are similar to the set used by C-like languages.
They are:
+
*
/
**
//
%
<<
>>
&
|
^
~
<
>
<=
>=
==
!=
add (also string concatenation and unary plus)
subtract (and unary minus)
multiply
divide
exponentiate
integer quotient (floor) division
remainder of division (modulo)
left shift
right shift
bitwise AND
bitwise OR
bitwise XOR
unary bitwise complement
less than compare
greater than compare
less or equal compare
greater or equal compare
equal compare
unequal compare
=
+=
-=
*=
/=
**=
//=
%=
<<=
>>=
&=
|=
^=
assignment
assign by adding ( a += b and a = a + b are equivalent)
assign by subtracting
assign by multiplying
assign by dividing
assign by exponentiation
assign by integer quotient division
assign by modulo
assign by left shift
assign by arithmetic right shift
assign by AND
assign by OR
assign by XOR
Table 5-4: Operators
There are also delimiters and qualifiers:
[]
()
{}
:
.
,
`
@
;
list element indexing and grouping
many uses, mostly as in standard C
groups dictionary element list
many uses
qualifies a reference
separates elements in a list of tokens
delimiter for test_list
"decorator" mark
separates multiple statements on the same line
Table 5-5: Delimiters and qualifiers
If you're keeping score, there are two characters left in the set of printable ASCII symbols for
which Python has not defined a use: The question mark and the dollar sign. These two
characters are illegal except within a string literal.
There is no variable declaration syntax in Python. To define a variable, you simply assign
an expression (or literal) to an identifier using the = assignment operator. This assignment
will determine the value which the identifier holds and also its type. If the identifier had
previously been assigned (where "previous" means earlier in the same scope, as we will
emphasize when we get to the topic of scope rules), the old value is replaced by the new one
and the identifier's type may also mutate. In plain English,
a = 1
# a is an integer having a value of 1
a = 1.41 # a is now a float having a value of 1.41
a = 'abc' # a is now a string having a value of 'abc'
This madness is perfectly implementable because Python is not compiled and therefore
subsequent statements have no preconceptions about what a particular identifier's type is
and therefore can deal with any type for which the statement is still legal syntax. And given
that Python performs garbage collection, the "old" versions of the identifier (which can no
longer be retrieved) will eventually be returned to free memory. So this kind of code, while
it may become confusing and hard to maintain, is at least not terribly untidy.
Expressions in Python are very similar to expressions in Standard C, except because of the
"exotic" data types there are some interesting expression forms whose result evaluates to one
or another of the exotic types. The exotic-form expressions result in list objects, tuples, sets
or dictionaries.
The simplest of these is the one which produces a tuple. This expression consists of a
comma-separated list of expressions surrounded by parenthesis. (The strict grammar rule
allows for one expression followed by a comma within the parenthesis. This forms a
special-case tuple called a singleton. Another allowed form is an empty set of parenthesis.
This creates a tuple with no elements.) The result is a tuple having as many elements as
there are expressions in the list and having values which are the results of each expression at
the time the tuple is constructed. The + operator is "overloaded" for tuples to implement
tuple concatenation: if a and b are tuples, a+b is a tuple containing all the elements of a
followed by all the elements of b. Individual elements of a tuple may be accessed using the
same syntax that C, C++ or Java uses to access an element of an array:
the_third_element_value = tuple[2]. But since a tuple is an invariant sequence,
it is not legal to attempt to assign a new value to an element of a tuple. In other words, a
reference to a tuple element cannot appear on the left side of an assignment statement.
A list object is produced by an expression consisting of a comma-separated list of expressions
surrounded by square brackets. (Again, the grammar allows for single-element and empty
list objects.) Concatenation of list objects is accomplished with the + operator as well. Like
tuples, an individual element is referenced using C subscripting notation, e.g.:
the_ninth_element_value = list[8]. Unlike tuples, however, the elements of a
list object may be the target of an assignment, e.g.: list1[15] =
new_value_for_element_16. As in C-like languages, the first element of a list has an
index of zero.
Elements of list objects (and characters of strings) can also be referenced using a slice
expression. This expression is described by the production:
optional_start_position : optional_end_position
Referencing a list or string this way returns a subset of the object consisting of all elements
from the start_position up to but not including the end_position. If the
start_position expression is omitted, the slice starts with element zero; if the
end_position expression is omitted, the slice extends through the last element of the list
(or the last character of the string).
Dictionaries are produced by an expression consisting of a comma-separated list of key-value
pairs surrounded by curly braces ( '{'and '}'). A key-value pair consists of an expression, a
colon character and a second expression. The first expression is the key; the second is the
value. To obtain the value in a dictionary, you provide the key and the dictionary
auto-magically finds the corresponding value. This means that dictionary keys must be
unique. Again, you can create an empty dictionary with an expression consisting of just the
braces. You can create a dictionary containing a single key-value pair by specifying the
element with or without a trailing comma.
Dictionary values are referenced using C-like subscripting notation where the value in the
brackets is one of the dictionary's key's value. Values can be the target of an assignment,
e.g.:
answers['life, the universe and everything'] = 42
If the dictionary does not have a key with the same value as what appears in the brackets, a
new key and value is added to the dictionary; if the key already exists, the corresponding
value is altered. Warning: the key is compared by value, with all the usual type conversion
rules applied. So a key having a value of 2 will be matched with a lookup for 2, 2.0 or even
2+0j. Unlike tuples and lists, concatenation of dictionaries with + is not defined. (But
there is still a way to do concatenation, as we shall see later.)
As in C, multiple assignment is possible: a = b = 42 defines both a and b as integers having
the value 42. I don't recommend using this syntax because while the above assignment will
do what you expect, an assignment of a = b = [42, 43, 44] will not necessarily create
separate copies of the three-integer list; if the implementer desires, it is perfectly permissible
to simply have a and b point to the same list. Which means that this code
a = b = [41, 42, 43]
b[1] = 76
c = b[1] - a[1]
might or might not produce the same result for c as this code
a = [41, 42, 43]
b = [41, 42, 43]
b[1] = 76
c = b[1] - a[1]
In implementations where the multiple assignment results in distinct objects, c will have a
value of 34 for either code fragment; if the implementation makes a an "alias" for b,
however, then c will be zero in the first case and 34 in the second. Best to avoid multiple
assignment and ensure that your code isn't going to fail in implementation-dependent ways.
But programs are more than just calculations. We often need to make the computer
conditionally perform operations. This is often called control-of-flow. The simplest
control-of-flow construct is the if statement: this statement tests some condition and, if the
result of the test is True, executes some block of code. We can get fancy and define another
block of code to be executed instead.
The above paragraph introduces a new concept: a block. A code block is just a series of
statements indented to the next level. The block ends when one encounters a statement
indented to a previous level. Python has no begin / end keywords or operators, it simply
keeps track of the number of leading spaces a statement has.
The Python if statement looks like this:
if x == 0:
# do something when x is zero
elif x > 5:
# do something else if x is greater than 5
else:
# do this when x is neither zero nor greater than 5
The else and elif clauses are optional. All elif clauses must preceed the else clause if both are
present.
Nesting of if statements is permitted and because indentation determines the block structure,
the "dangling else" problem which exists in some languages (which if does the else close?) is
not an issue in Python.
if first_name == 'Jim':
if last_name != "Jones":
pass
else:
drink_koolaid()
elif first_name == "Mark":
if last_name == """Knopfler""":
play_guitar()
elif last_name == "Twain":
write_humor()
else:
pass
else:
print "I don't know this person.\n"
Figure 5-1: Use of nested if statements
(I know there are several language elements we haven't discussed in the above code fragment.
We'll get to them eventually...)
If you are coming to Python from C or C-like languages, you will probably want to
parenthesize the logical expression. That's okay, but it's not necessary. Even if you use the
logical operators and, or and not in the expression, parenthesis is only needed to override the
natural left-to-right evaluation and multiplicative distribution rules of the language. So you
can make complex expressions like
if not x == 3 and y <= 5 or z == 0:
# do what you need if both x is not 3 and
# y is less than or equal to 5 or if z is zero
# which is not the same as
if not x == 3 and (y <= 5 or z == 0):
# do what you need if x is not 3 and either
# y is less than or equal to 5 or z is zero
# which is not the same as
if not (x == 3 and y <= 5) or z == 0:
# do what you need if z is zero or unless x is 3 and
# y is less than or equal to 5
# which is not the same as
if not (x == 3 and y <= 5 or z == 0):
# which is the opposite of the first expression
Figure 5-2: Complex tests using logical operators
Enough said about that. Using complex logical expressions or using nested if statements to
achieve a multiple test is a matter of taste. Some people like to sprinkle and, or and not
throughout their code; others never use the keywords at all. Most of us split the difference,
choosing whichever strategy seems clearer at the time.
The next control-of-flow statement is for. This is either the most insanely great or greatly
insane feature of Python. If you are used to almost any other language's for statement, this
one is going to look decidedly weird. Formally, the for statement is defined as:
for_statement ::= for target_list in expression_list :
block [else: block]
Perhaps the most common form is the simple definite iteration, which is spelled
for i in range(10):
# do something 10 times while i increments
#
from zero through nine
The range(10) part uses the built-in range function (which we'll see again later) to create
the iteration -- in this case a list of the integers 0 through 9. So we could also have coded
that statement as
for i in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]:
though it would have been silly to do so when the interpreter can do a better job of getting
the list right than a human typist could. But this exposes the fact that the for statement's
"in a list" is a lot more powerful than the FORTRAN do or even the C-like languages' for.
Here's a for that's relatively easy to code in Python and would be very tough in any of the
C-like languages:
for i in [-5, -1, 0, 1, 5]:
But wait... There's more! Here's an example from a program that needed to extract data
from a table of constellations. The data structure consisted of a dictionary whose key was a
3-letter name and whose value was a list of coordinates in two dimensions. This was the pair
of for statements that got the coordinate pairs for the purpose of drawing lines:
constellations = { 'And': [ (0, 2, 4, 6),
(8, 10, 12, 14) ],
'Ori': [ (-1, -2, -3, -4),
(0, 0, 0, 0),
(5, 6, 7, 8, 9),
(10, 11, 12, 13) ]
}
# the above is shorter than the actual data structure
# and does not contain actual data values but it is
# correct in terms of structure.
for name, coords in constellations.iteritems():
for line_seg in coords:
# draw the line.
Figure 5-3: Draw constellation code fragment
again, apologies for again using built-in functions (in this case iteritems) without prior
explanation. Suffice to say iteritems returns a list which is the key and value from a
dictionary, provided in an order that is determined by lexigraphically sorting the keys. The
iteritems function allows you to "walk" the dictionary without knowing how many
key-value pairs it contains or even which keys are defined. The way I use the second for
statement lets me walk a list without knowing its length. (Note also the use of tuples to
ensure that we have a single object which completely describes the line to be drawn.)
The for statement can take an else clause; this clause is executed when the enumeration list is
exhausted. We saw this usage in Figure 3-1.
The next control-of-flow statement is while. This is like the while statement of C-like
languages:
while x < 12:
# do something (which had better eventually make x >= 12.)
The test of the while statement is evaluated. So long as the result of the evaluation is True,
the subsequent code block is executed. When the test fails, the loop is broken. The while
statement also can have an else clause. If the test is (or becomes) False, the else clause is
executed.
As in the C-like languages, Python has the break and continue statements. These
statements affect how loops are executed. When a break is encountered in the looping of a
for or while statement, the loop terminates immediately without executing any more
statements from within the loop's code block. Also, an else clause, if present, is not executed
when a loop terminates because of a break. A continue, on the other hand, causes control to
pass immediately back to the test of the while or the enumeration of the for. (The sensible
use of break and/or continue is as part of an if statement contained within the loop's code
block.)
Here's an example. This code examines the elements of a list of (presumably) integers and
prints all the odd numbers so long as none of the elements is zero:
for i in range(len(the_list)):
if the_list[i] == 0:
break
if the_list[i] % 2 == 0:
continue
print the_list[i]
Figure 5-4: Use of break and continue
We could have added an else clause to the for statement to report that we had scanned the
whole list without finding a zero value if we wanted to.
(Again, I'm using a built-in function, len, without having previously introduced it. The len
function returns the number of elements contained in a sequence. So
range(len(the_list)) creates an enumeration list that causes the variable i to go
from zero to the maximum legal index for the_list.)
We've actually seen the pass statement already in one of the examples. This statement is a
"nop" -- it does nothing. The pass statement is needed whenever a construct requires a code
block but there is nothing which should be done. Typically, this happens when, as in the
example, an if statement should do nothing if the test is True but must do something when
the test is False and re-writing the test to the opposite sense is not desired. The pass
statement may also serve as a place-holder for a code block to be defined as the program
evolves.
In Python, the ability to output (to stdout, by default) is built into the language rather than
provided by a library routine, as in C. Output is performed with type conversion when
necessary so that the result is readable. The keyword that does this magic is predictable:
print. A print statement consists of the keyword followed by a comma-separated list of
expressions. Each expression is evaluated and the result converted to readable form and
output. A newline is appended after the final expression unless the expression list ends with
a comma -- in which case, the various expressions in the list are separated with a single space.
(If you've been paying attention, you'll notice I've already used a few simple print
statements.)
The exec statement lets you execute an object as Python code, optionally defining its context
(environment). In its simplest form, exec takes a single expression (either a string or an
opened file) and passes this to the Python execution engine. The text is interpreted as
Python code and runs in the current scope (meaning that the executed code has access to the
same set of variables that it would have had if the code had been inserted in the program text
at the same point).
The more advanced forms of the exec statement allow one to map the references within the
executed code to the namespace of the program in which the exec statement appears -- this
means that when the exec runs code referring to x and y, the code may actually be using the
containing program's variables a and b. That allows the exec code to be re-used in multiple
context. Here's an example of just such a strategy:
person = 'Sam'
exec """print 'My name is', name""" in {'name' : person}
(There is another form of exec which takes a second dictionary to distinguish global variable
namespace translation from local variable namespace translation. But it's rarely necessary
to use this feature. Separate the second dictionary from the first with comma.)
Having the exec statement lets one implement a simple Python demonstration environment
in Python: one would write a loop which would read a line of console input and then pass the
line as a string to an exec.
6. Functions
Okay, we've seen a lot of code fragments but we haven't seen much that looks like programs
yet. So let's start getting familiar with the parts of Python which make it more than a fancy
calculator. We'll begin by learning to define, call, pass parameters to and get results from
subroutines. Subroutines in Python are called functions and are introduced by the def
(which stands for define) keyword. A function must be in scope when it is called.
(Functions can contain function definitions, so it's possible for a function definition to go out
of scope.)
Because Python does not declare types, a function's definition does not specify the types of
the function's arguments nor the type of the function's return value. As with variables,
these are defined by the context in which the function is called. Formally, a function
definition is:
def funct_name (optional_param_list):
code_block
The parameter list can be a C-like comma-separated list of identifiers or it may include
Java-like assignments to specify default values for arguments. As in Java (and in C++),
parameters with default values must come after all mandatory parameters.
When a function is called, the list of argument expressions is evaluated and the values are
matched to the function's parameters by position. This means that all mandatory
parameters must be satisfied and if any parameter with a default value needs to receive a
different value then all parameters to the left of that one must also have been given explicit
values. In other words:
def a_func(arg1, arg2=0, arg3=1, arg4=2):
print arg1, arg2, arg3, arg4
a_func()
a_func(7, ,9)
a_func(7)
a_func(7, 0, 9)
#
#
#
#
illegal -- arg1 is mandatory
illegal -- no way to skip an argument
valid
valid
Figure 6-1: Use of default argument values
However, Python allows you to call functions with keyword argument syntax instead of
simply by positional relationship. In this type of call, your argument list consists of
assignment expressions where the identifier which is the target of the assignment is spelled
the same as the formal parameter which is intended to receive the value. So the
problematical assignment of one default parameter without having to know the default
values of all the parameters to its left is resolved. We can use any of the following calls:
a_func(arg1=7, arg3=9)
a_func(7, arg3=9)
a_func(arg3=9, arg1=7)
to assign 7 to arg1 and 9 to arg3 without having to know the default value of arg2.
(Note that the second call mixes keyword and positional argument syntax; when you do that,
all positional arguments must come before any keyword arguments and the keyword syntax
must not attempt to describe the same argument as the positional syntax corresponds to.
It's probably best not to mix the argument style -- if you need to use keyword arguments at
all, use them for every parameter.)
Default value expressions are evaluated once when the interpreter encounters the function
definition. Therefore, although the expressions are permitted to refer to other identifiers,
those identifiers must themselves have been assigned values before the def statement is
encountered and changes to the values of those identifiers subsequently will have no effect on
the default values. This makes it all the more urgent to use the keyword form of the
argument list -- even if you know the expression which defined the initial value of an
argument, the expression may not evaluate to the same value at the time of the call as it had
when the function's definition was encountered.
There are additional forms of the def statement's parameter list which allow for the
equivalent of the ellipses (unknown number of parameters) found in the C-like languages.
These exotic forms will not be covered in this introductory class. If you find yourself
needing this capability, see the following URL:
http://docs.python.org/tutorial/
controlflow.html#arbitrary-argument-lists
Function definitions may be prefixed by "decorators". We won't discuss them in this class,
either. (Since I haven't actually figured out what they’re useful for yet.)
Within the block of the function may be nearly any kind of statement, including function
definitions. Execution of the function terminates when the interpreter "runs off the end of
the block" (which is detected because the indentation level decrements) or when the
interpreter encounters a return statement. All functions return some value, but unless the
function terminates with a return that specifies an expression, the value returned is the
rather-less-than-useful defined constant None. Returning a useful value requires an explicit
return statement which specifies an expression. Example:
def the_answer(question):
if (question == "life, the universe and everything"):
return "Forty-two."
else:
return "Reply hazy, try again later."
Figure 6-2: Use of return statement
Having now seen that we can create functions, it's now time to introduce the topic of scope in
Python. It's essentially identical to the scope rules in C-like languages, except for the fact
that local identifiers are defined by assignment instead of declaration. Unlike C-like
languages, identifiers defined outside the function (even if defined before the function’s
definition) are invisible within the function.
One can override this rule by using a global statement within the function's block. Any
identifier mentioned in the global statement (if it is defined, of course) becomes read-write
accessible to the function. And any assignment within the function's block makes a change
to the global identifier instead of defining a local variable. For example:
i = 3
def x():
global i
print i
i = 5
print i
print i
x()
print i
Figure 6-3: An example of the global statement
will result in:
3
3
5
5
as the variable i used within the scope of x() is the global variable i and not some local
variable. Also, unlike C++, an “inner” block (i.e.: a block that is not the body of a function
but rather the body of an if, for, or while) does not create a new scope, so:
def x():
i = 1
print i
while (i == 1):
i = 2
print i
print i
x()
Figure 6-4: Inner blocks do not alter the scope
results in the following output:
1
2
2
The one exception to the global scope rule is that references to functions (and, as we shall see,
classes) that are in a scope level “above" that of the function will resolve correctly. So:
def x():
i = 1
print i
def y():
x()
i = 2
print i
x()
y()
Figure 6-5: Scope of functions
results in the following output:
1
1
2
as expected and not an “uninitialized identifier" error.
Another feature of functions (and classes, as we shall see later) is that if the first statement in
the block is a string literal, the interpreter automatically assigns it to a special identifier
named __doc__ (recall I said earlier that identifiers that begin and end with underscore are
reserved for the implementation and that variables that begin with two underscores are by
convention private). Should you use the docstring feature, there are tools which will extract
the string and expect it to be the function / class description. And you may also get the
docstring yourself by using a qualified reference:
def x():
"""
This is the docstring for function x(). Since x()
doesn't do anything, the __docstring__ of x() is
pretty boring.
"""
return
print x.__doc__
Figure 6-6: Obtaining and printing a docstring, qualified names
Notice the use of a qualified name in the print statement. This syntax is the same as in C++
or Java: the leftmost part is the highest-level containing object; the rightmost part is the
desired variable; any intervening parts are sub-containing objects in hierarchical order. The
qualifier symbol is the period character. If there were more functions, you could print each
docstring by just prefixing with the name of each of the functions in turn.
When we get to classes, we're going to wind up using qualified names a lot.
Python has a plethora of builtin functions. There is no way I can take the time to cover
them all. But some are so commonly used that I must do more than simply list those. So
I'll first present a list of the builtin functions and then I'll pick a dozen or so to describe in
detail.
Unlike keywords, the names of builtin functions are not reserved. You are permitted to
define your own function named len() if you like, but when you do, the builtin function of
that name will become unavailable in that scope. Since len() is a very commonly-used
builtin, you probably don't really want to override that name. This is why I am going to list
all the builtin functions even though I won't be covering all of them in detail: it's better to
know the entire list of names to avoid. Anyway, here's the Python builtin function zoo:
abs
bin
classmethod
delattr
enumerate
filter
getattr
help
int
len
map
object
pow
raw_input
reversed
slice
sum
unichr
zip
all
bool
cmp
dict
eval
float
globals
hex
isinstance
list
max
oct
print
reduce
round
sorted
super
unicode
__import__
any
callable
compile
dir
execfile
format
hashattr
id
issubclass
locals
min
open
property
reload
set
staticmethod
tuple
vars
basestring
chr
complex
divmod
file
frozenset
hash
input
iter
long
next
ord
range
repr
setattr
str
type
xrange
Table 6-1: The builtin functions
Note: the builtin function print is new with Python 2.6 and you must take special action to
activate it as the print keyword normally "hides" this function’s name.
The abs function takes the absolute value of the single numeric argument it is given. The
numeric argument may be an integer, a long integer, a float or a complex type. If the
argument is a complex, the function returns the magnitude of the complex (square root of
the sum of the squares of the real and imaginary part).
The bool function casts its argument to a boolean value (True or False) according to the
following rules:
1. If no argument is passed, the function returns False.
2. If the argument is numerically equal to zero, the function returns False. If the
argument is non-zero, the function returns True.
3. If the argument is a string (any kind) and is empty, the function returns False. For
non-empty strings, it returns True.
4. If the argument is a complex and its magnitude is zero, the function returns False.
For complex values with non-zero magnitudes, the function returns True.
5. If the argument is a set, tuple, list or dictionary, the function returns False if the
object contains no elements and returns True for an object which is not empty.
The chr function returns the ASCII character corresponding to the function's integer
argument. The argument must have a value in the range 0 through 255.
The cmp function takes two arguments and performs a comparison between them. The
function returns an integer whose value is zero if the two arguments are equal, negative if the
first argument is less than the second and positive if the fist argument is greater than the
second. (Comparisons of non-numeric values are performed lexigraphically; if two lists are
of differing length, for example, and their elements are equal up to the point where the
shorter list is exhausted the longer list is considered greater than the shorter list. (One
should not expect the integer being returned to make any kind of mathematical sense; use
only the fact that it is negative, zero or positive for program logic.)
The complex function casts its arguments to a complex number, where the first argument
becomes the real part and the second becomes the imaginary part. If only one argument is
passed, the result is a complex with a zero imaginary part; if no arguments are passed, the
result is a complex with a magnitude of zero.
The divmod function returns a tuple consisting of the integer quotient of dividing the first
argument by the second and the remainder of the division. For integer arguments, the
result of x = divmod(a, b) is the same as the expression x = (a // b, a % b).
The enumerate function returns a list of tuples each of which consists of the ordinal and
value of the element of the sequence which is the function's argument. That's a mouthful, so
I think a code example is necessary:
list = ['apples', 'bananas', 'cantaloupes', 'dates']
for (i, fruit) in enumerate(list):
print i, fruit
Figure 6-5: Fruit salad
The above example will print
0
1
2
3
apples
bananas
cantaloupes
dates
(In Python 2.6, the enumerate built-in function was extended to accept an optional second
argument which specifies the starting index for the enumeration. If your Python
environment doesn't support this feature, you can always get the same effect by using a slice
operation.)
The execfile function provides the same capability as the exec statement except that
instead of executing the first argument (a string) as Python code, the function's first
argument is taken to be a pathname to a file which (presumably) contains Python code.
Like the exec statement, execfile can take optional arguments which will be the global
and local variables of the scope context in which the function is invoked.
The float function converts its argument (a string) into a float value. If you have used
the C standard library function atof then this will look familiar.
The globals function returns a dictionary containing the global symbols from the symbol
table of the module that contains the call to globals. This function creates an object
which is suitable for use in an exec statement or as the second argument to an execfile
call, though to make use of this dictionary, the Python code in the string (or file) being
executed would have to be using the same names for the global variables as the caller. The
dictionary contains key / value pairs in the form 'name' : value where name is the identifier of
a global variable and value is what that variable was set to when the globals function was
called.
The hex function converts its integer (or long integer) argument to a string representing that
value as hexadecimal. The string begins with '0x' and uses lower-case letters for the
high-value digits.
The id function returns the unique integer value which is the interpreter's symbol table
target corresponding to the identifier which was given as the function's argument.
Remember that issue with multiple assignment? Well, the id function could have shown us
whether or not a = b = [1, 2, 3] was dangerous because we could have simply asked
to print id(a) == id(b). If the result was True, we had an "aliasing" situation; if the
result was False, the multiple assignment was safe.
The input function reads a Python expression from stdin. You may optionally pass a
string as the function's argument. If you do, the string is output as a prompt. (You will
probably want to end the string with a space, as input begins in the column immediately
after the prompt.) This function is somewhat dangerous because whatever the user types is
evaluated as a Python expression. So your program may fail with a syntax error exception
if, for example, you want string input and the user forgets to provide proper string
delimiters. But for quick-and-dirty input from an expert user (like yourself -- you never
make mistakes, do you?) this function will at least allow interaction with a Python program.
But the raw_input builtin function (see below) is a safer choice for obtaining input from
humans.
The int function accepts as its argument a string which it will attempt to convert to an
integer. An optional second argument can specify the radix for conversion (default is
decimal, but any value from 2 through 36 is permitted). If the radix is zero (a special case),
Python attempts to determine the radix from context (e.g.: if the string contains an 'f',
Python will probably assume hexadecimal).
The len function, as we have seen in a couple of examples already, returns an integer which
is the number of elements in a sequence object such as a list or string.
The list function takes as its argument any sequence object and makes a list of its
elements. If given a string, for example, the result will be a list of one-character strings
corresponding (in order) to each character of the given string.
The locals function is similar to the globals function we saw already. The returned
value is a dictionary of the names and values of the the currently-defined local variables.
The long function constructs a long integer from its argument(s), either a string or a
number. If the argument is a string, an optional second argument specifying the radix
integer can be given.
The max function returns the largest value of all values given by the function's argument list.
The arguments passed can be either multiple discrete values or a single argument which is a
sequence. (If the latter, each element of the sequence object is evaluated to determine the
largest.)
The min function returns the smallest value of all values given by the function's argument
list. The arguments passed can be either multiple discrete values or a single argument which
is a sequence. (If the latter, each element of the sequence object is evaluated to determine
the smallest.)
The oct function is similar to the hex function we have already seen. It accepts an integer
argument and returns a string that is the representation of that integer in base-8 (octal)
notation. The returned string has the "0o" prefix.
The open function is not entirely unlike the Standard C library's fopen. The function takes
a string which is interpreted as a pathname, a second string which specifies the I/O mode to
be applied to the file and an optional third argument which is an integer specifying the buffer
size. If the file exists (or can be created if the mode argument allows that), the open
function returns a file object. Possible values for the mode string are:
 'r' - open for read, converting newlines to '\n' if required.
 'rb' - open for reading binary (no conversion).
 'r+' - open for reading and updating 9allows random access to the file).
 'r+b' - open for reading and updating in binary.
 'w' - open for write, create if required else truncate and convert '\n' to the
platform-specific representation for a line ending.
 'wb' - open for write, creating if required else truncate for binary.
 'w+' - open for write and updating (truncate but then allow random access as the file
is written).
 'w+b' - open for write and updating in binary.
 'a' - open for write-append (does not truncate).
 'ab' - open for write-append (does not truncate) for binary.
 'a+' - open for write-append and random access.
 'a+b' - open for write-append and random access for binary.
The buffer size argument is a positive integer value with two "magic numbers": A buffer size
of zero specifies unbuffered I/O and a buffer size of one specifies line-by-line I/O. Any other
value is the size of the I/O buffer bytes. The maximum, minimum and default buffer size
values are implementation-specific.
The ord function takes a one-character-length string and returns the integer value (ordinal)
for that character. This is the converse operation to what the chr function does.
The pow function raises its first argument to the power of its second argument. This is
similar to the ** exponentiation operator. But pow can take an optional third argument
which results in a calculation equivalent to the expression a ** b % c but the
implementation is much faster.
The range function has been used in several examples already. This function takes a single
integer argument and returns a list containing all the values from zero to one less than the
argument's value. But it can also take two integer arguments. In that case, the list
contains all the values from the first argument's value to the second's minus one. So the list
need not start at zero. But wait, there's more! The range function can take a third
integer argument which specifies the step increment, so the values in the list are not required
to differ by +1. (Of course a step of zero is nonsense, so it's not allowed.) This function's
most important use is to describe the iteration list of a for statement.
The raw_input function reads from the console (stdin) but does not attempt to process the
input as a Python expression. Like input the raw_input function can take an optional
argument which is the prompt string. The function reads from stdin until a newline is
encountered and returns a string value which is all characters up to but not including the
newline.
The repr function returns the string which is the printable representation of the object
which was passed as its argument. This is usually the same as what the user would see if the
print statement were to print the object.
The reversed function accepts a sequence object as its argument and returns the same
kind of sequence object but one whose elements are copies of the elements of the argument
arranged in reverse order.
The round function takes two arguments: a float and an optional positive integer. Its
purpose is to return a float whose precision is specified by the integer and whose value has
been rounded (away from zero) to satisfy that precision. If the precision argument is not
supplied, the function will assume zero. Note that the rounding algorithm for cases where
the value to the right of the lowest retained digit is exactly 1/2 is to always round the
magnitude up, irrespective of whether the digit to be affected is even or odd. This may not
be correct according to the procedure that kids are currently taught in grammar school.
The set function returns a set constructed from its argument, which may be any kind of
sequence object. Recall that a set cannot contain duplicate elements; use of this function
will eliminate duplicates in, for example, a list. So unique_x = list(set(x)) would
create a new list containing only the unique elements of the provided list.
The sorted function takes a sequence object and returns an object of the same type with its
elements arranged in lexigraphical order. But the function can take three additional,
optional arguments. The first of these, cmp specifies a custom comparison function (pass
None if the standard comparison works for you). The second function key specifies a
function that translates the element of the sequence into some value which should be used
instead for comparison purposes (again, None specifies no translation required). The final
argument, reverse, is a boolean value which if set to True causes the function to sort in
descending order instead of the default ascending sort process. Note that using key and/or
reverse is less expensive than using cmp since the cmp function is called multiple times per
element whereas key need only be called once and reverse simply changes how the return value
of the comparison is interpreted.
The str converts its argument into a string. This function usually returns the same thing
as repr would have. But occasionally the returned strings will differ. The repr function
returns a string which Python could evaluate; the str function will return a string which
"looks good", even if it cannot be evaluated.
The sum function takes a list of numeric elements (of the same type) and an optional start
index value (default is zero) and returns the sum of the elements from start though the last
element of the list.
The tuple function returns a tuple which is made from the elements of its argument (a
sequence object). The elements of the tuple will be in the same order as the elements of the
argument.
The unichr function returns the Unicode character corresponding to the value of the
function's numeric argument.
The unicode function, in its most basic form (one argument) converts its argument to a
Unicode string in much the same way as str converts its argument into an ASCII string.
The optional second and third arguments allow you to specify the encoding and what to do in
event of errors.
7. Objects
Python is an OOD-capable language. Like C++ or Java, it allows one to define custom
objects called classes. To define a class is to define a new data type and to describe that
type's behaviors and capabilities. Python allows for inheritance and permits one to define
both class variables and instance variables. But Python classes are not very good at data
hiding since Python does not have a concept of privacy -- all class variables and methods are
public.
A class is defined with the class keyword, followed by an identifier, then optionally the
class's inheritance specifier and the statement ends with a colon (which means of course that
the statement introduces a block). The inheritance specifier consists of a pair of parenthesis
surrounding a comma-separated list of expressions (usually simply identifiers) which are the
base class(es) from which this class inherits. Yes, Python supports multiple inheritance.
Class definitions closely parallel function definitions in that they introduce a new scope
range. But a class keyword is a marker for an executable statement -- all the statements
within the class but outside of contained function definitions (methods) are evaluated as the
class's namespace is created. This includes the expressions in the inheritance specifier and the
assignments which create the class variables. The class's namespace at this point is the list of
identifiers of methods and class variables. Note that the code of the class's methods is not
executed when the class is defined.
So to define a class variable one places an assignment statement inside the class's block but
outside of the block of any of the class's methods. The assignment defines the variable and
its type and also serves to initialize the variable. A class variable is common to all instances
of the class and can serve as a communication channel between instances, since any instance
can alter the value.
A class variable is referenced from within the class by using a qualified name which -- if you
have followed the convention for naming the first argument of a method -- would be to prefix
the name of the class variable with self. Example:
class TheClass:
i = 0
def a_method(self, x):
self.i = x
Figure 7-1: Referencing a class variable
Okay, as usual I got ahead of myself. It's time to explain how to define a class method. As
it turns out, you define a class method the same way you define a function. Except class
methods always have at least one argument which will be set to a reference of the current
instance of the class. Hence, by convention, this first argument is given the name self.
(When a class method is called, this argument is not supplied, however: to the outside world,
a class method has one less argument than its definition demands.)
To invoke a class method, the calling code must also use a qualified name, in this case
prefixing the name of the method with the name of the instance of the class. Example:
class Powerful:
def cube(self, x):
return x**3
def square(self, x):
return x**2
raised = Powerful()
i = 3
print raised.cube(i), raised.square(i)
Figure 7-2: Invoking a class's method
There is a special method of every class, named __init__ which is the initializer method.
This method is analogous to the constructor in C++: it sets initial values of (in fact
defining) the class's instance variables. The __init__ method automatically runs when an
instance of a class is created. The arguments to __init__ are passed during instantiation.
(In the above example, the only argument that Powerful.__init__ gets is the unstated
self). Since the Powerful class has no instance variables and needs no initialization, we
haven't bothered to define an __init__ method for it.
A typical task in an __init__ method is to run the __init__ method of the base class
from within the descendant class's initializer, like this:
class D (B):
def __init__(self):
B.__init__(self)
Figure 7-3: Initializing the base class inside the descendant's __init__
This is one time where one must pass an argument to satisfy the usually unstated self
argument. Since we are attempting to use the __init__ method of the class B without
instantiating B, we must pass an instance of type B to the method. In this case, we can pass
self which is an instance of D because D is a descendant class derived from B and is,
therefore a kind of B.
Naturally, if the initializer of B takes additional arguments, the initializer of D must supply
them, either by making them up as part of the initializer method's code or by passing them
along from the initializer's own argument list.
Let's look at class variables versus instance variables again. This code:
class X (object):
a = 0
def __init__(self):
self.b = X.a
X.a += 1
self.a = 0
def value(self):
return (X.a, self.a, self.b)
def increment(self):
self.a += 1
x = X()
print x.value()
y = X()
x.increment()
print x.value()
print y.value()
Figure 7-4: Use of both instance and class variables
will output
(1, 0, 0)
(2, 1, 0)
(2, 0, 1)
because X.a is a class variable (and therefore in the above code counts instances), whereas
self.a and self.b are instance variables. In particular, x.a is incremented when we
call x.increment() but y.a is unaffected.
I skipped over two built-in functions in the previous topic. These functions are appropriate
for classes (which is why I didn't cover them previously):
isinstance
issubclass
Table 7-1: Builtin functions useful for classes
The isinstance function takes two arguments, an identifier which is an instance of a class
and an identifier which is the name of a class. The function returns True if the instance is
either the specified class or if it is an instance of a descendent of that class.
The issubclass function returns True if the class name given as the first argument is a
descendent of the class name given as the second argument. It also returns True if the two
arguments specify the same class.
class B:
def __init__(self):
pass
class D (B):
def __init__(self):
pass
d = D()
print isinstance(d,
print issubclass(D,
print issubclass(D,
print isinstance(d,
print issubclass(B,
D)
B)
D)
B)
D)
Figure 7-5: Use of the isinstance and issubclass builtin functions
The above code will output
True
True
True
True
False
because d is an instance of D which is a subclass of B but B is not a subclass of D.
All class methods are (in C++ terms) virtual. So if your class B implements a method m and
your class D needs a more elaborate (or different) implementation, you simply define a
method D.m and that will override the base class's method. (Your descendent class can still
get to the base class's method, though, by referring to B.m.) Here's an example: a class that
is similar to int except only even numbers are permitted.
class EvenInt (int):
def __init__(self, value = 0):
self.data = 2 * (value // 2)
def __add__(self, value):
self.data += 2 * (value // 2)
return self.data
def __sub__(self, value):
self.data -= 2 * (value // 2)
return self.data
def __str__(self):
return str(self.data)
i = EvenInt(3)
j = EvenInt(2)
print i, j
i = i + 2
j = j + 1
print i, j
j = j - 2
print j
i = i * 2
print i
i = i / 5
print i
Figure 7-6: Overloading base class methods
The above code is incorrect. The implementation of the EvenInt class is incomplete. As a
result, the program outputs
2 2
4 2
0
8
1
Addition and subtraction perform as expected. Multiplication also works in this example,
but by accident -- you can't get an odd number by integer multiplication if one of the
operands is an even number, which was the case when we did the multiplication. But
division fails because the __div__ method of the int class has not been overloaded. The
correct behavior would have resulted in a quotient of 2 (the result of dividing 8 by 4).
Likewise, a bunch of other methods from __int__ need to be overridden for completeness.
Execute help(int) in your Python interpreter to get some idea of what would be needed.
Python classes have no equivalent of the C++ private keyword. But because data hiding
and namespace isolation is so useful a language feature, a naming convention has evolved:
If you define an identifier within a class whose initial two characters are underscore but
which does not end in an underscore, the Python interpreter will perform "name mangling"
on the symbol (for instance, in Python 2.6 for Windows, if you have defined __a in class X
then what winds up in the symbol table is _X__a). This handles the namespace separation
issue nicely. But the mangled name is still public so you must abide by the social contract
not to refer to X._X__a or to _X__a in an instance of X. Also, other Python
implementations may mangle the name differently, so breaking into the system by
anticipating how the name may be convolved will result in non-portable code.
8. Exceptions
Python is supports exceptions. Like Java, there are pre-defined exceptions and exceptions
which you get when you include a library module. Of course Python also allows you to
define your own custom exceptions because exceptions are actually classes derived from the
Exception object. The pre-defined (built-in) exceptions are reserved identifiers, just like
the built-in functions are.
The built-in exceptions are:
BaseException
Exception
StandardError
ArithmeticError
EnvironmentError
AttributeError
FloatingPointError
IOError
IndexError
KeyboardInterrupt
NameError
OSError
ReferenceError
StopIteration
SystemError
TypeError
UnicodeError
UnicodeDecodeError
ValueError
WindowsError
LookupError
AssertionError
EOFError
GeneratorExit
ImportError
KeyError
MemoryError
NotImplementedError
OverflowError
RuntimeError
SyntaxError
SystemExit
UnboundLocalError
UnicodeEncodeError
UnicodeTranslateError
VMSError
ZeroDivisionError
Warning
UserWarning
DeprecationWarning
SyntaxWarning
FutureWarning
UnicodeWarning
PendingDeprecationWarning
RuntimeWarning
ImportWarning
Table 8-1: The Built-in Exceptions
Why is the above table broken into groups? Because the exceptions are classes which have a
hierarchical relationship. BaseException is the ancestor class of all the others.
Exception, StandardError, Warning, and UserWarning are also classes from which
many others are descended.
The hierarchy of exceptions is too complex for a simple tabulation. Instead, we need a tree
diagram like this:
BaseException
+-- SystemExit
+-- GeneratorExit
+-- KeyboardInterrupt
+--Exception
+-- StopIteration
+-- Warning
|
+-- DeprecationWarning
|
+-- PendingDeprecationWarning
|
+-- RuntimeWarning
|
+-- SyntaxWarning
|
+-- UserWarning
|
+-- FutureWarning
|
+-- UnicodeWarning
|
+-- BytesWarning
+-- StandardError
+-- BufferError
+-- AssertionError
+-- AttributeError
+-- EOFError
+-- ImportError
+-- MemoryError
+-- ReferenceError
+-- SystemError
+-- TypeError
+-- ArithmeticError
|
+-- FloatingPointError
|
+-- OverflowError
|
+-- ZeroDivisionError
+-- LookupError
|
+-- IndexError
|
+-- KeyError
+-- NameError
|
+--UnboundLocalError
+-- RuntimeError
|
+-- NotImplementedError
+-|
|
|
|
+-|
|
|
|
+--
EnvironmentError
+-- IOError
+-- OSError
+-- VMSError
+-- WindowsError
SyntaxError
+-- IndentationError
+-- TabError
ValueError
+-- UnicodeError
+-- UnicodeDecodeError
+-- UnicodeEncodeError
+-- UnicodeTransateError
Figure 8-1: Exception Hierarchy
Knowing that tree lets you decide which built-in exception to use as the base class for your
own. Most likely, you would derive from Exception unless you know of a more
appropriate base. (Do not derive from BaseException.)
Using exceptions is not much different in Python than it is in Java. You enclose a block of
code which is likely to raise an exception in a try block and code an except block to handle the
expected exception. The way this works is as follows: the code in the try block is executed
until/unless any exception occurs. If an exception happens, the rest of the try block is
skipped. If the exception is specified by (one of) the except block(s) the corresponding except
block is executed instead. If no except block is defined to handle the particular exception,
the exception is passed to the next outermost block to be handled. This could be the Python
interpreter itself, as all uncaught exceptions will wind up there. If the exception is caught in
an except block, the outer block (including Python) will not be notified, normally.
There may be multiple except blocks to handle different exceptions differently or an except
block may specify a parenthesised, comma-separated list (i.e.: a tuple) of exceptions which
the block would then all handle in the same way. The last except block can be anonymous, in
which case that block must handle all exceptions not already mentioned (which could hide
program bugs if the handling isn't done carefully). Additionally, there may be an else block
(which must come after all except blocks) which executes if no exception is raised. Lastly,
there may be a finally block for code which must be performed whether or not the
exception(s) are raised. The finally block must come after everything else. Here's a little
pseudo-code to make that clear:
try:
# code which could raise the FoobarException but might
# also raise the FooException or the BarException
except FoobarException, e:
print e # let the exception explain itself
except (FooException, BarException):
print """Foo or Bar but not Foobar."""
except: # catch everything else here (bad strategy!)
print "we got some exception we never expected"
else:
# code to do in case no exception is raised
finally:
# code to do whether an exception is raised or not
Figure 8-2: Using exceptions
Notice in the above example that the first except block specifies an instane, e, of the
FoobarException, whereas the second except block only specifies the names of its
exceptions. It's your choice. But you will need to specify an instance if your exception
handling depends on access to the attributes of the exception class.
Exceptions which derive from the Exception class (which is most of them and probably
anything you are likely to define for yourself) define a __str__() method to return a string
which is a human-readable representation of the arguments used to construct the expression
object. You may elect to examine the expression's args attribute (a list) directly.
Note: the syntax involving the use comma is very similar to the syntax for multiple
exceptions. Be careful to use parenthesis when you mean the latter! (Version 3.0 of Python
introduces a new syntax for when you mean an instance of a particular exception.)
To raise an exception, your code needs to instantiate the desired exception object and then
use the raise statement to signal it.
if lions and tigers and bears:
e = OzException("Lions and tigers and bears, oh my!")
raise e
Figure 8-3: raising an exception
(You could, of course combine instantiation and signaling into one statement by saying
raise SomeException(args) and never have an explicit instance to refer to.)
You can "pass the buck" by inserting a raise statement in your except block. This will pass
the exception along to the outer block's handler (if any). In this case, you simply say
raise without instantiating an exception object and the one your except block caught is the
one passed upwards. This is useful when your except block must do some action in response
to the exception involving objects local to the function containing the try but cannot
completely handle the exception because the calling block needs to be notified as well. The
following code is an example of this:
def read_ini(filename, p):
f = open(filename, 'r')
try:
# attempt to read f line-by-line and assign values in
# the list p.
for line in f:
p.append(line)
except IOError, e:
print e
close(f) # must still close f -- nobody else can
raise
close(f)
def read_profile(p):
p = [] # ok since the list (p) is passed by reference.
try:
read_ini('my_file', p)
except IOError:
# if we get an IO error, treat the entire file a
# worthless and set p back to an empty list.
p = []
Figure 8-4: Using raise to pass handling to outer block
We'll see more of this when we tackle the topic of File IO, next.
9. File I/O
We have already mentioned the open built-in function. This function returns a file object
(a class called file). We can just consider a file to be a "black box" that provides an interface
to the platform's file system for the purpose of I/O. This interface is embodied in a set of
methods. Each method is simply a function which can perform some action on (or with) a
file. The fact that these functions are methods of a class means that the functions can
operate on instance data related to their tasks, starting with the path to the file being
manipulated and the I/O mode which were passed as arguments to the open function.
To invoke a particular method on the file corresponding to an instance of the file class, you
call that method using a qualified name, in this case qualified by the name of the file variable
which was set by the open call for the specific file. Example:
foo = open("xyzzy/plugh", "r")
foo.seek(0, os.SEEK_END)
Figure 9-1: Using the qualified name to access a file method
There are some other wrinkles in file I/O, too. For example, if "xyzzy/plugh" is not the
name of a file to which read permission is granted, the open call will fail and throw an
exception. We could catch that exception (rather than just letting the program terminate) to
make the program handle the problem gracefully. Likewise, many of the methods can throw
exceptions which can be handled. We'll discuss which methods throw what exceptions and
later show a robust example where these exceptions are caught.
Here is a list of all the attributes and methods belonging to the file class:
close()
fileno()
name
readline()
tell()
closed
flush()
newlines
readlines()
truncate()
encoding
isatty()
next()
seek()
write()
Table 9-1: File I/O attributes and methods
errors
mode
read()
softspace
writelines()
Many of these methods are spelled the same and work the same as the C stdio library
functions. Others are unique to Python but turn out to be very useful to know. I'll cover
the stdio-like methods first and then some of the others. Along the way, I'll point out a few
Python-esque tricks that the file class permits you to perform.
The close method closes the file. The file instance still exists, but it is no longer good
for much. The closed attribute for that file object will now be True. (The closed
attribute is read-only.)
The fileno method returns an integer which is the file descriptor value that the underlying
implementation of the I/O system uses. This attribute's value is sometimes needed when
accessing low-level I/O methods.
The flush method writes the contents of the file's I/O buffer to the actual file. This
method could actually do nothing if the I/O is unbuffered or if the underlying
implementation does not allow a flush operation.
The read method takes an optional parameter, an integer specifying the maximum number
of bytes to read from the file. Without the parameter, the method reads to EOF. The
bytes are returned as a string. Once EOF is reached, the method returns an empty string.
The readline method takes an optional parameter. This parameter will limit the reading
to that many bytes. Otherwise, the method returns a string consisting of all bytes up to and
including the next newline, if any. The newline is retained but if the file ends with an
incomplete line or if the line read is longer than the optional parameter specified, the string
may not contain a terminal newline.
The seek method repositions the file's read/write cursor (the place in the file from which the
next I/O operation will start). This method takes a mandatory offset argument and an
optional origin argument. The offset argument is a long integer; the origin argument may be
one of three values: os.SEEK_SET, os.SEEK_CUR and os.SEEK_END (and defaults to
os.SEEK_SET if it is not supplied). If the origin argument is os.SEEK_SET, the cursor is
repositioned relative to the start of the file (and the offset argument cannot be negative). If
the origin argument is os.SEEK_END then the cursor is positioned to the end of the file (and
the offset argument cannot be greater than zero). In the final case, when the origin
argument is os.SEEK_CUR, the offset can have any value and the cursor is positioned
relative to where it was when the seek method was invoked. (If this looks suspiciously like
the stdio C library function fseek, it should!)
The tell method reports the position of the file's I/O cursor. The value returned is the
offset relative to the origin given by os.SEEK_SET. Note that the value returned may not
make sense on Windows implementations if the file contains odd newline encodings unless
the file was opened in a binary mode. (This function is bug-compatible with stdio's fseek.)
The truncate method takes one optional argument, a size. If the argument is not
supplied, size is set equal to the value that tell() would return. This method sets the
file's size to the specified value. The file cursor is not repositioned, so if you truncate the file
to a size less than where the cursor is pointing, you'd better call seek before attempting
another operation.
The write method writes an array of bytes from its string argument.
The mode attribute is the string passed to the open function which specified the file's I/O
mode (e.g.: "r+"). This attribute is read-only and some files will may not have a meaningful
mode attribute.
The name attribute is the string passed to the open function that is the name of the file.
(This attribute is read-only and may not be meaningful for all file objects.)
The isatty method returns True if the file is a tty (e.g.: stdout) or a tty-like file.
The encoding attribute is a function which is used to encode Unicode strings into the file's
byte-stream. This attribute may be None if the file uses default encoding.
The errors attribute is a function which handles Unicode encoding errors.
The newlines attribute returns a touple of strings which is all the encodings so far
encountered which are recognized as newlines by the Python implementation. This
attribute is read-only and may be None if no newline has yet been encountered or missing if
the implementation is not configured to accept multiple encodings for newline. (The default
configuration accepts '\r', \n' or '\r\n' as newlines.) The file must be opened for read, of
course.
The softspace attribute is used by the print statement to decide whether or not it must
output a space before the next item. (Note that this is a state variable being used by the
print statement and not a way to affect how print works.)
The next method is the file's iterator. It returns the next line of text from a file opened for
input.
The readlines method takes an optional parameter which, if present, specifies an
approximate limit to the data size (if the implementation chooses to recognize this limit).
The default behavior for this method is to do a readline repeatedly on the file until EOF
is reached, returning a list of strings.
The writelines method writes each element of its argument, an iterable containing
strings (typically a list of strings). Despite its name, the method just writes what it is given;
the strings themselves must supply the terminal newlines. But since readlines reads
lines and doesn't remove the newlines, the writelines method is truly performing the
converse operation.
A file class is its own iterator. That means that if you open some file for read, you can then
use the file class itself to step through the data line-by-line. Here's how:
foo = open("foobar_file", "r")
for line in foo:
print line
Figure 9-2: Using a file as an iterator
10. Modules
Module is just Python's term for a file containing Python source. Instead of simply starting
the interpreter and entering Python statements until you have a program (then losing it all
when you exit Python), you use a text editor of some sort and write your code to a file.
(Aside: not all text editors are created equal. The Idle development system which was
installed when you installed Python has a syntax-colored editor with helpful features like
auto-indent.) The file you created then becomes a module that you can tell the Python
interpreter to load and run.
Another use for a module is to contain a library. A library is a collection of code (usually
functions or classes or both) which is inter-related and provides a set of (hopefully) debugged
capabilities that you simply import instead of having to reinvent. Much of this topic is
going to be about how to use the import statement, actually. It's vastly more powerful than
the C++ #include or even Java's import.
A module's filename (minus the ".py" suffix) becomes the name by which it is imported.
Everything within the module can be imported as a block or you can pick and choose. You
can even import a definition and change its name as you do so, providing a cheap way to
avoid naming collisions when you must import multiple modules.
The import statement has many forms. The simplest is simply import followed by the
module's name, e.g.: import foo. This imports the module foo (or throws an exception if
the module cannot be found). None of the names within that module are imported, however
-- you must reference everything using a qualified name, e.g.: foo.bar = 42. Since all
names must be qualified by the module name, name conflicts are not possible. But if you
need to frequently refer to the various components within a particular module, you can say
from foo import * and import everything from that module into your namespace (which
may result in name collisions, since these names will no longer be qualified).
The general form of that last import statement form is
from module import name-list
where name-list is a comma-separated list of all the names you want to import directly or *
which means import all names (except those which begin with underscore).
An even more elaborate form is
from module import name as alias optional-list
where optional-list is zero or more of , name as alias which allows you to import multiple
names each with its own alias. Example:
from math import e as E, pi as PI, tanh, acos
will import the constants for the base of the natural logarithm and the ratio of the
circumference of a circle to its diameter under the alternate names E and PI, respectively,
and will import the hyperbolic tangent and inverse cosine functions under their standard
names. Important: the other names from the math module are not imported and the module
itself is not imported, making even a qualified reference for those names such as
math.sin() undefined. So if you're going to pick and choose, you need to know what you
will need from the module.
But one way to manage the name collision problem is to take advantage of Python's "ad hoc
variable declaration" behavior and simply assign a qualified name to a global variable. Like
this:
import math
PI = math.pi
E = math.e
tanh = math.tanh
acos = math.acos
print acos(PI / 8), tanh(E / 4), math.sin(PI)
Figure 10-1: import your cake and reference it, too
By using this technique, you can have simple names for commonly-needed components from
a module and still be able to reference any of the other components.
Modules may contain executable statements for the purpose of initialization and declaring
data. These statements get executed once -- when the module is first imported. (Since one
module may import another, it's possible for you to import a given module more than once
without realizing it.) This is how math.e and math.pi get created -- there are a couple of
assignment statements in the math module.
When you invoke the Python interpreter as a command-line, you can cause a module to run
as a script. (I dislike this term -- Python modules run from the command-line are programs.
To call them scripts diminishes their importance, as if Python wasn't "real" code.) The
command-line syntax would be:
Python module-name arguments-if-any
When invoked in this manner, the module's name becomes __main__ instead of whatever
the filename was as it would be if the module were imported. This is a useful fact to know
because you can then put a few statements in the module which are to be executed when
__name__ == "__main__". This code would then allow you to both run the module as a
program and to import it. Here's one way:
def _main(args);
# since this name begins with _, it is
# unlikely that this will be imported
# by accident.
""" Main function of module. Will run when module
is invoked as a program.
"""
# code of main function.
if __name__ == "__main__":
import sys
# ensure we have the sys module.
_main(sys.argv)
Figure 10-2: Turning a module into a program
(sys.argv is a list of strings constituting the command-line arguments.)
Even if your module is not normally intended to run as a program, you might use this
technique to run code which tests the module. That lets you bundle the module with the
code which was used to certify it.
That's all very nice, but what if somebody else wrote a module you want to import? Do you
have to read the whole module to see what functions or classes it provides? As it turns out,
no. The built-in function dir if called with no argument returns a list of strings which is all
the names you have defined. But if you call dir with an argument which is the name of a
module, then the list returned is that of the names defined in the module. And if the
programmer has done a professional job, you can then find out what each function does by
printing the function's __doc__ string.
Here's a quick-and-dirty way to get a module's documentation:
import math
for name in dir(math):
cmd = "print math." + name + ".__doc__"
print "math." + name + ":"
exec cmd
Figure 10-3: Code to act like help(module)
The above code works fine on modules which consist only of functions and where those
functions all contain doc strings. The math module almost meets this requirement -- the
constants e and pi don't have doc strings so you get something strange for those: the doc
string for the float data type.