Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Python Programming Copyright 2010 by David A. Wallace revision D of 07-Aug-2010 Contents 1. Where Python is similar to other languages a. Identifiers b. Operators c. Comments d. Constants e. Modular 2. Where Python is different from other languages a. Indentation defines block structure b. Interpreted with garbage collection c. Variables are not declared and type is contextual d. Scope 3. A slice of cheese a. What a Python program looks like b. Installing a Python development environment 4. Intrinsic types a. Integer b. Float c. Complex d. Boolean e. String f. List g. Tuple h. Set i. Dictionary 5. Language elements a. Identifiers b. Keywords c. Literals d. Operators e. Qualifiers and Modifiers f. Assignment is definition g. Expressions h. If / elif / else i. For j. k. l. m. n. While Break and continue Pass Print Exec 6. Functions a. Def(ining) b. Argument passing c. Return d. Returning values e. Scope f. Global g. Docstring h. Built-in functions 7. Objects a. Defining classes b. Inheritance c. Methods d. Attributes e. Initializer f. Scope g. Builtin Functions Useful for Classes h. Overloading i. Pseudo-private Variables 8. Exceptions a. The Builtin Exceptions b. Catching Exceptions: Try and Except Code Blocks c. Catching Exceptions: Else and Finally Code Blocks d. Throwing Exceptions: the Raise statement 9. File I/O a. The File Class b. Attributes and Methods of File c. Iteration using File 10. Modules a. Import Statement b. Namespace Considerations and Aliasing c. Use of a Module as a Program Preface and Disclaimer The author is still learning the Python language. Since this is about the fifteenth programming language he has used (counting all assemblers as one), the programming style used in the examples herein is probably not exactly what an expert Python programmer would use. Nor does he always express his algorithms using idiomatic Python. And there are undoubtedly many subtle features of Python which this introductory class will not cover. The author's most recent programming experience has been in C++ and Java. When analogies between Python and some other language are appropriate, these analogies will be to those two; the assumption being that since these are the two most widely-used languages at the moment, the audience is most likely familiar to one or the other. However, expertise in either of these languages is not a requirement for this course. All examples and code fragments have, however, been checked for correctness using the Python 2.6 development environment for Windows. 1. Where Python is similar to other programming languages Python is a procedural language, like pretty much every computer language since FORTRAN and COBOL. It consists of statements, usually organized into functions and/or classes. Statements usually specify operations applied to operands which may be constants or variables. Python uses data type which would be familiar to anyone proficient in high-level languages newer than -- say -- Pascal. Identifiers in Python follow the same syntax a C-like languages: they consist of one or more letters or underscores followed by zero or more letters, digits or underscores. Identifiers are case-sensitive. Keywords in Python are reserved identifiers. Like all but a few languages (PL/1 springs to mind), keywords are reserved in all contexts. Operators in Python are similar to the operators in C, though not all of the C operators are present in Python and a couple are spelled differently. In Python, the constants for integer, float, string and boolean values follow the same syntax as C++(except that the string constants can be delimited with either single-quote/apostrophe or double-quote characters and the boolean constants are spelled True and False). The usual whitespace characters (space, tab, newline) are recognized and delimit tokens (except when they appear in string literals). Line continuation in Python is performed by ending the current line with a backslash character, as in C. Comments in Python are defined as from the mesh character (#) to the end of the line. Therefore, one may have right-comments or full-line comments but there is no specific way to have a block of text commented save by beginning each line with mesh. Python can be organized into modules and packages which may be included (imported) into a program in whole or partially. This capability is similar to the capabilities which exist in Java. 2. Where Python is different from other programming languages Firstly, Python is very sensitive to the formatting of its lines. In Python, the indentation of a statement defines block structure. Failure to indent or indenting by an incorrect distance is an error -- and the error may either be caught as a syntax problem or it may be syntactically acceptable but cause a logic bug. Python is an interpreted language, like BASIC. This means that you will only be told about the first syntax error in your code when you attempt to run it. Where "first" is the first one the interpreter encounters -- not necessarily the first one in statement-number order. If you are only used to compiled languages (and that includes Java, even though it is technically interpreted by the JRT), you may find this frustrating. Like Java, Python has a garbage collector; there is no explicit way to allocate or free storage. Variables in Python are defined from context. How they are assigned and from what will determine their type. This includes arguments to functions and values returned from functions. The one exception to this is the definition of an instance of a class. The above idiosyncrasy implies a strange scope rule: global variables are not visible within functions or classes and will be overridden by a local assignment to an identifier of the same name. (There is a keyword which must be used to allow access to globals, though.) Additionally, class instance attributes have to be qualified even within methods of the class; this is like having to use this for all attributes in a C++ class. 3. A slice of cheese Okay, I couldn't resist that. You see, Python (the name, not the language) comes from the BBC TV comedy show Monty Python's Flying Circus and the official Python package repository is called The Cheese Shop after one of the more famous sketches from that series. Here's a simple Python program to print the first ten prime numbers: primes = [] def is_prime(n): """ Determine if the given value can be evenly divided by any of the previously-found prime numbers. If so, return False. Otherwise, the given number is a prime -- append it to the list of primes and return True. """ global primes for k in range(len(primes)): if (n % primes[k] == 0): return False else: primes.append(n) return True n = 2 while (len(primes) < 10): is_prime(n) # ignore return value; we don't need it. n = n + 1 for k in range(len(primes)): print str(primes[k]) Figure 3-1: Print the first ten prime numbers This simple program demonstrates many of the charming [sic] features of the Python language: global and local scope, extensible arrays, enumeration, indefinite looping, function definition and simple console output with type conversion. The result of running the above program is: 2 3 5 7 11 13 17 19 23 29 output to the console (stdout). Okay. Now that we've seen Python in action, the next thing you probably want to do is get your own Python development environment. The best place to start is probably http://python.org/download -- this is where you can download Python packages for Windows and Mac OS/X. Just follow the instructions for the particular package you need. I use the Windows package, myself. This includes an IDE called Idle which has a syntax-sensitive editor and a debugger. Get the version 2.x package rather than the version 3.x one -- the 3.x Python is newer and not-entirely-compatible with 2.x and most of the Python code currently in circulation is 2.x. Oh yes, I should mention that the download is free. 4. Intrinsic types Python has nine intrinsic data types: integer, float, complex, boolean, string, list, tuple, set and dictionary. Let's examine each one in turn. Integers are probably boring. They are the familiar twos-complement 32-bit binary numbers that are the meat-and-potatoes of data. There are no surprises here. Nothing to see... move along... move along... Oh, wait! Long integers can also be defined and are limited only by available memory. So go ahead and count all the permutations of the quantum states of every proton in the universe if you want. Floats are almost equally boring. The intrinsic floats are represented in binary as double-precision float values in the underlying implementation (usually the same IEEE standard as Standard C uses these days). There is no such thing as a single-precision float; all floats are equivalent to the Standard C double type. Complex is a built-in type in Python. Complex quantities have both a real and an imaginary part and both parts are floats. Having this type built into the language is unusual, but very nice to have if you are working with analog electronic circuit problems, for example. Booleans can only have values of True or False. When printed, they have the values "True" and "False", but are otherwise identical to integers having the values 1 and 0, respectively. Strings are arrays of characters. They come in two flavors: ASCII and Unicode. Python has no character data type; a string of length one serves the purpose instead. Strings may be indexed by character position to retrieve individual characters or by slicing to retrieve substrings. (Note that these are read-only operations; strings are immutable sequence objects, so to assign new values to one or more characters within the string requires a different method.) A string literal is more like a Pascal string than a C string, in that it has a length attribute as opposed to containing a terminating character value such as '\0'. Lists are one-dimensional arrays of objects. You can have a homogeneous list where all elements are the same type of object or a heterogeneous list where the various elements are of different types. Elements of lists are accessed as in C, using an index or by the slice expression. The number of elements and even the type of each element in a list can be changed at will during execution. Tuples are multi-element objects which are always manipulated as a single entity. A tuple can only have the number and type of elements it was defined with. It is possible to change the values of the elements in a tuple, but you cannot change the number of elements or their types once the tuple is defined. Sets are a kind of immutable sequence where no two elements can have the same value. You can perform operations like union and intersection on sets. Dictionaries are arrays of key / value pairs where key and value can both be any type but the dictionary will always contain only unique keys. The value is retrieved, set or changed by addressing the dictionary with a key. Dictionary keys are immutable objects. Dictionary values are mutable. 5. Language elements Identifiers in Python are character sequences of unlimited length which begin with either an underscore or a letter and optionally may contain underscores, letters or digits. Identifiers are case-sensitive. By convention, identifiers which begin and end with underscore are reserved for the Python implementation and identifiers which begin with two consecutive underscores are private to the class in which they appear. Your particular Python implementation may or may not enforce these conventions. Python reserves a set of identifiers as keywords. These are: and continue except global lambda raise yield as def exec if not return assert del finally import or try break elif for in pass while class else from is print with Table 5-1: Python keywords Because these keywords are reserved, you may not use an identifier of the same spelling for your own purposes. Python allows you to define constants for creating values in many of the intrinsic types: integer, float, string and complex. Here are examples of integer literals: 1 0177 0 123 12345678 1L 32768 0o177 0x7F 0b0110 0x3FFFFFFFFFL 162534173946247485950607867563534363785959676 Float literals can be specified either in float or exponent notation: 1.023 0.4 6.02e23 2.99792e8 8.98755430563e20 1073.2 7.07e-1 3.1415926 .707 0.0 note that an integer can also be interpreted as a float literal: 12 and 12.0 are the same float literal when used in a context where a float value is required. (Warning: assignment is not such a context; x=1 defines x as an integer. But x = 3.7652 / 14 would define x as a float.) Imaginary literals are used to define complex numbers. An imaginary literal is a complex number whose real part is 0.0. Imaginary literals are just float literals with a suffixed j or J: 1.023j 0.4J String literals come in four flavors. The "vanilla" string literal encodes ASCII-7 character sequences and follows the backslash escape sequence conventions of C. The other flavors are formed by preceeding the leading string delimiter with a type code prefix. If you are familiar with standard C, the ASCII-7 string literals are pretty much the same except that there are four possible delimiters you may choose from: the apostrophe ('), the double-quote ("), three successive apostrophes (''') or three successive double-quotes ("""). Whichever delimiter you start with determines the one you must end with; embedded characters which aren't an exact match for the initial delimiter need not be escaped. Here are some examples: '' # empty string delimited with apostrophe "'" # an apostrophe, delimited with double-quotes "\n" # newline """ This string's using three " characters to avoid needing escapes""" The backslash escape conventions are nearly the same as Standard C: \a \b \f \n \r \t \v \' \" \\ \123 # # # # # # # # # # # # \x7B # # alarm (bell) backspace form-feed newline (line-feed) carriage return horizontal tab vertical tab apostrophe double-quote backslash the character whose value is octal 123 (use 1 to 3 digits) the character whose value is hexadecimal 7B (always use 2 digits) Table 5-2: String literal escape sequences Unlike Standard C, a backslash followed by an unrecognized character results in the backslash and the character which follows being left in the string. (I.e.: 'a\pe' results in a\pe, not ape.) Unicode string literals are similar to the vanilla string literals except that the character set is Unicode rather than ASCII. The type code prefix for Unicode strings is the letter u (either lower- or upper-case). The escape convention in unicode strings is a bit richer than in ASCII-7 strings, in that the escape sequence \u1234 encodes the 16-bit unicode character whose value is hexadecimal 1234 and \U12345678 encodes the 32-bit character whose value is hexadecimal 12345678. Raw string literals are byte sequences. The escape conventions do not apply. The type prefix for raw strings is the letter r (either lower- or upper-case). The raw string can encode any character sequence except one which ends in an odd number of backslashes. (IMO, this is an implementation bug.) Unicode raw strings are also possible. The type prefix for them is the two letters ur (either case, but in that order). Again, backslash escapes are not applicable but the character set is Unicode. String literals can be implicitly concatenated by juxtaposition or being separated only by white-space, as in C++. Python implements several predefined constants. While these are not precisely keywords, you should probably consider them as such; you cannot assign anything to them: None NotImplemented Ellipsis True False Table 5-3: Predefined constants Python uses a rich set of operators which are similar to the set used by C-like languages. They are: + * / ** // % << >> & | ^ ~ < > <= >= == != add (also string concatenation and unary plus) subtract (and unary minus) multiply divide exponentiate integer quotient (floor) division remainder of division (modulo) left shift right shift bitwise AND bitwise OR bitwise XOR unary bitwise complement less than compare greater than compare less or equal compare greater or equal compare equal compare unequal compare = += -= *= /= **= //= %= <<= >>= &= |= ^= assignment assign by adding ( a += b and a = a + b are equivalent) assign by subtracting assign by multiplying assign by dividing assign by exponentiation assign by integer quotient division assign by modulo assign by left shift assign by arithmetic right shift assign by AND assign by OR assign by XOR Table 5-4: Operators There are also delimiters and qualifiers: [] () {} : . , ` @ ; list element indexing and grouping many uses, mostly as in standard C groups dictionary element list many uses qualifies a reference separates elements in a list of tokens delimiter for test_list "decorator" mark separates multiple statements on the same line Table 5-5: Delimiters and qualifiers If you're keeping score, there are two characters left in the set of printable ASCII symbols for which Python has not defined a use: The question mark and the dollar sign. These two characters are illegal except within a string literal. There is no variable declaration syntax in Python. To define a variable, you simply assign an expression (or literal) to an identifier using the = assignment operator. This assignment will determine the value which the identifier holds and also its type. If the identifier had previously been assigned (where "previous" means earlier in the same scope, as we will emphasize when we get to the topic of scope rules), the old value is replaced by the new one and the identifier's type may also mutate. In plain English, a = 1 # a is an integer having a value of 1 a = 1.41 # a is now a float having a value of 1.41 a = 'abc' # a is now a string having a value of 'abc' This madness is perfectly implementable because Python is not compiled and therefore subsequent statements have no preconceptions about what a particular identifier's type is and therefore can deal with any type for which the statement is still legal syntax. And given that Python performs garbage collection, the "old" versions of the identifier (which can no longer be retrieved) will eventually be returned to free memory. So this kind of code, while it may become confusing and hard to maintain, is at least not terribly untidy. Expressions in Python are very similar to expressions in Standard C, except because of the "exotic" data types there are some interesting expression forms whose result evaluates to one or another of the exotic types. The exotic-form expressions result in list objects, tuples, sets or dictionaries. The simplest of these is the one which produces a tuple. This expression consists of a comma-separated list of expressions surrounded by parenthesis. (The strict grammar rule allows for one expression followed by a comma within the parenthesis. This forms a special-case tuple called a singleton. Another allowed form is an empty set of parenthesis. This creates a tuple with no elements.) The result is a tuple having as many elements as there are expressions in the list and having values which are the results of each expression at the time the tuple is constructed. The + operator is "overloaded" for tuples to implement tuple concatenation: if a and b are tuples, a+b is a tuple containing all the elements of a followed by all the elements of b. Individual elements of a tuple may be accessed using the same syntax that C, C++ or Java uses to access an element of an array: the_third_element_value = tuple[2]. But since a tuple is an invariant sequence, it is not legal to attempt to assign a new value to an element of a tuple. In other words, a reference to a tuple element cannot appear on the left side of an assignment statement. A list object is produced by an expression consisting of a comma-separated list of expressions surrounded by square brackets. (Again, the grammar allows for single-element and empty list objects.) Concatenation of list objects is accomplished with the + operator as well. Like tuples, an individual element is referenced using C subscripting notation, e.g.: the_ninth_element_value = list[8]. Unlike tuples, however, the elements of a list object may be the target of an assignment, e.g.: list1[15] = new_value_for_element_16. As in C-like languages, the first element of a list has an index of zero. Elements of list objects (and characters of strings) can also be referenced using a slice expression. This expression is described by the production: optional_start_position : optional_end_position Referencing a list or string this way returns a subset of the object consisting of all elements from the start_position up to but not including the end_position. If the start_position expression is omitted, the slice starts with element zero; if the end_position expression is omitted, the slice extends through the last element of the list (or the last character of the string). Dictionaries are produced by an expression consisting of a comma-separated list of key-value pairs surrounded by curly braces ( '{'and '}'). A key-value pair consists of an expression, a colon character and a second expression. The first expression is the key; the second is the value. To obtain the value in a dictionary, you provide the key and the dictionary auto-magically finds the corresponding value. This means that dictionary keys must be unique. Again, you can create an empty dictionary with an expression consisting of just the braces. You can create a dictionary containing a single key-value pair by specifying the element with or without a trailing comma. Dictionary values are referenced using C-like subscripting notation where the value in the brackets is one of the dictionary's key's value. Values can be the target of an assignment, e.g.: answers['life, the universe and everything'] = 42 If the dictionary does not have a key with the same value as what appears in the brackets, a new key and value is added to the dictionary; if the key already exists, the corresponding value is altered. Warning: the key is compared by value, with all the usual type conversion rules applied. So a key having a value of 2 will be matched with a lookup for 2, 2.0 or even 2+0j. Unlike tuples and lists, concatenation of dictionaries with + is not defined. (But there is still a way to do concatenation, as we shall see later.) As in C, multiple assignment is possible: a = b = 42 defines both a and b as integers having the value 42. I don't recommend using this syntax because while the above assignment will do what you expect, an assignment of a = b = [42, 43, 44] will not necessarily create separate copies of the three-integer list; if the implementer desires, it is perfectly permissible to simply have a and b point to the same list. Which means that this code a = b = [41, 42, 43] b[1] = 76 c = b[1] - a[1] might or might not produce the same result for c as this code a = [41, 42, 43] b = [41, 42, 43] b[1] = 76 c = b[1] - a[1] In implementations where the multiple assignment results in distinct objects, c will have a value of 34 for either code fragment; if the implementation makes a an "alias" for b, however, then c will be zero in the first case and 34 in the second. Best to avoid multiple assignment and ensure that your code isn't going to fail in implementation-dependent ways. But programs are more than just calculations. We often need to make the computer conditionally perform operations. This is often called control-of-flow. The simplest control-of-flow construct is the if statement: this statement tests some condition and, if the result of the test is True, executes some block of code. We can get fancy and define another block of code to be executed instead. The above paragraph introduces a new concept: a block. A code block is just a series of statements indented to the next level. The block ends when one encounters a statement indented to a previous level. Python has no begin / end keywords or operators, it simply keeps track of the number of leading spaces a statement has. The Python if statement looks like this: if x == 0: # do something when x is zero elif x > 5: # do something else if x is greater than 5 else: # do this when x is neither zero nor greater than 5 The else and elif clauses are optional. All elif clauses must preceed the else clause if both are present. Nesting of if statements is permitted and because indentation determines the block structure, the "dangling else" problem which exists in some languages (which if does the else close?) is not an issue in Python. if first_name == 'Jim': if last_name != "Jones": pass else: drink_koolaid() elif first_name == "Mark": if last_name == """Knopfler""": play_guitar() elif last_name == "Twain": write_humor() else: pass else: print "I don't know this person.\n" Figure 5-1: Use of nested if statements (I know there are several language elements we haven't discussed in the above code fragment. We'll get to them eventually...) If you are coming to Python from C or C-like languages, you will probably want to parenthesize the logical expression. That's okay, but it's not necessary. Even if you use the logical operators and, or and not in the expression, parenthesis is only needed to override the natural left-to-right evaluation and multiplicative distribution rules of the language. So you can make complex expressions like if not x == 3 and y <= 5 or z == 0: # do what you need if both x is not 3 and # y is less than or equal to 5 or if z is zero # which is not the same as if not x == 3 and (y <= 5 or z == 0): # do what you need if x is not 3 and either # y is less than or equal to 5 or z is zero # which is not the same as if not (x == 3 and y <= 5) or z == 0: # do what you need if z is zero or unless x is 3 and # y is less than or equal to 5 # which is not the same as if not (x == 3 and y <= 5 or z == 0): # which is the opposite of the first expression Figure 5-2: Complex tests using logical operators Enough said about that. Using complex logical expressions or using nested if statements to achieve a multiple test is a matter of taste. Some people like to sprinkle and, or and not throughout their code; others never use the keywords at all. Most of us split the difference, choosing whichever strategy seems clearer at the time. The next control-of-flow statement is for. This is either the most insanely great or greatly insane feature of Python. If you are used to almost any other language's for statement, this one is going to look decidedly weird. Formally, the for statement is defined as: for_statement ::= for target_list in expression_list : block [else: block] Perhaps the most common form is the simple definite iteration, which is spelled for i in range(10): # do something 10 times while i increments # from zero through nine The range(10) part uses the built-in range function (which we'll see again later) to create the iteration -- in this case a list of the integers 0 through 9. So we could also have coded that statement as for i in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]: though it would have been silly to do so when the interpreter can do a better job of getting the list right than a human typist could. But this exposes the fact that the for statement's "in a list" is a lot more powerful than the FORTRAN do or even the C-like languages' for. Here's a for that's relatively easy to code in Python and would be very tough in any of the C-like languages: for i in [-5, -1, 0, 1, 5]: But wait... There's more! Here's an example from a program that needed to extract data from a table of constellations. The data structure consisted of a dictionary whose key was a 3-letter name and whose value was a list of coordinates in two dimensions. This was the pair of for statements that got the coordinate pairs for the purpose of drawing lines: constellations = { 'And': [ (0, 2, 4, 6), (8, 10, 12, 14) ], 'Ori': [ (-1, -2, -3, -4), (0, 0, 0, 0), (5, 6, 7, 8, 9), (10, 11, 12, 13) ] } # the above is shorter than the actual data structure # and does not contain actual data values but it is # correct in terms of structure. for name, coords in constellations.iteritems(): for line_seg in coords: # draw the line. Figure 5-3: Draw constellation code fragment again, apologies for again using built-in functions (in this case iteritems) without prior explanation. Suffice to say iteritems returns a list which is the key and value from a dictionary, provided in an order that is determined by lexigraphically sorting the keys. The iteritems function allows you to "walk" the dictionary without knowing how many key-value pairs it contains or even which keys are defined. The way I use the second for statement lets me walk a list without knowing its length. (Note also the use of tuples to ensure that we have a single object which completely describes the line to be drawn.) The for statement can take an else clause; this clause is executed when the enumeration list is exhausted. We saw this usage in Figure 3-1. The next control-of-flow statement is while. This is like the while statement of C-like languages: while x < 12: # do something (which had better eventually make x >= 12.) The test of the while statement is evaluated. So long as the result of the evaluation is True, the subsequent code block is executed. When the test fails, the loop is broken. The while statement also can have an else clause. If the test is (or becomes) False, the else clause is executed. As in the C-like languages, Python has the break and continue statements. These statements affect how loops are executed. When a break is encountered in the looping of a for or while statement, the loop terminates immediately without executing any more statements from within the loop's code block. Also, an else clause, if present, is not executed when a loop terminates because of a break. A continue, on the other hand, causes control to pass immediately back to the test of the while or the enumeration of the for. (The sensible use of break and/or continue is as part of an if statement contained within the loop's code block.) Here's an example. This code examines the elements of a list of (presumably) integers and prints all the odd numbers so long as none of the elements is zero: for i in range(len(the_list)): if the_list[i] == 0: break if the_list[i] % 2 == 0: continue print the_list[i] Figure 5-4: Use of break and continue We could have added an else clause to the for statement to report that we had scanned the whole list without finding a zero value if we wanted to. (Again, I'm using a built-in function, len, without having previously introduced it. The len function returns the number of elements contained in a sequence. So range(len(the_list)) creates an enumeration list that causes the variable i to go from zero to the maximum legal index for the_list.) We've actually seen the pass statement already in one of the examples. This statement is a "nop" -- it does nothing. The pass statement is needed whenever a construct requires a code block but there is nothing which should be done. Typically, this happens when, as in the example, an if statement should do nothing if the test is True but must do something when the test is False and re-writing the test to the opposite sense is not desired. The pass statement may also serve as a place-holder for a code block to be defined as the program evolves. In Python, the ability to output (to stdout, by default) is built into the language rather than provided by a library routine, as in C. Output is performed with type conversion when necessary so that the result is readable. The keyword that does this magic is predictable: print. A print statement consists of the keyword followed by a comma-separated list of expressions. Each expression is evaluated and the result converted to readable form and output. A newline is appended after the final expression unless the expression list ends with a comma -- in which case, the various expressions in the list are separated with a single space. (If you've been paying attention, you'll notice I've already used a few simple print statements.) The exec statement lets you execute an object as Python code, optionally defining its context (environment). In its simplest form, exec takes a single expression (either a string or an opened file) and passes this to the Python execution engine. The text is interpreted as Python code and runs in the current scope (meaning that the executed code has access to the same set of variables that it would have had if the code had been inserted in the program text at the same point). The more advanced forms of the exec statement allow one to map the references within the executed code to the namespace of the program in which the exec statement appears -- this means that when the exec runs code referring to x and y, the code may actually be using the containing program's variables a and b. That allows the exec code to be re-used in multiple context. Here's an example of just such a strategy: person = 'Sam' exec """print 'My name is', name""" in {'name' : person} (There is another form of exec which takes a second dictionary to distinguish global variable namespace translation from local variable namespace translation. But it's rarely necessary to use this feature. Separate the second dictionary from the first with comma.) Having the exec statement lets one implement a simple Python demonstration environment in Python: one would write a loop which would read a line of console input and then pass the line as a string to an exec. 6. Functions Okay, we've seen a lot of code fragments but we haven't seen much that looks like programs yet. So let's start getting familiar with the parts of Python which make it more than a fancy calculator. We'll begin by learning to define, call, pass parameters to and get results from subroutines. Subroutines in Python are called functions and are introduced by the def (which stands for define) keyword. A function must be in scope when it is called. (Functions can contain function definitions, so it's possible for a function definition to go out of scope.) Because Python does not declare types, a function's definition does not specify the types of the function's arguments nor the type of the function's return value. As with variables, these are defined by the context in which the function is called. Formally, a function definition is: def funct_name (optional_param_list): code_block The parameter list can be a C-like comma-separated list of identifiers or it may include Java-like assignments to specify default values for arguments. As in Java (and in C++), parameters with default values must come after all mandatory parameters. When a function is called, the list of argument expressions is evaluated and the values are matched to the function's parameters by position. This means that all mandatory parameters must be satisfied and if any parameter with a default value needs to receive a different value then all parameters to the left of that one must also have been given explicit values. In other words: def a_func(arg1, arg2=0, arg3=1, arg4=2): print arg1, arg2, arg3, arg4 a_func() a_func(7, ,9) a_func(7) a_func(7, 0, 9) # # # # illegal -- arg1 is mandatory illegal -- no way to skip an argument valid valid Figure 6-1: Use of default argument values However, Python allows you to call functions with keyword argument syntax instead of simply by positional relationship. In this type of call, your argument list consists of assignment expressions where the identifier which is the target of the assignment is spelled the same as the formal parameter which is intended to receive the value. So the problematical assignment of one default parameter without having to know the default values of all the parameters to its left is resolved. We can use any of the following calls: a_func(arg1=7, arg3=9) a_func(7, arg3=9) a_func(arg3=9, arg1=7) to assign 7 to arg1 and 9 to arg3 without having to know the default value of arg2. (Note that the second call mixes keyword and positional argument syntax; when you do that, all positional arguments must come before any keyword arguments and the keyword syntax must not attempt to describe the same argument as the positional syntax corresponds to. It's probably best not to mix the argument style -- if you need to use keyword arguments at all, use them for every parameter.) Default value expressions are evaluated once when the interpreter encounters the function definition. Therefore, although the expressions are permitted to refer to other identifiers, those identifiers must themselves have been assigned values before the def statement is encountered and changes to the values of those identifiers subsequently will have no effect on the default values. This makes it all the more urgent to use the keyword form of the argument list -- even if you know the expression which defined the initial value of an argument, the expression may not evaluate to the same value at the time of the call as it had when the function's definition was encountered. There are additional forms of the def statement's parameter list which allow for the equivalent of the ellipses (unknown number of parameters) found in the C-like languages. These exotic forms will not be covered in this introductory class. If you find yourself needing this capability, see the following URL: http://docs.python.org/tutorial/ controlflow.html#arbitrary-argument-lists Function definitions may be prefixed by "decorators". We won't discuss them in this class, either. (Since I haven't actually figured out what they’re useful for yet.) Within the block of the function may be nearly any kind of statement, including function definitions. Execution of the function terminates when the interpreter "runs off the end of the block" (which is detected because the indentation level decrements) or when the interpreter encounters a return statement. All functions return some value, but unless the function terminates with a return that specifies an expression, the value returned is the rather-less-than-useful defined constant None. Returning a useful value requires an explicit return statement which specifies an expression. Example: def the_answer(question): if (question == "life, the universe and everything"): return "Forty-two." else: return "Reply hazy, try again later." Figure 6-2: Use of return statement Having now seen that we can create functions, it's now time to introduce the topic of scope in Python. It's essentially identical to the scope rules in C-like languages, except for the fact that local identifiers are defined by assignment instead of declaration. Unlike C-like languages, identifiers defined outside the function (even if defined before the function’s definition) are invisible within the function. One can override this rule by using a global statement within the function's block. Any identifier mentioned in the global statement (if it is defined, of course) becomes read-write accessible to the function. And any assignment within the function's block makes a change to the global identifier instead of defining a local variable. For example: i = 3 def x(): global i print i i = 5 print i print i x() print i Figure 6-3: An example of the global statement will result in: 3 3 5 5 as the variable i used within the scope of x() is the global variable i and not some local variable. Also, unlike C++, an “inner” block (i.e.: a block that is not the body of a function but rather the body of an if, for, or while) does not create a new scope, so: def x(): i = 1 print i while (i == 1): i = 2 print i print i x() Figure 6-4: Inner blocks do not alter the scope results in the following output: 1 2 2 The one exception to the global scope rule is that references to functions (and, as we shall see, classes) that are in a scope level “above" that of the function will resolve correctly. So: def x(): i = 1 print i def y(): x() i = 2 print i x() y() Figure 6-5: Scope of functions results in the following output: 1 1 2 as expected and not an “uninitialized identifier" error. Another feature of functions (and classes, as we shall see later) is that if the first statement in the block is a string literal, the interpreter automatically assigns it to a special identifier named __doc__ (recall I said earlier that identifiers that begin and end with underscore are reserved for the implementation and that variables that begin with two underscores are by convention private). Should you use the docstring feature, there are tools which will extract the string and expect it to be the function / class description. And you may also get the docstring yourself by using a qualified reference: def x(): """ This is the docstring for function x(). Since x() doesn't do anything, the __docstring__ of x() is pretty boring. """ return print x.__doc__ Figure 6-6: Obtaining and printing a docstring, qualified names Notice the use of a qualified name in the print statement. This syntax is the same as in C++ or Java: the leftmost part is the highest-level containing object; the rightmost part is the desired variable; any intervening parts are sub-containing objects in hierarchical order. The qualifier symbol is the period character. If there were more functions, you could print each docstring by just prefixing with the name of each of the functions in turn. When we get to classes, we're going to wind up using qualified names a lot. Python has a plethora of builtin functions. There is no way I can take the time to cover them all. But some are so commonly used that I must do more than simply list those. So I'll first present a list of the builtin functions and then I'll pick a dozen or so to describe in detail. Unlike keywords, the names of builtin functions are not reserved. You are permitted to define your own function named len() if you like, but when you do, the builtin function of that name will become unavailable in that scope. Since len() is a very commonly-used builtin, you probably don't really want to override that name. This is why I am going to list all the builtin functions even though I won't be covering all of them in detail: it's better to know the entire list of names to avoid. Anyway, here's the Python builtin function zoo: abs bin classmethod delattr enumerate filter getattr help int len map object pow raw_input reversed slice sum unichr zip all bool cmp dict eval float globals hex isinstance list max oct print reduce round sorted super unicode __import__ any callable compile dir execfile format hashattr id issubclass locals min open property reload set staticmethod tuple vars basestring chr complex divmod file frozenset hash input iter long next ord range repr setattr str type xrange Table 6-1: The builtin functions Note: the builtin function print is new with Python 2.6 and you must take special action to activate it as the print keyword normally "hides" this function’s name. The abs function takes the absolute value of the single numeric argument it is given. The numeric argument may be an integer, a long integer, a float or a complex type. If the argument is a complex, the function returns the magnitude of the complex (square root of the sum of the squares of the real and imaginary part). The bool function casts its argument to a boolean value (True or False) according to the following rules: 1. If no argument is passed, the function returns False. 2. If the argument is numerically equal to zero, the function returns False. If the argument is non-zero, the function returns True. 3. If the argument is a string (any kind) and is empty, the function returns False. For non-empty strings, it returns True. 4. If the argument is a complex and its magnitude is zero, the function returns False. For complex values with non-zero magnitudes, the function returns True. 5. If the argument is a set, tuple, list or dictionary, the function returns False if the object contains no elements and returns True for an object which is not empty. The chr function returns the ASCII character corresponding to the function's integer argument. The argument must have a value in the range 0 through 255. The cmp function takes two arguments and performs a comparison between them. The function returns an integer whose value is zero if the two arguments are equal, negative if the first argument is less than the second and positive if the fist argument is greater than the second. (Comparisons of non-numeric values are performed lexigraphically; if two lists are of differing length, for example, and their elements are equal up to the point where the shorter list is exhausted the longer list is considered greater than the shorter list. (One should not expect the integer being returned to make any kind of mathematical sense; use only the fact that it is negative, zero or positive for program logic.) The complex function casts its arguments to a complex number, where the first argument becomes the real part and the second becomes the imaginary part. If only one argument is passed, the result is a complex with a zero imaginary part; if no arguments are passed, the result is a complex with a magnitude of zero. The divmod function returns a tuple consisting of the integer quotient of dividing the first argument by the second and the remainder of the division. For integer arguments, the result of x = divmod(a, b) is the same as the expression x = (a // b, a % b). The enumerate function returns a list of tuples each of which consists of the ordinal and value of the element of the sequence which is the function's argument. That's a mouthful, so I think a code example is necessary: list = ['apples', 'bananas', 'cantaloupes', 'dates'] for (i, fruit) in enumerate(list): print i, fruit Figure 6-5: Fruit salad The above example will print 0 1 2 3 apples bananas cantaloupes dates (In Python 2.6, the enumerate built-in function was extended to accept an optional second argument which specifies the starting index for the enumeration. If your Python environment doesn't support this feature, you can always get the same effect by using a slice operation.) The execfile function provides the same capability as the exec statement except that instead of executing the first argument (a string) as Python code, the function's first argument is taken to be a pathname to a file which (presumably) contains Python code. Like the exec statement, execfile can take optional arguments which will be the global and local variables of the scope context in which the function is invoked. The float function converts its argument (a string) into a float value. If you have used the C standard library function atof then this will look familiar. The globals function returns a dictionary containing the global symbols from the symbol table of the module that contains the call to globals. This function creates an object which is suitable for use in an exec statement or as the second argument to an execfile call, though to make use of this dictionary, the Python code in the string (or file) being executed would have to be using the same names for the global variables as the caller. The dictionary contains key / value pairs in the form 'name' : value where name is the identifier of a global variable and value is what that variable was set to when the globals function was called. The hex function converts its integer (or long integer) argument to a string representing that value as hexadecimal. The string begins with '0x' and uses lower-case letters for the high-value digits. The id function returns the unique integer value which is the interpreter's symbol table target corresponding to the identifier which was given as the function's argument. Remember that issue with multiple assignment? Well, the id function could have shown us whether or not a = b = [1, 2, 3] was dangerous because we could have simply asked to print id(a) == id(b). If the result was True, we had an "aliasing" situation; if the result was False, the multiple assignment was safe. The input function reads a Python expression from stdin. You may optionally pass a string as the function's argument. If you do, the string is output as a prompt. (You will probably want to end the string with a space, as input begins in the column immediately after the prompt.) This function is somewhat dangerous because whatever the user types is evaluated as a Python expression. So your program may fail with a syntax error exception if, for example, you want string input and the user forgets to provide proper string delimiters. But for quick-and-dirty input from an expert user (like yourself -- you never make mistakes, do you?) this function will at least allow interaction with a Python program. But the raw_input builtin function (see below) is a safer choice for obtaining input from humans. The int function accepts as its argument a string which it will attempt to convert to an integer. An optional second argument can specify the radix for conversion (default is decimal, but any value from 2 through 36 is permitted). If the radix is zero (a special case), Python attempts to determine the radix from context (e.g.: if the string contains an 'f', Python will probably assume hexadecimal). The len function, as we have seen in a couple of examples already, returns an integer which is the number of elements in a sequence object such as a list or string. The list function takes as its argument any sequence object and makes a list of its elements. If given a string, for example, the result will be a list of one-character strings corresponding (in order) to each character of the given string. The locals function is similar to the globals function we saw already. The returned value is a dictionary of the names and values of the the currently-defined local variables. The long function constructs a long integer from its argument(s), either a string or a number. If the argument is a string, an optional second argument specifying the radix integer can be given. The max function returns the largest value of all values given by the function's argument list. The arguments passed can be either multiple discrete values or a single argument which is a sequence. (If the latter, each element of the sequence object is evaluated to determine the largest.) The min function returns the smallest value of all values given by the function's argument list. The arguments passed can be either multiple discrete values or a single argument which is a sequence. (If the latter, each element of the sequence object is evaluated to determine the smallest.) The oct function is similar to the hex function we have already seen. It accepts an integer argument and returns a string that is the representation of that integer in base-8 (octal) notation. The returned string has the "0o" prefix. The open function is not entirely unlike the Standard C library's fopen. The function takes a string which is interpreted as a pathname, a second string which specifies the I/O mode to be applied to the file and an optional third argument which is an integer specifying the buffer size. If the file exists (or can be created if the mode argument allows that), the open function returns a file object. Possible values for the mode string are: 'r' - open for read, converting newlines to '\n' if required. 'rb' - open for reading binary (no conversion). 'r+' - open for reading and updating 9allows random access to the file). 'r+b' - open for reading and updating in binary. 'w' - open for write, create if required else truncate and convert '\n' to the platform-specific representation for a line ending. 'wb' - open for write, creating if required else truncate for binary. 'w+' - open for write and updating (truncate but then allow random access as the file is written). 'w+b' - open for write and updating in binary. 'a' - open for write-append (does not truncate). 'ab' - open for write-append (does not truncate) for binary. 'a+' - open for write-append and random access. 'a+b' - open for write-append and random access for binary. The buffer size argument is a positive integer value with two "magic numbers": A buffer size of zero specifies unbuffered I/O and a buffer size of one specifies line-by-line I/O. Any other value is the size of the I/O buffer bytes. The maximum, minimum and default buffer size values are implementation-specific. The ord function takes a one-character-length string and returns the integer value (ordinal) for that character. This is the converse operation to what the chr function does. The pow function raises its first argument to the power of its second argument. This is similar to the ** exponentiation operator. But pow can take an optional third argument which results in a calculation equivalent to the expression a ** b % c but the implementation is much faster. The range function has been used in several examples already. This function takes a single integer argument and returns a list containing all the values from zero to one less than the argument's value. But it can also take two integer arguments. In that case, the list contains all the values from the first argument's value to the second's minus one. So the list need not start at zero. But wait, there's more! The range function can take a third integer argument which specifies the step increment, so the values in the list are not required to differ by +1. (Of course a step of zero is nonsense, so it's not allowed.) This function's most important use is to describe the iteration list of a for statement. The raw_input function reads from the console (stdin) but does not attempt to process the input as a Python expression. Like input the raw_input function can take an optional argument which is the prompt string. The function reads from stdin until a newline is encountered and returns a string value which is all characters up to but not including the newline. The repr function returns the string which is the printable representation of the object which was passed as its argument. This is usually the same as what the user would see if the print statement were to print the object. The reversed function accepts a sequence object as its argument and returns the same kind of sequence object but one whose elements are copies of the elements of the argument arranged in reverse order. The round function takes two arguments: a float and an optional positive integer. Its purpose is to return a float whose precision is specified by the integer and whose value has been rounded (away from zero) to satisfy that precision. If the precision argument is not supplied, the function will assume zero. Note that the rounding algorithm for cases where the value to the right of the lowest retained digit is exactly 1/2 is to always round the magnitude up, irrespective of whether the digit to be affected is even or odd. This may not be correct according to the procedure that kids are currently taught in grammar school. The set function returns a set constructed from its argument, which may be any kind of sequence object. Recall that a set cannot contain duplicate elements; use of this function will eliminate duplicates in, for example, a list. So unique_x = list(set(x)) would create a new list containing only the unique elements of the provided list. The sorted function takes a sequence object and returns an object of the same type with its elements arranged in lexigraphical order. But the function can take three additional, optional arguments. The first of these, cmp specifies a custom comparison function (pass None if the standard comparison works for you). The second function key specifies a function that translates the element of the sequence into some value which should be used instead for comparison purposes (again, None specifies no translation required). The final argument, reverse, is a boolean value which if set to True causes the function to sort in descending order instead of the default ascending sort process. Note that using key and/or reverse is less expensive than using cmp since the cmp function is called multiple times per element whereas key need only be called once and reverse simply changes how the return value of the comparison is interpreted. The str converts its argument into a string. This function usually returns the same thing as repr would have. But occasionally the returned strings will differ. The repr function returns a string which Python could evaluate; the str function will return a string which "looks good", even if it cannot be evaluated. The sum function takes a list of numeric elements (of the same type) and an optional start index value (default is zero) and returns the sum of the elements from start though the last element of the list. The tuple function returns a tuple which is made from the elements of its argument (a sequence object). The elements of the tuple will be in the same order as the elements of the argument. The unichr function returns the Unicode character corresponding to the value of the function's numeric argument. The unicode function, in its most basic form (one argument) converts its argument to a Unicode string in much the same way as str converts its argument into an ASCII string. The optional second and third arguments allow you to specify the encoding and what to do in event of errors. 7. Objects Python is an OOD-capable language. Like C++ or Java, it allows one to define custom objects called classes. To define a class is to define a new data type and to describe that type's behaviors and capabilities. Python allows for inheritance and permits one to define both class variables and instance variables. But Python classes are not very good at data hiding since Python does not have a concept of privacy -- all class variables and methods are public. A class is defined with the class keyword, followed by an identifier, then optionally the class's inheritance specifier and the statement ends with a colon (which means of course that the statement introduces a block). The inheritance specifier consists of a pair of parenthesis surrounding a comma-separated list of expressions (usually simply identifiers) which are the base class(es) from which this class inherits. Yes, Python supports multiple inheritance. Class definitions closely parallel function definitions in that they introduce a new scope range. But a class keyword is a marker for an executable statement -- all the statements within the class but outside of contained function definitions (methods) are evaluated as the class's namespace is created. This includes the expressions in the inheritance specifier and the assignments which create the class variables. The class's namespace at this point is the list of identifiers of methods and class variables. Note that the code of the class's methods is not executed when the class is defined. So to define a class variable one places an assignment statement inside the class's block but outside of the block of any of the class's methods. The assignment defines the variable and its type and also serves to initialize the variable. A class variable is common to all instances of the class and can serve as a communication channel between instances, since any instance can alter the value. A class variable is referenced from within the class by using a qualified name which -- if you have followed the convention for naming the first argument of a method -- would be to prefix the name of the class variable with self. Example: class TheClass: i = 0 def a_method(self, x): self.i = x Figure 7-1: Referencing a class variable Okay, as usual I got ahead of myself. It's time to explain how to define a class method. As it turns out, you define a class method the same way you define a function. Except class methods always have at least one argument which will be set to a reference of the current instance of the class. Hence, by convention, this first argument is given the name self. (When a class method is called, this argument is not supplied, however: to the outside world, a class method has one less argument than its definition demands.) To invoke a class method, the calling code must also use a qualified name, in this case prefixing the name of the method with the name of the instance of the class. Example: class Powerful: def cube(self, x): return x**3 def square(self, x): return x**2 raised = Powerful() i = 3 print raised.cube(i), raised.square(i) Figure 7-2: Invoking a class's method There is a special method of every class, named __init__ which is the initializer method. This method is analogous to the constructor in C++: it sets initial values of (in fact defining) the class's instance variables. The __init__ method automatically runs when an instance of a class is created. The arguments to __init__ are passed during instantiation. (In the above example, the only argument that Powerful.__init__ gets is the unstated self). Since the Powerful class has no instance variables and needs no initialization, we haven't bothered to define an __init__ method for it. A typical task in an __init__ method is to run the __init__ method of the base class from within the descendant class's initializer, like this: class D (B): def __init__(self): B.__init__(self) Figure 7-3: Initializing the base class inside the descendant's __init__ This is one time where one must pass an argument to satisfy the usually unstated self argument. Since we are attempting to use the __init__ method of the class B without instantiating B, we must pass an instance of type B to the method. In this case, we can pass self which is an instance of D because D is a descendant class derived from B and is, therefore a kind of B. Naturally, if the initializer of B takes additional arguments, the initializer of D must supply them, either by making them up as part of the initializer method's code or by passing them along from the initializer's own argument list. Let's look at class variables versus instance variables again. This code: class X (object): a = 0 def __init__(self): self.b = X.a X.a += 1 self.a = 0 def value(self): return (X.a, self.a, self.b) def increment(self): self.a += 1 x = X() print x.value() y = X() x.increment() print x.value() print y.value() Figure 7-4: Use of both instance and class variables will output (1, 0, 0) (2, 1, 0) (2, 0, 1) because X.a is a class variable (and therefore in the above code counts instances), whereas self.a and self.b are instance variables. In particular, x.a is incremented when we call x.increment() but y.a is unaffected. I skipped over two built-in functions in the previous topic. These functions are appropriate for classes (which is why I didn't cover them previously): isinstance issubclass Table 7-1: Builtin functions useful for classes The isinstance function takes two arguments, an identifier which is an instance of a class and an identifier which is the name of a class. The function returns True if the instance is either the specified class or if it is an instance of a descendent of that class. The issubclass function returns True if the class name given as the first argument is a descendent of the class name given as the second argument. It also returns True if the two arguments specify the same class. class B: def __init__(self): pass class D (B): def __init__(self): pass d = D() print isinstance(d, print issubclass(D, print issubclass(D, print isinstance(d, print issubclass(B, D) B) D) B) D) Figure 7-5: Use of the isinstance and issubclass builtin functions The above code will output True True True True False because d is an instance of D which is a subclass of B but B is not a subclass of D. All class methods are (in C++ terms) virtual. So if your class B implements a method m and your class D needs a more elaborate (or different) implementation, you simply define a method D.m and that will override the base class's method. (Your descendent class can still get to the base class's method, though, by referring to B.m.) Here's an example: a class that is similar to int except only even numbers are permitted. class EvenInt (int): def __init__(self, value = 0): self.data = 2 * (value // 2) def __add__(self, value): self.data += 2 * (value // 2) return self.data def __sub__(self, value): self.data -= 2 * (value // 2) return self.data def __str__(self): return str(self.data) i = EvenInt(3) j = EvenInt(2) print i, j i = i + 2 j = j + 1 print i, j j = j - 2 print j i = i * 2 print i i = i / 5 print i Figure 7-6: Overloading base class methods The above code is incorrect. The implementation of the EvenInt class is incomplete. As a result, the program outputs 2 2 4 2 0 8 1 Addition and subtraction perform as expected. Multiplication also works in this example, but by accident -- you can't get an odd number by integer multiplication if one of the operands is an even number, which was the case when we did the multiplication. But division fails because the __div__ method of the int class has not been overloaded. The correct behavior would have resulted in a quotient of 2 (the result of dividing 8 by 4). Likewise, a bunch of other methods from __int__ need to be overridden for completeness. Execute help(int) in your Python interpreter to get some idea of what would be needed. Python classes have no equivalent of the C++ private keyword. But because data hiding and namespace isolation is so useful a language feature, a naming convention has evolved: If you define an identifier within a class whose initial two characters are underscore but which does not end in an underscore, the Python interpreter will perform "name mangling" on the symbol (for instance, in Python 2.6 for Windows, if you have defined __a in class X then what winds up in the symbol table is _X__a). This handles the namespace separation issue nicely. But the mangled name is still public so you must abide by the social contract not to refer to X._X__a or to _X__a in an instance of X. Also, other Python implementations may mangle the name differently, so breaking into the system by anticipating how the name may be convolved will result in non-portable code. 8. Exceptions Python is supports exceptions. Like Java, there are pre-defined exceptions and exceptions which you get when you include a library module. Of course Python also allows you to define your own custom exceptions because exceptions are actually classes derived from the Exception object. The pre-defined (built-in) exceptions are reserved identifiers, just like the built-in functions are. The built-in exceptions are: BaseException Exception StandardError ArithmeticError EnvironmentError AttributeError FloatingPointError IOError IndexError KeyboardInterrupt NameError OSError ReferenceError StopIteration SystemError TypeError UnicodeError UnicodeDecodeError ValueError WindowsError LookupError AssertionError EOFError GeneratorExit ImportError KeyError MemoryError NotImplementedError OverflowError RuntimeError SyntaxError SystemExit UnboundLocalError UnicodeEncodeError UnicodeTranslateError VMSError ZeroDivisionError Warning UserWarning DeprecationWarning SyntaxWarning FutureWarning UnicodeWarning PendingDeprecationWarning RuntimeWarning ImportWarning Table 8-1: The Built-in Exceptions Why is the above table broken into groups? Because the exceptions are classes which have a hierarchical relationship. BaseException is the ancestor class of all the others. Exception, StandardError, Warning, and UserWarning are also classes from which many others are descended. The hierarchy of exceptions is too complex for a simple tabulation. Instead, we need a tree diagram like this: BaseException +-- SystemExit +-- GeneratorExit +-- KeyboardInterrupt +--Exception +-- StopIteration +-- Warning | +-- DeprecationWarning | +-- PendingDeprecationWarning | +-- RuntimeWarning | +-- SyntaxWarning | +-- UserWarning | +-- FutureWarning | +-- UnicodeWarning | +-- BytesWarning +-- StandardError +-- BufferError +-- AssertionError +-- AttributeError +-- EOFError +-- ImportError +-- MemoryError +-- ReferenceError +-- SystemError +-- TypeError +-- ArithmeticError | +-- FloatingPointError | +-- OverflowError | +-- ZeroDivisionError +-- LookupError | +-- IndexError | +-- KeyError +-- NameError | +--UnboundLocalError +-- RuntimeError | +-- NotImplementedError +-| | | | +-| | | | +-- EnvironmentError +-- IOError +-- OSError +-- VMSError +-- WindowsError SyntaxError +-- IndentationError +-- TabError ValueError +-- UnicodeError +-- UnicodeDecodeError +-- UnicodeEncodeError +-- UnicodeTransateError Figure 8-1: Exception Hierarchy Knowing that tree lets you decide which built-in exception to use as the base class for your own. Most likely, you would derive from Exception unless you know of a more appropriate base. (Do not derive from BaseException.) Using exceptions is not much different in Python than it is in Java. You enclose a block of code which is likely to raise an exception in a try block and code an except block to handle the expected exception. The way this works is as follows: the code in the try block is executed until/unless any exception occurs. If an exception happens, the rest of the try block is skipped. If the exception is specified by (one of) the except block(s) the corresponding except block is executed instead. If no except block is defined to handle the particular exception, the exception is passed to the next outermost block to be handled. This could be the Python interpreter itself, as all uncaught exceptions will wind up there. If the exception is caught in an except block, the outer block (including Python) will not be notified, normally. There may be multiple except blocks to handle different exceptions differently or an except block may specify a parenthesised, comma-separated list (i.e.: a tuple) of exceptions which the block would then all handle in the same way. The last except block can be anonymous, in which case that block must handle all exceptions not already mentioned (which could hide program bugs if the handling isn't done carefully). Additionally, there may be an else block (which must come after all except blocks) which executes if no exception is raised. Lastly, there may be a finally block for code which must be performed whether or not the exception(s) are raised. The finally block must come after everything else. Here's a little pseudo-code to make that clear: try: # code which could raise the FoobarException but might # also raise the FooException or the BarException except FoobarException, e: print e # let the exception explain itself except (FooException, BarException): print """Foo or Bar but not Foobar.""" except: # catch everything else here (bad strategy!) print "we got some exception we never expected" else: # code to do in case no exception is raised finally: # code to do whether an exception is raised or not Figure 8-2: Using exceptions Notice in the above example that the first except block specifies an instane, e, of the FoobarException, whereas the second except block only specifies the names of its exceptions. It's your choice. But you will need to specify an instance if your exception handling depends on access to the attributes of the exception class. Exceptions which derive from the Exception class (which is most of them and probably anything you are likely to define for yourself) define a __str__() method to return a string which is a human-readable representation of the arguments used to construct the expression object. You may elect to examine the expression's args attribute (a list) directly. Note: the syntax involving the use comma is very similar to the syntax for multiple exceptions. Be careful to use parenthesis when you mean the latter! (Version 3.0 of Python introduces a new syntax for when you mean an instance of a particular exception.) To raise an exception, your code needs to instantiate the desired exception object and then use the raise statement to signal it. if lions and tigers and bears: e = OzException("Lions and tigers and bears, oh my!") raise e Figure 8-3: raising an exception (You could, of course combine instantiation and signaling into one statement by saying raise SomeException(args) and never have an explicit instance to refer to.) You can "pass the buck" by inserting a raise statement in your except block. This will pass the exception along to the outer block's handler (if any). In this case, you simply say raise without instantiating an exception object and the one your except block caught is the one passed upwards. This is useful when your except block must do some action in response to the exception involving objects local to the function containing the try but cannot completely handle the exception because the calling block needs to be notified as well. The following code is an example of this: def read_ini(filename, p): f = open(filename, 'r') try: # attempt to read f line-by-line and assign values in # the list p. for line in f: p.append(line) except IOError, e: print e close(f) # must still close f -- nobody else can raise close(f) def read_profile(p): p = [] # ok since the list (p) is passed by reference. try: read_ini('my_file', p) except IOError: # if we get an IO error, treat the entire file a # worthless and set p back to an empty list. p = [] Figure 8-4: Using raise to pass handling to outer block We'll see more of this when we tackle the topic of File IO, next. 9. File I/O We have already mentioned the open built-in function. This function returns a file object (a class called file). We can just consider a file to be a "black box" that provides an interface to the platform's file system for the purpose of I/O. This interface is embodied in a set of methods. Each method is simply a function which can perform some action on (or with) a file. The fact that these functions are methods of a class means that the functions can operate on instance data related to their tasks, starting with the path to the file being manipulated and the I/O mode which were passed as arguments to the open function. To invoke a particular method on the file corresponding to an instance of the file class, you call that method using a qualified name, in this case qualified by the name of the file variable which was set by the open call for the specific file. Example: foo = open("xyzzy/plugh", "r") foo.seek(0, os.SEEK_END) Figure 9-1: Using the qualified name to access a file method There are some other wrinkles in file I/O, too. For example, if "xyzzy/plugh" is not the name of a file to which read permission is granted, the open call will fail and throw an exception. We could catch that exception (rather than just letting the program terminate) to make the program handle the problem gracefully. Likewise, many of the methods can throw exceptions which can be handled. We'll discuss which methods throw what exceptions and later show a robust example where these exceptions are caught. Here is a list of all the attributes and methods belonging to the file class: close() fileno() name readline() tell() closed flush() newlines readlines() truncate() encoding isatty() next() seek() write() Table 9-1: File I/O attributes and methods errors mode read() softspace writelines() Many of these methods are spelled the same and work the same as the C stdio library functions. Others are unique to Python but turn out to be very useful to know. I'll cover the stdio-like methods first and then some of the others. Along the way, I'll point out a few Python-esque tricks that the file class permits you to perform. The close method closes the file. The file instance still exists, but it is no longer good for much. The closed attribute for that file object will now be True. (The closed attribute is read-only.) The fileno method returns an integer which is the file descriptor value that the underlying implementation of the I/O system uses. This attribute's value is sometimes needed when accessing low-level I/O methods. The flush method writes the contents of the file's I/O buffer to the actual file. This method could actually do nothing if the I/O is unbuffered or if the underlying implementation does not allow a flush operation. The read method takes an optional parameter, an integer specifying the maximum number of bytes to read from the file. Without the parameter, the method reads to EOF. The bytes are returned as a string. Once EOF is reached, the method returns an empty string. The readline method takes an optional parameter. This parameter will limit the reading to that many bytes. Otherwise, the method returns a string consisting of all bytes up to and including the next newline, if any. The newline is retained but if the file ends with an incomplete line or if the line read is longer than the optional parameter specified, the string may not contain a terminal newline. The seek method repositions the file's read/write cursor (the place in the file from which the next I/O operation will start). This method takes a mandatory offset argument and an optional origin argument. The offset argument is a long integer; the origin argument may be one of three values: os.SEEK_SET, os.SEEK_CUR and os.SEEK_END (and defaults to os.SEEK_SET if it is not supplied). If the origin argument is os.SEEK_SET, the cursor is repositioned relative to the start of the file (and the offset argument cannot be negative). If the origin argument is os.SEEK_END then the cursor is positioned to the end of the file (and the offset argument cannot be greater than zero). In the final case, when the origin argument is os.SEEK_CUR, the offset can have any value and the cursor is positioned relative to where it was when the seek method was invoked. (If this looks suspiciously like the stdio C library function fseek, it should!) The tell method reports the position of the file's I/O cursor. The value returned is the offset relative to the origin given by os.SEEK_SET. Note that the value returned may not make sense on Windows implementations if the file contains odd newline encodings unless the file was opened in a binary mode. (This function is bug-compatible with stdio's fseek.) The truncate method takes one optional argument, a size. If the argument is not supplied, size is set equal to the value that tell() would return. This method sets the file's size to the specified value. The file cursor is not repositioned, so if you truncate the file to a size less than where the cursor is pointing, you'd better call seek before attempting another operation. The write method writes an array of bytes from its string argument. The mode attribute is the string passed to the open function which specified the file's I/O mode (e.g.: "r+"). This attribute is read-only and some files will may not have a meaningful mode attribute. The name attribute is the string passed to the open function that is the name of the file. (This attribute is read-only and may not be meaningful for all file objects.) The isatty method returns True if the file is a tty (e.g.: stdout) or a tty-like file. The encoding attribute is a function which is used to encode Unicode strings into the file's byte-stream. This attribute may be None if the file uses default encoding. The errors attribute is a function which handles Unicode encoding errors. The newlines attribute returns a touple of strings which is all the encodings so far encountered which are recognized as newlines by the Python implementation. This attribute is read-only and may be None if no newline has yet been encountered or missing if the implementation is not configured to accept multiple encodings for newline. (The default configuration accepts '\r', \n' or '\r\n' as newlines.) The file must be opened for read, of course. The softspace attribute is used by the print statement to decide whether or not it must output a space before the next item. (Note that this is a state variable being used by the print statement and not a way to affect how print works.) The next method is the file's iterator. It returns the next line of text from a file opened for input. The readlines method takes an optional parameter which, if present, specifies an approximate limit to the data size (if the implementation chooses to recognize this limit). The default behavior for this method is to do a readline repeatedly on the file until EOF is reached, returning a list of strings. The writelines method writes each element of its argument, an iterable containing strings (typically a list of strings). Despite its name, the method just writes what it is given; the strings themselves must supply the terminal newlines. But since readlines reads lines and doesn't remove the newlines, the writelines method is truly performing the converse operation. A file class is its own iterator. That means that if you open some file for read, you can then use the file class itself to step through the data line-by-line. Here's how: foo = open("foobar_file", "r") for line in foo: print line Figure 9-2: Using a file as an iterator 10. Modules Module is just Python's term for a file containing Python source. Instead of simply starting the interpreter and entering Python statements until you have a program (then losing it all when you exit Python), you use a text editor of some sort and write your code to a file. (Aside: not all text editors are created equal. The Idle development system which was installed when you installed Python has a syntax-colored editor with helpful features like auto-indent.) The file you created then becomes a module that you can tell the Python interpreter to load and run. Another use for a module is to contain a library. A library is a collection of code (usually functions or classes or both) which is inter-related and provides a set of (hopefully) debugged capabilities that you simply import instead of having to reinvent. Much of this topic is going to be about how to use the import statement, actually. It's vastly more powerful than the C++ #include or even Java's import. A module's filename (minus the ".py" suffix) becomes the name by which it is imported. Everything within the module can be imported as a block or you can pick and choose. You can even import a definition and change its name as you do so, providing a cheap way to avoid naming collisions when you must import multiple modules. The import statement has many forms. The simplest is simply import followed by the module's name, e.g.: import foo. This imports the module foo (or throws an exception if the module cannot be found). None of the names within that module are imported, however -- you must reference everything using a qualified name, e.g.: foo.bar = 42. Since all names must be qualified by the module name, name conflicts are not possible. But if you need to frequently refer to the various components within a particular module, you can say from foo import * and import everything from that module into your namespace (which may result in name collisions, since these names will no longer be qualified). The general form of that last import statement form is from module import name-list where name-list is a comma-separated list of all the names you want to import directly or * which means import all names (except those which begin with underscore). An even more elaborate form is from module import name as alias optional-list where optional-list is zero or more of , name as alias which allows you to import multiple names each with its own alias. Example: from math import e as E, pi as PI, tanh, acos will import the constants for the base of the natural logarithm and the ratio of the circumference of a circle to its diameter under the alternate names E and PI, respectively, and will import the hyperbolic tangent and inverse cosine functions under their standard names. Important: the other names from the math module are not imported and the module itself is not imported, making even a qualified reference for those names such as math.sin() undefined. So if you're going to pick and choose, you need to know what you will need from the module. But one way to manage the name collision problem is to take advantage of Python's "ad hoc variable declaration" behavior and simply assign a qualified name to a global variable. Like this: import math PI = math.pi E = math.e tanh = math.tanh acos = math.acos print acos(PI / 8), tanh(E / 4), math.sin(PI) Figure 10-1: import your cake and reference it, too By using this technique, you can have simple names for commonly-needed components from a module and still be able to reference any of the other components. Modules may contain executable statements for the purpose of initialization and declaring data. These statements get executed once -- when the module is first imported. (Since one module may import another, it's possible for you to import a given module more than once without realizing it.) This is how math.e and math.pi get created -- there are a couple of assignment statements in the math module. When you invoke the Python interpreter as a command-line, you can cause a module to run as a script. (I dislike this term -- Python modules run from the command-line are programs. To call them scripts diminishes their importance, as if Python wasn't "real" code.) The command-line syntax would be: Python module-name arguments-if-any When invoked in this manner, the module's name becomes __main__ instead of whatever the filename was as it would be if the module were imported. This is a useful fact to know because you can then put a few statements in the module which are to be executed when __name__ == "__main__". This code would then allow you to both run the module as a program and to import it. Here's one way: def _main(args); # since this name begins with _, it is # unlikely that this will be imported # by accident. """ Main function of module. Will run when module is invoked as a program. """ # code of main function. if __name__ == "__main__": import sys # ensure we have the sys module. _main(sys.argv) Figure 10-2: Turning a module into a program (sys.argv is a list of strings constituting the command-line arguments.) Even if your module is not normally intended to run as a program, you might use this technique to run code which tests the module. That lets you bundle the module with the code which was used to certify it. That's all very nice, but what if somebody else wrote a module you want to import? Do you have to read the whole module to see what functions or classes it provides? As it turns out, no. The built-in function dir if called with no argument returns a list of strings which is all the names you have defined. But if you call dir with an argument which is the name of a module, then the list returned is that of the names defined in the module. And if the programmer has done a professional job, you can then find out what each function does by printing the function's __doc__ string. Here's a quick-and-dirty way to get a module's documentation: import math for name in dir(math): cmd = "print math." + name + ".__doc__" print "math." + name + ":" exec cmd Figure 10-3: Code to act like help(module) The above code works fine on modules which consist only of functions and where those functions all contain doc strings. The math module almost meets this requirement -- the constants e and pi don't have doc strings so you get something strange for those: the doc string for the float data type.