Download M02 Notes: Introduction to Python (extended)

Genomic Data Manipulation BIO508 Spring 2015 Python 00 And Now For Something Completely Different The Meaning of Life Whoever's reading this, meet Python. Python, meet another enthusiastic student ready to plunge into a world filled with wonderment and random punctuation1. Since you probably haven't met before, we'll be spending a while getting to know each other. Python is a programming language full of features that are now considered modern, and it's found homes in contexts ranging from directing the artificial intelligence for computercontrolled game players2 to animating the physically improbable explosions of the starship Enterprise 3. But all of this came about following a development process that began in the late 80s, allowing Python to age gracefully and mature over the course of several decades. We'll learn to love it due to its simple-yet-elegant design, its forgiving-yet-clean syntax, its refreshing aftertaste, and its light aroma of crushed leaves and spicy apples. In truth, Python's core feature set, syntax, and most of all, philosophy haven't changed much since its inception. It's a scripting language, which means (among other things) that Python programs are just plain text files. To make them go (that is, to do something other than sit there containing text), you have to use a special program called an interpreter. Think of a Python program as a list of instructions (because they are, really) that doesn't do much on its own - your parents might tell you to clean your room, take out the trash, do the dishes, and shampoo the goldfish, but those jobs won't get done unless you buckle down and put in some elbow grease. The Python interpreter is the thing that actually does the dirty work of turning your programs into real work on the computer4. The main Python interpreter (a program called, creatively enough, python) is a black box: you throw a program in one end, turn a crank, and all of your results pop out of a chute. This is usually what you expect a computer to do, if you think about it; when you double click on Your Favorite Game 5, it runs and does stuff and doesn't stop to tell you things like, "I'm going to draw an Angry Bird now," or, "I'm trying to access the network now." When you're writing a program yourself rather than running one, though, some visibility is nice - and the Python interpreter is supplemented by an interactive interpreter when it is executed without a pre-existing complete program. Interactive Python lets you run exactly one command at a time to see what it does and whether it works quite the way you think it should. Interactive Python is available through at least three different interfaces, and since I don't know whether you'll be using Windows, Linux, or a Mac to do your work, I'll refer to them all interchangeably as "interactive Python":   Running the program python from a command prompt with no arguments, on any computer, will enter an interactive Python session denoted by the >>> prompt. This is the most common method of performing quick-and-dirty interactive Python sessions, and the one I use myself. The program IDLE is a semi-graphical interface to Python most often used on Windows, but also available for Linux and MacOS. It's not much more sophisticated than a window running interactive python, rather than a command prompt running interactive python, but it's a step up. Most programming languages are overly fond of punctuation; in Python's case the issues tend to be spaces and tabs instead... Most famously the Civilization series, but also Battlefield, EVE Online, and several Disney masterpieces such as everyone's favorite Little Mermaid Pinball. 3 Python is one of the languages used by Industrial Light and Magic to orchestrate their visual effects animation processes. 4 Now if only I could make it clean my house... 5 Not to be confused with My Favorite Game, which is a Cardigans song. 1 2 S00-1  ipython is a powerful replacement for interactive python most commonly used on Linux, but also available (with some tinkering) for Windows and MacOS. Use at your own risk, but it's a fun tool to try out6. Anyhow, none of this matters yet, because you have to learn how to write Python programs before you can run them! Let's dive in to the good stuff. Results Ok, well, not quite yet. One last note on notation. If we agree that a Python interpreter is something that turns commands into shampooed goldfish (i.e. results), we need a way of notating instructions (the stuff that you tell Python to do) and output (the stuff that Python gives you back as output). Every Python instruction produces a result in the form of a return value. You can think of this as the shampooed goldfish - the result that's actually generated when Python carries out an instruction. If you tell Python to add 1 and 2, the number 3 should (hopefully) by the return value. Return values will sometimes (not always, unless you're specifically using python interactively) be displayed on your screen, but even when they're not, they're "visible" to the program itself; they can be used as inputs to further commands. For example, you might tell Python to add 1 and 2, then multiply the result by 3. Even if you don't see an intermediate 3, it's there, and your final return value should (again, hopefully) be 9. When I, your humble scribe, am painstakingly constructing these notes from wild electrons captured by my computer's transistors from the ether, I'll use the  symbol to indicate that the thing on the left evaluates to the return value on the right. For example (and yes, these are actually valid Python code): 1 + 2  3 ( 1 + 2 ) * 3  9 We'll deal more with return values later - you can't program without them! - but for now, consider this a heads up as to what those funny little arrows mean. Whitespace I'd like to say something important about Python: That's right; what you just read is very important in the Python language! Unlike almost every other programming language out there, whitespace is part of Python's core syntax, meaning spaces, tabs, and newlines. There are many places where the exact amount of whitespace doesn't matter: in between words, before and after parentheses, and at the end or in between lines. All of the following are equivalent: ( 1 + 2 ) * 3  9 (1+2)*3  9 (1 + 2) * 3  9 ( 1+ 2 ) *3  9 But Python cares very much about whitespace at the beginning of lines, referred to as indentation. Python uses indentation to denote semi-independent blocks of code, these being sections of your programs that are nested 6 And is made to be reminiscent of Matlab or R, for those of you familiar with those tools. S00-2 inside (or outside) of each other for organizational reasons. We'll see more of them later, so for now, remember that Python is quite the martinet regarding indentation and whitespace, and keep the following rules in mind:       One indentation level is equivalent to one tab or four spaces. Each program must use only spaces or only tabs at the beginning of each line. Do not mix them! You will produce ladybugs. Unless you are starting a new block or ending the current one, each line must be indented identically 7 to the previous one. The start of a new block is typically denoted by a line that ends with a colon (:), and the following line should increase by exactly one indentation level. The end of a block is denoted only by unindentation, no other characters or symbols; multiple blocks can end simultaneously, thus requiring two subsequent lines to unindent by multiple levels. If one line of your code is so long that it needs to be wrapped with a newline simply for readability, place a backslash (\) at the end and continue on the next line. A backslash used in this way negates Python's usual indentation rules. Putting all of these together to provide a sneak preview of things to come, the following is a fairly robust Python program, for which we'll introduce all of the necessary syntax and semantics over the course of the next several scintillating pages of notes: for i in range( 2, 10 ): print( "Multiples of: " + \ str(i) ) for j in range( 10 ): print( i * j ) print( "Done with: " + str(i) ) print( "That's all, folks!" ) References One of the features that makes Python surprisingly unique for a scripting language is that it is 100% certified organic. No pesticides were used during its cultivation, it contains no genetically modified crops, and it was hand-picked by fairly paid indigenous Luxembourgian laborers. It also happens to be roughly 95% object oriented, which means that almost every piece of data in Python is an object. We'll get to the nitty-gritty definition of objects later, but for now, just think of them as things, blobs, chunks-o-stuff, piles of data and bits and bytes and all that sitting together in the computer's memory. Objects can be very complicated, but since everything in Python is an object, something as simple as a number or a word is an object in an of itself. To keep track of all of these objects, Python uses references. If it isn't too much of an anachronism, think about a library with one of those old-fashioned paper card catalogs8. If every book is an object (a chunk of data that lives somewhere in the library's memory), every card is a reference. It's a small indicator (a sign or pointer, if you will) that tells you where to find the real object. When you program in Python, all of the data you have to manipulate is accessed through references9. Say that one three times fast... In a few years, I'll teach this course again and nobody will know what I'm talking about. I don't think I've used or even seen a non-electronic card catalog in ages... 9 Because, to continue the metaphor, it's easier for the computer to manipulate a little card whenever possible; it only goes and digs out the big heavy books when it absolutely has to. 7 8 S00-3 A few of the most basic data types are exempt from this rule, and the ones we care about are numbers and strings10. Don't get me wrong - numbers and strings are still objects in Python - but you don't have to worry about references when dealing with them. Instead, Python allows you to manipulate these data types by value rather than by reference. Think of them like magazines; they're not big, heavy books, so it doesn't make sense to create special reference cards just for them. Numbers and strings are often referred to collectively as scalar data types, so if you see me use that word by accident, you'll know what I mean 11. To manipulate any objects in Python - numbers, strings, or otherwise - we have to give them names. Variables are the boxes in which we can carry around data when programming. In Python, there are very few restrictions on variable names - they must begin with a letter12, can contain letters, numbers, and underscores (_), and are case sensitive. That is, abc is different from aBc is different from ABC, and they're all valid Python variable names. I'll use a convention called CamelCase 13 in which we avoid underscores and instead capitalize separate words within a variable name. I'll also generally start names with a lower case letter, and as a convention (not a requirement), I usually start with a group of letters indicating what type of data they're containing. For example: strIAmAString = "string" iIAmAnInteger = 1234 dIAmAFloat = 1.234 Don't worry about the specifics of that little example yet, except to notice that...  I'll try to be consistent about putting Python snippets in the Courier New font. Hope I don't mess up too much!  Each variable starts with a few lower case letters that represent its type followed by an arbitrary name. The lower case letters are optional; they're how you tell other programmers (or yourself!) reading your code what type of data the scalar is intended for14. "str" stands for a string, "i" for an integer, and "d" for a floating point value15.  Once you've created a variable, you can use it just like you would the original data. Anywhere you want to use the string "string", you could use our strIAmAString variable instead. Variables give a name to a container for data, and we've put the data "string" into the strIAmAString container. Numbers Python knows about two important kinds of numbers: integers (with no fractional part) and floats (real values with stuff after the decimal). Any plain old integer can just be written as itself: 0 123 -987654321 These are all valid Python integers - and don't forget that integers aren't floats! Python represents floating point decimal numbers as an integer followed by a . character followed by a non-negative integer, or using scientific notation with an "e" for exponentiation: The other major ones are weird Python-specific constructs like tuples and frozen sets, which are essentially immutable ordered and unordered bags of values, respectively. Needless to say, I don't think we'll be saying much about these beasties... 11 I say "by accident" because it's really not proper to call them scalars in Python - they're objects just like everything else. However, I'm old fashioned enough that most of the languages I know aren't fully object oriented and treat scalar data types differently. This is what happens when you're in computer science; you're ancient and decrepit after only a few short years. 12 Or, technically, an underscore, but this has a special meaning that we'll discuss later. 13 ForObviousReasonsInLongNames, as in ThisAliceTheCamelHasSevenHumps. 14 This is a relaxed form of something called Hungarian Notation developed by Charles Simonyi, a founding member of Microsoft who came up with these mnemonics to help make sense of the old Microsoft Office code. He was Hungarian (hence the name!) and created oodles and oodles of very specific prefixes to use for variable names in different programming languages. We'll use only a few of them here, and not strictly in the way they were intended; they were originally meant for C, which has a fairly different idea of typing than Python does. 15 This stands for "double", the more precise of the two floating point representations available in C/C++. Ruby represents floating point numbers internally using a few different formats, and it'll interconvert between them silently. We'll use d for simplicity. 10 S00-4 1.0 0.98765 -3.1415926535897932384626 6.02e23 No surprises so far, right? Strings A Python string consists of zero or more characters between single or double quotation marks. Zero characters represent the empty string, and more than zero characters represent, well, a nonempty string. Strings are just bunches of text, and they can range from a character or two that you dump right into your program up to pages and pages of text that you read in from a file. Python is very good at handling strings - large strings, small strings, strings containing weird binary data, anything you can think of. But for our purposes, you can just think of them as characters plunked together in sequence, represented programmatically by quotation marks. This means that the following are all valid Python strings: "I am a string" '' "12345" 'Niagara falls, the Hoover dam...' Unlike many other languages, single quoted strings (') and double quoted strings (") are identical in Python! I prefer to use double quotes (") because this matches the convention of most other programming languages, but the choice is really up to you in Python. Both ' and " create unescaped or interpolated or interpreted strings. Python provides quite the toolkit for manipulating interpreted strings - and as such, you'll probably never need to use escaped strings (single quotes) for this class. If you do, they are created using triple quotes, three single quotes '''like this''' or three double quotes """like this""". In such escaped strings, each character inside the single quotes is treated as exactly what it looks like (except for the single/double quote character itself). In unescaped strings, certain characters (particularly the backslash, which we'll get to imminently) are treated specially and can produce unintended consequences. These are generally unlikely, though, and all of my examples will use double quoted strings; all of your code should use them, too. It's just good to know that alternatives exist. The astute observer might immediately ask the question, "How does one create a string containing a double quote character?16" Python provides a collection of escape sequences for indicating that something in a string should be processed specially. Escape sequences are preceded by a backslash character (\) and indicate that the following character doesn't mean what it normally does. For example: "The " is a double quote" "The \" is a double quote" That first thing isn't a valid Python string at all - it's not a sequence of characters wrapped in double quotes. The second thing is, however, because the escaped double quote doesn't mean what it normally does! It's interpreted as a literal double quote character, not an end-of-string marker. Python lets you insert all sorts of funky characters (and other things) into strings this way 17, but only a handful of these escape sequences are useful more than once in a long while. These include...  \' and \" and \\, which insert a single quote, a double quote, or a backslash, respectively.  \t and \n, which insert a tab and a newline, respectively. Useful for nicely formatted output! 16 17 If it wasn't too confusing, I would have written that as, "How does one create a string containing a "?" just to illustrate the problem... This actually isn't far off, since there's a way to use an escape to insert any Unicode character into a string, but we won't worry about that... S00-5 Booleans I actually sort of half-lied-but-not-really when I told you that only numbers and strings were special in Python. Most programming languages has special ways of representing the concepts of "true" and "false", and Python is no exception. The special symbols True and False are Boolean data types meaning, well, exactly that - true and false. Watch the caps - those big T and F are critical! Furthermore, the special symbol None means essentially "nothing at all"18, and it's also a false value. So is any number equal to zero, or any collection or container 19 that's empty - like the string "". Everything else is true. Let me repeat that: The special values False and None are false. Any number that represents zero is false: 0, 0.0, 0e0, etc. Any string or collection that's empty is false: "", '', [], {}, etc. The special value True and absolutely anything else at all is true. So anything that isn't False, None, zero, or empty is true! The number 123 is true, the number 1 is true20, the string "true" is true, the string "false" is true21, even the string "0" is true. Of course, whenever possible, you should just use True for true and False for false. Because that, like, makes sense and stuff. Lists When you mash a bunch of references together, side by side, you basically get a list, which is Python's name for what most programming languages call an array. In Python, a list is an ordered collection of zero or more references stored one after the other, and there's lots of notation associated with them. You can refer to a whole list at the same time just like you can refer to any other variable, and I'll typically prefix their names with a lower case a22: aIAmAList = ["one", 2, strThree] This is also a nice demonstration of the second piece of list syntax: you can represent a list value by wrapping a bunch of other values with square brackets and separating them with commas. Lists can be empty (in fact, they start out that way by default, if you don't put something in them): aIAmAnEmptyList = [] or they can contain any mixture of other values, written out explicitly as strings and numbers and whatnot or by giving their variable names: aIAmAListOfIntegers = [1, 10, iOneHundred, 1000] Since most Python lists end up containing an odd mishmash of different types of data, you can just prefix your list name with "a" and be done with it. Or, if you know that a particular list will only contain one thing (like just integers or just strings), you can prefix its name with "a" and the appropriate more specific prefix. So astrStrings would be a list of strings, and the thingy just above is a list of integers. If you want to reference a particular element of a list, you access it by a subscript placed between square brackets. You'll get back a reference to the object at that location in the list. This is often a number, string, or other simple type, but lists can contain anything at all - even other lists! List indices start at zero23, so in the examples above... For those of you familiar with Java or C/C++ or any of those, None is Python's version of null/NULL/etc. We'll talk about them imminently - on this very page, in fact! Look one paragraph down... 20 Note that this follows the same pattern as C/C++ and Perl (modulo Perl's bizarre type conversion rules), but not Java or Ruby. 21 Although don't ever let me catch you using "false" as a true value intentionally! Very naughty programming practice. 22 Because sane people call these arrays, not lists, but there's some history and details to this that we'll avoid here. Python gets religious sometimes. 23 As well they should! Everything should start with zero. 18 19 S00-6 aIAmAList[0]  "one" aiIAmAListOfIntegers[3]  1000 aIAmAList[3]  where   is an indicator that something will go very wrong 24. Notice a few things going on here: Accesses to locations outside of a list's contents results in an error. This can be useful sometimes, because it helps to operate on the principle that bad code should fail sooner rather than later. If your program accesses an index that's beyond the end of a list, it's probably unintentional, and generating an error like this will immediately (and vocally) inform you of the problem. For those of you used to C/C++ or Java where arrays are statically sized, this is not the case in Python. That's part of why they're more properly called lists - Python arrays allow you to easily insert new items at the beginning, middle, or end of an existing list. So aIAmAList[3] might be undefined now, but if I add one more element (which we'll see how to do a bit later), it'll magically take that spot. I'm actually going to jump ahead a little and give you a taste of more advanced Python syntax, since it's impossibly boring to have bunches of things (i.e. lists) without knowing how many things are in the bunch 25! If you need to ask a list how many elements it contains, just wrap it with len() - that is, parentheses and the word "len". This means... len( aIAmAnEmptyList )  0 len( aIAmAList )  3 len( aiIAmAListOfIntegers )  4 Dictionaries Just like lists are collections of references indexed by number, Python dictionaries are collections of references indexed by other references (often strings). A list pairs sequential, increasing numbers with values; a dictionary pairs arbitrary keys with values in a data structure known as a hash or hash table. You "look up" the value of a particular entry by accessing it with its key, just like you look up values in a list by accessing them with integers. That being said, a lot of the notation for dictionaries is very similar to the notation for lists, but with different funny punctuation. A whole dictionary can be constructed by wrapping pairs of values with curly braces 26: hashIAmADictionary = {"first key" : "first value", "second key" : "second value"} Lists just contain individual elements, so they're constructed by wrapping values with square braces. Dictionaries contain pairs of values, so they're constructed by wrapping pairs of values with curly braces; each pair is separated by commas, and the key and value in each pair is separated by colon (:). Dictionaries can also be empty, and also start that way if you don't put anything in them, and they can also contain a mixture of data types - both keys and values can be of any type at all. hashSchedule = {"morning" : 9, "afternoon" : 12.5, "evening" : 19} The prefix notation falls apart even more for dictionaries. From the examples, you can see that dictionaries are prefixed with the characters "hash", but since they can contain any sort of weird combination of references, there's not much more you can say about them. You can access the contents of a dictionary by wrapping a key with [] And will be of significant amusement value to nobody but me, since it dates back to the first version of this course. Which was taught in Scheme, using a software environment for programming called DrScheme, which signified bugs in your code using a ladybug icon not unlike this. 25 Come Mr. Tally Man, tally me list elements... if the scansion weren't so poor, I might try to parody it correctly. It always just makes me think of Beetlejuice, anyhow. 26 Often referred to using the technical term "squiggly brackets," which I think offers a more strikingly graphical contrast to the less hip "square brackets." 24 S00-7 (square brackets, just like accessing a list element), and again like lists, keys with undefined values generate a ladybug: hashIAmADictionary["first key"]  "first value" hashSchedule[19]  hashSchedule["evening"]  19 Remember that you put in keys to get values - putting in values will just get you a (unless there happens to be a key with the same value as a value; which is a situation that can legitimately occur, assuming you understood that little tangle of jargon in the first place). We'll see more about this later, but since I tempted your programmatic taste buds with the more advanced len function for lists, I'll demonstrate a similar appetizer for dictionary access. "How," you may ask, "can you tell if you're skirting the precipice of abject failure by attempting to access an undefined key in a dictionary? To what lengths must one go in order to prevent the Python interpreter from spewing forth the ladybug of improper actions?" Just as the len function operates on a list to return its length, the get function operates on a dictionary and a key in order to test whether a corresponding value is contained. The get function also uses an additional piece of punctuation and is written hash.get( key ) - that is, your dictionary variable name followed by a period (.), get, parentheses, and the key to be tested: hashIAmADictionary.get( "first key" )  "first value" hashSchedule.get( )  hashSchedule.get( 19 )  None hashSchedule.get( "evening" )  19 hashIAmADictionary.get( hashSchedule )  None len( hashIAmADictionary )  2 Lessons to be learned?  It's hard to generate a ladybug using get, unless you do something completely beyond the pale like not even providing it with a key to look up.  If you get a key that is in a dictionary, the return value is the same as accessing that key's value using square brackets.  If you get a key that isn't in a dictionary, the return value is always None.  A bonus: len works on any collection, including dictionaries! I wouldn't try get with a list, though, unless you have aphids that need to be eaten... Keywords Based on the principle that it can only get easier from here, we'll start with the exceptions. In Python, "keywords" basically means "anything that doesn't follow the normal rules". Think of these like irregular verbs27; they have unique meanings, and although the act roughly like everything else we'll see in Python, they're all a little special in various ways. Python, being such a well-designed little language, has surprisingly few of these.  # The pound sign, hash mark, or whatever you'd like to call it, indicates a Python comment. A comment is something that the Python interpreter ignores; it's just there to make a statement about the code, to document what you're doing, or to fill up space. Comments aren't evaluated, don't have a return value, and have no bearing be/is/was, eat/ate, do/did/done, etc. etc. etc. There aren't actually very many truly irregular verbs in English, really - it's just that there are multiple standards. The "canonical" English inflection is stuff like -s, -ed, -ing, etc., so something like arise/arose/arisen can be considered irregular but still follow a pattern that holds for many verbs. It pretty much depends where the word originally came from. Anyhow... 27 S00-8 on the program at all. They're optional; any Python program will run identically if you remove all of the comments. But that's not true for Python programmers! Comments are meant to help people reading your code, whether it's someone else or your future self28. Use them liberally, whenever you write a piece of code that wouldn't be immediately obvious to someone else. 1 + 2 # Lalalalalalala Oohoohoo  3 1 + 2 Notice that there's no hash here   pass Python's "null statement" is written pass, and it does exactly nothing. No return value, no changes, no side effects, no modifications, no refunds, no turning back. There are very few cases in which you need to use pass (specifically when you're in a syntactically mandatory block that doesn't actually need to do anything), but if you happen to be writing a program that does nothing, it's very useful.  del There are various ways to remove objects from collections such as lists and dictionaries in Python, but this particular programming language feels the need to go one step further. Python provides a blanket statement for destroying things in the form of the del keyword. del followed by a list reference, a dictionary reference, or even a simple variable name will irrevocably remove the offending item from existence: iLiad = "dactylic hexameter" aiLias = [0, 0, 7] hashIsh = {"California" : True, "Amsterdam" : True, "Elsewhere" : False} aiLias[0]  0 iLiad  "dactylic hexameter" del iLiad del aiLias[1] del hashIsh["California"] aiLias  [0, 7] hashIsh  {"Amsterdam" : True, "Elsewhere" : False} iLiad  Aside from allowing some Python poetry with respect to Prop 19, I find del to be useful only rarely, since it's unusual to undefine a variable once it's been given a value; you can typically just ignore it if you don't need it any more29. Removing elements from lists or dictionaries is a lot more useful, but for some reason just doesn't seem to happen very often in real code. Something to keep in mind when you're feeling destructive, though.  if/elif/else Normally, Python pretty much just does what you tell it to30. if is something called a conditional - it allows one Python statement to do two (or more!) different things. Not at the same time, but Python gets to pick. Let's look at an example: Reading your own code several months later can be tricky! I feel like there's some commentary to be made here on the morality of variable deletion and the ethics of putting unwanted variable names out of their misery, but I fear for my employment status if I make two politically sensitive jokes on the same page. 30 Exactly what you tell it to do, whether that's what you actually want or not... 28 29 S00-9 if "you "say else: "say  feel like it": this" this instead" "say this" if has at least two parts: a test or a condition that is evaluated for truth or falsehood, and a body that is executed if the condition is true. It can optionally have an else clause that is evaluated instead if the condition is false. The rules are...  You can follow if with any one Python statement, itself followed by a colon :. This is the condition, and it's always exactly one statement followed by a colon.  You can put as many Python statements as you'd like after the if, indented by exactly one unit. This is the body of the if, which is executed if the condition is true. You must terminate the body by unindenting, which lets Python know where the if ends and the rest of the program continues.  You can optionally include one or more elif keywords, each followed by exactly one test statement and an indented body. elifs are executed in order, and the body corresponding to the first test condition that succeeds (if any) is immediately executed.  You can optionally include the else keyword followed by an indented body containing as many statements as you'd like. If you include an else, it must be terminated by unindenting. Anyhow, back to if. The rules for if are as follows:  Let's take an if statement of the form if x: y else z  First, Python evaluates x.  If x is true, the if evaluates y and returns the result of the last statement.  Otherwise, x is false, and the if evaluates z and returns the result of the last statement.  Notice that I've crammed these all onto one line; normally, each would be separated by a newline and properly indented. Let's look at some examples: if "true": "one"  "one" "two" if None else "buckle my shoe"  "buckle my shoe" if False: 3  nil if 4: "shut" "the" else: "door"  "the" if aSomeList: 5 else: 6e6  5 aEmptyDict = {} if aEmptyDict.get( 0 ): "pick up" else: "sticks" S00-10  "sticks" Almost every value in Python is true, but you have to be careful! You usually won't see a False unless you really mean it, but Nones, zeros, and empty collections can creep in from many sources, and they're false! Also look closely at that second example; you can write an if statement on a single line, but you must reverse the body and the condition. That is, as long as each body contains exactly one Python statement, the following two blocks are equivalent: if x: y else: z y if x else z Anyhow, back to if yet again. We mentioned a fourth piece of optional trickery that you can use in an if statement. Suppose your first condition fails, but instead of defaulting to some else clause, you want to test a second condition. Or a third. Or a fifteenth. You could write this: if "first test": "first body" else: if "second test": "second body" else: if "third test": "third body" else: "else clause" but that's fairly strange looking. The indentation is a mess, the unindentation is confusing31 (especially at the end, as it were), and if I hadn't labeled things as first, second, and third, it would be really hard to tell which happened when32. Instead, Python provides the elif keyword, short for "else if", which behaves in exactly the same way, but without the confusion: if "first test": "first body" elif "second test": "second body" elif "third test": "third body" else: "else clause" Remember two things:  elif is just shorthand for nested if/else statements. It doesn't do anything on its own; it just lets you be lazy. Although it has nothing on curly braces in C/C++ or parentheses in Scheme and Lisp... Especially if you mess up your indentation. I haven't been emphasizing this nearly enough... Pay attention to indentation! Python forces you to indent exactly once for each body and to unindent afterwards. It is possible to get this wrong, though, and have code that doesn't do what it looks like solely because it's improperly indented. Most Python editors will automatically indent and unindent for you, but even they don't always get it right. Good indentation can make code surprisingly much more readable; imagine reading your favorite book with no spaces between the words! 31 32 S00-11   It's elif, not else if or elseif or elsif! If I had a nickel for every time I've gotten syntax errors for mistyping this one, Harvard wouldn't have to pay me to work here33. while The if/else construct is called a conditional, a statement that allows a program to do different things in different circumstances. It's not too exciting if you don't have, say, random numbers or user input, though; testing if True: is going to be true every single time, so the whole shebang is pretty pointless. But there are other uses for conditionals that are kind of nice even just for the saving-time-and-space laziness factor. For the sake of an example, I'm now going to reveal four deep, dark secrets. Python has a lot of weird stuff built into it, and some of it can be fairly shocking sometimes. I don't want this to frighten you - goodness knows how many good programmers have been lost to syntax-induced heart attacks - so buckle down and prepare yourself. In rapid succession...  Python has an assignment operator, =. You've seen it before, up above; it evaluates the statement on the right and assigns its result to the variable (always a reference to an object, remember) on the left.  Python has an addition operator, +. It evaluates the statements on the left and right, makes sure that they're numbers, and then returns their sum.  Python has a comparison operator, <. If evaluates the statements on the left and right, makes sure that they're numbers, and returns True if and only if the left hand one is less than the right hand one.  Python has a way to display output, print. If you write the statement print( "printer" ), you'll get the output "printer". This isn't a return value! It's something that's just displayed on your screen, not something that's accessible to your program34. I'll use a little arrow → to denote text that's produced on the screen for display (and not as a return value). I'm sneaking these dark truths in here so that I can come up with a plausible example; you'll get the details on why and how they really work a bit later. For now, believe me that they exist, and treasure your brief preview of the nutty world of Python functions and operators. So, moving right along... Suppose you want to print out the first ten primes. You know that they're numerical values, and you can put any of them into a plain old variable. You could put all of them into a list. You can print any one of them with print, and you could just write ten different print statements to get the job done. But what if you wanted to print out the first hundred primes instead? Or every multiple of three below a million35? My laziness factor kicks in below ten, let alone a million, so we're going to have to work out some sort of deal here. Fortunately, Python provides a construct called the while loop (although I should say up front that it's not really a good solution to our problem; I just cooked up the simplest example I could think of to illustrate it). To solve our prime printing problem36, check out the following: aiPrimes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29] iPrime = 0 while iPrime < len( aiPrimes ): puts( aiPrimes[ iPrime ] ) iPrime = iPrime + 1 I mean, not that I'm complaining. Being paid is nice; it allows me to buy life-preserving substances such as "food" and "housing" and "clothing". Trust me, you'd like me a lot less if I couldn't buy food. Or clothes. 34 Actually, the behavior of print was one of the major changes between Python 2 and 3, oddly enough, so look very closely at which version you're using before you believe anything detailed about print. 35 The largest of which is, of course, 999999. Not to be confused with the largest prime less than a million, which is 999983. 36 Peter Piper printed a peck of paired primes. How many paired primes would a prime prior prior pair if a prime prior prior paired paired primes? 33 S00-12 There's a fairly large bucket of new information in there, so let's refine our palates on a few of the finer specimens, shall we?  First of all, this is actually a decent Python program - several separate statements, each on their own line. Python executes them in order, top to bottom, just like you'd expect.  There are examples of how to use several of the things we've seen in, well, useful ways: variables, arrays, numbers, len of a list, and user-visible output. And, of course, the while itself is the new guy. The rules for while are...  Let's take a while statement of the form while x: y.  First, Python evaluates x.  If x is true, the while evaluates y (a block of indented statements terminated by an unindent) and starts over again with x.  Otherwise, x is false, and the while loop stops. You can have while loops that run forever (which is bad): while True: print( "This is the song that never ends..." ) You can have while loops that don't run at all (which is fine, under the right circumstances): while False: print( "This loop has self esteem issues." ) And as with any other Python construct, you can combine while loops with the other pieces of syntax you've already seen (or with ones you haven't yet, for that matter): aiUninteresting = [1, 7, 2, 9] iIndex = 0 while iIndex < len( aiUninteresting ): if aiUninteresting[iIndex] < 5: print( "small" ) else: print( "big" ) iIndex = iIndex + 1 → "small", "big", "small", "big"  for/in I warned you that the while loop was a poor way to achieve our nefarious purposes in that last example, and in fact, it's almost never used in Python. Loops like this are used extensively in other languages, though, so they're worth knowing! But, if you want to be loopy in a Pythonic sort of way, you'll probably be using something like for. The general form of a for loop is as follows: for var in aSomeCollection: do some stuff A while loop runs its body statements over and over again while some condition is true. A for loop runs over some specific collection (typically a list) and executes its body statements once per element in the collection. As an added convenience, a for loop takes some variable and binds it to (i.e. gives it the value of) each list element in turn. This variable can then be used in the body of the for loop. This might make more sense if I use a for to rewrite our example above the right way: S00-13 aiUninteresting = [1, 7, 2, 9] for iVal in aiUninteresting: if iVal < 5: print( "small" ) else: print( "big" ) → "small", "big", "small", "big" We follow the for statement itself with the special keyword in, which is just a way of denoting where the variable of the for stops and the collection starts. The variable name on the left is typically used only in the body of the for (which is terminated as usual with an unindent). It's usually bad practice to keep using it outside of the for body, although it is defined and retains the value it had during the loop's last iteration. The body of this for loop runs exactly four times. The first time, iVal takes the value 1, and we get the output "small". The second time, iVal takes the value 7, producing "big". Then iVal is 2, then iVal is 9, we've seen each value, and we're done - we've looped over each value in the list. When you think about it this way, you can see how this example (and any for loop, for that matter) is roughly equivalent to (yet vastly shorter than): iVal = 1 if iVal < print( else: print( iVal = 7 if iVal < print( else: print( iVal = 2 if iVal < print( else: print( ...  5: "small" ) "big" ) 5: "small" ) "big" ) 5: "small" ) "big" ) range Ok, so range isn't a Python keyword at all, but it's so useful in combination with for that I'm going to describe it here anyhow. Technically range is a function and behaves very much like len as described above: by providing it with some inputs wrapped in parentheses, it transforms them into an output return value that we can use in surrounding Python code. Specifically, len is a function that inputs one collection and returns one number representing the number of elements it contains. Range is a function that can inputs one (or two or three; see below) numbers and returns a list containing the values counting up to that number. In other words: range( 0 )  [] range( 1 )  [0] range( 5 )  [0, 1, 2, 3, 4] This looks pretty boring, but it becomes a pleasantly effective tool in combination with for loops, allowing one to iterate over index in a list rather than each element. That is, rather than taking the values in the array, the loop variables takes the values of their indices (starting at zero). Just writing this: S00-14 aiUninteresting = [1, 7, 2, 9] for iIndex in range( len( aiUninteresting ) ): if iIndex < 5: print( "small" ) else: print( "big" ) → "small", "small", "small", "small" doesn't do what it did before, because now iIndex is taking the values 0, 1, 2, and 3. To use range to act like the original for, we need to use brackets to access elements of the list: aiUninteresting = [1, 7, 2, 9] for iIndex in range( len( aiUninteresting ) ): if aiUninteresting[iIndex] < 5: print( "small" ) else: print( "big" ) → "small", "big", "small", "big" You'll probably end up using for more often without range (or at least I do), but it's useful to have both floating around in your head somewhere. You never know when being able to access a list by index rather than value might save your life! range goes above and beyond the call of duty to provide other tools for counting and looping, however. It doesn't have to remain tied to lists or len at all, and you can provide it with two numbers instead of one as inputs in order to count from an initial value to a final value: for iVar in range( iFirstNumber, iSecondNumber ): do stuff here This loop will run (iSecondNumber - iFirstNumber) times, binding iVar to each number from iFirstNumber (which must be smaller) up to iSecondNumber exclusive. We could rewrite our uninteresting example in yet another equivalent (but fairly yukky) way: aiUninteresting = [1, 7, 2, 9] for iIndex in range( 0, len( aiUninteresting ): if aiUninteresting[iIndex] < 5: print( "small" ) else: print( "big" ) → "small", "big", "small", "big" Finally, range can stretch to consume three numbers as inputs, the third representing the step or increment by which it counts. This step is typically one (note that in all of the examples above, our ranges only know how to count up by one at a time), but can be made larger or negative: range( 2, 32, 8 )  [2, 10, 18, 26] range( 8, 4, -2 )  [8, 6] range( 3, -1, -1 )  [3, 2, 1, 0] Note that negative steps are often ill-advised, since they can get $@!!# confusing (I mean, come on - what's up with that last example?), but they do allow us to make loops that count down rather than up: S00-15 for i in range( 3, -4, -1 ): print( i ) → 3, 2, 1, 0, -1, -2, -3 Keep in mind that all of these prints are just producing output - you should never rely on the return value of a loop! It doesn't really make sense, and I think most of them are defined to return weird stuff anyhow. In general, if you want to run a loop some specific number of times, use range with one argument. If you want to count from somewhere to somewhere else, use two arguments. If you want to loop over each item in a collection, use for/in by itself.  continue/break So what happens when you're looping along 37, somewhere in the middle, and you get bored? What if you have some occasional weird exceptions that allow you to skip the body of the loop? Or what if you... oh, I don't know. Say you have some random collection of numbers, and you just want to print out the ones greater than or equal to ten. You might do something like this: for iPrime in aiPrimes: if iPrime < 10: continue print( iPrime ) Of course, there are a million other ways to do it (hint: there's a greater than or equal to operator), but this is just an example of the continue keyword. continue acts as a special command inside of a loop that means basically, "skip the rest of the loop body and immediately start on the next iteration." In a while loop, this means going straight to the test; in a for, the body starts again with the next element from the collection. So continue provides a way for you to bail out of the current iteration of a loop, and if you really want out, you can skip out on the whole thing. Suppose you're somewhere in our prime production again, and you want to stop completely as soon as you see any prime less than or equal than ten. Then you can say: for iPrime in aiPrimes: if iPrime < 10: break The break keyword bails out of the whole loop as soon as it's executed; nothing else from the loop body or test is executed. Python goes straight on to whatever the next thing after the loop is. Needless to say, if you tell Python to continue or break outside of a loop, you'll make it unhappy. Oh, and if you continue or break inside of nested loops, it only affects the innermost loop (although there are ways around that, you usually don't want to use them). For example: for i in range( 3 ): for j in range( i + 1, 6 ): if j < 4: continue print( str(i) + ", " + str(j) ) → "0, 4", "0, 5", "1, 4", "1, 5", "2, 4", "2, 5" 37 When you're loopy, as it were. S00-16 There's a little bit of funny syntax in that print there, but you can probably figure out what's happening if I tell you that str converts a number (or anything, really) to a string and + concatenates strings. We'll get to them soon!  raise/Exception You should sense a theme here. continue bails out of the current iteration. break bails out of the current loop. Well, the obvious next step is to shuffle off this mortal coil entirely. When your Python program is ready to give up the ghost, to push up daisies, to cease to be, to be an ex-program, you tell it to raise. Which doesn't make a whole lot of sense, really; the only lame metaphor I can think of is that it's waving around a flag of surrender or something. Perl uses the much more evocative "die" keyword to perform the same function. The thing that is "raised" is called an exception, which is a moderately ubiquitous programming term referring to an object that contains information about an unusual condition. Just like a variable can contain a number with some value, or a string with some value, an exception contains some value that explains the situation that caused it (typically some kind of error). When we raise exceptions (which requires a capital E, as in raise Exception), we'll provide them with a string that describes why we took such drastic steps. In particular, since the raise instruction causes the entire program to exit immediately, it's quite appropriate for debugging: you can tell raise to print out why the program went poof. Suppose we're absolutely never supposed to see a prime less than ten. Then we might say: for iPrint in aiPrimes: if iPrime < 10: raise Exception( str(iPrime) + " is the wrong input, you insensitive clod!" ) print( iPrime ) This seems fairly innocuous, but it means that you, the programmer, get to know what the value of iPrime was at the time of death. This can be invaluable in more complicated situations; if yor program breaks because of bad input (especially when reading from a file, for example), you want to know what the bad input is. This one is fun to try out in practice, though - not that the others aren't, but it's fun to find ways to make a program die 38! So I'll make this the last example and let you get to work trying things out. Making It All Happen So you know what Python looks like. How do you make Python know what you look like? Or in other words, how do you make it go? If you're not using the interactive shell (which I don't think we'll ever need to), Python runs in essentially two parts: you write a script using your Text Editor Of Choice (TM), and the Python interpreter reads in and executes that script. Because of the vagaries of the command environment and the Python interpreter itself, the process is a little convoluted. It goes thusly: 1. Fire up a text editor. vi, emacs, pico, joe, nedit, BBEdit, Notepad, you name it. 2. Make the first line "#!/usr/bin/python". This tells the command environment that you're writing something that should be run by the Python interpreter. 3. Skip a line, and then write some Python! Put all of your code in this space. 4. Save the file as "something.py", where the "py" suffix stands for Python. 5. From the console, go to the directory where your file is and run "python something.python". a. If you've never used a console/command line/terminal before, here's your chance! We'll discuss this in more detail later, and many details are available elsewhere; your options are: 38 That's really pretty much what programming is about, after all. S00-17 b. 6. i. On Windows, run Command Prompt from the Start menu (it's located under Programs/Accessories). ii. On MacOS, run Terminal (it's located under Applications/Utilities). iii. On Linux, you wouldn't be running the OS if you didn't already know how to launch a command prompt. Change directories using the cd command. For example, if you saved your script on the desktop (which you generally should), you should type "cd Desktop" (without the quotes, of course) to get there. Enjoy! S00-18 Python Keyword (Mostly) Quick Reference Keyword(s) # Syntax # I am a comment Meaning Makes the interpreter ignore the rest of the line. Null statement; fills in empty blocks. pass pass del del i del list[i] del hash[k] if elif else if test1: body1 elif test2: body2 else: body3 while while test: body for/in for var in list: body range range( end ) range( begin, end ) range( begin, end, step ) continue continue break break Terminate the completely. raise raise Exception( message ) Terminate program execution and print message. Removes an element from a collection (list or dictionary) or undefines a variable. If test1 evaluates to true, evaluate body1. Otherwise, if test2 evaluates to true, evaluate body2. Otherwise, evaluate body3. You can have zero or more elifs, and zero or one elses. Evaluate test. If it's true, evaluate body and repeat. For each element in list, set var to that value and evaluate body. With one argument, return a list from 0 to end-1. With two, a list from begin to end-1. With three, increment each list element by step. Terminate the current loop iteration and start the next one. current loop Example iFour = 4 # Comment! if False: pass else: print( True ) ai = [1, 2, 3] del ai[1] del ai if i < 10: print( "tall" ) elif i < 20: print( "grande " ) else: print( "venti " ) while i < 10: print( i ) i = i + 1 for i in [1, 3, 7]: print( i ) aBle = [1, 3, 7] for i in \ range( len( aBle ) ): print( aBle[i] ) for i in ai: if i < 10: continue print( i ) for i in ai: if i < 10: break raise Exception( "the dead" ) S00-19

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download M02 Notes: Introduction to Python (extended)