Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STA312 Python Introduction Craig Burkett, Dan Zingaro January 6, 2015 Python History I Late 1970s: programming language called ABC I I I I I High-level, intended for teaching Only five data types Programs are supposedly one-quarter the size of the equivalent BASIC or Pascal program Not a successful project More ABC information: http://homepages.cwi.nl/~steven/abc/ Python History... I 1983: Guido van Rossum joined the ABC team I Late 1980s: Guido started working on a new project, in which a scripting language would be helpful I Based Python on ABC, removed warts (e.g. ABC wasn’t extensible) I Python, after Monty Python I Guido: Benevolent Dictator for Life (BDFL)... but he’s retiring! I http://www.artima.com/intv/ (search for Guido) Why Python for Big Data? I Readable, uniform code structure I No compilation step; Python is interpreted I Supports object-oriented programming (OOP) features I Batteries included: Python’s standard library comes with tools for a variety of problem domains I Additional modules are available for download: data mining, language processing . . . Dynamic Typing I Biggest conceptual change compared to C, Java etc. I Variables do not have types. Objects have types >>> a = 5 >>> type (a) <type ’int’> >>> a = ’hello’ >>> type (a) <type ’str’> >>> a = [4, 1, 6] >>> type (a) <type ’list’> Built-in Types I We’ll look at the core five object types that are built-in to Python I I I I I I Numbers Strings Lists Dictionaries Files They’re extremely powerful and save us from writing tons of low-level code Built-in Types: Numbers I Create numbers by using numeric literals I If you include no fractional component, it’s an integer; otherwise it’s a float I We have all of the standard mathematical operators, and even ** for exponent I Make integers as small or large as you like — they can’t go out of bounds Built-in Types: Strings I A string is a sequence of characters I To indicate that something is a string, we place single- or double-quotes around it I We can use + to concatenate strings I This is an example of overloading: + is used to add numbers too; it knows what to do based on context What happens if we try to use + with a string and a number? I I I I Error: + doesn’t know what to do! e.g. is ’3’ + 4 supposed to be the string ’34’ or the number 7? Design philosophy: Python tries never to guess at what you mean Strings... I The * operator is overloaded, too I I I Applied to a string and an integer i, it duplicates the string i times If i ≤ 0, the result is the empty string Can also use relational operators such as < or > to alphabetically compare strings Looping Through Strings for char in s: <do something with char> I We’ll see this pattern again and again for each Python type I It’s like Php’s foreach or Java’s for-with-the-colon I Let’s write a function that counts the number of vowels in a string I A function is a named piece of code that carries out some task Possible Solution: How Many Vowels? (num vowels.py) def num_vowels(s): ’’’Return the number of vowels in string s. The letter "y" is not treated as a vowel.’’’ count = 0 for char in s: if char in "aAeEiIoOuU": count += 1 return count String Methods I Strings are objects and have tons of methods I Use dot-notation to access methods I Use dir (str) to get a list of methods, and help (str.methodname) for help on any method I Useful ones: find, lower, count, replace... I Strings are immutable (cannot be modified): all we can do is create new strings Indexing and Slicing Strings I Assume s is a string I Then, s[i] for i ≥ 0 extracts character i from the left (0 is the leftmost character) I We can also use a negative index i to extract a character beginning from the right (-1 is the rightmost character) Slice notation: s[i:j] extracts characters beginning at s[i] and ending at the character one to the left of s[j] I I I If we leave out the first index, Python defaults to using index 0 to begin the slice Similarly, if we leave out the second index, Python defaults to using index len(s) to end the slice Built-in Types: Lists Lists are like arrays in other languages, Strings Sequences of? Characters Yes Immutable? Can be heterogeneous? No Yes Can index and slice? Can use for-loop? Yes Created like? ’hi’ but much more powerful. Lists Any object types No Yes Yes Yes [4, 1, 6] List Methods I As with strings, there are lots of methods; use dir (list) or help (list.method) for help I append is used to add an object to the end of a list I extend is used to append the objects of another list I insert (index, object) inserts object before index I sort() sorts a list I remove (value) removes the first occurrence of value from the list Exercise: Length of Strings 3 1 0 I Write a function that takes a list of strings, and prints out the length of each string in the list I e.g. if the list is [’abc’, ’q’, ’’], the output would be as follows Built-in Types: Dictionaries Dictionaries are like associative arrays or maps in other languages. Stores? Immutable? Can be heterogeneous? Can index and slice? Can use for-loop? Created like? Lists Sequences of objects No Yes Yes Yes [4, 1] Dictionaries Key-value pairs No Yes No Yes {’a’: 1, ’b’: 2} Dictionaries vs. Lists I Compared to using “parallel lists”, dictionaries make an explicit connection between a key and a value I I But unlike lists, dictionaries do not guarantee any ordering of the elements If you use for k in d, for a dictionary d, you get the keys back in arbitrary order bird_dict = { ’peregrine falcon’: 1, ’harrier falcon’: 5, ’red-tailed hawk’: 2, ’osprey’: 11} Adding to Dictionaries I Dictionary keys must be of immutable types (no lists!), but values can be anything I We can use d[k] = v to add key k with value v to dictionary d I We can use the update method to dump another dictionary’s key-value pairs into our dictionary We can use d[k] to obtain the value associated with key k of dictionary d I I I If k does not exist, we get an error The get method is similar, except it returns None instead of giving an error when the key does not exist Built-in Types: Files I We’ll use files whenever we read external data (websites, spreadsheets, etc.) I To open a file in Python, we use the open function I Syntax: open (filename, mode) I mode is the string ’r’ to open the file for reading, ’w’ to open the file for writing, or ’a’ to open the file for appending. No mode = ’r’ I open gives us a file object that we can use to read or write the file Reading Files with Methods To read the next line from a file: readline: reads and returns next line; returns empty string at end-of-file There are other methods, but try not to use these because they read the entire file into memory: I read: reads the entire file into one string I readlines: reads the entire file into a list of strings All of these leave a trailing ’\n’ character at the end of each line. Reading Files with Loops A file is a sequence of lines: f = open(’songs.txt’) for line in f: print(line.strip()) . . . or using a while-loop: f = open(’songs.txt’) line = f.readline() while line: print(line.strip()) line = f.readline() Skipping Headers Suppose we have a file of this format: header # comment text # comment text # ... ... actual data ... Let’s write a function that skips the header of such a file and returns the first line of actual data. Multi-Field Records I So far, we have been reading entire lines from our file I But, our lines are actually records containing three fields: game name, song name, and rating I Let’s write a function to read this data into three lists The critical string method here is split I I I With no parameters, it splits around any space With a string parameter, it splits around that string Further Python Resources I http://www.rmi.net/~lutz I I I http://docs.python.org/tutorial I I Mark Lutz’ Python books Constantly-updated to keep up with Python releases Free online Python tutorial https://mcs.utm.utoronto.ca/~108 I Dan’s intro CS course in Python