Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Python Session 1: The Basics and a Little More Jeremy Chen Objectives • Learn basic Python • Learn to use Python for Data Analysis 2 新加坡国立大学商学院 This Session’s Agenda • • • • • Getting Started Data types Using Modules and Packages Functions and Flow Control Objects 3 新加坡国立大学商学院 Why Use Python • Writing readable code is easy – Natural syntax to commands – Indentation-consciousness forces readability – “Everything I like about Perl, and everything I like about MATLAB.” - Someone • Modules for everything – – – – The drudgery (csv, JSON, …) Image Manipulation and Plotting Scientific Computing More: https://wiki.python.org/moin/UsefulModules 新加坡国立大学商学院 4 About Me • Credibility Destroyer: – I’m not really a Python user. – I use/have used Python for: • Scientific computing (Optimization, Statistics) • Server maintenance (Database/file system clean up/data acquisition) • … may be using Django (a Python Web Framework) to build an interesting web app… once the design and architecture is figured out. • … but I suppose I’ve done my fair bit of data wrangling and analysis 5 新加坡国立大学商学院 Setting Up • Vanilla Python: http://www.python.org/getit/ – Windows/Mac: Pick a binary – Linux: (You should know what to do) – We will use a 2.7.x build. Install this now. • Some existing third-party software is not yet compatible with Python 3; if you need to use such software, you can download Python 2.7.x instead. • I use Python 2.6.6 “on server” and Python 2.7.5 elsewhere. • Distributions with Almost Everything You Need: – Enthought Canopy – Python(x,y) – WinPython Start downloading one of these now 6 新加坡国立大学商学院 Starting Up • Start Interpreter: IDLE or /usr/bin/python • Basics: – CTRL-D or exit() to exit – Comments begin with # >>> x = y = z = 1 # Multiple Assignments >>> x += 1 # This is not C: x++ doesn’t work >>> x 2 >>> some_list = [1,2.0,"hi"] # Can contain multiple "types" >>> some_list[1] # Zero-based indexing: Stuff starts at 0 2.0 >>> some_list[1:2] # List of (1+1)-th to 2-nd items: Weird? [2.0] 新加坡国立大学商学院 7 Basic Data Types • Strings >>> some_string = "this is a string" >>> some_string[5:] # Element 5 to end 'is a string' >>> some_string[:5] # Element 0 to 5-1 'this ' • Integers >>> a = 1; b = 2 # Another "multiple assignment" >>> a/b # "Truncation" is about to happen 0 • Floats >>> fl = 1.0; b = 2; fl/b # Another multiple assignment and... 0.5 8 新加坡国立大学商学院 Container Data Types • Lists >>> some_list = [1,2.0,"hi"] # Can contain multiple "types" >>> x,y,z = [1,2,3]; y # Assignment 2 >>> some_list.append(5); some_list # Append to end [1, 2.0, 'hi', 5] >>> el = some_list.pop(); el; some_list # Extract last element 5 [1, 2.0, 'hi'] >>> el = some_list.pop(1); el; some_list # Get second this time 2.0 [1, 'hi'] >>> some_list[0]=range(5); some_list # Change the first element [[0, 1, 2, 3, 4], 'hi'] >>> some_list[0][3]="too much"; some_list # Be slightly abusive [[0, 1, 2, 'too much', 4], 'hi'] >>> del some_list[0][2]; some_list # Delete 2nd element of list in 1st element [[0, 1, 'too much', 4], 'hi'] 新加坡国立大学商学院 9 Collections • Lists >>> anotherlist = [1,2,3,4]; anotherlist # Concatenating lists [1, 2, 3, 4] >>> anotherlist += range(5,7); anotherlist # Adding to end; range behaves like that [1, 2, 3, 4, 5, 6] >>> anotherlist[-3] # Slicing: Get 3rd last element 4 >>> anotherlist[0:4] # Slicing: Get 0th to 3rd element (not 4th) [1, 2, 3, 4] >>> anotherlist[:-3] # Slicing: Get elements until before 3rd last element [1, 2, 3] >>> anotherlist * 2 # What happens? [1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6] >>> anotherlist + 1 # What happens? Traceback (most recent call last): File "<pyshell#78>", line 1, in <module> anotherlist += 1 TypeError: 'int' object is not iterable 10 新加坡国立大学商学院 Collections • Sets (A collection of unique items) >>> one_to_three = {1,2,3}; one_to_three set([1, 2, 3]) >>> {1,2,3,1} # Sets are collections of unique items set([1, 2, 3]) >>> one_to_ten = set(range(1,11)); one_to_ten # Note the thing abt range set([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) >>> five_to_eleven = set(range(5,12)) >>> len(one_to_three), len(one_to_ten) # Cardinality (3, 10) >>> 3 in one_to_three, 3 in five_to_eleven # Membership (True, False) >>> one_to_three.issubset(one_to_ten) # Containment True >>> one_to_three.union(five_to_eleven) # Union set([1, 2, 3, 5, 6, 7, 8, 9, 10, 11]) >>> one_to_ten.intersection(five_to_eleven) # Intersection set([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) >>> # There is also issuperset, difference, symmetric_difference 新加坡国立大学商学院 11 Collections • Dictionaries (An indexed container) >>> daysOfWeek = {1: "Mon", 2: "Tue", 3: "Wed", 4: "Thu", 5: "Fri"} >>> daysOfWeek[5] # No more zero-based indexing 'Fri' >>> daysOfWeek[6]="Sat"; daysOfWeek[7]="Sun" # Adding Entries >>> daysOfWeek {1: 'Mon', 2: 'Tue', 3: 'Wed', 4: 'Thu', 5: 'Fri', 6: 'Sat', 7: 'Sun'} >>> daysOfWeekInv = {"Mon":1, "Tue":2, "Wed":3, "Thu":4, "Fri":5} >>> daysOfWeekInv["Mon"] 1 >>> daysOfWeekInv.keys() # Get list of keys; Your mileage may vary ['Fri', 'Thu', 'Wed', 'Mon', 'Tue'] >>> daysOfWeek[8]="ExtraDay"; daysOfWeek {1: 'Mon', 2: 'Tue', 3: 'Wed', 4: 'Thu', 5: 'Fri', 6: 'Sat', 7: 'Sun', 8: 'ExtraDay'} >>> del daysOfWeek[8]; daysOfWeek # Delete key-value pair {1: 'Mon', 2: 'Tue', 3: 'Wed', 4: 'Thu', 5: 'Fri', 6: 'Sat', 7: 'Sun'} 新加坡国立大学商学院 12 Collections (Not really…) • Tuples >>> tup = (1,2.0,"hi"); tup # Looks just like a list (1, 2.0, 'hi') >>> tup[1] = 0 # But tuples are immutable (You can't change them) Traceback (most recent call last): File "<pyshell#54>", line 1, in <module> tup[1] = 0 TypeError: 'tuple' object does not support item assignment • Tuples v.s. Lists – Tuples are not constant lists – Lists are meant to be homogeneous sequences – Tuples are meant to be heterogeneous data structures • e.g.: thisCustomer = (<customerId>,<address>,<DOB>,...) • Lightweight classes 新加坡国立大学商学院 13 Using Modules and Packages • Modules/Packages are useful collections of code (functions, classes, constants) that one may use. >>> import math >>> math.sqrt(4) 2.0 >>> import math as m >>> m.e 2.718281828459045 >>> from math import pi as PIE # Omit at "as PIE" and pi will be "pi" >>> PIE 3.141592653589793 >>> import os # This is useful >>> os.getcwd() 'C:\\Python27\\lib\\site-packages\\xy' >>> os.chdir(r'C:\Python27') # Raw Strings don't need to be escaped. (And can’t end in "\".) >>> os.getcwd() 'C:\\Python27' • Difference: Modules are single .py files (with stuff) while packages come in directories. 新加坡国立大学商学院 14 Comprehensions • List comprehensions >>> squares = [x**2 for x in range(10)]; squares [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] >>> import math; [math.sqrt(x) for x in squares] [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0] >>> import random; [random.randint(0,9) for x in range(10)] [5, 8, 3, 6, 1, 1, 6, 6, 0, 2] • Set comprehensions >>> {x for x in range(10) if x >= 5} set([8, 9, 5, 6, 7]) >>> set(x for x in range(10) if x >= 5) # Sad pandas using Python 2.6.x {0: 0, 1: 1, 2: 4, 3: 9, 4: 16} • Dictionary comprehensions >>> {0: >>> {0: {x:x**2 for x in range(5)} 0, 1: 1, 2: 4, 3: 9, 4: 16} dict((x,x**2) for x in range(5)) # Sad pandas using Python 2.6.x 0, 1: 1, 2: 4, 3: 9, 4: 16} 新加坡国立大学商学院 15 Optional Exercise • Compute daysOfWeekInv from daysOfWeek. (See slide introducing dictionaries.) • Form a set of all the weekend and Tuesday dates from 1 Dec 2012 to 1 Mar 2013. Less January dates. – – – import datetime; startdate = datetime.date(2012,12,1) one_day = datetime.timedelta(days=1) (startdate + 2*one_day).isoweekday() # Should be 1 (Monday); 7 for Sunday • Form a list of multiples of 3 above 30 but below 100 in descending order. • Do the same for a list of multiples of x above L but below H. 16 新加坡国立大学商学院 Flow Control • The if statement (Note the indents) >>> x = int(raw_input("Please enter an integer: ")) Please enter an integer: 8 >>> if x > 10: # This block executes if the condition is True print "x > 10" elif x == 8: # Optional case block print "x == 8" else: # Optional catch-all block print "x <= 10 and x is not 8" x == 8 17 新加坡国立大学商学院 Flow Control • The for loop “loops” over an “iterator” >>> for n in range(2, 10, 2): print n, 2 4 6 8 • The break statement and else block >>> for n in range(15, 22): for c in range(2, n): if n % c == 0: # remainder from division (commonly known as mod) print "%d is not prime; " % n, break else: # Evaluates if for loop doesn’t break print "%d is prime" % n # Note the different print statements 15 is not prime; 16 is not prime; 18 is not prime; 19 is prime 20 is not prime; 21 is not prime; 新加坡国立大学商学院 17 is prime 18 Flow Control • The continue statement >>> for n in range(15, 22): if n % 3 == 0: # remainder from division (commonly known as mod) continue print "%d is not a multiple of 3" % n 16 17 19 20 is is is is not not not not a a a a multiple multiple multiple multiple of of of of 3 3 3 3 19 新加坡国立大学商学院 Functions • An example: >>> def fn1(x): return x * x >>> def fn2(x,y): z = x + y return z >>> fn2(1,fn1(2)) 5 • Function declarations: – Start with def… – … are followed by a function name – … then arguments in parentheses • Output is passed back with return • Indentation defines the function body 新加坡国立大学商学院 20 Functions • Default arguments and named arguments >>> def fn(x,y=1): z = x + 2*y return z >>> fn(1) # Default argument used 3 >>> fn(1,5) 11 >>> fn(y=5,x=1) # Named arguments used 11 >>> fn = lambda x,y,z : x+y+z # Lambda Expressions >>> fn(1,2,3) 6 • Warning: If a default argument is a mutable object like a list, changing it results in a different default argument in the next call. 新加坡国立大学商学院 21 Generator Functions • Generator functions create iterators >>> def gen(start=1,max=10,step=1): x = start; while (x <= max): yield x; x += step >>> print list(gen(2,10,2)) [2, 4, 6, 8, 10] >>> y = 0 >>> for k in gen(1,10,4): y += 1 print (y, k) (1, 1) (2, 5) (3, 9) • yield returns an item and computation continues if another item is requested. 新加坡国立大学商学院 22 Classes • Like “mutable tuples with behavior” (or not) • Contain data that transform in well-defined ways class SimpleFactorizer: # Edit in IDLE; Enter as xxx.py, then Run (F5) def __init__(self): # Constructor self.__last_integer = 2 # Initialization of data self.__primes = [2] # Initialization of data # __x variables are Python standard practice # (“culture”) for labeling “private” data def prime_list(self): return list(self.__primes) # duplicate list def compute_primes_to(self, u): for c in range(self.__last_integer+1, u+1): if self.get_prime_factor(c) == 1: self.__primes.append(c) self.__last_integer = u # Continued on next slide 新加坡国立大学商学院 23 Objects # ... continued from last slide def get_prime_factor(self, v): factor = 1 for c in self.__primes: if v % c == 0: factor = c break return factor def get_prime_factors(self, v): factors = [] remainder = int(v) # Cast to integer if remainder > self.__last_integer: self.compute_primes_to(remainder) while remainder > 1: thisFactor = self.get_prime_factor(remainder) factors += [thisFactor] remainder /= thisFactor return factors # Continued on next slide... 新加坡国立大学商学院 24 Objects # ... continued from last slide # Test it out df = SimpleFactorizer() print df.get_prime_factors(2*2*3*5*7) print df.get_prime_factors(2*2*3*5*7*13) print df.prime_list()[:min(50, len(df.prime_list()))] # Print first 50 # Actually [:50] works # even if list length < 50 • Output: [2, 2, 3, 5, 7] [2, 2, 3, 5, 7, 13] [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229] • This can be done in the interpreter too. 25 新加坡国立大学商学院 Some Things I Missed Out • Better String formatting. All you need is: >>> "Position 0: {0}; Position 1: {1}; Position 0 again: {0}".format('a', 1, 7) 'Position 0: a; Position 1: 1; Position 0 again: a' >>> r"C:\BlahBlah\output_p{param}_s{num_samples}".format(param=2, num_samples=10000) 'C:\\BlahBlah\\output_p2_s10000' >>> "% Affected (Q={param:.3}): {outcome:.1%}".format(param=1.234567, outcome=0.23454) '% Affected (Q=1.23): 23.5%' • Inheritance, Polymorphism – Standard Object Oriented Programming • Handling “unplanned events” with exceptions – “It is easier to ask for forgiveness than permission.” • Testing – (This is not a software engineering course.) – For more info: doctest, unittest 新加坡国立大学商学院 26 ☺ 27 新加坡国立大学商学院