Download slide - Jeremy Chen`s Website

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Introduction to Python
Session 1: The Basics and a Little More
Jeremy Chen
Objectives
• Learn basic Python
• Learn to use Python for Data Analysis
2
新加坡国立大学商学院
This Session’s Agenda
•
•
•
•
•
Getting Started
Data types
Using Modules and Packages
Functions and Flow Control
Objects
3
新加坡国立大学商学院
Why Use Python
• Writing readable code is easy
– Natural syntax to commands
– Indentation-consciousness forces readability
– “Everything I like about Perl, and everything I
like about MATLAB.” - Someone
• Modules for everything
–
–
–
–
The drudgery (csv, JSON, …)
Image Manipulation and Plotting
Scientific Computing
More: https://wiki.python.org/moin/UsefulModules
新加坡国立大学商学院
4
About Me
• Credibility Destroyer:
– I’m not really a Python user.
– I use/have used Python for:
• Scientific computing (Optimization, Statistics)
• Server maintenance (Database/file system clean
up/data acquisition)
• … may be using Django (a Python Web Framework)
to build an interesting web app… once the design and
architecture is figured out.
• … but I suppose I’ve done my fair bit of
data wrangling and analysis
5
新加坡国立大学商学院
Setting Up
• Vanilla Python: http://www.python.org/getit/
– Windows/Mac: Pick a binary
– Linux: (You should know what to do)
– We will use a 2.7.x build.
Install this now.
• Some existing third-party software is not yet compatible
with Python 3; if you need to use such software, you can
download Python 2.7.x instead.
• I use Python 2.6.6 “on server” and Python 2.7.5 elsewhere.
• Distributions with Almost Everything You Need:
– Enthought Canopy
– Python(x,y)
– WinPython
Start downloading
one of these now
6
新加坡国立大学商学院
Starting Up
• Start Interpreter: IDLE or /usr/bin/python
• Basics:
– CTRL-D or exit() to exit
– Comments begin with #
>>> x = y = z = 1 # Multiple Assignments
>>> x += 1 # This is not C: x++ doesn’t work
>>> x
2
>>> some_list = [1,2.0,"hi"] # Can contain multiple "types"
>>> some_list[1] # Zero-based indexing: Stuff starts at 0
2.0
>>> some_list[1:2] # List of (1+1)-th to 2-nd items: Weird?
[2.0]
新加坡国立大学商学院
7
Basic Data Types
• Strings
>>> some_string = "this is a string"
>>> some_string[5:] # Element 5 to end
'is a string'
>>> some_string[:5] # Element 0 to 5-1
'this '
• Integers
>>> a = 1; b = 2 # Another "multiple assignment"
>>> a/b # "Truncation" is about to happen
0
• Floats
>>> fl = 1.0; b = 2; fl/b # Another multiple assignment and...
0.5
8
新加坡国立大学商学院
Container Data Types
• Lists
>>> some_list = [1,2.0,"hi"] # Can contain multiple "types"
>>> x,y,z = [1,2,3]; y # Assignment
2
>>> some_list.append(5); some_list # Append to end
[1, 2.0, 'hi', 5]
>>> el = some_list.pop(); el; some_list # Extract last element
5
[1, 2.0, 'hi']
>>> el = some_list.pop(1); el; some_list # Get second this time
2.0
[1, 'hi']
>>> some_list[0]=range(5); some_list # Change the first element
[[0, 1, 2, 3, 4], 'hi']
>>> some_list[0][3]="too much"; some_list # Be slightly abusive
[[0, 1, 2, 'too much', 4], 'hi']
>>> del some_list[0][2]; some_list # Delete 2nd element of list in 1st element
[[0, 1, 'too much', 4], 'hi']
新加坡国立大学商学院
9
Collections
• Lists
>>> anotherlist = [1,2,3,4]; anotherlist # Concatenating lists
[1, 2, 3, 4]
>>> anotherlist += range(5,7); anotherlist # Adding to end; range behaves like that
[1, 2, 3, 4, 5, 6]
>>> anotherlist[-3] # Slicing: Get 3rd last element
4
>>> anotherlist[0:4] # Slicing: Get 0th to 3rd element (not 4th)
[1, 2, 3, 4]
>>> anotherlist[:-3] # Slicing: Get elements until before 3rd last element
[1, 2, 3]
>>> anotherlist * 2 # What happens?
[1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6]
>>> anotherlist + 1 # What happens?
Traceback (most recent call last):
File "<pyshell#78>", line 1, in <module>
anotherlist += 1
TypeError: 'int' object is not iterable
10
新加坡国立大学商学院
Collections
• Sets (A collection of unique items)
>>> one_to_three = {1,2,3}; one_to_three
set([1, 2, 3])
>>> {1,2,3,1} # Sets are collections of unique items
set([1, 2, 3])
>>> one_to_ten = set(range(1,11)); one_to_ten # Note the thing abt range
set([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> five_to_eleven = set(range(5,12))
>>> len(one_to_three), len(one_to_ten) # Cardinality
(3, 10)
>>> 3 in one_to_three, 3 in five_to_eleven # Membership
(True, False)
>>> one_to_three.issubset(one_to_ten) # Containment
True
>>> one_to_three.union(five_to_eleven) # Union
set([1, 2, 3, 5, 6, 7, 8, 9, 10, 11])
>>> one_to_ten.intersection(five_to_eleven) # Intersection
set([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
>>> # There is also issuperset, difference, symmetric_difference
新加坡国立大学商学院
11
Collections
• Dictionaries (An indexed container)
>>> daysOfWeek = {1: "Mon", 2: "Tue", 3: "Wed", 4: "Thu", 5: "Fri"}
>>> daysOfWeek[5] # No more zero-based indexing
'Fri'
>>> daysOfWeek[6]="Sat"; daysOfWeek[7]="Sun" # Adding Entries
>>> daysOfWeek
{1: 'Mon', 2: 'Tue', 3: 'Wed', 4: 'Thu', 5: 'Fri', 6: 'Sat', 7: 'Sun'}
>>> daysOfWeekInv = {"Mon":1, "Tue":2, "Wed":3, "Thu":4, "Fri":5}
>>> daysOfWeekInv["Mon"]
1
>>> daysOfWeekInv.keys() # Get list of keys; Your mileage may vary
['Fri', 'Thu', 'Wed', 'Mon', 'Tue']
>>> daysOfWeek[8]="ExtraDay"; daysOfWeek
{1: 'Mon', 2: 'Tue', 3: 'Wed', 4: 'Thu', 5: 'Fri', 6: 'Sat', 7: 'Sun',
8: 'ExtraDay'}
>>> del daysOfWeek[8]; daysOfWeek # Delete key-value pair
{1: 'Mon', 2: 'Tue', 3: 'Wed', 4: 'Thu', 5: 'Fri', 6: 'Sat', 7: 'Sun'}
新加坡国立大学商学院
12
Collections (Not really…)
• Tuples
>>> tup = (1,2.0,"hi"); tup # Looks just like a list
(1, 2.0, 'hi')
>>> tup[1] = 0 # But tuples are immutable (You can't change them)
Traceback (most recent call last):
File "<pyshell#54>", line 1, in <module>
tup[1] = 0
TypeError: 'tuple' object does not support item assignment
• Tuples v.s. Lists
– Tuples are not constant lists
– Lists are meant to be homogeneous sequences
– Tuples are meant to be heterogeneous data structures
• e.g.: thisCustomer = (<customerId>,<address>,<DOB>,...)
• Lightweight classes
新加坡国立大学商学院
13
Using Modules and Packages
• Modules/Packages are useful collections of code
(functions, classes, constants) that one may use.
>>> import math
>>> math.sqrt(4)
2.0
>>> import math as m
>>> m.e
2.718281828459045
>>> from math import pi as PIE
# Omit at "as PIE" and pi will be "pi"
>>> PIE
3.141592653589793
>>> import os
# This is useful
>>> os.getcwd()
'C:\\Python27\\lib\\site-packages\\xy'
>>> os.chdir(r'C:\Python27') # Raw Strings don't need to be escaped. (And can’t end in "\".)
>>> os.getcwd()
'C:\\Python27'
• Difference: Modules are single .py files (with
stuff) while packages come in directories.
新加坡国立大学商学院
14
Comprehensions
• List comprehensions
>>> squares = [x**2 for x in range(10)]; squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> import math; [math.sqrt(x) for x in squares]
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
>>> import random; [random.randint(0,9) for x in range(10)]
[5, 8, 3, 6, 1, 1, 6, 6, 0, 2]
• Set comprehensions
>>> {x for x in range(10) if x >= 5}
set([8, 9, 5, 6, 7])
>>> set(x for x in range(10) if x >= 5) # Sad pandas using Python 2.6.x
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
• Dictionary comprehensions
>>>
{0:
>>>
{0:
{x:x**2 for x in range(5)}
0, 1: 1, 2: 4, 3: 9, 4: 16}
dict((x,x**2) for x in range(5)) # Sad pandas using Python 2.6.x
0, 1: 1, 2: 4, 3: 9, 4: 16}
新加坡国立大学商学院
15
Optional Exercise
• Compute daysOfWeekInv from daysOfWeek. (See
slide introducing dictionaries.)
• Form a set of all the weekend and Tuesday dates
from 1 Dec 2012 to 1 Mar 2013. Less January
dates.
–
–
–
import datetime; startdate = datetime.date(2012,12,1)
one_day = datetime.timedelta(days=1)
(startdate + 2*one_day).isoweekday() # Should be 1 (Monday); 7 for Sunday
• Form a list of multiples of 3 above 30 but below
100 in descending order.
• Do the same for a list of multiples of x above L
but below H.
16
新加坡国立大学商学院
Flow Control
• The if statement (Note the indents)
>>> x = int(raw_input("Please enter an integer: "))
Please enter an integer: 8
>>> if x > 10:
# This block executes if the condition is True
print "x > 10"
elif x == 8:
# Optional case block
print "x == 8"
else:
# Optional catch-all block
print "x <= 10 and x is not 8"
x == 8
17
新加坡国立大学商学院
Flow Control
• The for loop “loops” over an “iterator”
>>> for n in range(2, 10, 2):
print n,
2 4 6 8
• The break statement and else block
>>> for n in range(15, 22):
for c in range(2, n):
if n % c == 0: # remainder from division (commonly known as mod)
print "%d is not prime; " % n,
break
else:
# Evaluates if for loop doesn’t break
print "%d is prime" % n # Note the different print statements
15 is not prime; 16 is not prime;
18 is not prime; 19 is prime
20 is not prime; 21 is not prime;
新加坡国立大学商学院
17 is prime
18
Flow Control
• The continue statement
>>> for n in range(15, 22):
if n % 3 == 0: # remainder from division (commonly known as mod)
continue
print "%d is not a multiple of 3" % n
16
17
19
20
is
is
is
is
not
not
not
not
a
a
a
a
multiple
multiple
multiple
multiple
of
of
of
of
3
3
3
3
19
新加坡国立大学商学院
Functions
• An example:
>>> def fn1(x): return x * x
>>> def fn2(x,y):
z = x + y
return z
>>> fn2(1,fn1(2))
5
• Function declarations:
– Start with def…
– … are followed by a function name
– … then arguments in parentheses
• Output is passed back with return
• Indentation defines the function body
新加坡国立大学商学院
20
Functions
• Default arguments and named arguments
>>> def fn(x,y=1):
z = x + 2*y
return z
>>> fn(1) # Default argument used
3
>>> fn(1,5)
11
>>> fn(y=5,x=1) # Named arguments used
11
>>> fn = lambda x,y,z : x+y+z # Lambda Expressions
>>> fn(1,2,3)
6
• Warning: If a default argument is a mutable
object like a list, changing it results in a
different default argument in the next call.
新加坡国立大学商学院
21
Generator Functions
• Generator functions create iterators
>>> def gen(start=1,max=10,step=1):
x = start;
while (x <= max):
yield x;
x += step
>>> print list(gen(2,10,2))
[2, 4, 6, 8, 10]
>>> y = 0
>>> for k in gen(1,10,4):
y += 1
print (y, k)
(1, 1)
(2, 5)
(3, 9)
• yield returns an item and computation
continues if another item is requested.
新加坡国立大学商学院
22
Classes
• Like “mutable tuples with behavior” (or not)
• Contain data that transform in well-defined ways
class SimpleFactorizer:
# Edit in IDLE; Enter as xxx.py, then Run (F5)
def __init__(self):
# Constructor
self.__last_integer = 2
# Initialization of data
self.__primes = [2]
# Initialization of data
# __x variables are Python standard practice
# (“culture”) for labeling “private” data
def prime_list(self):
return list(self.__primes)
# duplicate list
def compute_primes_to(self, u):
for c in range(self.__last_integer+1, u+1):
if self.get_prime_factor(c) == 1:
self.__primes.append(c)
self.__last_integer = u
# Continued on next slide
新加坡国立大学商学院
23
Objects
# ... continued from last slide
def get_prime_factor(self, v):
factor = 1
for c in self.__primes:
if v % c == 0:
factor = c
break
return factor
def get_prime_factors(self, v):
factors = []
remainder = int(v) # Cast to integer
if remainder > self.__last_integer:
self.compute_primes_to(remainder)
while remainder > 1:
thisFactor = self.get_prime_factor(remainder)
factors += [thisFactor]
remainder /= thisFactor
return factors
# Continued on next slide...
新加坡国立大学商学院
24
Objects
# ... continued from last slide
# Test it out
df = SimpleFactorizer()
print df.get_prime_factors(2*2*3*5*7)
print df.get_prime_factors(2*2*3*5*7*13)
print df.prime_list()[:min(50, len(df.prime_list()))]
# Print first 50
# Actually [:50] works
# even if list length < 50
• Output:
[2, 2, 3, 5, 7]
[2, 2, 3, 5, 7, 13]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73,
79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163,
167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229]
• This can be done in the interpreter too.
25
新加坡国立大学商学院
Some Things I Missed Out
• Better String formatting. All you need is:
>>> "Position 0: {0}; Position 1: {1}; Position 0 again: {0}".format('a', 1, 7)
'Position 0: a; Position 1: 1; Position 0 again: a'
>>> r"C:\BlahBlah\output_p{param}_s{num_samples}".format(param=2, num_samples=10000)
'C:\\BlahBlah\\output_p2_s10000'
>>> "% Affected (Q={param:.3}): {outcome:.1%}".format(param=1.234567, outcome=0.23454)
'% Affected (Q=1.23): 23.5%'
• Inheritance, Polymorphism
– Standard Object Oriented Programming
• Handling “unplanned events” with exceptions
– “It is easier to ask for forgiveness than permission.”
• Testing
– (This is not a software engineering course.)
– For more info: doctest, unittest
新加坡国立大学商学院
26
☺
27
新加坡国立大学商学院