Download Introduction to Python

Document related concepts
no text concepts found
Transcript
Introduction to Python
LING 5200
Computational Corpus Linguistics
Nianwen Xue
1
What's a programming language?

Way of converting a text file to
instructions for the machine
LING 5200, 2006
2
BASED on Kevin Cohen’s LING 5200
What's a programming language?

lexicon

syntax
Vs. natural languages: no ambiguity
LING 5200, 2006
3
BASED on Kevin Cohen’s LING 5200
What does a program do?

Take in data (input)

Do something with it (processing)

Produce output
LING 5200, 2006
4
BASED on Kevin Cohen’s LING 5200
What does a program do?



Input
egrep '^[0-9]+\/' epw.cd

your regex

one or more files

switches
For each line in each file, determine
whether or not it matches your regex
Tell you about it
LING 5200, 2006
5
BASED on Kevin Cohen’s LING 5200
Producing output in Python
print "hello, world"
LING 5200, 2006
6
BASED on Kevin Cohen’s LING 5200
Producing output
print "hello, world"
verb
LING 5200, 2006
7
BASED on Kevin Cohen’s LING 5200
Producing output
print "hello, world"
noun (object)
LING 5200, 2006
8
BASED on Kevin Cohen’s LING 5200
Producing output

Filename: helloWorld.py

What do the file's permissions need to be?
LING 5200, 2006
9
BASED on Kevin Cohen’s LING 5200
Producing output
babel>./helloWorld.py
./helloWorld.py: line 1: print:
command not found
LING 5200, 2006
10
BASED on Kevin Cohen’s LING 5200
Producing output
#!/usr/local/bin/python
print "hello, world"
“The magic line”
LING 5200, 2006
11
BASED on Kevin Cohen’s LING 5200
Producing output
babel>./helloWorld.py
hello, worldbabel>
LING 5200, 2006
12
BASED on Kevin Cohen’s LING 5200
Producing output
#!/usr/local/bin/python
print "hello, world\n";
"escape"
character
LING 5200, 2006
13
BASED on Kevin Cohen’s LING 5200
Producing output

\t tab

\n "newline"
LING 5200, 2006
14
BASED on Kevin Cohen’s LING 5200
Comments
“Not comments”
#!/usr/local/bin/python
#
#
#
#
#
the purpose of this program
is to print "hello, world" to
the screen.
author:[email protected]
303-735-5383
# do the actual printing
print "hello, world\n"
LING 5200, 2006
15
"Commenting"
your code
BASED on Kevin Cohen’s LING 5200
Comments

#
#
#
#
#
#1 use for comments: adding notes to
yourself/other programmers
the purpose of this program
is to print "hello, world" to
the screen.
author: [email protected]
303-735-5383
LING 5200, 2006
16
BASED on Kevin Cohen’s LING 5200
Comments

Other use: causing Python to ignore a line
"Commenting out"
a line of code
print "goodbye, cruel world\n";
# print "hello, world\n";
LING 5200, 2006
17
BASED on Kevin Cohen’s LING 5200
Comments

Own-line or end-of-line formats
# print it
print "hello, world\n"
print "hello, world\n" # print it
LING 5200, 2006
18
BASED on Kevin Cohen’s LING 5200
Comments



Start comments with # – the rest of line is
ignored.
Can include a “documentation string” as the first
line of any new function or class that you define.
The development environment, debugger, and
other tools use it: it’s good style to include one.
def my_function(x, y):
“““This is the docstring. This
function does blah blah blah.”””
# The code would go here...
LING 5200, 2006
19
BASED on Kevin Cohen’s LING 5200
Whitespace

Whitespace is meaningful in Python: especially
indentation and placement of newlines.



Use a newline to end a line of code.
(Not a semicolon like in C++ or Java.)
(Use \ when must go to next line prematurely.)
No braces { } to mark blocks of code in Python…
Use consistent indentation instead. The first line with
a new indentation is considered outside of the block.
Often a colon appears at the start of a new block.
(We’ll see this later for function and class
definitions.)
LING 5200, 2006
20
BASED on Kevin Cohen’s LING 5200
Getting input


From the user

input = raw_input(‘Your name please:\n’)

Print input
From a file

input_file = open(‘phone-numbers.txt’, “r”)

Will learn what to do with a file later
LING 5200, 2006
21
BASED on Kevin Cohen’s LING 5200
Producing output

I'd like to print something different every
once in a while…
#!/usr/local/bin/python
#my first python program
print “hello world\n”
name = raw_input(“Your name please:”)
print name
LING 5200, 2006
22
BASED on Kevin Cohen’s LING 5200
Variables

Name

Contents

Location in memory
LING 5200, 2006
23
BASED on Kevin Cohen’s LING 5200
Variables

Name (name)

Contents (Kinder)

Location in memory (13025)
$name
LING 5200, 2006
24
BASED on Kevin Cohen’s LING 5200
Good and bad names

1stnumber = 32

Print = 32

Large-number = 123456789

Dir:subdir = “/home/corpora”
LING 5200, 2006
25
BASED on Kevin Cohen’s LING 5200
Naming Rules

Names are case sensitive and cannot start with a
number. They can contain letters, numbers, and
underscores.
bob

Bob
_bob
_2_bob_
bob_2
BoB
There are some reserved words:
and, assert, break, class, continue, def,
del, elif, else, except, exec, finally,
for, from, global, if, import, in, is,
lambda, not, or, pass, print, raise,
return, try, while
LING 5200, 2006
26
BASED on Kevin Cohen’s LING 5200
Accessing Non-existent Name

If you try to access a name before it’s been properly
created (by placing it on the left side of an
assignment), you’ll get an error.
>>> y
Traceback (most recent call last):
File "<pyshell#16>", line 1, in -toplevely
NameError: name ‘y' is not defined
>>> y = 3
>>> y
3
LING 5200, 2006
27
BASED on Kevin Cohen’s LING 5200
Names and References 1



Python has no pointers like C or C++. Instead, it has “names” and
“references”. (Works a lot like Lisp or Java.)
You create a name the first time it appears on the left side of an
assignment expression:
x = 3
Names store “references” which are like pointers to locations in
memory that store a constant or some object.


Python determines the type of the reference automatically based on
what data is assigned to it.
It also decides when to delete it via garbage collection after any names
for the reference have passed out of scope.
LING 5200, 2006
28
BASED on Kevin Cohen’s LING 5200
Names and References 2

There is a lot going on when we type:
x = 3



First, an integer 3 is created and stored in
memory.
A name x is created.
An reference to the memory location storing the
3 is then assigned to the name x.
Name: x
Ref: <address1>
Type: Integer
Data: 3
name list
LING 5200, 2006
memory
29
BASED on Kevin Cohen’s LING 5200
Names and References 3


The data 3 we created is of type integer. In
Python, the basic data types integer, float, and
string are “immutable.”
This doesn’t mean we can’t change the value of
x… For example, we could increment x.
>>> x = 3
>>> x = x + 1
>>> print x
4
LING 5200, 2006
30
BASED on Kevin Cohen’s LING 5200
Names and References 4

If we increment x, then what’s really happening is:

The reference of name x is looked up.

The value at that reference is retrieved.

The 3+1 calculation occurs, producing a new data element 4 which
is assigned to a fresh memory location with a new reference.

The name x is changed to point to this new reference.

The old data 3 is garbage collected if no name still refers to it.
Type: Integer
Data: 3
Name: x
Ref: <address1>
LING 5200, 2006
31
BASED on Kevin Cohen’s LING 5200
Names and References 4

If we increment x, then what’s really happening is:

The reference of name x is looked up.

The value at that reference is retrieved.

The 3+1 calculation occurs, producing a new data element 4 which
is assigned to a fresh memory location with a new reference.

The name x is changed to point to this new reference.

The old data 3 is garbage collected if no name still refers to it.
Type: Integer
Data: 3
Name: x
Ref: <address1>
LING 5200, 2006
Type: Integer
Data: 4
32
BASED on Kevin Cohen’s LING 5200
Names and References 4

If we increment x, then what’s really happening is:

The reference of name x is looked up.

The value at that reference is retrieved.

The 3+1 calculation occurs, producing a new data element 4 which
is assigned to a fresh memory location with a new reference.

The name x is changed to point to this new reference.

The old data 3 is garbage collected if no name still refers to it.
Type: Integer
Data: 3
Name: x
Ref: <address2>
Type: Integer
Data: 4
LING 5200, 2006
33
BASED on Kevin Cohen’s LING 5200
Names and References 4

If we increment x, then what’s really happening is:

The reference of name x is looked up.

The value at that reference is retrieved.

The 3+1 calculation occurs, producing a new data element 4 which
is assigned to a fresh memory location with a new reference.

The name x is changed to point to this new reference.

The old data 3 is garbage collected if no name still refers to it.
Name: x
Ref: <address2>
Type: Integer
Data: 4
LING 5200, 2006
34
BASED on Kevin Cohen’s LING 5200
Assignment 1

So, for simple built-in datatypes (integers,
floats, strings), assignment behaves as you
would expect:
>>>
>>>
>>>
>>>
3
x = 3
y = x
y = 4
print x
LING 5200, 2006
#
#
#
#
Creates 3, name
Creates name y,
Creates ref for
No effect on x,
35
x refers to 3
refers to 3.
4. Changes y.
still ref 3.
BASED on Kevin Cohen’s LING 5200
Assignment 1

So, for simple built-in datatypes (integers,
floats, strings), assignment behaves as you
would expect:
>>>
>>>
>>>
>>>
3
x = 3
y = x
y = 4
print x
#
#
#
#
Creates 3, name
Creates name y,
Creates ref for
No effect on x,
Name: x
Ref: <address1>
LING 5200, 2006
x refers to 3
refers to 3.
4. Changes y.
still ref 3.
Type: Integer
Data: 3
36
BASED on Kevin Cohen’s LING 5200
Assignment 1

So, for simple built-in datatypes (integers,
floats, strings), assignment behaves as you
would expect:
>>>
>>>
>>>
>>>
3
x = 3
y = x
y = 4
print x
#
#
#
#
Creates 3, name
Creates name y,
Creates ref for
No effect on x,
Name: x
Ref: <address1>
x refers to 3
refers to 3.
4. Changes y.
still ref 3.
Type: Integer
Data: 3
Name: y
Ref: <address1>
LING 5200, 2006
37
BASED on Kevin Cohen’s LING 5200
Assignment 1

So, for simple built-in datatypes (integers,
floats, strings), assignment behaves as you
would expect:
>>>
>>>
>>>
>>>
3
x = 3
y = x
y = 4
print x
#
#
#
#
Creates 3, name
Creates name y,
Creates ref for
No effect on x,
Name: x
Ref: <address1>
Type: Integer
Data: 3
Name: y
Ref: <address1>
LING 5200, 2006
x refers to 3
refers to 3.
4. Changes y.
still ref 3.
Type: Integer
Data: 4
38
BASED on Kevin Cohen’s LING 5200
Assignment 1

So, for simple built-in datatypes (integers,
floats, strings), assignment behaves as you
would expect:
>>>
>>>
>>>
>>>
3
x = 3
y = x
y = 4
print x
#
#
#
#
Creates 3, name
Creates name y,
Creates ref for
No effect on x,
Name: x
Ref: <address1>
Type: Integer
Data: 3
Name: y
Ref: <address2>
LING 5200, 2006
x refers to 3
refers to 3.
4. Changes y.
still ref 3.
Type: Integer
Data: 4
39
BASED on Kevin Cohen’s LING 5200
Assignment 1

So, for simple built-in datatypes (integers,
floats, strings), assignment behaves as you
would expect:
>>>
>>>
>>>
>>>
3
x = 3
y = x
y = 4
print x
#
#
#
#
Creates 3, name
Creates name y,
Creates ref for
No effect on x,
Name: x
Ref: <address1>
Type: Integer
Data: 3
Name: y
Ref: <address2>
LING 5200, 2006
x refers to 3
refers to 3.
4. Changes y.
still ref 3.
Type: Integer
Data: 4
40
BASED on Kevin Cohen’s LING 5200
Assignment 2

But we’ll see that for other more complex data types
assignment seems to work differently.

We’re talking about: lists, dictionaries, user-defined classes.



We will learn details about all of these type later.
The important thing is that they are “mutable.”
This means we can make changes to their data without having to
copy it into a new memory reference address each time.
immutable
mutable
>>> x = 3
x = some mutable object
>>> y = x
y=x
>>> y = 4
make a change to y
>>> print x
look at x
3
x will be changed as well
LING 5200, 2006
41
BASED on Kevin Cohen’s LING 5200
Assignment 3
Assume we have a name x that refers to a mutable object of some
user-defined class. This class has a “set” and a “get” function for some
value.
>>> x.getSomeValue()
4
We now create a new name y and set y=x.
>>> y = x
This creates a new name y which points to the same memory reference
as the name x. Now, if we make some change to y, then x will be
affected as well.
>>> y.setSomeValue(3)
>>> y.getSomeValue()
3
>>> x.getSomeValue()
3
LING 5200, 2006
42
BASED on Kevin Cohen’s LING 5200
Assignment 4


Because mutable data types can be changed in place without
producing a new reference every time there is a modification,
then changes to one name for a reference will seem to affect all
those names for that same reference. This leads to the
behavior on the previous slide.
Passing Parameters to Functions:


When passing parameters, immutable data types appear to
be “call by value” while mutable data types are “call by
reference.”
(Mutable data can be changed inside a function to which they
are passed as a parameter. Immutable data seems
unaffected when passed to functions.)
LING 5200, 2006
43
BASED on Kevin Cohen’s LING 5200
Multiple Assignment

You can also assign to multiple names at the same
time.
>>> x, y = 2, 3
>>> x
2
>>> y
3
LING 5200, 2006
44
BASED on Kevin Cohen’s LING 5200
Basic Datatypes

Integers (default for numbers)
z=5/2

# Answer is 2, integer division.
Floats
x = 3.456

Strings ‘the movie “gladiator”’
Can use “” or ‘’ to specify. “abc” ‘abc’ (Same thing.)
Unmatched ones can occur within the string. “matt’s”
Use triple double-quotes for multi-line strings or strings
than contain both ‘ and “ inside of them: “““a‘b“c”””
LING 5200, 2006
45
BASED on Kevin Cohen’s LING 5200
Python and Types
Python determines the data types
in a program automatically.
“Dynamic Typing”
But Python’s not casual about types, it
enforces them after it figures them out.
“Strong Typing”
So, for example, you can’t just append an integer to a string. You
must first convert the integer to a string itself.
x = “the answer is ” # Decides x is string.
y = “23”
# Decides y is integer.
print x + y
# Python will complain about this.
LING 5200, 2006
46
BASED on Kevin Cohen’s LING 5200
Numerical Operations
47
Numerical operations


Integer and float additions

4+6

4 + 6.0
Will the results be the same?

5/3

5 / 3.0
LING 5200, 2006
48
BASED on Kevin Cohen’s LING 5200
Numerical operations



Integer and float additions

4+6

4 + 6.0
Will the results be the same?

5/3

5 / 3.0
Python will first convert the operands up to the
most complicated operand, and then perform the
math to the same-type operands
LING 5200, 2006
49
BASED on Kevin Cohen’s LING 5200
Operator precedence

Numerical operator precedence


*, /, //, %, +, -
Use parentheses to break precedence

(3 + 5) * 6
LING 5200, 2006
50
BASED on Kevin Cohen’s LING 5200
String Operations
51
Using +
name = “John Doe"
print "hello, ” + name
+ means concatenation when applied to a string
LING 5200, 2006
52
BASED on Kevin Cohen’s LING 5200
Using *
name = “John Doe"
print "hello, ” * 3 + name
* Means repetition when applied to a string
LING 5200, 2006
53
BASED on Kevin Cohen’s LING 5200
Index and slice

str = “counterterrorism”

str[4]

str[-4]

str[0:7]

Str[7:13]

Str[13:16]
LING 5200, 2006
54
BASED on Kevin Cohen’s LING 5200
Index and slice
0 1
c
2
o
3
u
4
n
5 … …
t
e
r
15 16
t
… …
LING 5200, 2006
e
r
r
o
r
i
s m
-3 -2 -1
55
BASED on Kevin Cohen’s LING 5200
len and replace

len(str)

str.replace(‘ism’, ‘ist’)

str.replace(‘ism’, ‘’)

What is the value of str?

Str1 = str.replace(‘ism’, ‘’)

replace(‘ism’, ‘ist’), what happens?

str.len(str), what happens?
replace is a method of the “str” object,
len is a built-in function
LING 5200, 2006
56
BASED on Kevin Cohen’s LING 5200
String Operations

We can use some methods built-in to the string
data type to perform some formatting operations
on strings:
>>> “hello”.upper()
‘HELLO’

There are many other handy string operations
available. Check the Python documentation for
more.
LING 5200, 2006
57
BASED on Kevin Cohen’s LING 5200
String Formatting Operator: %

The operator % allows us to build a string out of
many data items in a “fill in the blanks” fashion.



Also allows us to control how the final string output
will appear.
For example, we could force a number to display with a
specific number of digits after the decimal point.
It is very similar to the sprintf command of C.
LING 5200, 2006
58
BASED on Kevin Cohen’s LING 5200
Formatting Strings with %
>>> x = “abc”
>>> y = 34
>>> “%s xyz %d” % (x, y)
‘abc xyz 34’

The tuple following the % operator is used to fill in
the blanks in the original string marked with %s or
%d.

Check Python documentation for whether to use %s, %d, or
some other formatting code inside the string.
LING 5200, 2006
59
BASED on Kevin Cohen’s LING 5200
Printing with Python


You can print a string to the screen using “print.”
Using the % string operator in combination with the
print command, we can format our output text.
>>> print “%s xyz %d”
abc xyz 34
%
(“abc”, 34)
“Print” automatically adds a newline to the end of the string. If
you include a list of strings, it will concatenate them with a
space between them.
>>> print “abc” >>> print “abc”, “def”
abc
abc def
LING 5200, 2006
60
BASED on Kevin Cohen’s LING 5200
Getting more practice: Python for
Linguists

http://verbs.colorado.edu/~xuen/teaching
/ling5200/PythonForLinguists/Python1.pdf
LING 5200, 2006
61
BASED on Kevin Cohen’s LING 5200
More Python resources

http://docs.python.org/tut/tut.html
LING 5200, 2006
62
BASED on Kevin Cohen’s LING 5200