Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Python LING 5200 Computational Corpus Linguistics Nianwen Xue 1 What's a programming language? Way of converting a text file to instructions for the machine LING 5200, 2006 2 BASED on Kevin Cohen’s LING 5200 What's a programming language? lexicon syntax Vs. natural languages: no ambiguity LING 5200, 2006 3 BASED on Kevin Cohen’s LING 5200 What does a program do? Take in data (input) Do something with it (processing) Produce output LING 5200, 2006 4 BASED on Kevin Cohen’s LING 5200 What does a program do? Input egrep '^[0-9]+\/' epw.cd your regex one or more files switches For each line in each file, determine whether or not it matches your regex Tell you about it LING 5200, 2006 5 BASED on Kevin Cohen’s LING 5200 Producing output in Python print "hello, world" LING 5200, 2006 6 BASED on Kevin Cohen’s LING 5200 Producing output print "hello, world" verb LING 5200, 2006 7 BASED on Kevin Cohen’s LING 5200 Producing output print "hello, world" noun (object) LING 5200, 2006 8 BASED on Kevin Cohen’s LING 5200 Producing output Filename: helloWorld.py What do the file's permissions need to be? LING 5200, 2006 9 BASED on Kevin Cohen’s LING 5200 Producing output babel>./helloWorld.py ./helloWorld.py: line 1: print: command not found LING 5200, 2006 10 BASED on Kevin Cohen’s LING 5200 Producing output #!/usr/local/bin/python print "hello, world" “The magic line” LING 5200, 2006 11 BASED on Kevin Cohen’s LING 5200 Producing output babel>./helloWorld.py hello, worldbabel> LING 5200, 2006 12 BASED on Kevin Cohen’s LING 5200 Producing output #!/usr/local/bin/python print "hello, world\n"; "escape" character LING 5200, 2006 13 BASED on Kevin Cohen’s LING 5200 Producing output \t tab \n "newline" LING 5200, 2006 14 BASED on Kevin Cohen’s LING 5200 Comments “Not comments” #!/usr/local/bin/python # # # # # the purpose of this program is to print "hello, world" to the screen. author:[email protected] 303-735-5383 # do the actual printing print "hello, world\n" LING 5200, 2006 15 "Commenting" your code BASED on Kevin Cohen’s LING 5200 Comments # # # # # #1 use for comments: adding notes to yourself/other programmers the purpose of this program is to print "hello, world" to the screen. author: [email protected] 303-735-5383 LING 5200, 2006 16 BASED on Kevin Cohen’s LING 5200 Comments Other use: causing Python to ignore a line "Commenting out" a line of code print "goodbye, cruel world\n"; # print "hello, world\n"; LING 5200, 2006 17 BASED on Kevin Cohen’s LING 5200 Comments Own-line or end-of-line formats # print it print "hello, world\n" print "hello, world\n" # print it LING 5200, 2006 18 BASED on Kevin Cohen’s LING 5200 Comments Start comments with # – the rest of line is ignored. Can include a “documentation string” as the first line of any new function or class that you define. The development environment, debugger, and other tools use it: it’s good style to include one. def my_function(x, y): “““This is the docstring. This function does blah blah blah.””” # The code would go here... LING 5200, 2006 19 BASED on Kevin Cohen’s LING 5200 Whitespace Whitespace is meaningful in Python: especially indentation and placement of newlines. Use a newline to end a line of code. (Not a semicolon like in C++ or Java.) (Use \ when must go to next line prematurely.) No braces { } to mark blocks of code in Python… Use consistent indentation instead. The first line with a new indentation is considered outside of the block. Often a colon appears at the start of a new block. (We’ll see this later for function and class definitions.) LING 5200, 2006 20 BASED on Kevin Cohen’s LING 5200 Getting input From the user input = raw_input(‘Your name please:\n’) Print input From a file input_file = open(‘phone-numbers.txt’, “r”) Will learn what to do with a file later LING 5200, 2006 21 BASED on Kevin Cohen’s LING 5200 Producing output I'd like to print something different every once in a while… #!/usr/local/bin/python #my first python program print “hello world\n” name = raw_input(“Your name please:”) print name LING 5200, 2006 22 BASED on Kevin Cohen’s LING 5200 Variables Name Contents Location in memory LING 5200, 2006 23 BASED on Kevin Cohen’s LING 5200 Variables Name (name) Contents (Kinder) Location in memory (13025) $name LING 5200, 2006 24 BASED on Kevin Cohen’s LING 5200 Good and bad names 1stnumber = 32 Print = 32 Large-number = 123456789 Dir:subdir = “/home/corpora” LING 5200, 2006 25 BASED on Kevin Cohen’s LING 5200 Naming Rules Names are case sensitive and cannot start with a number. They can contain letters, numbers, and underscores. bob Bob _bob _2_bob_ bob_2 BoB There are some reserved words: and, assert, break, class, continue, def, del, elif, else, except, exec, finally, for, from, global, if, import, in, is, lambda, not, or, pass, print, raise, return, try, while LING 5200, 2006 26 BASED on Kevin Cohen’s LING 5200 Accessing Non-existent Name If you try to access a name before it’s been properly created (by placing it on the left side of an assignment), you’ll get an error. >>> y Traceback (most recent call last): File "<pyshell#16>", line 1, in -toplevely NameError: name ‘y' is not defined >>> y = 3 >>> y 3 LING 5200, 2006 27 BASED on Kevin Cohen’s LING 5200 Names and References 1 Python has no pointers like C or C++. Instead, it has “names” and “references”. (Works a lot like Lisp or Java.) You create a name the first time it appears on the left side of an assignment expression: x = 3 Names store “references” which are like pointers to locations in memory that store a constant or some object. Python determines the type of the reference automatically based on what data is assigned to it. It also decides when to delete it via garbage collection after any names for the reference have passed out of scope. LING 5200, 2006 28 BASED on Kevin Cohen’s LING 5200 Names and References 2 There is a lot going on when we type: x = 3 First, an integer 3 is created and stored in memory. A name x is created. An reference to the memory location storing the 3 is then assigned to the name x. Name: x Ref: <address1> Type: Integer Data: 3 name list LING 5200, 2006 memory 29 BASED on Kevin Cohen’s LING 5200 Names and References 3 The data 3 we created is of type integer. In Python, the basic data types integer, float, and string are “immutable.” This doesn’t mean we can’t change the value of x… For example, we could increment x. >>> x = 3 >>> x = x + 1 >>> print x 4 LING 5200, 2006 30 BASED on Kevin Cohen’s LING 5200 Names and References 4 If we increment x, then what’s really happening is: The reference of name x is looked up. The value at that reference is retrieved. The 3+1 calculation occurs, producing a new data element 4 which is assigned to a fresh memory location with a new reference. The name x is changed to point to this new reference. The old data 3 is garbage collected if no name still refers to it. Type: Integer Data: 3 Name: x Ref: <address1> LING 5200, 2006 31 BASED on Kevin Cohen’s LING 5200 Names and References 4 If we increment x, then what’s really happening is: The reference of name x is looked up. The value at that reference is retrieved. The 3+1 calculation occurs, producing a new data element 4 which is assigned to a fresh memory location with a new reference. The name x is changed to point to this new reference. The old data 3 is garbage collected if no name still refers to it. Type: Integer Data: 3 Name: x Ref: <address1> LING 5200, 2006 Type: Integer Data: 4 32 BASED on Kevin Cohen’s LING 5200 Names and References 4 If we increment x, then what’s really happening is: The reference of name x is looked up. The value at that reference is retrieved. The 3+1 calculation occurs, producing a new data element 4 which is assigned to a fresh memory location with a new reference. The name x is changed to point to this new reference. The old data 3 is garbage collected if no name still refers to it. Type: Integer Data: 3 Name: x Ref: <address2> Type: Integer Data: 4 LING 5200, 2006 33 BASED on Kevin Cohen’s LING 5200 Names and References 4 If we increment x, then what’s really happening is: The reference of name x is looked up. The value at that reference is retrieved. The 3+1 calculation occurs, producing a new data element 4 which is assigned to a fresh memory location with a new reference. The name x is changed to point to this new reference. The old data 3 is garbage collected if no name still refers to it. Name: x Ref: <address2> Type: Integer Data: 4 LING 5200, 2006 34 BASED on Kevin Cohen’s LING 5200 Assignment 1 So, for simple built-in datatypes (integers, floats, strings), assignment behaves as you would expect: >>> >>> >>> >>> 3 x = 3 y = x y = 4 print x LING 5200, 2006 # # # # Creates 3, name Creates name y, Creates ref for No effect on x, 35 x refers to 3 refers to 3. 4. Changes y. still ref 3. BASED on Kevin Cohen’s LING 5200 Assignment 1 So, for simple built-in datatypes (integers, floats, strings), assignment behaves as you would expect: >>> >>> >>> >>> 3 x = 3 y = x y = 4 print x # # # # Creates 3, name Creates name y, Creates ref for No effect on x, Name: x Ref: <address1> LING 5200, 2006 x refers to 3 refers to 3. 4. Changes y. still ref 3. Type: Integer Data: 3 36 BASED on Kevin Cohen’s LING 5200 Assignment 1 So, for simple built-in datatypes (integers, floats, strings), assignment behaves as you would expect: >>> >>> >>> >>> 3 x = 3 y = x y = 4 print x # # # # Creates 3, name Creates name y, Creates ref for No effect on x, Name: x Ref: <address1> x refers to 3 refers to 3. 4. Changes y. still ref 3. Type: Integer Data: 3 Name: y Ref: <address1> LING 5200, 2006 37 BASED on Kevin Cohen’s LING 5200 Assignment 1 So, for simple built-in datatypes (integers, floats, strings), assignment behaves as you would expect: >>> >>> >>> >>> 3 x = 3 y = x y = 4 print x # # # # Creates 3, name Creates name y, Creates ref for No effect on x, Name: x Ref: <address1> Type: Integer Data: 3 Name: y Ref: <address1> LING 5200, 2006 x refers to 3 refers to 3. 4. Changes y. still ref 3. Type: Integer Data: 4 38 BASED on Kevin Cohen’s LING 5200 Assignment 1 So, for simple built-in datatypes (integers, floats, strings), assignment behaves as you would expect: >>> >>> >>> >>> 3 x = 3 y = x y = 4 print x # # # # Creates 3, name Creates name y, Creates ref for No effect on x, Name: x Ref: <address1> Type: Integer Data: 3 Name: y Ref: <address2> LING 5200, 2006 x refers to 3 refers to 3. 4. Changes y. still ref 3. Type: Integer Data: 4 39 BASED on Kevin Cohen’s LING 5200 Assignment 1 So, for simple built-in datatypes (integers, floats, strings), assignment behaves as you would expect: >>> >>> >>> >>> 3 x = 3 y = x y = 4 print x # # # # Creates 3, name Creates name y, Creates ref for No effect on x, Name: x Ref: <address1> Type: Integer Data: 3 Name: y Ref: <address2> LING 5200, 2006 x refers to 3 refers to 3. 4. Changes y. still ref 3. Type: Integer Data: 4 40 BASED on Kevin Cohen’s LING 5200 Assignment 2 But we’ll see that for other more complex data types assignment seems to work differently. We’re talking about: lists, dictionaries, user-defined classes. We will learn details about all of these type later. The important thing is that they are “mutable.” This means we can make changes to their data without having to copy it into a new memory reference address each time. immutable mutable >>> x = 3 x = some mutable object >>> y = x y=x >>> y = 4 make a change to y >>> print x look at x 3 x will be changed as well LING 5200, 2006 41 BASED on Kevin Cohen’s LING 5200 Assignment 3 Assume we have a name x that refers to a mutable object of some user-defined class. This class has a “set” and a “get” function for some value. >>> x.getSomeValue() 4 We now create a new name y and set y=x. >>> y = x This creates a new name y which points to the same memory reference as the name x. Now, if we make some change to y, then x will be affected as well. >>> y.setSomeValue(3) >>> y.getSomeValue() 3 >>> x.getSomeValue() 3 LING 5200, 2006 42 BASED on Kevin Cohen’s LING 5200 Assignment 4 Because mutable data types can be changed in place without producing a new reference every time there is a modification, then changes to one name for a reference will seem to affect all those names for that same reference. This leads to the behavior on the previous slide. Passing Parameters to Functions: When passing parameters, immutable data types appear to be “call by value” while mutable data types are “call by reference.” (Mutable data can be changed inside a function to which they are passed as a parameter. Immutable data seems unaffected when passed to functions.) LING 5200, 2006 43 BASED on Kevin Cohen’s LING 5200 Multiple Assignment You can also assign to multiple names at the same time. >>> x, y = 2, 3 >>> x 2 >>> y 3 LING 5200, 2006 44 BASED on Kevin Cohen’s LING 5200 Basic Datatypes Integers (default for numbers) z=5/2 # Answer is 2, integer division. Floats x = 3.456 Strings ‘the movie “gladiator”’ Can use “” or ‘’ to specify. “abc” ‘abc’ (Same thing.) Unmatched ones can occur within the string. “matt’s” Use triple double-quotes for multi-line strings or strings than contain both ‘ and “ inside of them: “““a‘b“c””” LING 5200, 2006 45 BASED on Kevin Cohen’s LING 5200 Python and Types Python determines the data types in a program automatically. “Dynamic Typing” But Python’s not casual about types, it enforces them after it figures them out. “Strong Typing” So, for example, you can’t just append an integer to a string. You must first convert the integer to a string itself. x = “the answer is ” # Decides x is string. y = “23” # Decides y is integer. print x + y # Python will complain about this. LING 5200, 2006 46 BASED on Kevin Cohen’s LING 5200 Numerical Operations 47 Numerical operations Integer and float additions 4+6 4 + 6.0 Will the results be the same? 5/3 5 / 3.0 LING 5200, 2006 48 BASED on Kevin Cohen’s LING 5200 Numerical operations Integer and float additions 4+6 4 + 6.0 Will the results be the same? 5/3 5 / 3.0 Python will first convert the operands up to the most complicated operand, and then perform the math to the same-type operands LING 5200, 2006 49 BASED on Kevin Cohen’s LING 5200 Operator precedence Numerical operator precedence *, /, //, %, +, - Use parentheses to break precedence (3 + 5) * 6 LING 5200, 2006 50 BASED on Kevin Cohen’s LING 5200 String Operations 51 Using + name = “John Doe" print "hello, ” + name + means concatenation when applied to a string LING 5200, 2006 52 BASED on Kevin Cohen’s LING 5200 Using * name = “John Doe" print "hello, ” * 3 + name * Means repetition when applied to a string LING 5200, 2006 53 BASED on Kevin Cohen’s LING 5200 Index and slice str = “counterterrorism” str[4] str[-4] str[0:7] Str[7:13] Str[13:16] LING 5200, 2006 54 BASED on Kevin Cohen’s LING 5200 Index and slice 0 1 c 2 o 3 u 4 n 5 … … t e r 15 16 t … … LING 5200, 2006 e r r o r i s m -3 -2 -1 55 BASED on Kevin Cohen’s LING 5200 len and replace len(str) str.replace(‘ism’, ‘ist’) str.replace(‘ism’, ‘’) What is the value of str? Str1 = str.replace(‘ism’, ‘’) replace(‘ism’, ‘ist’), what happens? str.len(str), what happens? replace is a method of the “str” object, len is a built-in function LING 5200, 2006 56 BASED on Kevin Cohen’s LING 5200 String Operations We can use some methods built-in to the string data type to perform some formatting operations on strings: >>> “hello”.upper() ‘HELLO’ There are many other handy string operations available. Check the Python documentation for more. LING 5200, 2006 57 BASED on Kevin Cohen’s LING 5200 String Formatting Operator: % The operator % allows us to build a string out of many data items in a “fill in the blanks” fashion. Also allows us to control how the final string output will appear. For example, we could force a number to display with a specific number of digits after the decimal point. It is very similar to the sprintf command of C. LING 5200, 2006 58 BASED on Kevin Cohen’s LING 5200 Formatting Strings with % >>> x = “abc” >>> y = 34 >>> “%s xyz %d” % (x, y) ‘abc xyz 34’ The tuple following the % operator is used to fill in the blanks in the original string marked with %s or %d. Check Python documentation for whether to use %s, %d, or some other formatting code inside the string. LING 5200, 2006 59 BASED on Kevin Cohen’s LING 5200 Printing with Python You can print a string to the screen using “print.” Using the % string operator in combination with the print command, we can format our output text. >>> print “%s xyz %d” abc xyz 34 % (“abc”, 34) “Print” automatically adds a newline to the end of the string. If you include a list of strings, it will concatenate them with a space between them. >>> print “abc” >>> print “abc”, “def” abc abc def LING 5200, 2006 60 BASED on Kevin Cohen’s LING 5200 Getting more practice: Python for Linguists http://verbs.colorado.edu/~xuen/teaching /ling5200/PythonForLinguists/Python1.pdf LING 5200, 2006 61 BASED on Kevin Cohen’s LING 5200 More Python resources http://docs.python.org/tut/tut.html LING 5200, 2006 62 BASED on Kevin Cohen’s LING 5200