Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ICL LAB ZERO Introduction to Python James Curran Contact Info Ben Hachey job: Contract Research Staff, Institute for Communicating and Collaborative Systems office: B02 BP6 (Basement, Room 02, 6 Buccleuch Place) office phone number: 0131 650 4656 email: [email protected] James Curran job: 3rd Year PhD student, Institute for Communicating and Collaborative Systems office: 3R14 BP2 (3rd Floor Right, Room 14, 2 Buccleuch Place) office phone number: 0131 650 4431 web: http://www.cogsci.ed.ac.uk/∼jamesc/ email: [email protected] It is best to contact the tutors by email first if you have questions. What is Python? Python is a script language developed by Guido van Rossum at CNRI (Corporation for National Research Initiatives) in the early 1990’s. The language is named after the BBC show “Monty Python’s Flying Circus”. You will find there are frequent (and gratuitous) references to Monty Python skits in the Python documentation (but thankfully not in these notes). The Python Language Reference Manual sums up the language features nicely: Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for rapid application development, as well as for use as a scripting or glue language to connect existing components together. Python’s simple, easy to learn syntax emphasises readability and therefore reduces the cost of program maintenance. Python supports modules and packages, which encourages program modularity and code reuse. The Python interpreter and the extensive standard library are available in source or binary form without charge for all major platforms, and can be freely distributed. I will not explain what all this means here, but hopefully by the end of the course you will understand many of the important properties of Python that are alluded to in this description. 1 Why use Python? Python makes the development of short to medium length programs easy. Python syntax is very simple (in fact many people call it executable pseudo-code), which helps make the programs easy to read, debug and maintain. Variables don’t require declaration in Python which makes the code shorter and often clearer. Since the scripts are interpreted, they don’t take time to be compiled (which for a large system can be very time consuming) and the code itself can be manipulated as the program runs. Further, in Python experimentation is made easy by the fact that code can be typed directly into the interpreter and run as it is entered. All types of values in Python can be printed which makes debugging and tracing easier. Thus Python is great for the newbie, the curious and the experimenter. Python scripts can be run without change on any system which has a Python interpreter installed, which makes the scripts fully portable. The Python interpreter and the extensive standard library are freely available in source or binary form for all major platforms from the Python web site, http://www.python.org, and can be freely distributed. Python’s free documentation has long been considered to be excellent (particularly for a free programming language), and there is a large, active and very helpful community of Python programmers on the web. There are also many Python tutorials available on the web. Python has built in support for many standard data structures which compiled languages typically lack built in support for. This means more convenient syntax can be used for common operations. Python also has a comprehensive standard library supporting text and HTML/XML processing, network access and operating system services. Members of the Python user community often distribute their own work in the form of Python modules that collect common functionality together in one place. The Python website contains pointers to many free third party Python modules, programs and tools, and additional documentation. Examples of these modules include Graphical User Interface (GUI) components, matrix/vector support and the Natural Language Toolkit (NLTK), which we will be using for this course. Starting Python One of the nicest things about Python (which is reminiscent of the glory days of BASIC on the Commodore 64, VIC 20 or BBC Micro) is the ability to type a program straight into the Python interpreter and have it run as you go. This is one of the reasons that Python is so great to learn. If you want to try something just run the Python interpreter, by typing python, and then pressing Enter at the command (or shell) prompt in the terminal window. Example 1 These notes will use my DICE shell prompt, which consists of the machine I am running on tarski and my user id s0090160. Your shell prompt will have a different machine name in square brackets followed by s, and then your matriculation number. [tarski]s0090160: python2.2 Python 2.2 (#1, Aug 23 2002, 15:36:47) [GCC 2.96 20000731 (Red Hat Linux 7.1 2.96-85)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> print ’Hello World’ Hello World >>> In examples where you have to type things into the shell or the interpreter directly, what you are required to type is in bold teletype font, and the shell’s or interpreter’s responses are in normal teletype. The Python interpreter prints its version and copyright information when it is first started for interactive use. The >>> is the prompt which means python is waiting for a statement to be entered. Typing in print ’Hello World’ and pressing Enter causes the interpreter to read and execute the statement. The result is the Hello World printed on the next line. After executing the statement, python returns to waiting for another statement, and hence there is another >>> prompt. And before you know it, you have written your first Python program, the ubiquitous ’Hello World’. 2 In order to exit from the Python interpreter, enter the key sequence Ctrl-D (i.e., while holding the control key down, press the key d). Most of the time I will provide each example in a separate file, because it saves typing them out every time you want to run them and I need to type them out to check them anyway. These files are named after the example number in the notes. However, for this week we will do everything interactively so you get the idea of typing things directly into Python. NB: On many Unix systems, Python version 2.2 (which we will require to run NLTK properly) is not the default version of Python, so typing python rather than python2.2 (or on some systems python2) may give you the wrong version. For instance, on the DICE machines and almost all current versions of Linux, python will give you version 1.5.2, which will not run NLTK. Python as a Calculator To further stress the value of typing code directly into the Python interpreter for experimentation purposes, you should have a go at using Python as a calculator. I fire up the Python interpreter all the time, since it is more convenient that having to run the standard graphical calculator program: >>> 123*34 + 3 4185 >>> 1834.34/34.5 - 4 49.169275362318835 >>> 1/2 0 >>> 1/2.0 0.5 Notice that Python prints the result of each command or statement on the following line and then waits for more statements. The second thing to notice is that multiply uses the ∗ rather than ×. The final, and perhaps most mysterious thing to notice is the difference between the results of 1/2 and 1/2.0. We will get back to why this is the case in the next lab session. Hello, Who are you? The previous ‘Hello World’ program is a bit too simple (and impersonal). Programs that do not accept information from the user (or the outside world) can only do so many interesting things. As a first step, we will ask the user their name and greet them personally. Also, this program is quite a bit bigger, so it is worth typing it into a file and making it run like an independent program. Example 2 This example asks the user for their name and then greets them personally. It shows two different ways of achieving this in Python. The first step is to open your favourite text editor (probably Emacs) and type in the program below, and then save it as example1.py 3 #!/usr/bin/python2.2 # tute0/example1.py name = raw_input(’Enter your name? ’) print ’Hello ’ + name print ’Hello %s’ % name import sys print ’Enter your name?’ name = sys.stdin.readline() print ’Hello %s’ % name print ’Hello "%s"’ % name name = name[:-1] print "Hello ’%s’" % name # prompt and read user’s name # string concatenation # format string substitution # prompt for user’s name # read user’s name # double quotes within single quotes # remove the newline character # single quotes within double quotes Running this example now gives the following output (remember to answer the questions (the bits in bold) otherwise it will just sit there): [tarski]s0090160: python2.2 example1.py Enter your name? James Hello James Hello James Enter your name? James Hello James Hello "James " Hello ’James’ This script can also be run without having to invoke the Python interpreter: [tarski]s0090160: ./example1.py This example shows one way of making a Python script appear stand-alone (that is, you don’t have to type python before the name of the program). To make this happen, there are two steps to the process: make the first line of the script tell Unix how to run the script by specifying which program to run to interpret the rest of the file. #!, pronounced hash-bang, tells the operating system (only in Linux and Unix that is) that the file is a script that must be interpreted by the following program. The /usr/bin/python2.2 is the full path (location in the directory hierarchy) of the interpreter, in our case the Python 2.2 interpreter. mark the file as executable using the chmod (1) (change mode) command1 . The command chmod +rx example1.py marks the file example1.py as readable and executable for all users on the system. Doing this is called changing the file permissions to executable and needs only be done to the file once. This program shows two different ways of reading information from the user. The most simple and direct approach is to call the raw_input function, which needs to be given a string which it prompts the user with. The next chapter fully describes creating and using functions, but for now, a function is a piece of code with a name that does a particular task. The raw_input function waits for the user to type in a string and press Enter. Once we have retrieved the name from the user (which is returned to us in a string), we need to store it somewhere. This ‘somewhere’ is called a variable A variable is a like a mailbox with a name, you can store things in the mailbox and look at the contents of the mailbox. Internally the variable is just a piece of memory with a number, but Python makes it easier for us to remember the piece of memory by giving it a name rather than a number. Another way of thinking about variables is that they are like pronumerals in algebra – they are designed to hold and give a name to changeable bits of information. 1 the bracketed number following a Unix command is customary. It refers to the set of manual pages that describe the command. To see this try typing man chmod at the shell prompt, the manual entry for chmod comes from section 1 of the manual pages (the section is shown in the top left and right corners of the manual page). 4 Variables are created in Python by assignment, which is the process of setting the value of a variable (or putting something in the mailbox). Assignment is signified by the = operator. Be careful not to confuse with assignment with equality in mathematics. To avoid confusion, always think about assignment as taking the value calculated on the right of the equals sign and placing into the variable on the left. The next two lines print out the message with the contents of the variable using concatenation (as we have seen above) and a format string. A format string is a like a template for making new strings by substituting values into the template. The %s in the format string ’Hello %s’ is replaced by a string value (the s in %s stands for string) which must be placed after the % that follows the format string. More will be said about this below when we describe strings in detail. The second chunk of Python shows a more general way of performing the same task. It involves reading the name from the user using file operations. 5 Python in Action The following example gives you a taste of what you can do very simply with Python. This program will extract all of the text from the main Informatics web page: #!/usr/bin/python2.2 import urllib import re URL = ’http://www.informatics.ed.ac.uk’ TAGS = re.compile(’<[ˆ>]+>’) WS = re.compile(’\w+’) url = urllib.urlopen(URL) html = url.read() text = TAGS.sub(’’, html) words = WS.findall(text) for word in words: print word Type this program in, save it as example2.py and then make it executable. Here are some things you can try: make the URL selectable by the user convert the words to lowercase count the number of times each word appears extract the URLs from the page rather than the words Python Resources http://www.python.org/ — main Python portal Python binaries i.e. the interpreter, tutorials, reference manuals, and interesting Python modules. I recommend reading the tutorial, but it isn’t really designed for total beginners. http://greenteapress.com/thinkpython.html — How to think like a Computer Scientist free online book that teaches programming (Python version). Hard copy of this is available from the ITO. http://www.onlamp.com/python/ - O’Reilly Python portal contains parts of the Learning Python and Programming Python reference books online, and source code, discussions etc. http://www.cogsci.ed.ac.uk/∼jamesc/icl/ccss2002.pdf - James Curran’s CCSS notes http://www.cogsci.ed.ac.uk/∼jamesc/icl/ccss2002.tgz - James Curran’s CCSS examples 6