Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Python Building a Web Crawler in Python CSE, HKUST Feb 13 Why Python? • Easy to learn, yet powerful • Emphasizes readability • Great as both a scripting/glue language and for full-blown application development • "Scales with the ability of the programmer” • … reference: http://www.python.org/doc/essays/blurb/ Introduction to Python • For Mac OS/most of the Linux and Unix users: • Python has already been installed • For Windows users: • You can follow the instructions here: http://www.howtogeek.com/197947/how-to-install-python-onwindows/ • Download your python here: https://www.python.org/downloads/windows/ Running Python There are many ways Python can be used: • Interactively • Run the python program with no arguments and end up at something like this: • Type “python” at your command line, then ENTER • Write your codes (e.g., print “hello world”) • Type “exit()” to exit • Useful for succinct tests, debugging, and for demonstrations Running Python There are many ways Python can be used: • Non-interactively • Write a script (a text file) and run it at the command line with python MYSCRIPT.py, maybe adding arguments and other options. • Example: • Write a script and name it helloworld.py • Run it at command line (under the directory where you put the hellowprld.py) using the command “python helloworld.py” Running Python There are many ways Python can be used: • Using an IDE (Interactive Development Environment) • A hybrid of the above — save work in script files, but maintain an interactive session for running them and debugging • Python comes with its own IDE named idle. • Other IDE: https://wiki.python.org/moin/IntegratedDevelopmentEnvironments • Pydev with Eclipse • PyCharm • Wing IDE • Komodo IDE • VIM • Sublime Text • … Using Python Interactively • First some explanation of terminology and notation: • Text written in the Python language (or any language) is generically referred to as source code. • >>> means the interpreter is waiting for input (more code). • So does ... — specifically, as a continuation of the previous code. • Idle's interactive interpreter actually just indents instead of using the ... notation. • Indentation, even TABs vs. SPACEs, matters!!! (More on that later.) Comments in Python Python code can be sprinkled with comments that are ignored by the interpreter. The Single Line Comments in Python... • begin with the # character, • can be on lines by themselves or follow on the same lines as code, • take effect until the end of the line. The Multiple Line Comments in Python... • begin with ’’’ , end with ‘’’’. Numbers • • • • • • • • • • • • The interpreter prints whatever your code evaluates to. Since python has all sorts of numeric types, it can be used as a simple calculator. the operators +, -, * and / work like in most other languages parentheses can be used for grouping >>> 2+2 4 >>> (50-5*6)/4 5 >>> 7/3 # integer division returns the floor: 2 >>> 2**3 # exponentiation 8 Numbers Python has a float type for floating point numbers: >>> 3 * 3.75 / 1.5 7.5 >>> 42 * 1.234e-2 0.51828 Operators with mixed type operands convert the integer operand to floating point: >>> 7.0 / 2 3.5 >>> 7 / 2.0 3.5 Numbers • Python also has a long type for arbitrary-length integers; conversion is usually automatic if necessary: >>> 2**70 1180591620717411303424L • Python also support octal and hex bases, complex, numbers, etc. • Python.org documentation on numeric types is at: http://docs.python.org/library/stdtypes.html. Variable Assignment • The equal sign (=) is used to assign a value to a variable (often more appropriate to read this as gets rather then equals): >>> width = 20 >>> height = 5*9 >>> width * height 900 • Python developers often like to use assignment shorthand: >>> x = y = 0 >>> x 0 >>> y 0 >>> x, y = 4, 2 >>> x 4 >>> y 2 Variables and Types • Note that we never declared the type of the variable — Python uses dynamic typing and dynamic binding: >>> x = 1 >>> type(x) #this returns the type of x rather than its value <type 'int'> >>> x = 1.2 >>> type(x) <type 'float'> • Variables aren't defined until you give them a value: >>> n Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'n' is not defined Strings • Python strings hold text data: • Strings can be enclosed in single- or double-quotes (unlike bash and perl, there is no significant difference). • The \ escape character can be used to embed quotes within strings. • The print statement nicely prints strings to the screen. >>> print 'spam eggs' spam eggs >>> print 'doesn\'t' doesn't >>> print "doesn't" doesn't >>> print '"Yes," he said.' "Yes," he said. >>> print "\"Yes,\" he said." "Yes," he said. >>> print '"Isn\'t," she said.’ "Isn't," she said. Strings The \ is also used to write the newline (\n), tab (\t), and other special characters: >>> print 'line one\nline two\nline three' line one line two line three >>> print 'topic\n\tsub1\n\tsub2' #(actual tabs would work here, too) topic sub1 sub2 Prefix a string literal with r (for raw) to not have it treat any characters as special (this will come in very handy later with regular expressions): >>> print r'topic\n\tsub1\n\tsub2' topic\n\tsub1\n\tsub2 String Operations The + operator works on strings, too (called concatenation), as does *: >>> word = 'Help' + 'A' >>> word 'HelpA' >>> '<' + word*5 + '>' '<HelpAHelpAHelpAHelpAHelpA>' Note unlike some other languages, there is no separate type for a single character — all text is string: >>> type('c') <type 'str'> String Operations Strings can be subscripted (indexed), and sliced. Python indexing starts at 0. >>> word[4] 'A' >>> word[0:2] 'He' >>> word[:2] # The first index defaults to zero 'He' >>> word[2:4] 'lp' >>> word[2:] # The last index defaults to the end of the string 'lpA' And negative indices count backwards from the end: >>> word[-1] # The last character 'A' >>> word[:-2] # Everything except the last two characters 'Hel' String Methods Strings are objects with methods. We'll define this later, but here are some examples: >>> print 'a' + ' foo '.strip() + 'z' afooz >>> first, last = 'George Washington'.split() >>> first 'George' >>> last 'Washington' >>> 'abcdefghijklmnopqrstuvwxyz'.find('m') 12 See these for everything you want to know about Python strings. •http://docs.python.org/library/string.html •http://docs.python.org/library/stdtypes.html#string-methods Control Structures – if/else Of course, rarely is just a sequence of value manipulations useful, we need control structures to build logic into a program. Perhaps the most well-known control structure is the if statement. Here's Python's: >>> x = 5 >>> if x < 0: ... print 'negative' ...elif x == 0: ... print 'zero' ...else: ... print 'positive' ... positive This introduces several concepts, including comparison operators (<, >, == (equal to — don't confuse with assignment), <, >=, and != (not equal to)) and Boolean values: >>> x < 0 False >>> x > 0 True and Indentation. Indentation Unlike most other programming language, indentation matter — it's Python's way of grouping statements (unlike the curly braces using in, for example, C). •The body of a control structure must be uniformly indented (Idle, and many other syntax-aware editors, automatically indent). •When a compound statement is entered interactively, it must be followed by a blank line to indicate completion (since the parser cannot guess when you have typed the last line). •TABs vs. SPACEs, matters!!! >>> if True: ... print 'x' #leading space is a TAB ... print 'y' #leading space is four SPACEs File "<stdin>", line 3 print 'y' #leading space is four SPACEs ^ IndentationError: unindent does not match any outer indentation level Control Structures – while Here's another control structure, this time a loop: >>> # Fibonacci series: ... # the sum of two elements defines the next ... a, b = 0, 1 >>> while b < 10: ... print b ... a, b = b, a+b ... 1 1 2 3 5 8 It re-executes its body until its condition is False (if b were not updated, it would result in an infinite loop). Sequence - Lists In addition to numbers and strings, Python has several compound data types, used to group together other values. The most versatile is the list (a list of comma-separated values (items) between square brackets (list items need not all have the same type): >>> a = ['spam', 'eggs', 100, 1234] >>> a ['spam', 'eggs', 100, 1234] Lists are a type of sequence object; strings are sequences, too, and like strings, they can be indexed, sliced, concatenated, etc.: >>> a[0] 'spam' >>> a[1:3] ['eggs', 100] >>> a[:2] + ['bacon', 2*2] ['spam', 'eggs', 'bacon', 4] Sequence - Lists Lists are objects with methods, too (more on that later): >>> a.append(9.87) >>> a ['spam', 'eggs', 100, 1234, 9.87] >>> a.pop() 9.87 >>> a ['spam', 'eggs', 100, 1234] Sequence - Tuples It is possible to change individual elements of a list: >>> a ['spam', 'eggs', 100, 1234] >>> a[2] = a[2] + 99 >>> a ['spam', 'eggs', 199, 1234] We say that lists are mutable (strings are actually not mutable, you must make new strings). Python has a close cousin to the list called the tuple. Tuples are like lists, but the syntax uses parentheses instead of square brackets, and they're not mutable: >>> b = ('spam', 'eggs', 100, 1234) >>> b[2] = b[2] + 99 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment Sequence - Tuples Lists, tuples, and strings, like all sequences, have a length that can be determined using a built-in function: >>> s = 'supercalifragilisticexpialidocious' >>> len(s) 34 Control Structures - misc There are several other keywords used to • for We can loop over anything iterable using a for statement • break breaks out of the smallest enclosing for or while loop • continue continues with the next iteration of the loop (i.e. prematurely) • pass do nothing (used as a placeholder) Other Materials for Python Beginners • Useful introduction material for Python beginners: Harvard Python Workshop https://software.rc.fas.harvard.edu/training/workshop_intro_python/latest/index.psp#(01) The Python Tutorial (official) https://docs.python.org/3/tutorial/ Python for Beginners (official) https://www.python.org/about/gettingstarted/