Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
RUHR-UNIVERSITY BOCHUM Operating System Security Exercise No. 2, WS 09/10 Chair for System Security Exercise 2: A Brief Introduction Into the Python Programming Language Duration: 45min Maximum Points: 10 Introduction As the practical assignments requires coding work in Python, this exercise aims to give a very short introducing into the necessary parts of Python. For a complete reference, see the Python language reference [3]. Wikipedia [1] describes Pythons design philosophy as: It “emphasizes programmer productivity and code readability. Python’s core syntax and semantics are minimalistic, while the standard library is large and comprehensive”. As the syntax is minimalistic, it should be easy enough to learn, even if you don’t have any experiences before. For a more complete introduction into the Python programming language, see the Python tutorial [2]. Contents 0.1 0.2 0.3 0.4 0.5 Python Basics . . . . . . . . . . . . . . . . . . . 0.1.1 Pythons Integrated Documentation . . . 0.1.2 Syntax and Layout . . . . . . . . . . . . 0.1.3 Basic Commands and Constructs . . . . File And String Handling . . . . . . . . . . . . 0.2.1 File handling . . . . . . . . . . . . . . . 0.2.2 Basic String Handling . . . . . . . . . . Regular Expressions . . . . . . . . . . . . . . . 0.3.1 General Syntax of a Regular Expression 0.3.2 Literals . . . . . . . . . . . . . . . . . . 0.3.3 Quantifier . . . . . . . . . . . . . . . . . 0.3.4 Grouping . . . . . . . . . . . . . . . . . 0.3.5 Alternation . . . . . . . . . . . . . . . . 0.3.6 Greedy and Non Greedy Matching . . . 0.3.7 Backreferences . . . . . . . . . . . . . . 0.3.8 Regular Expressions in Python . . . . . Python Modules . . . . . . . . . . . . . . . . . How to Write and Execute Python Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 2 4 9 9 11 12 12 12 13 13 13 14 14 14 16 16 1 Homework (5 Points) 18 2 Practical Exercises (5 points) 18 1 RUHR-UNIVERSITY BOCHUM Operating System Security Exercise No. 2, WS 09/10 Chair for System Security 3 Bonus Exercises (+3 Points) 19 0.1 Python Basics 0.1.1 Pythons Integrated Documentation In Python, every build in and module function is (more or less) documented. One can add a documentation string to your function by adding a string enclosed by three double quotes directly below your def line, and you access the string via function.__doc__ or, formated, with the help(function) function. Example: 1 #! / u s r / b i n / python 2 3 import o s 4 5 help ( os . f o r k ) gives: 1 f o r k ( ) −> p i d 2 3 Fork a c h i l d p r o c e s s . 4 Return 0 t o c h i l d p r o c e s s and PID o f c h i l d t o p a r e n t p r o c e s s . Furthermore, you can browse available methods of an object (or a module) with the dir() function. This is, at most, useful in the python shell. For example all available methods of a string object can be listed by >>> dir(“string”) in the python shell. 0.1.2 Syntax and Layout Preamble Every Python script should let the system know which interpreter should be used to execute the script. While on Windows (and DOS), this is traditionally done by the file extension, on UNIX systems it is done by the first row of the file. If it contains a “she-bang” (the combination #!) followed by a path to the interpreter and is executable, the operating system will try to start the interpreter with the script as a parameter. A typical python preamble would be: 1 #! / u s r / b i n / python Statements: Python code usually has one statement per line. A statement may be terminated by a semicolon, but a simple newline is sufficient and the semicolon is usually omitted. 2 RUHR-UNIVERSITY BOCHUM Operating System Security Exercise No. 2, WS 09/10 Chair for System Security Comments: In Python, everything in a line following a # is treated as a comment. Thus, the preamble in the last paragraph is interpreted as a comment by the interpreter, too. As described before, it is not ignored by the system. Example: 1 #! / u s r / b i n / python 2 3 #t h i s i s a comment 4 5 print "huhu" # t h i s i s a comment too , b u t not t h e p r i n t ! Blocks: Unlike most other languages, python does not use curly brackets to define block starts and block ends. Blocks are entirely defined by indention1 , thus indention is very important in python. Consider the following example: 1 2 3 4 5 6 7 8 9 #! / u s r / b i n / python parameter = " oink , ␣ o i n k " def f u n c ( parameter1 , parameter2 = " t e s t " ) : print parameter2 print parameter1 f u n c ( " o p e r a t i n g ␣ system ␣ s e c u r i t y ␣ r o c k s ! " ) It produces: 1 test 2 o p e r a t i n g system s e c u r i t y r o c k s ! While this: 1 2 3 4 5 6 7 #! / u s r / b i n / python parameter = " oink , ␣ o i n k " def f u n c ( parameter1 , parameter2 = " t e s t " ) : print parameter2 print parameter1 #n o t e t h a t t h i s s t a t e m e n t i s not l o n g e r y indented . 8 9 f u n c ( " o p e r a t i n g ␣ system ␣ s e c u r i t y ␣ r o c k s ! " ) 1 usual coding-convention defines 4 spaces for indention 3 RUHR-UNIVERSITY BOCHUM Operating System Security Exercise No. 2, WS 09/10 Chair for System Security gives: 1 oink , o i n k 2 test 0.1.3 Basic Commands and Constructs print: The built-in command print can be used to print something to the standard output (stdout) followed by a newline. It is used in the previous section, thus the script in the previous section would print out “huhu” onto the standard-console. formatted print: Python’s print offers functionality similar to printf. The format string is followed by a %-character and a tuple of parameters. An example should illustrate this: 1 2 3 4 5 6 #! / u s r / b i n / python t e m p e r a t u r e = 10 conditions = " fine " print " Weather ␣ today : ␣ i t ␣ i s ␣%d␣ d e g r e e s ␣ c e l s i u s , ␣ t h e ␣ c o n d i t i o n s y ␣ a r e ␣%s " % ( temperature , c o n d i t i o n s ) import: The import statement is used to include other “modules”, comparable to libraries. Functions of the module can be accessed by using the <modulename>.-prefix after importing the module. Important library functions for the assignments are described in section 0.4. Here is an example for import: 1 2 3 4 5 6 #! / u s r / b i n / python import s y s #t h i s i m p o r t s t h e ‘ ‘ s y s ’ ’ − module s y s . s t d o u t . w r i t e ( "huhu\n" ) #d o e s t h e same as t h e p r i n t s t a t e m e n t a b o v e Variables and Data Types: Though Python is a strong typed language (you cannot use operations on incompatible types), it uses dynamic typing. Variables must not be declared, but can directly be assigned and take the type of the assignment. Assignments are done by the = operator. Example: 1 #! / u s r / b i n / python 2 3 var = " have ␣ you ␣mooed␣ today ? " #t h i s i s a s t r i n g o b j e c t 4 RUHR-UNIVERSITY BOCHUM Operating System Security Exercise No. 2, WS 09/10 Chair for System Security 4 var2 = 42 #i n t e g e r o b j e c t 5 var3 = E x c e p t i o n ( ) #new o b j e c t o f t y p e " E x c e p t i o n " 6 var2 = var #v ar 2 i s now a s t r i n g o b j e c t , t o o Operators on Variables: In Python, every data type is a class, thus every variable (and even static values) represents an object. The operators are defined by special functions of that object, for example: var = 1 + 2 is the same as var = 1.__add__(2). As you can see, the meaning of the operator can be different depending on the object type you use it with. For standard number objects, this works as expected, for string-types (and arrays, tuples, etc) the + operator does concatenation. You can redefine the meaning of the operator (“operator overloading”) by creating a new class for example with another __add__ function. This could for example be useful if you want to implement matrix classes and the + operator should be the matrix addition, the * operator should be the matrix multiplication and the / operator should be the multiplication with the inverse matrix. It is obvious that the types of the classes on which some operation is applied must be compatible. It is not obvious what should happen if you add a string to an integer. However, sometimes you want to concatenate the string representation of an integer to a string. This isn’t possible directly (as the object types aren’t compatible). The nonstring object must be converted to its string representation first. This can be done with the constructor of a string object str() which implicitly calls the __str__ method of the object. Consider this example: 1 2 3 4 5 6 7 8 9 10 11 12 13 #! / u s r / b i n / python var1 = " have ␣ you ␣mooed␣ today ?\ nyes , ␣ " var2 = " t i m e s " var3 = 5 print var1 + s t r ( var3 ) + var2 # i s equivalent to # p r i n t va r1 + va r3 . __str__ ( ) + va r2 print var1 + 7 . __str__ ( ) + var2 # i s equivalent to # p r i n t va r1 + s t r ( 7 ) + v ar 2 Standard Classes: Integer An integer number. Float A floating-point number. 5 RUHR-UNIVERSITY BOCHUM Operating System Security Exercise No. 2, WS 09/10 Chair for System Security String The purpose of string is obvious. It can be constructed implicitly by enclosing the character string with quotes or double quotes or by explicitly defining it as object of the str class. Single characters of the String st can be accessed with st[5], substrings with st[5:8] and the complete string of course with st. Tuple A tuple is a set of objects (components), bound to a single name. The set is enclosed by parenthesis and its components are separated by comma. The components can be accessed by brackets, the first element is denominated by 0. Example: 1 2 3 4 5 6 7 8 9 #! / u s r / b i n / python i = 5 s t = " t h e ␣ l a z y ␣ dog ␣ jumps ␣ o v e r ␣ t h e ␣ . . . " pi = 3.1415947 tuple = ( i , st , pi ) print t u p l e [ 2 ] #p r i n t s o u t p i Once assigned, it is not longer possible to change components. This is a major restriction but gives a serious performance boost. List A list is a set of objects (components) bound to a single name. The set is enclosed by brackets and its components are separated by comma. Components can be accessed like tuple components. Example: 1 2 3 4 5 6 7 8 9 10 11 12 #! / u s r / b i n / python i = 5 s t = " t h e ␣ l a z y ␣ dog ␣ jumps ␣ o v e r ␣ t h e ␣ . . . " pi = 3.1415947 roundpi = 3 l = [ i , st , pi ] print l [ 2 ] l [ 2 ] = roundpi print l [ 2 ] Dictionary A dictionary is a set of key-value pairs. It manages relations between key and value of a set of objects (components). The set is defined in the format: key: value,... and is enclosed by brackets. Components are accessed by their designated key. Example: 6 RUHR-UNIVERSITY BOCHUM Operating System Security Exercise No. 2, WS 09/10 Chair for System Security 1 #!/ u s r / b i n / python 2 3 d i c t i o n a r y = {" p i " : 3 . 1 4 1 5 9 4 7 , " s t " : " t h e l a z y dog jumps y over the . . . " , " i ":5} 4 5 print dictionary [" pi "] Defining and Calling Functions and Methods: Functions and methods are defined by two parts: a signature and a function block. The signature consists of the def keyword, followed by the function name and a aparameter list (with optional default values) enclosed in parenthesis and a colon. It is followed by the function block which is a set of statements. A function is called by function name, and arguments separated by commas enclosed in parenthesis. Even if no arguments are given, the parenthesis are still required! Returning Values from Functions: A function returns an object. This is done by the return-statement. After the return-statement, the program flow leaves the function and continue after the calling statement. Example: 1 2 3 4 5 6 7 8 #! / u s r / b i n / python def f u n c ( ) : print " i ␣am␣a␣ f u n c t i o n ! " return "Have␣ you ␣mooed␣ today ? " x = func () print x gives 1 i am a f u n c t i o n ! 2 Have you mooed today ? Conditional Code Sometimes you need conditional codes. The most important example of conditional code is the if statement, possibly followed by an elif (shorthand for else if) and/or an else-statement. The exact syntax is best shown in an example: 1 #! / u s r / b i n / python 2 3 x = 1 4 5 i f x == 1 : 6 print " f i r s t ␣ b l o c k : " 7 RUHR-UNIVERSITY BOCHUM Operating System Security Exercise No. 2, WS 09/10 Chair for System Security 7 print "x␣=␣1 " 8 elif x > 1: 9 print " s e c o n d ␣ b l o c k : " 10 print "x␣>␣1 " 11 e l s e : 12 print " t h i r d ␣ b l o c k : " 13 print "x␣<␣1 " Loops: Python has two different types of loops: for-in-loop: The for-in loop is a loop that iterates through objects. Important iterable objects are strings, arrays, tuples and dictionaries. An example of a for in loop is: 1 #! / u s r / b i n / python 2 3 x = ( 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 0 ) # t u p l e w i t h 10 numbers 4 5 fo r i in x : # i t e r a t e s o v e r e v e r y component o f t h e t u p l e 6 print " l o o p i n g . . . " 7 print i while-loop: The Python while loop is a classical while-loop. It works as follows: 1 #! / u s r / b i n / python 2 3 x = 100 4 5 while x > 0 : 6 print x 7 print x −= 1 # e q u a l t o x = x − 1 range([start], stop, [step]), xrange([start], stop, [range]): The range/xrange functions are built-in functions, that generate iterable objects containing integer-ranges of numbers. Calling is quite obvious. Start is the start value (the first number in the range) (0 if omitted), stop (the first number not in the range) is the stop value (cannot be omitted), step defines the interval between the numbers (1 if omitted, must be an integer value). The difference between range and xrange is that range returns a list, while xrange returns an iterable “generator function”. Thus, if you do range(10000000000), your list takes quite some time to generate and uses large amounts of memory, while xrange(10000000000) just returns an iterable object and the entries are generated on demand. While this is different, both works equivalent in for-loops. Example: 8 RUHR-UNIVERSITY BOCHUM Operating System Security Exercise No. 2, WS 09/10 Chair for System Security 1 #! / u s r / b i n / python 2 3 x = range ( 0 , 4 , 2 ) 4 fo r i in x : 5 print i gives 1 0 2 2 0.2 File And String Handling 0.2.1 File handling open(filename, [mode, [buffering]]) opens the file filename and returns a file object. mode determines the filemode, which may be ’r’(read only), ’w’(write only), ’a’(append). If mode is omitted, ’r’ is assumed. For further filemodes refer to the python’s API documentation [4] of the open() function. The buffering method buffering, which is optional, can be 0, which means unbuffered, 1, which means line buffered or larger numbers to specify the size of the buffer in bytes). The file object can be iterated, returning one line per iteration. Example: 1 #! / u s r / b i n / python 2 3 F = open ( "/ e t c /motd . t a i l " ) 4 fo r i in F : 5 print i gives 1 2 The programs i n c l u d e d with t h e Ubuntu system a r e f r e e s o f t w a r e y ; 3 t h e e x a c t d i s t r i b u t i o n terms f o r each program a r e d e s c r i b e d i n y the 4 i n d i v i d u a l f i l e s i n / u s r / s h a r e / doc /∗/ c o p y r i g h t . 5 6 Ubuntu comes with ABSOLUTELY NO WARRANTY, t o t h e e x t e n t y p e r m i t t e d by 7 a p p l i c a b l e law . 8 9 To a c c e s s o f f i c i a l Ubuntu documentation , p l e a s e v i s i t : 10 h t t p : / / h e l p . ubuntu . com/ on the authors system. 9 RUHR-UNIVERSITY BOCHUM Operating System Security Exercise No. 2, WS 09/10 Chair for System Security F.close() (where F is a file object) closes the file referenced by the file object. F.read([n]) reads (at most) n bytes of the file referenced by the file object and returns it as a string. If n is negative or omited, reads until end of file (EOF) is reached. F.readline([n]) reads the next line from the file (or the rest of the file if called on the last line of it) and returns it as a string. It retains the newline. The optional parameter n limits the maximum number of bytes that are returned (an incomplete line may be returned then). Returns an empty string at EOF. F.readlines([n]) returns a list of strings, each a line from the file. Calls readline() repeatedly and returns a list of the lines read. The optional size argument n, if given, is an approximate bound on the total number of bytes in the lines returned. Example: 1 2 3 4 5 6 #! / u s r / b i n / python F = f i l e ( " / e t c /motd . t a i l " ) . r e a d l i n e s ( ) print F . __class__ #t h i s p r i n t s t h e c l a s s , t h i s i s an o b j e c t o f fo r i in F : print i gives 1 <type ’ l i s t ’> 2 3 The programs i n c l u d e d with t h e Ubuntu system a r e f r e e s o f t w a r e y ; 4 t h e e x a c t d i s t r i b u t i o n terms f o r each program a r e d e s c r i b e d i n y the 5 i n d i v i d u a l f i l e s i n / u s r / s h a r e / doc /∗/ c o p y r i g h t . 6 7 Ubuntu comes with ABSOLUTELY NO WARRANTY, t o t h e e x t e n t y p e r m i t t e d by 8 a p p l i c a b l e law . 9 10 To a c c e s s o f f i c i a l Ubuntu documentation , p l e a s e v i s i t : 11 h t t p : / / h e l p . ubuntu . com/ F.write(S) writes a string S to the file F. Note that due to buffering, flush() or close() may be needed before the file on disk reflects the data written. dir(F) (where F is a file object created by file()) gives a complete list of methods available. Please use the documentation (as explained above) to find out how to use this. 10 RUHR-UNIVERSITY BOCHUM Operating System Security Exercise No. 2, WS 09/10 Chair for System Security 0.2.2 Basic String Handling Since string handling is a requirement for nearly all programming tasks, python string objects have basic integrated methods for string handling and manipulation. Remember that each string in python is an object. Thus, access to these basic functions is quite easy: S.replace(s, t) (where S is a string object) searches the string for the first s and replaces it with t. Example: 1 2 3 4 #! / u s r / b i n / python S1 = " O p e r a t i n g ␣ system ␣ s e c u r i t y ␣ i s ␣my␣ l e a s t ␣ f a v o r i t e ␣ l e c t u r e " S2 = S1 . r e p l a c e ( " l e a s t " , "" ) print S2 gives: 1 O pe ra t i n g system s e c u r i t y i s my f a v o r i t e l e c t u r e S.split(d) splits the string into substrings by delimiter d and returns a list. The delimiter is removed. If d is not given or None, any whitespace character is used as delimiter. 1 2 3 4 5 #! / u s r / b i n / python S = " O p e r a t i n g ␣ system ␣ s e c u r i t y " l = S . s p l i t ( "␣ " ) fo r i in l : print i gives: 1 O pe ra t i n g 2 system 3 security S.find(s) returns the lowest index, where substring s is found. Example: 1 #! / u s r / b i n / python 2 S = "The␣ o p e r a t i n g ␣ system ␣ s h o u l d ␣ p r o v i d e ␣ o p e r a t i n g ␣ system ␣y security " 3 print S . f i n d ( " system " ) gives 14, the start of the first “system”. S.strip([chars]) returns a copy of the string S with leading and trailing whitespace removed. If the parameter chars is given and not None, remove characters in chars instead. 11 RUHR-UNIVERSITY BOCHUM Operating System Security Exercise No. 2, WS 09/10 Chair for System Security dir(S) (where S is a string object) gives a complete list of methods available. Please use the documentation (as explained above) to find out how to use this. 0.3 Regular Expressions One topic of the following exercises is “Logging and Auditing”. Log files usually contain large amounts of plain text. You will most often only need parts of such a file. Extraction of relevant parts from a text is normally done with the help of “regular expressions”(regex). A regular expression is a generic pattern, which describes the format of a text string. Classic unix tools, which support regular expressions are for instance grep and sed. The perl scripting language is well known for it’s great support of regular expressions and is probably the best way to go when dealing with text. To spare you learning another scripting language, we will use Python’s implementation of regular expression. Since regular expressions can get very complex very fast, this subsection tries to give a good overview of the basics. For a more complete reference, see for example [7]! 0.3.1 General Syntax of a Regular Expression Regular expressions consist of a set of literals and control strings for quantification, alternation or grouping. Literals are in general case sensitive. (Though most tools provide an option to change this behaviour.) 0.3.2 Literals Characters and digits are simple literals. This means, an a matches on an a and b matches on a b. Most special characters are not. They must be escaped by the \ (backslash) character. A \ must also be escaped by a \, thus matching a single \ would be \\. Example: c:\\Windows\\System would match on c:\Windows\System Sets of Literals can also be matched. This can be done by enclosing multiple literals in square brackets. Example: [abcd]x matches on ax or bx or cx or dx. You can also give ranges like [a-z] or [0-9]. Special Literals . matches any character ˆ matches the beginning of a line $ matches the end of a line [ˆa] matches on not a, where a is replaceable by any other character \t matches the TAB-character \n matches a newline (be careful with newlines, as this can get confusing with multilinematches or other weird options that are often implemented) 12 RUHR-UNIVERSITY BOCHUM Operating System Security Exercise No. 2, WS 09/10 Chair for System Security Examples: • ˆ$ matches an empty line. • ˆ.$ matches on a line that contains exactly one character. • ˆfoobar matches any line that starts with the word foobar. 0.3.3 Quantifier Often it is useful to match more than one occurrence of a (set of) literal(s). In this case, you must specify how often the literal may/must appear. This can be done by the following quantifiers: ? matches on zero or one occurrences * matches on zero or more occurrences + matches on one or more occurrences {m,n} matches on at least m up to n occurrences. You can omit either m or n. If you omit m, this means zero to n occurrences, if you omit n, this means at least m occurrences. Examples: • .* matches on anything including the empty string (“”) • .+ matches on anything but the empty string • ␣{,5} matches up to 5 spaces. 0.3.4 Grouping Sometimes it is useful to group a regular expression into sub regular expressions. This is done by parenthesis. In most tools, you can access these groups directly. 0.3.5 Alternation In some cases, you need alternative words that cannot be achieved by matching a set of literals (for example if you want to allow two different words). In this case, you need alternation. Alternation is done by the | character. This usually appears in combination with grouping. Example: the this is a (test|task) matches on this is a test as well as on this is a task. 13 RUHR-UNIVERSITY BOCHUM Operating System Security Exercise No. 2, WS 09/10 Chair for System Security 0.3.6 Greedy and Non Greedy Matching By default, a regular expression matches greedy. This means, if you have a string “this is a string that is some string” and you try to match with the regular expression th.*is, you will match “this is a string that is”. If this behaviour is unwanted, it can be changed by appending a question mark after the quantifier. Thus, th.*?is is the non greedy regular expression and will only match this from the string above. Note that the question mark is also a quantifier and this is easily mixed up. 0.3.7 Backreferences Sometimes you need to match a string that contains the same pattern more than once. Let’s say you want to match HTML-tags. This is no problem, if you know exactly what tag to match. You could, for example, match a DIV-section with <div.*?>.+?</div>. If you want to match DIV and SPAN tags, you might want to use <(div|span).*?>.+?</(div|span)>. The problem is, that this would not work with DIV and SPAN nested. The solution is to use a backreference. If you use the ()-pair somewhere in your pattern, the match between the brackets is saved. It can later on be recalled with \1 . . . \9. So a better regular expression for HTML-tags would look like <(div|span).*?>.+?</\1>. 0.3.8 Regular Expressions in Python If your task is analyzing large text files, you will notice, that basic string functions are no longer sufficient. Thus more powerful solutions like regular expressions are mandatory. In Python, access to regular expressions is provided by the re module. re.compile(regex) compiles the regular expression regex into an regular expression object and returns the object. Example: 1 #! / u s r / b i n / python 2 import r e 3 4 r e g e x = r e . c o m p i l e ( "^ f o o b a r . ∗ baz$ " ) R.search(S) tries to match the regular expression object R on the string S. Returns a SRE_Match object (M) if the string matches or None if not. Example: 1 2 3 4 5 6 7 8 #! / u s r / b i n / python import r e R = r e . c o m p i l e ( " [ Tt ] he ␣" ) f = f i l e ( "/ e t c /motd . t a i l " ) fo r i in f : i f R. s e a r c h ( i ) : 14 RUHR-UNIVERSITY BOCHUM Operating System Security Exercise No. 2, WS 09/10 Chair for System Security print i . s t r i p ( ) 9 gives every line containing the word "the", the first character is case insensitive: 1 The programs i n c l u d e d with t h e Ubuntu system a r e f r e e s o f t w a r e y ; 2 t h e e x a c t d i s t r i b u t i o n terms f o r each program a r e d e s c r i b e d i n y the R.findall(S) tries to match the regular expression object R on the string S (as R.search()). It returns a list of of all non-overlapping matchings and None if not. Example: 1 2 3 4 5 6 7 8 9 10 #! / u s r / b i n / python import r e R = r e . c o m p i l e ( " [ Tt ] he ␣" ) f = f i l e ( "/ e t c /motd . t a i l " ) fo r i in f : x = R. f i n d a l l ( i ) : if x: print x [ 0 ] gives every instance of either the or The: 1 2 3 4 The the the the M.groups() (where M is a SRE_Match object) is useful if you used grouping in your regular expression. It returns the stings matched by the particular group of the regular expression in a list where the element with index zero contains the string that matched on your first group, the one with index one the second group and so on. Example: Assume you have a file called /home/labor/contact.txt which, beside a lot of other text, contains lines which a string email contact always via [email protected] and another line with jabber contact always via [email protected]. Then 1 #! / u s r / b i n / python 2 import r e 3 4 R = r e . c o m p i l e ( " ( . + ) ␣ c o n t a c t ␣ always ␣ v i a ␣ ( [ a−zA−Z0−9+−]+@( [ a−zAy −Z0−9+−]+\.)+[a−zA−Z ] { 2 , 5 } ) " ) 5 f i l e = ( " /home/ l a b o r / c o n t a c t . t x t " ) 15 RUHR-UNIVERSITY BOCHUM Operating System Security Exercise No. 2, WS 09/10 Chair for System Security 6 fo r i in f i l e : 7 M = R. s e a r c h ( i ) 8 i f M: 9 l i s t = m. g r o u p s ( ) 10 print " c o n t a c t ␣method : ␣\%s , ␣ a d d r e s s : ␣\%s " \% ( l i s t [ 0 ] , y list [1]) gives you: 1 c o n t a c t method : email , a d d r e s s : t r u s t+lab@rub . de 2 c o n t a c t method : j a b b e r , a d d r e s s : n o n e x i s t e n t @ j a b b e r . t r u s t . rub . y de More Information on the re module in general or the SRE_Match object in particular, can be found in [8] 0.4 Python Modules This part introduces some additional functions that should enable you to complete your assignments. Additional informations of these functions are given in the appendix. A complete documentation of the standard library of modules is the Python library reference [4]. os: The os-module provides the interface to the operating-system. It provides wrappers to syscalls (like fork) and other advanced operating system functions (like directory walking). os.system - system(param-string): The system call starts the default shell and lets it execute the parameter-string. An example should make this clear: 1 2 3 4 5 6 7 #! / u s r / b i n / python import o s o s . system ( " echo ␣ t h i s ␣ i s ␣ a␣ t e s t ␣>␣ f i l e " ) o s . system ( " c a t ␣ f i l e " ) o s . system ( "rm␣ f i l e " ) It does the following: First creates a “file” named “file” and echos “this is a test” in it, then prints out the contents of the “file” using the unix cat command and finally removes the “file” with the unix rm command. 0.5 How to Write and Execute Python Scripts Python scripts can either be written in a text editor or directly tested in the python shell. 16 RUHR-UNIVERSITY BOCHUM Operating System Security Exercise No. 2, WS 09/10 Chair for System Security Write scripts with a text editor: Open your favorite text editor (for example kate [5]) and write the script. Make sure not to forget the preamble 0.1.2. After saving the script, you must mark it executable by executing chmod +x <filename>. Now you can execute it with ./<filename> in the directory you saved the script. Test in the Python-shell: Type python on the shell. It should give you something like: 1 2 3 4 5 Python 2 . 5 . 2 ( r 2 5 2 : 6 0 9 1 1 , Oct 7 2 0 0 8 , 1 2 : 4 5 : 4 9 ) [GCC 4 . 3 . 1 ] on l i n u x 2 Type " h e l p " , " c o p y r i g h t " , " c r e d i t s " o r " l i c e n s e " f o r more information . >>> You can type your statements into the console. Single statements are evaluated after typing enter, blocks are evaluated after finishing the statement. You can exit the shell with the exit() statement. Example: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 immo@wok ~ $ python Python 2 . 5 . 2 ( r 2 5 2 : 6 0 9 1 1 , Oct 7 2 0 0 8 , 1 2 : 4 5 : 4 9 ) [GCC 4 . 3 . 1 ] on l i n u x 2 Type " h e l p " , " c o p y r i g h t " , " c r e d i t s " o r " l i c e n s e " f o r more information . >>> f o r i i n r a n g e ( 1 , 5 ) : #range−f u n c t i o n r e t u r n s l i s t y [1 ,2 ,3 ,4] ... print i ... 1 2 3 4 >>> e x i t ( ) immo@wok ~ $ 17 RUHR-UNIVERSITY BOCHUM Operating System Security Exercise No. 2, WS 09/10 Chair for System Security 1 Homework (5 Points) You shoul answer all of these questions before starting the practical exercises. Use ␣ to disambiguate the indention! 1. Whats the difference between the Python datatypes “tuple” and “list”? 2. Whats the difference between the Python datatypes “list” and “dictionary”? 3. Implement (on paper) a small Python-script, that prints out your name! 4. Implement (on paper) a small Python-script, that counts from 0 to 10. Use a for-loop, conditional code and the xrange-function! The output should be the following: 1 2 3 4 5 6 7 8 9 10 11 0 is 1 2 3 is 4 5 6 is 7 8 9 is 10 0 mod 3 0 mod 3 0 mod 3 0 mod 3 5. Let r ∈R Z be an integer variable, containing a random number. Give two possible print-statements to print “our random number is <<insert content of the random number here>>” 6. give a regular expression that matches on a) a date b) valid e-mail-address 2 Practical Exercises (5 points) To get used to python, the exercise is to write a programm that can be used as telephonebook. Provided is a small telephone-book in the following format: Name1 ; Address1 ; Number1 ; E-Mail1 ; Birthday1 Name2 ; Address2 ; Number2 ; E-Mail2 ; Birthday2 ... 1. Write a python script, that reads the file into a list of lists, each item should be one field of the file. The datastructure should look like the following: 18 RUHR-UNIVERSITY BOCHUM Operating System Security Exercise No. 2, WS 09/10 Chair for System Security [[Name1, Address1, Number1, E-Mail1, Birthday1], [Name2, Address2, Number2, E-Mail2, Birthday2], ...] print the list(s) to verify your solution! Hint: split (seperatorstring, string) splits the string string at the seperatorstring. 2. Convert the lists to two dictionarys (like this:) {Name1: {name: Name1, address: Address1, email: E-Mail1, birthday: Birthday1}, Name2: {name: Name2, address: Address2, email: E-Mail2, birthday: Birthday2}, ...} print the dictionary(s) to verify your solution! 3. verify each Birthday and E-Mail address using regular expressions, warn if you find an invalid format! 3 Bonus Exercises (+3 Points) 1. Suppose you have a HTML file and want to extract all links from it. (A use case would be a http downloader, that get’s webpages recursively.) This might at first sight seem to be quite an easy task. A little regex and everything’s fine. The problem is, that HTML code out in the wild is most often badly written. Take a look at the following code: 1 2 3 4 5 ... <a href=" h t t p : / /www. t e s t . de / " a l t =’ l i n k t o www. t e s t . dey ’ c l a s s =’ link ’ >www. t e s t . de</a> <a href =’ h t t p : / /www. g o o g l e . com ’ c l a s s =’ link ’ a l t=" l i n k y ␣ t o ␣ g o o g l e ">g o o g l e</a> <a href=h t t p : / /www. rub . de>Ruhr−U n i v e r s i t a e t Bochum</a> ... We have different kinds of quotation marks, in one <a> tag they’re even left out. The attributes of the tag are in no specific order. (It might even be possible, that we have some strange situations, where we have another tag inside the <a> tag.) Write a little python script, that extracts the value of the href attribute and the text between the start and end-tag of the hyperlink(description)! The script should print out a list which gives description : link for each link. 19 RUHR-UNIVERSITY BOCHUM Operating System Security Exercise No. 2, WS 09/10 Chair for System Security References [1] Python (programming language) (programming_language) http://en.wikipedia.org/wiki/Python_ [2] Python Tutorial http://docs.python.org/tutorial/ [3] Python Language Reference http://docs.python.org/reference/ [4] Python Standard Library http://docs.python.org/library/ [5] The Kate Text Editor http://kate-editor.org/ [6] The wikipedia malloc-page http://en.wikipedia.org/wiki/Malloc [7] perlre - Perl Regular Expressions http://perldoc.perl.org/perlre.html [8] Python Regular Expression Howto http://www.amk.ca/python/howto/regex/ 20