Download Exercise 2: A Brief Introduction Into the Python Programming

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
RUHR-UNIVERSITY BOCHUM
Operating System Security
Exercise No. 2, WS 09/10
Chair for
System Security
Exercise 2: A Brief Introduction Into the
Python Programming Language
Duration: 45min
Maximum Points: 10
Introduction
As the practical assignments requires coding work in Python, this exercise aims to give
a very short introducing into the necessary parts of Python. For a complete reference,
see the Python language reference [3].
Wikipedia [1] describes Pythons design philosophy as: It “emphasizes programmer
productivity and code readability. Python’s core syntax and semantics are minimalistic,
while the standard library is large and comprehensive”. As the syntax is minimalistic,
it should be easy enough to learn, even if you don’t have any experiences before. For
a more complete introduction into the Python programming language, see the Python
tutorial [2].
Contents
0.1
0.2
0.3
0.4
0.5
Python Basics . . . . . . . . . . . . . . . . . . .
0.1.1 Pythons Integrated Documentation . . .
0.1.2 Syntax and Layout . . . . . . . . . . . .
0.1.3 Basic Commands and Constructs . . . .
File And String Handling . . . . . . . . . . . .
0.2.1 File handling . . . . . . . . . . . . . . .
0.2.2 Basic String Handling . . . . . . . . . .
Regular Expressions . . . . . . . . . . . . . . .
0.3.1 General Syntax of a Regular Expression
0.3.2 Literals . . . . . . . . . . . . . . . . . .
0.3.3 Quantifier . . . . . . . . . . . . . . . . .
0.3.4 Grouping . . . . . . . . . . . . . . . . .
0.3.5 Alternation . . . . . . . . . . . . . . . .
0.3.6 Greedy and Non Greedy Matching . . .
0.3.7 Backreferences . . . . . . . . . . . . . .
0.3.8 Regular Expressions in Python . . . . .
Python Modules . . . . . . . . . . . . . . . . .
How to Write and Execute Python Scripts . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
2
2
4
9
9
11
12
12
12
13
13
13
14
14
14
16
16
1 Homework (5 Points)
18
2 Practical Exercises (5 points)
18
1
RUHR-UNIVERSITY BOCHUM
Operating System Security
Exercise No. 2, WS 09/10
Chair for
System Security
3 Bonus Exercises (+3 Points)
19
0.1 Python Basics
0.1.1 Pythons Integrated Documentation
In Python, every build in and module function is (more or less) documented. One can
add a documentation string to your function by adding a string enclosed by three double
quotes directly below your def line, and you access the string via function.__doc__ or,
formated, with the help(function) function.
Example:
1 #! / u s r / b i n / python
2
3 import o s
4
5 help ( os . f o r k )
gives:
1 f o r k ( ) −> p i d
2
3 Fork a c h i l d p r o c e s s .
4 Return 0 t o c h i l d p r o c e s s and PID o f c h i l d t o p a r e n t p r o c e s s .
Furthermore, you can browse available methods of an object (or a module) with the
dir() function. This is, at most, useful in the python shell. For example all available
methods of a string object can be listed by >>> dir(“string”) in the python shell.
0.1.2 Syntax and Layout
Preamble Every Python script should let the system know which interpreter should be
used to execute the script. While on Windows (and DOS), this is traditionally done by
the file extension, on UNIX systems it is done by the first row of the file. If it contains a
“she-bang” (the combination #!) followed by a path to the interpreter and is executable,
the operating system will try to start the interpreter with the script as a parameter.
A typical python preamble would be:
1 #! / u s r / b i n / python
Statements: Python code usually has one statement per line. A statement may be
terminated by a semicolon, but a simple newline is sufficient and the semicolon is usually
omitted.
2
RUHR-UNIVERSITY BOCHUM
Operating System Security
Exercise No. 2, WS 09/10
Chair for
System Security
Comments: In Python, everything in a line following a # is treated as a comment.
Thus, the preamble in the last paragraph is interpreted as a comment by the interpreter,
too. As described before, it is not ignored by the system.
Example:
1 #! / u s r / b i n / python
2
3 #t h i s i s a comment
4
5 print "huhu" # t h i s i s a comment too , b u t not t h e p r i n t !
Blocks: Unlike most other languages, python does not use curly brackets to define block
starts and block ends. Blocks are entirely defined by indention1 , thus indention is very
important in python. Consider the following example:
1
2
3
4
5
6
7
8
9
#! / u s r / b i n / python
parameter = " oink , ␣ o i n k "
def f u n c ( parameter1 , parameter2 = " t e s t " ) :
print parameter2
print parameter1
f u n c ( " o p e r a t i n g ␣ system ␣ s e c u r i t y ␣ r o c k s ! " )
It produces:
1 test
2 o p e r a t i n g system s e c u r i t y r o c k s !
While this:
1
2
3
4
5
6
7
#! / u s r / b i n / python
parameter = " oink , ␣ o i n k "
def f u n c ( parameter1 , parameter2 = " t e s t " ) :
print parameter2
print parameter1 #n o t e t h a t t h i s s t a t e m e n t i s not l o n g e r y
indented .
8
9 f u n c ( " o p e r a t i n g ␣ system ␣ s e c u r i t y ␣ r o c k s ! " )
1
usual coding-convention defines 4 spaces for indention
3
RUHR-UNIVERSITY BOCHUM
Operating System Security
Exercise No. 2, WS 09/10
Chair for
System Security
gives:
1 oink , o i n k
2 test
0.1.3 Basic Commands and Constructs
print: The built-in command print can be used to print something to the standard
output (stdout) followed by a newline. It is used in the previous section, thus the script
in the previous section would print out “huhu” onto the standard-console.
formatted print: Python’s print offers functionality similar to printf. The format
string is followed by a %-character and a tuple of parameters. An example should illustrate this:
1
2
3
4
5
6
#! / u s r / b i n / python
t e m p e r a t u r e = 10
conditions = " fine "
print " Weather ␣ today : ␣ i t ␣ i s ␣%d␣ d e g r e e s ␣ c e l s i u s , ␣ t h e ␣ c o n d i t i o n s y
␣ a r e ␣%s " % ( temperature , c o n d i t i o n s )
import: The import statement is used to include other “modules”, comparable to libraries. Functions of the module can be accessed by using the <modulename>.-prefix
after importing the module. Important library functions for the assignments are described in section 0.4. Here is an example for import:
1
2
3
4
5
6
#! / u s r / b i n / python
import s y s #t h i s i m p o r t s t h e ‘ ‘ s y s ’ ’ − module
s y s . s t d o u t . w r i t e ( "huhu\n" )
#d o e s t h e same as t h e p r i n t s t a t e m e n t a b o v e
Variables and Data Types: Though Python is a strong typed language (you cannot
use operations on incompatible types), it uses dynamic typing. Variables must not be
declared, but can directly be assigned and take the type of the assignment. Assignments
are done by the = operator. Example:
1 #! / u s r / b i n / python
2
3 var = " have ␣ you ␣mooed␣ today ? " #t h i s i s a s t r i n g o b j e c t
4
RUHR-UNIVERSITY BOCHUM
Operating System Security
Exercise No. 2, WS 09/10
Chair for
System Security
4 var2 = 42 #i n t e g e r o b j e c t
5 var3 = E x c e p t i o n ( ) #new o b j e c t o f t y p e " E x c e p t i o n "
6 var2 = var #v ar 2 i s now a s t r i n g o b j e c t , t o o
Operators on Variables: In Python, every data type is a class, thus every variable (and
even static values) represents an object. The operators are defined by special functions
of that object, for example: var = 1 + 2 is the same as var = 1.__add__(2). As you
can see, the meaning of the operator can be different depending on the object type you
use it with. For standard number objects, this works as expected, for string-types (and
arrays, tuples, etc) the + operator does concatenation. You can redefine the meaning of
the operator (“operator overloading”) by creating a new class for example with another
__add__ function. This could for example be useful if you want to implement matrix
classes and the + operator should be the matrix addition, the * operator should be the
matrix multiplication and the / operator should be the multiplication with the inverse
matrix.
It is obvious that the types of the classes on which some operation is applied must
be compatible. It is not obvious what should happen if you add a string to an integer.
However, sometimes you want to concatenate the string representation of an integer to
a string. This isn’t possible directly (as the object types aren’t compatible). The nonstring object must be converted to its string representation first. This can be done with
the constructor of a string object str() which implicitly calls the __str__ method of
the object. Consider this example:
1
2
3
4
5
6
7
8
9
10
11
12
13
#! / u s r / b i n / python
var1 = " have ␣ you ␣mooed␣ today ?\ nyes , ␣ "
var2 = " t i m e s "
var3 = 5
print var1 + s t r ( var3 ) + var2
# i s equivalent to
# p r i n t va r1 + va r3 . __str__ ( ) + va r2
print var1 + 7 . __str__ ( ) + var2
# i s equivalent to
# p r i n t va r1 + s t r ( 7 ) + v ar 2
Standard Classes:
Integer An integer number.
Float A floating-point number.
5
RUHR-UNIVERSITY BOCHUM
Operating System Security
Exercise No. 2, WS 09/10
Chair for
System Security
String The purpose of string is obvious. It can be constructed implicitly by enclosing
the character string with quotes or double quotes or by explicitly defining it as
object of the str class. Single characters of the String st can be accessed with
st[5], substrings with st[5:8] and the complete string of course with st.
Tuple A tuple is a set of objects (components), bound to a single name. The set is
enclosed by parenthesis and its components are separated by comma. The components can be accessed by brackets, the first element is denominated by 0.
Example:
1
2
3
4
5
6
7
8
9
#! / u s r / b i n / python
i = 5
s t = " t h e ␣ l a z y ␣ dog ␣ jumps ␣ o v e r ␣ t h e ␣ . . . "
pi = 3.1415947
tuple = ( i , st , pi )
print t u p l e [ 2 ] #p r i n t s o u t p i
Once assigned, it is not longer possible to change components. This is a major
restriction but gives a serious performance boost.
List A list is a set of objects (components) bound to a single name. The set is enclosed
by brackets and its components are separated by comma. Components can be
accessed like tuple components.
Example:
1
2
3
4
5
6
7
8
9
10
11
12
#! / u s r / b i n / python
i = 5
s t = " t h e ␣ l a z y ␣ dog ␣ jumps ␣ o v e r ␣ t h e ␣ . . . "
pi = 3.1415947
roundpi = 3
l = [ i , st , pi ]
print l [ 2 ]
l [ 2 ] = roundpi
print l [ 2 ]
Dictionary A dictionary is a set of key-value pairs. It manages relations between key
and value of a set of objects (components). The set is defined in the format:
key: value,... and is enclosed by brackets. Components are accessed by their
designated key.
Example:
6
RUHR-UNIVERSITY BOCHUM
Operating System Security
Exercise No. 2, WS 09/10
Chair for
System Security
1 #!/ u s r / b i n / python
2
3 d i c t i o n a r y = {" p i " : 3 . 1 4 1 5 9 4 7 , " s t " : " t h e l a z y dog jumps y
over the . . . " , " i ":5}
4
5 print dictionary [" pi "]
Defining and Calling Functions and Methods: Functions and methods are defined
by two parts: a signature and a function block. The signature consists of the def keyword, followed by the function name and a aparameter list (with optional default values)
enclosed in parenthesis and a colon.
It is followed by the function block which is a set of statements. A function is called
by function name, and arguments separated by commas enclosed in parenthesis. Even if
no arguments are given, the parenthesis are still required!
Returning Values from Functions: A function returns an object. This is done by the
return-statement. After the return-statement, the program flow leaves the function and
continue after the calling statement. Example:
1
2
3
4
5
6
7
8
#! / u s r / b i n / python
def f u n c ( ) :
print " i ␣am␣a␣ f u n c t i o n ! "
return "Have␣ you ␣mooed␣ today ? "
x = func ()
print x
gives
1 i am a f u n c t i o n !
2 Have you mooed today ?
Conditional Code Sometimes you need conditional codes. The most important example
of conditional code is the if statement, possibly followed by an elif (shorthand for else
if) and/or an else-statement. The exact syntax is best shown in an example:
1 #! / u s r / b i n / python
2
3 x = 1
4
5 i f x == 1 :
6
print " f i r s t ␣ b l o c k : "
7
RUHR-UNIVERSITY BOCHUM
Operating System Security
Exercise No. 2, WS 09/10
Chair for
System Security
7
print "x␣=␣1 "
8 elif x > 1:
9
print " s e c o n d ␣ b l o c k : "
10
print "x␣>␣1 "
11 e l s e :
12
print " t h i r d ␣ b l o c k : "
13
print "x␣<␣1 "
Loops: Python has two different types of loops:
for-in-loop: The for-in loop is a loop that iterates through objects. Important iterable
objects are strings, arrays, tuples and dictionaries. An example of a for in loop is:
1 #! / u s r / b i n / python
2
3 x = ( 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 0 ) # t u p l e w i t h 10 numbers
4
5 fo r i in x : # i t e r a t e s o v e r e v e r y component o f t h e t u p l e
6
print " l o o p i n g . . . "
7
print i
while-loop: The Python while loop is a classical while-loop. It works as follows:
1 #! / u s r / b i n / python
2
3 x = 100
4
5 while x > 0 :
6
print x
7
print x −= 1 # e q u a l t o x = x − 1
range([start], stop, [step]), xrange([start], stop, [range]): The range/xrange functions are built-in functions, that generate iterable objects containing integer-ranges of
numbers. Calling is quite obvious. Start is the start value (the first number in the range)
(0 if omitted), stop (the first number not in the range) is the stop value (cannot be
omitted), step defines the interval between the numbers (1 if omitted, must be an integer
value). The difference between range and xrange is that range returns a list, while
xrange returns an iterable “generator function”. Thus, if you do range(10000000000),
your list takes quite some time to generate and uses large amounts of memory, while
xrange(10000000000) just returns an iterable object and the entries are generated on
demand. While this is different, both works equivalent in for-loops.
Example:
8
RUHR-UNIVERSITY BOCHUM
Operating System Security
Exercise No. 2, WS 09/10
Chair for
System Security
1 #! / u s r / b i n / python
2
3 x = range ( 0 , 4 , 2 )
4 fo r i in x :
5
print i
gives
1 0
2 2
0.2 File And String Handling
0.2.1 File handling
open(filename, [mode, [buffering]]) opens the file filename and returns a file object.
mode determines the filemode, which may be ’r’(read only), ’w’(write only), ’a’(append).
If mode is omitted, ’r’ is assumed. For further filemodes refer to the python’s API
documentation [4] of the open() function. The buffering method buffering, which is
optional, can be 0, which means unbuffered, 1, which means line buffered or larger
numbers to specify the size of the buffer in bytes).
The file object can be iterated, returning one line per iteration. Example:
1 #! / u s r / b i n / python
2
3 F = open ( "/ e t c /motd . t a i l " )
4 fo r i in F :
5
print i
gives
1
2 The programs i n c l u d e d with t h e Ubuntu system a r e f r e e s o f t w a r e y
;
3 t h e e x a c t d i s t r i b u t i o n terms f o r each program a r e d e s c r i b e d i n y
the
4 i n d i v i d u a l f i l e s i n / u s r / s h a r e / doc /∗/ c o p y r i g h t .
5
6 Ubuntu comes with ABSOLUTELY NO WARRANTY, t o t h e e x t e n t y
p e r m i t t e d by
7 a p p l i c a b l e law .
8
9 To a c c e s s o f f i c i a l Ubuntu documentation , p l e a s e v i s i t :
10 h t t p : / / h e l p . ubuntu . com/
on the authors system.
9
RUHR-UNIVERSITY BOCHUM
Operating System Security
Exercise No. 2, WS 09/10
Chair for
System Security
F.close() (where F is a file object) closes the file referenced by the file object.
F.read([n]) reads (at most) n bytes of the file referenced by the file object and returns
it as a string. If n is negative or omited, reads until end of file (EOF) is reached.
F.readline([n]) reads the next line from the file (or the rest of the file if called on the
last line of it) and returns it as a string. It retains the newline. The optional parameter
n limits the maximum number of bytes that are returned (an incomplete line may be
returned then). Returns an empty string at EOF.
F.readlines([n]) returns a list of strings, each a line from the file. Calls readline()
repeatedly and returns a list of the lines read. The optional size argument n, if given, is
an approximate bound on the total number of bytes in the lines returned. Example:
1
2
3
4
5
6
#! / u s r / b i n / python
F = f i l e ( " / e t c /motd . t a i l " ) . r e a d l i n e s ( )
print F . __class__ #t h i s p r i n t s t h e c l a s s , t h i s i s an o b j e c t o f
fo r i in F :
print i
gives
1 <type ’ l i s t ’>
2
3 The programs i n c l u d e d with t h e Ubuntu system a r e f r e e s o f t w a r e y
;
4 t h e e x a c t d i s t r i b u t i o n terms f o r each program a r e d e s c r i b e d i n y
the
5 i n d i v i d u a l f i l e s i n / u s r / s h a r e / doc /∗/ c o p y r i g h t .
6
7 Ubuntu comes with ABSOLUTELY NO WARRANTY, t o t h e e x t e n t y
p e r m i t t e d by
8 a p p l i c a b l e law .
9
10 To a c c e s s o f f i c i a l Ubuntu documentation , p l e a s e v i s i t :
11 h t t p : / / h e l p . ubuntu . com/
F.write(S) writes a string S to the file F. Note that due to buffering, flush() or close()
may be needed before the file on disk reflects the data written.
dir(F) (where F is a file object created by file()) gives a complete list of methods
available. Please use the documentation (as explained above) to find out how to use this.
10
RUHR-UNIVERSITY BOCHUM
Operating System Security
Exercise No. 2, WS 09/10
Chair for
System Security
0.2.2 Basic String Handling
Since string handling is a requirement for nearly all programming tasks, python string
objects have basic integrated methods for string handling and manipulation. Remember
that each string in python is an object. Thus, access to these basic functions is quite
easy:
S.replace(s, t) (where S is a string object) searches the string for the first s and replaces
it with t. Example:
1
2
3
4
#! / u s r / b i n / python
S1 = " O p e r a t i n g ␣ system ␣ s e c u r i t y ␣ i s ␣my␣ l e a s t ␣ f a v o r i t e ␣ l e c t u r e "
S2 = S1 . r e p l a c e ( " l e a s t " , "" )
print S2
gives:
1 O pe ra t i n g system s e c u r i t y i s my f a v o r i t e l e c t u r e
S.split(d) splits the string into substrings by delimiter d and returns a list. The delimiter is removed. If d is not given or None, any whitespace character is used as delimiter.
1
2
3
4
5
#! / u s r / b i n / python
S = " O p e r a t i n g ␣ system ␣ s e c u r i t y "
l = S . s p l i t ( "␣ " )
fo r i in l :
print i
gives:
1 O pe ra t i n g
2 system
3 security
S.find(s) returns the lowest index, where substring s is found. Example:
1 #! / u s r / b i n / python
2 S = "The␣ o p e r a t i n g ␣ system ␣ s h o u l d ␣ p r o v i d e ␣ o p e r a t i n g ␣ system ␣y
security "
3 print S . f i n d ( " system " )
gives 14, the start of the first “system”.
S.strip([chars]) returns a copy of the string S with leading and trailing whitespace
removed. If the parameter chars is given and not None, remove characters in chars
instead.
11
RUHR-UNIVERSITY BOCHUM
Operating System Security
Exercise No. 2, WS 09/10
Chair for
System Security
dir(S) (where S is a string object) gives a complete list of methods available. Please
use the documentation (as explained above) to find out how to use this.
0.3 Regular Expressions
One topic of the following exercises is “Logging and Auditing”. Log files usually contain
large amounts of plain text. You will most often only need parts of such a file. Extraction
of relevant parts from a text is normally done with the help of “regular expressions”(regex).
A regular expression is a generic pattern, which describes the format of a text string.
Classic unix tools, which support regular expressions are for instance grep and sed.
The perl scripting language is well known for it’s great support of regular expressions
and is probably the best way to go when dealing with text. To spare you learning another
scripting language, we will use Python’s implementation of regular expression.
Since regular expressions can get very complex very fast, this subsection tries to give
a good overview of the basics. For a more complete reference, see for example [7]!
0.3.1 General Syntax of a Regular Expression
Regular expressions consist of a set of literals and control strings for quantification,
alternation or grouping. Literals are in general case sensitive. (Though most tools
provide an option to change this behaviour.)
0.3.2 Literals
Characters and digits are simple literals. This means, an a matches on an a and b matches
on a b. Most special characters are not. They must be escaped by the \ (backslash)
character. A \ must also be escaped by a \, thus matching a single \ would be \\.
Example: c:\\Windows\\System would match on c:\Windows\System
Sets of Literals can also be matched. This can be done by enclosing multiple literals
in square brackets. Example: [abcd]x matches on ax or bx or cx or dx. You can also
give ranges like [a-z] or [0-9].
Special Literals
. matches any character
ˆ matches the beginning of a line
$ matches the end of a line
[ˆa] matches on not a, where a is replaceable by any other character
\t matches the TAB-character
\n matches a newline (be careful with newlines, as this can get confusing with multilinematches or other weird options that are often implemented)
12
RUHR-UNIVERSITY BOCHUM
Operating System Security
Exercise No. 2, WS 09/10
Chair for
System Security
Examples:
• ˆ$ matches an empty line.
• ˆ.$ matches on a line that contains exactly one character.
• ˆfoobar matches any line that starts with the word foobar.
0.3.3 Quantifier
Often it is useful to match more than one occurrence of a (set of) literal(s). In this
case, you must specify how often the literal may/must appear. This can be done by the
following quantifiers:
? matches on zero or one occurrences
* matches on zero or more occurrences
+ matches on one or more occurrences
{m,n} matches on at least m up to n occurrences. You can omit either m or n. If you
omit m, this means zero to n occurrences, if you omit n, this means at least m
occurrences.
Examples:
• .* matches on anything including the empty string (“”)
• .+ matches on anything but the empty string
• ␣{,5} matches up to 5 spaces.
0.3.4 Grouping
Sometimes it is useful to group a regular expression into sub regular expressions. This is
done by parenthesis. In most tools, you can access these groups directly.
0.3.5 Alternation
In some cases, you need alternative words that cannot be achieved by matching a set of
literals (for example if you want to allow two different words). In this case, you need
alternation. Alternation is done by the | character. This usually appears in combination
with grouping.
Example: the this is a (test|task) matches on this is a test as well as on this
is a task.
13
RUHR-UNIVERSITY BOCHUM
Operating System Security
Exercise No. 2, WS 09/10
Chair for
System Security
0.3.6 Greedy and Non Greedy Matching
By default, a regular expression matches greedy. This means, if you have a string “this
is a string that is some string” and you try to match with the regular expression
th.*is, you will match “this is a string that is”. If this behaviour is unwanted,
it can be changed by appending a question mark after the quantifier. Thus, th.*?is is
the non greedy regular expression and will only match this from the string above. Note
that the question mark is also a quantifier and this is easily mixed up.
0.3.7 Backreferences
Sometimes you need to match a string that contains the same pattern more than once.
Let’s say you want to match HTML-tags. This is no problem, if you know exactly what
tag to match. You could, for example, match a DIV-section with <div.*?>.+?</div>. If
you want to match DIV and SPAN tags, you might want to use <(div|span).*?>.+?</(div|span)>.
The problem is, that this would not work with DIV and SPAN nested. The solution is
to use a backreference. If you use the ()-pair somewhere in your pattern, the match
between the brackets is saved. It can later on be recalled with \1 . . . \9. So a better
regular expression for HTML-tags would look like <(div|span).*?>.+?</\1>.
0.3.8 Regular Expressions in Python
If your task is analyzing large text files, you will notice, that basic string functions are no
longer sufficient. Thus more powerful solutions like regular expressions are mandatory.
In Python, access to regular expressions is provided by the re module.
re.compile(regex) compiles the regular expression regex into an regular expression object and returns the object. Example:
1 #! / u s r / b i n / python
2 import r e
3
4 r e g e x = r e . c o m p i l e ( "^ f o o b a r . ∗ baz$ " )
R.search(S) tries to match the regular expression object R on the string S. Returns a
SRE_Match object (M) if the string matches or None if not. Example:
1
2
3
4
5
6
7
8
#! / u s r / b i n / python
import r e
R = r e . c o m p i l e ( " [ Tt ] he ␣" )
f = f i l e ( "/ e t c /motd . t a i l " )
fo r i in f :
i f R. s e a r c h ( i ) :
14
RUHR-UNIVERSITY BOCHUM
Operating System Security
Exercise No. 2, WS 09/10
Chair for
System Security
print i . s t r i p ( )
9
gives every line containing the word "the", the first character is case insensitive:
1 The programs i n c l u d e d with t h e Ubuntu system a r e f r e e s o f t w a r e y
;
2 t h e e x a c t d i s t r i b u t i o n terms f o r each program a r e d e s c r i b e d i n y
the
R.findall(S) tries to match the regular expression object R on the string S (as R.search()).
It returns a list of of all non-overlapping matchings and None if not. Example:
1
2
3
4
5
6
7
8
9
10
#! / u s r / b i n / python
import r e
R = r e . c o m p i l e ( " [ Tt ] he ␣" )
f = f i l e ( "/ e t c /motd . t a i l " )
fo r i in f :
x = R. f i n d a l l ( i ) :
if x:
print x [ 0 ]
gives every instance of either the or The:
1
2
3
4
The
the
the
the
M.groups() (where M is a SRE_Match object) is useful if you used grouping in your
regular expression. It returns the stings matched by the particular group of the regular
expression in a list where the element with index zero contains the string that matched
on your first group, the one with index one the second group and so on.
Example: Assume you have a file called /home/labor/contact.txt which, beside a lot of
other text, contains lines which a string email contact always via [email protected]
and another line with jabber contact always via [email protected].
Then
1 #! / u s r / b i n / python
2 import r e
3
4 R = r e . c o m p i l e ( " ( . + ) ␣ c o n t a c t ␣ always ␣ v i a ␣ ( [ a−zA−Z0−9+−]+@( [ a−zAy
−Z0−9+−]+\.)+[a−zA−Z ] { 2 , 5 } ) " )
5 f i l e = ( " /home/ l a b o r / c o n t a c t . t x t " )
15
RUHR-UNIVERSITY BOCHUM
Operating System Security
Exercise No. 2, WS 09/10
Chair for
System Security
6 fo r i in f i l e :
7
M = R. s e a r c h ( i )
8
i f M:
9
l i s t = m. g r o u p s ( )
10
print " c o n t a c t ␣method : ␣\%s , ␣ a d d r e s s : ␣\%s " \% ( l i s t [ 0 ] , y
list [1])
gives you:
1 c o n t a c t method : email , a d d r e s s : t r u s t+lab@rub . de
2 c o n t a c t method : j a b b e r , a d d r e s s : n o n e x i s t e n t @ j a b b e r . t r u s t . rub . y
de
More Information on the re module in general or the SRE_Match object in particular,
can be found in [8]
0.4 Python Modules
This part introduces some additional functions that should enable you to complete your
assignments. Additional informations of these functions are given in the appendix. A
complete documentation of the standard library of modules is the Python library reference [4].
os: The os-module provides the interface to the operating-system. It provides wrappers
to syscalls (like fork) and other advanced operating system functions (like directory
walking).
os.system - system(param-string): The system call starts the default shell and lets
it execute the parameter-string. An example should make this clear:
1
2
3
4
5
6
7
#! / u s r / b i n / python
import o s
o s . system ( " echo ␣ t h i s ␣ i s ␣ a␣ t e s t ␣>␣ f i l e " )
o s . system ( " c a t ␣ f i l e " )
o s . system ( "rm␣ f i l e " )
It does the following: First creates a “file” named “file” and echos “this is a test” in it,
then prints out the contents of the “file” using the unix cat command and finally removes
the “file” with the unix rm command.
0.5 How to Write and Execute Python Scripts
Python scripts can either be written in a text editor or directly tested in the python
shell.
16
RUHR-UNIVERSITY BOCHUM
Operating System Security
Exercise No. 2, WS 09/10
Chair for
System Security
Write scripts with a text editor: Open your favorite text editor (for example kate [5])
and write the script. Make sure not to forget the preamble 0.1.2. After saving the script,
you must mark it executable by executing chmod +x <filename>. Now you can execute
it with ./<filename> in the directory you saved the script.
Test in the Python-shell: Type python on the shell. It should give you something like:
1
2
3
4
5
Python 2 . 5 . 2 ( r 2 5 2 : 6 0 9 1 1 , Oct 7 2 0 0 8 , 1 2 : 4 5 : 4 9 )
[GCC 4 . 3 . 1 ] on l i n u x 2
Type " h e l p " , " c o p y r i g h t " , " c r e d i t s " o r " l i c e n s e " f o r more
information .
>>>
You can type your statements into the console. Single statements are evaluated after
typing enter, blocks are evaluated after finishing the statement. You can exit the shell
with the exit() statement. Example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
immo@wok ~ $ python
Python 2 . 5 . 2 ( r 2 5 2 : 6 0 9 1 1 , Oct 7 2 0 0 8 , 1 2 : 4 5 : 4 9 )
[GCC 4 . 3 . 1 ] on l i n u x 2
Type " h e l p " , " c o p y r i g h t " , " c r e d i t s " o r " l i c e n s e " f o r more
information .
>>> f o r i i n r a n g e ( 1 , 5 ) : #range−f u n c t i o n r e t u r n s l i s t y
[1 ,2 ,3 ,4]
...
print i
...
1
2
3
4
>>> e x i t ( )
immo@wok ~ $
17
RUHR-UNIVERSITY BOCHUM
Operating System Security
Exercise No. 2, WS 09/10
Chair for
System Security
1 Homework (5 Points)
You shoul answer all of these questions before starting the practical exercises. Use ␣ to
disambiguate the indention!
1. Whats the difference between the Python datatypes “tuple” and “list”?
2. Whats the difference between the Python datatypes “list” and “dictionary”?
3. Implement (on paper) a small Python-script, that prints out your name!
4. Implement (on paper) a small Python-script, that counts from 0 to 10. Use a
for-loop, conditional code and the xrange-function! The output should be the
following:
1
2
3
4
5
6
7
8
9
10
11
0 is
1
2
3 is
4
5
6 is
7
8
9 is
10
0 mod 3
0 mod 3
0 mod 3
0 mod 3
5. Let r ∈R Z be an integer variable, containing a random number. Give two possible
print-statements to print “our random number is <<insert content of the random
number here>>”
6. give a regular expression that matches on a) a date b) valid e-mail-address
2 Practical Exercises (5 points)
To get used to python, the exercise is to write a programm that can be used as telephonebook. Provided is a small telephone-book in the following format:
Name1 ; Address1 ; Number1 ; E-Mail1 ; Birthday1
Name2 ; Address2 ; Number2 ; E-Mail2 ; Birthday2
...
1. Write a python script, that reads the file into a list of lists, each item should be
one field of the file.
The datastructure should look like the following:
18
RUHR-UNIVERSITY BOCHUM
Operating System Security
Exercise No. 2, WS 09/10
Chair for
System Security
[[Name1, Address1, Number1, E-Mail1, Birthday1], [Name2, Address2, Number2,
E-Mail2, Birthday2], ...]
print the list(s) to verify your solution!
Hint: split (seperatorstring, string) splits the string string at the seperatorstring.
2. Convert the lists to two dictionarys (like this:)
{Name1: {name: Name1, address: Address1, email: E-Mail1, birthday:
Birthday1}, Name2: {name: Name2, address: Address2, email: E-Mail2,
birthday: Birthday2}, ...}
print the dictionary(s) to verify your solution!
3. verify each Birthday and E-Mail address using regular expressions, warn if you find
an invalid format!
3 Bonus Exercises (+3 Points)
1. Suppose you have a HTML file and want to extract all links from it. (A use case
would be a http downloader, that get’s webpages recursively.) This might at first
sight seem to be quite an easy task. A little regex and everything’s fine.
The problem is, that HTML code out in the wild is most often badly written. Take
a look at the following code:
1
2
3
4
5
...
<a href=" h t t p : / /www. t e s t . de / " a l t =’ l i n k t o www. t e s t . dey
’ c l a s s =’ link ’ >www. t e s t . de</a>
<a href =’ h t t p : / /www. g o o g l e . com ’ c l a s s =’ link ’ a l t=" l i n k y
␣ t o ␣ g o o g l e ">g o o g l e</a>
<a href=h t t p : / /www. rub . de>Ruhr−U n i v e r s i t a e t Bochum</a>
...
We have different kinds of quotation marks, in one <a> tag they’re even left out.
The attributes of the tag are in no specific order.
(It might even be possible, that we have some strange situations, where we have
another tag inside the <a> tag.)
Write a little python script, that extracts the value of the href attribute and the
text between the start and end-tag of the hyperlink(description)! The script should
print out a list which gives description : link for each link.
19
RUHR-UNIVERSITY BOCHUM
Operating System Security
Exercise No. 2, WS 09/10
Chair for
System Security
References
[1]
Python (programming language)
(programming_language)
http://en.wikipedia.org/wiki/Python_
[2] Python Tutorial http://docs.python.org/tutorial/
[3] Python Language Reference http://docs.python.org/reference/
[4] Python Standard Library http://docs.python.org/library/
[5] The Kate Text Editor http://kate-editor.org/
[6] The wikipedia malloc-page http://en.wikipedia.org/wiki/Malloc
[7] perlre - Perl Regular Expressions http://perldoc.perl.org/perlre.html
[8] Python Regular Expression Howto http://www.amk.ca/python/howto/regex/
20