Download When all else fails: do-it-yourself programming

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
When all else fails:
do-it-yourself programming
Konrad Hinsen
[email protected]
Laboratoire Léon Brillouin (CEA-CNRS)
CEA Saclay, 91191 Gif sur Yvette Cedex
France
Programming – p.1/45
Biomolecular simulations
Standard tasks:
• Run an MD simulation
• Calculate atomic fluctuations
• Visualize trajectory
⇒ standard programs
Nonstandard tasks:
• Automatize a multistep operation
• Extract data from a file
• File format conversion
• Implement new algorithms
⇒ do-it-yourself programming
Programming – p.2/45
Programming options
Shell scripts:
• quick to write
• work on any (Unix) system
• very limited possibilities
• difficult to read
• very slow execution
Suitable for: automatization of simple operations
Programming – p.3/45
Programming options
Low-level languages (C, C++, Fortran, . . .):
• fast execution
• good development tools
• require significant experience
• long development times
• can be difficult to read
Suitable for: time-critical algorithms
Programming – p.4/45
Programming options
High-level languages (Perl, Python, Ruby, . . .):
• complete programming languages
• easy to learn
• fast development
• easy to read
• interface to low-level languages
• slow execution
Suitable for: everything that is not time-critical
Programming – p.5/45
Programming options
Empirical rule: 90% of a program is not time critical
Therefore:
• Everyone should know and use a high-level
language.
• Developers of numerical methods must also know
a low-level language.
Programming – p.6/45
Python
Features:
• Clean syntax
• Object-oriented
(→ problem-oriented data structures)
• Good low-level language interface
(C, C++, Fortran)
• Free and portable implementation
• Well established as a scripting language for
computational science
Programming – p.7/45
Python
Examples:
•
Sum up numbers in the third column of a text file
sum = 0.
for line in file(’my_data’):
sum = sum + float(line.split()[2])
print sum
Programming – p.8/45
Python
•
Calculate the radius of gyration of a protein
from MMTK.Proteins import Protein
import Numeric
protein = Protein(’protein.pdb’)
center = protein.centerOfMass()
r_sq = 0.
for atom in protein.atomList():
mass = atom.mass()
distance = atom.position()-center
r_sq = r_sq + mass*distance*distance
r_sq = r_sq/protein.mass()
print Numeric.sqrt(r_sq)
Programming – p.9/45
Modules
Python code is structured by modules. Modules can
contain functions, variables, classes, and other
modules.
To use the contents of a module, they must be
imported, either individually:
from Module import function1, function2
or collectively:
from Module import *
Programming – p.10/45
Basic math
Integer
Real
0, 1, 2, 3, -1, -2, -3
0., 3.1415926, -2.05e30, 1e-4
(must contain dot or exponent)
Imaginary/Complex 1j, -2.5j, 3+4j
(the last one is a sum)
Addition
Subtraction
Multiplication
Division
Power
3+4, 42.+3, 1+0j
2-5, 3.-1, 3j-7.5
4*3, 2*3.14, 1j*3j
1/3, 1./3., 5/3j
1.5**3, 2j**2, 2**-0.5
Programming – p.11/45
Mathematical functions
The standard mathematical functions (sqrt, log,
log10, exp, sin, cos, tan, arcsin, arccos,
arctan, sin, cosh), as well as the constants pi
and e, are not part of the basic language, but
contained in a module called Numeric:
from Numeric import sin, pi
print sin(pi/3.)
Programming – p.12/45
Text strings
Two forms: ’abc’ or "abc"
Line breaks are indicated by \n: "abc\ndef"
Concatenation: "abc"+’def’ gives "abcdef"
Repetition: 3*"ab" gives /"ababab"/
Triple quotes permit multi-line strings:
comment = ’’’This string
is split over
three lines.
’’’
Programming – p.13/45
Lists
Lists are very important in Python. They can store any
kind of element:
some_prime_numbers = [2, 3, 5, 7, 11, 13]
names = [’Smith’, ’Jones’]
a_mixed_list = [3, [2, ’x’], ’’]
Concatenation [0,1]+[’a’,’b’]
→ [0,1,’a’,’b’]
Append
names.append(’Python’)
Sort
names.sort()
Reverse
names.reverse()
Programming – p.14/45
Vectors
from Scientific.Geometry import Vector
x = Vector(1,0,0)
print x[0], x[1], x[2]
Addtition
Subtraction
Mult. by scalar
Div. by scalar
Dot product
Cross product
Length
Normalization
Angle
Vector(1,0,0)+Vector(0,-1,3)
Vector(,1,0)-Vector(1.5,4,0)
3.5*Vector(1,1,0)
Vector(1,1,0)/2
Vector(1,2.5,0)*Vector(0,-1,3.1)
Vector(1,2.5,0).cross(Vector(0,-1,3.1))
Vector(2.5, 3.4, 1.).length()
Vector(1.,4.,2.).normal()
Vector(1,2.5,0).angle(Vector(0,-1,3.1))
Programming – p.15/45
Objects
An object is a collection of data. The type of an
object determines what operations are available for it.
Type
Integer
List
Vector
Operations
+-*/
+ append sort . . .
+ - * / cross . . .
Object-oriented programming: Programming by
creating problem-specific object types
Programming – p.16/45
Variables
Fortran, C:
A variable stands for a memory area holding data.
Python:
A variable refers to a memory area holding data.
Two variables can refer to the same memory area.
a = [1, 2, 3]
b = a
a.append(4)
print b
prints [1, 2, 3, 4]
Programming – p.17/45
Indexing
Sequence objects (lists, strings, vectors, . . .) permit
indexing:
text = ’abc’
print text[1]
print text[-1]
and slicing:
print text[0:2]
print text[1:-1]
Note: The first element always has index 0.
Programming – p.18/45
Indexing
Mutable sequence objects (e.g. lists) permit indexed
assignment:
names = [’Smith’, ’Jones’]
names[1] = ’Python’
some_prime_numbers = [2, 3, 5, 7, 11, 13]
some_prime_numbers[3:] = [17, 19, 23, 29]
Programming – p.19/45
Loops
for item in sequence:
print item
Counting loop (“do loop”):
for i in range(5):
print i
Note: range(5) yields [0, 1, 2, 3, 4]
Programming – p.20/45
File philosophy
Fortran: A file is a sequence of records, which can
be “formatted” (text) or “unformatted” (binary).
C/Python: A file is a sequence of bytes. Text files use
control characters for formatting (line breaks etc.)
Programming – p.21/45
Reading files
Read whole file into a string object:
contents = file(’filename’).read()
Read whole file into a list of lines:
contents = file(’filename’).readlines()
Read line by line:
for line in file(’filename’):
print line
Programming – p.22/45
Writing files
output_file = file(’filename’, ’w’)
output_file.write(’line1\n’)
output_file.write(’line2\n’)
output_file.close()
Replace ’w’ by ’a’ to append to an existing file.
Programming – p.23/45
Example
This program counts the lines and words in a file.
lines = 0
words = 0
for line in file(’some_file’):
lines = lines + 1
words = words + len(line.split())
print lines, " lines, ", words, " words."
Programming – p.24/45
Conditions
equal
not equal
greater than
less than
greater than or equal
less than or equal
a
a
a
a
a
a
== b
!= b
> b
< b
>= b
<= b
The result of a comparison is 0 (false) or 1 (true).
Logical operators: not, and, or
Programming – p.25/45
Conditions
if a > 0:
print ’greater’
elif a == 0:
print ’equal’
else:
print ’smaller’
while n > 0:
n = n - 1
Programming – p.26/45
Text processing
’ab cd e’.split()
’1, 2, 3’.split(’,’)
’,’.join([’1’,’2’])
’ a b c ’.strip()
’text’.find(’ex’)
’Abc’.upper()
’Abc’.lower()
[’ab’,’cd’,’e’]
[’1’, ’ 2’, ’ 3’]
’1,2’
’a b c’
1
’ABC’
’abc’
Convert to numbers: int(’2’), float(’2.1’)
Any data as text: str(3), str([1, 2, 3])
Programming – p.27/45
Dictionaries
A dictionary stores a mapping from (almost) arbitrary
keys to arbitrary values.
dictionary = {’a’: 1, ’b’: 2, ’c’: 3}
dictionary[’z’] = 26
print dictionary[’c’]
print dictionary.get(’h’, ’not found’)
print dictionary.keys()
print dictionary.values()
print dictionary.items()
Programming – p.28/45
Example
This program reads a file that describes an atomic
system. Each line has a chemical element name,
followed by three coordinates. The program
calculates the center of mass.
from Scientific.Geometry import Vector
masses = {’h’: 1, ’c’: 12, ’o’: 16}
tot_mass = 0.
sum = Vector(0., 0., 0.)
for line in file(’atoms’):
element, x, y, z = line.split()
position = Vector(float(x), float(y), float(z))
mass = masses[element.lower()]
tot_mass = tot_mass + mass
sum = sum + mass*position
print ’Center of mass: ’, sum/tot_mass
Programming – p.29/45
Functions
Like C, but unlike Fortran, Python makes no
difference between functions and subroutines.
Functions can have zero, one, or more return values.
from Scientific.Geometry import Vector
def distance(r1, r2):
return (r1-r2).length()
print distance(Vector(1., 0., 0.),
Vector(3., -2., 1.))
Programming – p.30/45
Functions
Multiple return values:
import Numeric
def cartesianToPolar(x, y):
r = Numeric.sqrt(x**2+y**2)
phi = Numeric.arccos(x/r)
if y < 0.:
phi = 2.*Numeric.pi-phi
return r, phi
radius, angle = cartesianToPolar(1., 1.)
Programming – p.31/45
Error handling
def inverse(x):
return 1./x
print inverse(0)
Output:
Traceback (innermost last):
File "demo.py", line 3, in ?
print inverse(0)
File "demo.py", line 2, in inverse
return 1./x
ZeroDivisionError: float division
Programming – p.32/45
Error handling
Intercepting an error:
try:
print inverse(0)
except ZeroDivisionError:
print ’Division by Zero’
Programming – p.33/45
Object-oriented programming
Simple idea: Define data types that fit your problem.
Advantages:
• Data types are easier to isolate than parts of
algorithms.
• Data types are more stable than algorithms as
code evolves.
• Data types encapsulate implementation details.
Programming – p.34/45
Example: complex numbers
class Complex:
def __init__(self, real, imag):
self.real = real
self.imag = imag
def __add__(self, other):
return Complex(self.real+other.real,
self.imag+other.imag)
def conjugate(self):
return Complex(self.real, -self.imag)
def __str__(self):
return ’Complex(%f, %f)’ % (self.real, self.imag)
Programming – p.35/45
Example: complex numbers
a = Complex(1., 2.)
b = Complex(-2., 3.)
print a.conjugate() + b
Programming – p.36/45
Object-oriented design
Central question: What are the “right” object types
in a non-trivial code?
•
•
•
Requires practice.
Trial and error.
Don’t hesitate to rewrite code for better design, it
always pays off in the long run.
Programming – p.37/45
Object-oriented design
Candidates for object types:
1. Mathematical objects
• numbers
• matrices
• vectors
2. Physical objects
• particles (atoms, electrons, ...)
• wave functions
• pseudo-potentials
3. Abstract objects
• algorithms (e.g. integrators)
• object factories
Programming – p.38/45
Polymorphism
Idea: Different object types can implement similar
operations using the same names. Algorithms can
then be implemented once for any of these types.
Programming – p.39/45
Example: atoms
class Atom:
def __init__(self, element, position):
self.element = element
self.position = position
def translateBy(self, vector):
self.position += vector
def centerOfMass(self):
return self.position
def mass(self):
return masses[self.element.lower()]
Programming – p.40/45
Example: molecules
class Molecule:
def __init__(self, atom_list):
self.atoms = atom_list
def translateBy(self, vector):
for atom in self.atoms:
atom.translateBy(vector)
def centerOfMass(self):
sum1 = 0.
sum2 = Vector(0., 0., 0.)
for atom in self.atoms:
sum1 += atom.mass()
sum2 += atom.mass()*atom.position
return sum2/sum1
Programming – p.41/45
Inheritance
Idea: Often one object type is a specialization of another one. It can inherit the general operations and
add its own ones.
Programming – p.42/45
Example: peptide chains
class PeptideChain(Molecule):
def __init__(self, residue_list):
self.residues = residue_list
atoms = []
for r in residue_list:
atoms = atoms + r.atoms
Molecule.__init__(self, atoms)
def __getitem__(self, index):
return self.residues[index]
Programming – p.43/45
Profiting from existing code
Python standard library:
• string and text processing
• object serialization: write any object to a file
• thread support
• database interfaces
• profiling and debugging
• most internet protocols
• complete Web server
• access to common data and file formats,
including HTML and XML
Programming – p.44/45
Profiting from existing code
Other free libraries and packages:
• Numerical Python, Scientific Python
• MMTK (molecular simulations)
• plotting
• signal processing
• interfaces to scientific data formats
• distributed objects (CORBA etc.)
• Zope (Web server management system)
Programming – p.45/45