Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
When all else fails: do-it-yourself programming Konrad Hinsen [email protected] Laboratoire Léon Brillouin (CEA-CNRS) CEA Saclay, 91191 Gif sur Yvette Cedex France Programming – p.1/45 Biomolecular simulations Standard tasks: • Run an MD simulation • Calculate atomic fluctuations • Visualize trajectory ⇒ standard programs Nonstandard tasks: • Automatize a multistep operation • Extract data from a file • File format conversion • Implement new algorithms ⇒ do-it-yourself programming Programming – p.2/45 Programming options Shell scripts: • quick to write • work on any (Unix) system • very limited possibilities • difficult to read • very slow execution Suitable for: automatization of simple operations Programming – p.3/45 Programming options Low-level languages (C, C++, Fortran, . . .): • fast execution • good development tools • require significant experience • long development times • can be difficult to read Suitable for: time-critical algorithms Programming – p.4/45 Programming options High-level languages (Perl, Python, Ruby, . . .): • complete programming languages • easy to learn • fast development • easy to read • interface to low-level languages • slow execution Suitable for: everything that is not time-critical Programming – p.5/45 Programming options Empirical rule: 90% of a program is not time critical Therefore: • Everyone should know and use a high-level language. • Developers of numerical methods must also know a low-level language. Programming – p.6/45 Python Features: • Clean syntax • Object-oriented (→ problem-oriented data structures) • Good low-level language interface (C, C++, Fortran) • Free and portable implementation • Well established as a scripting language for computational science Programming – p.7/45 Python Examples: • Sum up numbers in the third column of a text file sum = 0. for line in file(’my_data’): sum = sum + float(line.split()[2]) print sum Programming – p.8/45 Python • Calculate the radius of gyration of a protein from MMTK.Proteins import Protein import Numeric protein = Protein(’protein.pdb’) center = protein.centerOfMass() r_sq = 0. for atom in protein.atomList(): mass = atom.mass() distance = atom.position()-center r_sq = r_sq + mass*distance*distance r_sq = r_sq/protein.mass() print Numeric.sqrt(r_sq) Programming – p.9/45 Modules Python code is structured by modules. Modules can contain functions, variables, classes, and other modules. To use the contents of a module, they must be imported, either individually: from Module import function1, function2 or collectively: from Module import * Programming – p.10/45 Basic math Integer Real 0, 1, 2, 3, -1, -2, -3 0., 3.1415926, -2.05e30, 1e-4 (must contain dot or exponent) Imaginary/Complex 1j, -2.5j, 3+4j (the last one is a sum) Addition Subtraction Multiplication Division Power 3+4, 42.+3, 1+0j 2-5, 3.-1, 3j-7.5 4*3, 2*3.14, 1j*3j 1/3, 1./3., 5/3j 1.5**3, 2j**2, 2**-0.5 Programming – p.11/45 Mathematical functions The standard mathematical functions (sqrt, log, log10, exp, sin, cos, tan, arcsin, arccos, arctan, sin, cosh), as well as the constants pi and e, are not part of the basic language, but contained in a module called Numeric: from Numeric import sin, pi print sin(pi/3.) Programming – p.12/45 Text strings Two forms: ’abc’ or "abc" Line breaks are indicated by \n: "abc\ndef" Concatenation: "abc"+’def’ gives "abcdef" Repetition: 3*"ab" gives /"ababab"/ Triple quotes permit multi-line strings: comment = ’’’This string is split over three lines. ’’’ Programming – p.13/45 Lists Lists are very important in Python. They can store any kind of element: some_prime_numbers = [2, 3, 5, 7, 11, 13] names = [’Smith’, ’Jones’] a_mixed_list = [3, [2, ’x’], ’’] Concatenation [0,1]+[’a’,’b’] → [0,1,’a’,’b’] Append names.append(’Python’) Sort names.sort() Reverse names.reverse() Programming – p.14/45 Vectors from Scientific.Geometry import Vector x = Vector(1,0,0) print x[0], x[1], x[2] Addtition Subtraction Mult. by scalar Div. by scalar Dot product Cross product Length Normalization Angle Vector(1,0,0)+Vector(0,-1,3) Vector(,1,0)-Vector(1.5,4,0) 3.5*Vector(1,1,0) Vector(1,1,0)/2 Vector(1,2.5,0)*Vector(0,-1,3.1) Vector(1,2.5,0).cross(Vector(0,-1,3.1)) Vector(2.5, 3.4, 1.).length() Vector(1.,4.,2.).normal() Vector(1,2.5,0).angle(Vector(0,-1,3.1)) Programming – p.15/45 Objects An object is a collection of data. The type of an object determines what operations are available for it. Type Integer List Vector Operations +-*/ + append sort . . . + - * / cross . . . Object-oriented programming: Programming by creating problem-specific object types Programming – p.16/45 Variables Fortran, C: A variable stands for a memory area holding data. Python: A variable refers to a memory area holding data. Two variables can refer to the same memory area. a = [1, 2, 3] b = a a.append(4) print b prints [1, 2, 3, 4] Programming – p.17/45 Indexing Sequence objects (lists, strings, vectors, . . .) permit indexing: text = ’abc’ print text[1] print text[-1] and slicing: print text[0:2] print text[1:-1] Note: The first element always has index 0. Programming – p.18/45 Indexing Mutable sequence objects (e.g. lists) permit indexed assignment: names = [’Smith’, ’Jones’] names[1] = ’Python’ some_prime_numbers = [2, 3, 5, 7, 11, 13] some_prime_numbers[3:] = [17, 19, 23, 29] Programming – p.19/45 Loops for item in sequence: print item Counting loop (“do loop”): for i in range(5): print i Note: range(5) yields [0, 1, 2, 3, 4] Programming – p.20/45 File philosophy Fortran: A file is a sequence of records, which can be “formatted” (text) or “unformatted” (binary). C/Python: A file is a sequence of bytes. Text files use control characters for formatting (line breaks etc.) Programming – p.21/45 Reading files Read whole file into a string object: contents = file(’filename’).read() Read whole file into a list of lines: contents = file(’filename’).readlines() Read line by line: for line in file(’filename’): print line Programming – p.22/45 Writing files output_file = file(’filename’, ’w’) output_file.write(’line1\n’) output_file.write(’line2\n’) output_file.close() Replace ’w’ by ’a’ to append to an existing file. Programming – p.23/45 Example This program counts the lines and words in a file. lines = 0 words = 0 for line in file(’some_file’): lines = lines + 1 words = words + len(line.split()) print lines, " lines, ", words, " words." Programming – p.24/45 Conditions equal not equal greater than less than greater than or equal less than or equal a a a a a a == b != b > b < b >= b <= b The result of a comparison is 0 (false) or 1 (true). Logical operators: not, and, or Programming – p.25/45 Conditions if a > 0: print ’greater’ elif a == 0: print ’equal’ else: print ’smaller’ while n > 0: n = n - 1 Programming – p.26/45 Text processing ’ab cd e’.split() ’1, 2, 3’.split(’,’) ’,’.join([’1’,’2’]) ’ a b c ’.strip() ’text’.find(’ex’) ’Abc’.upper() ’Abc’.lower() [’ab’,’cd’,’e’] [’1’, ’ 2’, ’ 3’] ’1,2’ ’a b c’ 1 ’ABC’ ’abc’ Convert to numbers: int(’2’), float(’2.1’) Any data as text: str(3), str([1, 2, 3]) Programming – p.27/45 Dictionaries A dictionary stores a mapping from (almost) arbitrary keys to arbitrary values. dictionary = {’a’: 1, ’b’: 2, ’c’: 3} dictionary[’z’] = 26 print dictionary[’c’] print dictionary.get(’h’, ’not found’) print dictionary.keys() print dictionary.values() print dictionary.items() Programming – p.28/45 Example This program reads a file that describes an atomic system. Each line has a chemical element name, followed by three coordinates. The program calculates the center of mass. from Scientific.Geometry import Vector masses = {’h’: 1, ’c’: 12, ’o’: 16} tot_mass = 0. sum = Vector(0., 0., 0.) for line in file(’atoms’): element, x, y, z = line.split() position = Vector(float(x), float(y), float(z)) mass = masses[element.lower()] tot_mass = tot_mass + mass sum = sum + mass*position print ’Center of mass: ’, sum/tot_mass Programming – p.29/45 Functions Like C, but unlike Fortran, Python makes no difference between functions and subroutines. Functions can have zero, one, or more return values. from Scientific.Geometry import Vector def distance(r1, r2): return (r1-r2).length() print distance(Vector(1., 0., 0.), Vector(3., -2., 1.)) Programming – p.30/45 Functions Multiple return values: import Numeric def cartesianToPolar(x, y): r = Numeric.sqrt(x**2+y**2) phi = Numeric.arccos(x/r) if y < 0.: phi = 2.*Numeric.pi-phi return r, phi radius, angle = cartesianToPolar(1., 1.) Programming – p.31/45 Error handling def inverse(x): return 1./x print inverse(0) Output: Traceback (innermost last): File "demo.py", line 3, in ? print inverse(0) File "demo.py", line 2, in inverse return 1./x ZeroDivisionError: float division Programming – p.32/45 Error handling Intercepting an error: try: print inverse(0) except ZeroDivisionError: print ’Division by Zero’ Programming – p.33/45 Object-oriented programming Simple idea: Define data types that fit your problem. Advantages: • Data types are easier to isolate than parts of algorithms. • Data types are more stable than algorithms as code evolves. • Data types encapsulate implementation details. Programming – p.34/45 Example: complex numbers class Complex: def __init__(self, real, imag): self.real = real self.imag = imag def __add__(self, other): return Complex(self.real+other.real, self.imag+other.imag) def conjugate(self): return Complex(self.real, -self.imag) def __str__(self): return ’Complex(%f, %f)’ % (self.real, self.imag) Programming – p.35/45 Example: complex numbers a = Complex(1., 2.) b = Complex(-2., 3.) print a.conjugate() + b Programming – p.36/45 Object-oriented design Central question: What are the “right” object types in a non-trivial code? • • • Requires practice. Trial and error. Don’t hesitate to rewrite code for better design, it always pays off in the long run. Programming – p.37/45 Object-oriented design Candidates for object types: 1. Mathematical objects • numbers • matrices • vectors 2. Physical objects • particles (atoms, electrons, ...) • wave functions • pseudo-potentials 3. Abstract objects • algorithms (e.g. integrators) • object factories Programming – p.38/45 Polymorphism Idea: Different object types can implement similar operations using the same names. Algorithms can then be implemented once for any of these types. Programming – p.39/45 Example: atoms class Atom: def __init__(self, element, position): self.element = element self.position = position def translateBy(self, vector): self.position += vector def centerOfMass(self): return self.position def mass(self): return masses[self.element.lower()] Programming – p.40/45 Example: molecules class Molecule: def __init__(self, atom_list): self.atoms = atom_list def translateBy(self, vector): for atom in self.atoms: atom.translateBy(vector) def centerOfMass(self): sum1 = 0. sum2 = Vector(0., 0., 0.) for atom in self.atoms: sum1 += atom.mass() sum2 += atom.mass()*atom.position return sum2/sum1 Programming – p.41/45 Inheritance Idea: Often one object type is a specialization of another one. It can inherit the general operations and add its own ones. Programming – p.42/45 Example: peptide chains class PeptideChain(Molecule): def __init__(self, residue_list): self.residues = residue_list atoms = [] for r in residue_list: atoms = atoms + r.atoms Molecule.__init__(self, atoms) def __getitem__(self, index): return self.residues[index] Programming – p.43/45 Profiting from existing code Python standard library: • string and text processing • object serialization: write any object to a file • thread support • database interfaces • profiling and debugging • most internet protocols • complete Web server • access to common data and file formats, including HTML and XML Programming – p.44/45 Profiting from existing code Other free libraries and packages: • Numerical Python, Scientific Python • MMTK (molecular simulations) • plotting • signal processing • interfaces to scientific data formats • distributed objects (CORBA etc.) • Zope (Web server management system) Programming – p.45/45