Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Observatoire Astronomique de Strasbourg March 28, 2011 – One Day Session http://astro.unistra.fr/~morgan/python/M11.pptx Morgan Fouesneau [email protected] “Meals are not provided!” why python? and good references What, Why? In astronomy Should I throw away my C, Fortran codes? Language power Starting points when stuck in python Language Application Not to scale… “Hello World!” … A programmer can be 5-10 times more productive than others Java: Fortran (90): Python: As easy as IDL (or Yorick): public class HelloWorld{ public static void main(string[] args){ System.out.println(“Hello World!”); } } PROGRAM HelloWorld PRINT *, "Hello World!" END PROGRAM HelloWorld print “Hello World!” print, “Hello World!” Easy and Quick coding Already a reference in astronomy Faster than IDL (especially for data access) More flexible than any other language Between fully compiled and pure script program Free Quantitative study based on 80 programmers implementing the same program http://page.mi.fu-berlin.de/prechelt/Biblio/jccpprtTR.pdf no reason to replace them (yet), rather cooperate Working horses of science… zillions of codes existing optimized for runtime performance no reason to replace them (at this time) but... rather cooperate (f2py, swig) not really general purpose relatively primitive datatypes manual memory management slow edit/compile/test cycle Extremely popular these days Interactive, great visualization, good libraries but... Not really general purpose Not really ready for large scale programs Not free, even expensive We can still call them from Python but you will never do that unless you have a gun over your head Perl very strong at text manipulation Ruby seems to be popular in scientific community in Japan Shell scripts... you will not want to do shell scripting any more once you know Python (unless forced)! Easy to learn but yet powerful Fast and easy solutions for everyday problems Allows development of large and sophisticated software packages Several astronomical program packages implemented using python Excellent tool for scientists Easy parallel programming official Python website www.python.org Simple interactive example: >>> t array([ # print all t-values (assumed to be already defined) 0. , 0.8975979 , 1.7951958, 2.6927937, 3.5903916, 4.48798951, 5.38558741, 6.28318531 ]) >>> t[2:4] # get sub-array with the elements at the indexes 2,3 array([ 1.7951958, 2.6927937]) >>> p = 1.8 # set our point >>> abs(t-p) # how much do the t[:]-values differ from p? array([ 1.8 , 0.9024021 , 0.0048042 , 0.8927937, 1.7903916, 2.68798951, 3.58558741, .48318531 ]) >>> abs(t-p) # how much do the t[:]-values differ from p? array([ 1.8 , 0.9024021 , 0.0048042 , 0.8927937, 1.7903916, 2.68798951, 3.58558741, .48318531 ]) <...> >>> # how large is the 4-th largest absolute distance between the t[:]-values and p >>> dt_m = sort(abs(t-p))[3] >>> # where are the four elements of t[:]closest to p ? >>> abs(t-p) <= dt_m # corresponds to a “where” command in IDL array([False, True, True, True, True, False, False, False]) >>> # construct the sub-arrays; (1) get the 4 t[:]-values >>> t_p = t[abs(t-p)<=dt_m] >>> t_p array([ 0.8975979, 1.7951958, 2.6927937, 3.5903916]) Programs and libraries implemented (mainly in Python): BoA (Bolometer Analysis Package) APECS (APEX control software) Astro-WISE (widefield imaging system) PyFITS (access to FITS files) SAMPy (Samp communications) ... Wrappers for existing programs and libraries: PYGILDAS (GILDAS) ParselTongue (AIPS) PyRAF (IRAF) PyMIDAS (MIDAS) PyDS9 (DS9 via XPA messaging system) … Keep it simple Embed “basic” languages were really needed (e.g. brut/intensive comp.) C/C++ • Jython Java Fortran • Pyrex • Weave • Cython • f2py • pyfort Python Communicate with almost any application Python reference book Python from novice to pro Available online available in multiple languages. http://diveintopython.org Websites Python http://www.python.org Ipython http://ipython.scipy.org SciPy http://www.scipy.org NumPy http://numpy.scipy.org Matplotlib http://matplotlib.sourceforge.net Astropython http://www.astropython.org AstroPy http://www.astro.washington.edu/users/rowen/AstroPy.html Mathesaurus http://mathesaurus.sourceforge.net quick reference for switching F2py http://www.scipy.org/F2py c / f77 / f90 glue (is part of NumPy) PyTables http://www.pytables.org/ all around hdf5 format PyFITS http://www.stsci.edu/resources/software_hardware/pyfits/ FITS SAMPy http://cosmos.iasf-milano.inaf.it/pandora/sampy SAMP in python Mayavi http:// code.enthought.com/projects/mayavi/ SAMP in python “Daily Specials” Who, where, what, how? Audience Observatoire: Toddler, grown up, elderly… anyone curious Must have at least basic experience in one or more computer languages Content: Give you what you need for everyday problems asap First half of course: Basic language elements and basic visualization some modules from standard library, elements of NumPy, SciPy and matplotlib Second half of course: Advanced topics and installing packages Complex scripting, exceptions, classes… finding packages and how to install them… Avoid any personal installation SSH connection to `ASTROMASTER` Username: `guestpython` Password: `python2011` This account will be deleted within the next few days Create your own folder in this user’s home directory Installed packages numpy, matplotlib, pyfits, python-tables, setuptools, tkinter, ipython IDE: Interactive Development Environment i.e.: Very assisted Examples: Eclipse (ext. Python), Spyder… Spyder* Editor, Console, History Log, Variable explorer, Project manager, online help * Scientific PYthon Development EnviRonment http://code.google.com/p/spyderlib/ “comprehensive environment for interactive and exploratory computing.” http://ipython.scipy.org/ “comprehensive environment for interactive and exploratory computing.” In [ ]: myVariable? prints documentation of the variable In [ ]: myFunction?? prints the source code (if accessible) “Classic stuffs and some specials.” Numbers, variables, math and while Python's numerical types Operators for numerical types Variables, variable names, and assignment to variables The math module from the Standard Library Loops with the while statement >>> 2, 0 >>> -4711 >>> 07, 022 # Octal literals >>> 0x9, 0xa, 0XF # Hex literals >>> 17 + 4 # expression >>> 0xa - 2 >>> 23 ** (2+3) # 23⁵ >>> 7 / 2, 7 / -2 # Int division >>> 2.3 >>> -4. >>> 0.1, .1 >>> 2.99E10, 6.62607e-27, -1e10 >>> 1.7 + .4 >>> 17. + 4 >>> 7./2., 7./2., 7./2. >>> 1 + .4 >>> 2 - 3 >>> 25 * -17 >>> 2.**100 >>> 2/3, 2/-3 # Int division >>> 2./3 # Float division >>> 2.//3, 2.//3 # Floor division >>> 7 % 3, 7 % -3 # Int Modulus >>> 7. % 3, 7.2 % 2.5 # Modulus Nb: (n//m)*m + (n%m) = n >>> 17 & 5 # Bitwise AND >>> 17 | 5 # Bitwise OR >>> 17 ^ 5 # Bitwise EXOR >>> ~17 # Bitwise complement >>> 17 << 3 # Bitshift left >>> 0x11 >> 2 # Bitshift right >>> 2.+3j, 2-3J # complex literals >>> j # will not work >>> 1J # but this will >>> complex(1,2) >>> 1+1j*2, (1+1j)*2 >>> (2.+3j).real, (2+3j).imag >>> x = 2 # Assign variable >>> x # Display >>> x + 3 # Use variable >>> y = x + 3 # New variable >>> x = x + 1 # Assign new value >>> x += 1 # Shorthand; no x++ >>> x, y = 2, 3 >>> x, y = y, x >>> x = y = z = 0 # !! WARNING !! Case sensitive >>> xy, Xy = 2, 3 Must begin w. letter >>> 9x = 2 >>> x9 = 2 >>> _x = 2 # not working # ok # ok, but special must not be keyword >>> if = 2 and is try while with yield pass in finally return or import exec continue def not if except class raise None global else break lambda from elif assert del as for print >>> abs(-2.) >>> abs(1+1j) >>> max(1,2,3,4) >>> min(1,2,3,4) >>> hex(17), oct(-5) >>> round(1.23456, 2) >>> x = 2 # Assignment >>> x == 2 # Testing equality >>> x != 0 # True >>> x < 1 # False >>> (x >= 2.) == False # False >>> 1 < x <= 3 >>> 3 < x <= 1 >>> 1 < x >= 0 # (1<x) and (x<=3) >>> x = 0 >>> while x <= 10: ... print x, x**2 ... x += 1 ... <RETURN> # Bool. expr. # Indentation >>> print x # Unindented again Top level must not be indented It does not matter how much blanks you use, but: Uniform within each block Most people use 4 blanks per level but tabs work too. http://www.python.org/doc/essays/styleguide.html How much is the smallest / largest positive float? Help: Use a while loop, compare to 0 and float(‘inf’) >>> x = 1. >>> while x > 0: ... x /= 2. ... print x >>> x = 2.** 32 >>> while x != float(‘inf’): ... x *= 2. ... print x How precise is your installation? Help: Add smaller and smaller numbers to 1 while comparing the sum to 1 >>> x = 1. >>> while 1. + x > 1.: ... x /= 2. ... print x How much is the smallest / largest positive float? Help: Use a while loop, compare to 0 and float(‘inf’) >>> x = 1. >>> while x > 0: ... x /= 2. ... print x >>> x = 2.** 32 >>> while x != float(‘inf’): ... x *= 2. ... print x 4.94065645841e-324 How precise is your installation? Help: Add smaller and smaller numbers to 1 while comparing the sum to 1 >>> x = 1. >>> while 1. + x > 1.: ... x /= 2. ... print x How much is the smallest / largest positive float? Help: Use a while loop, compare to 0 and float(‘inf’) >>> x = 1. >>> while x > 0: ... x /= 2. ... print x >>> x = 2.** 32 >>> while x != float(‘inf’): ... x *= 2. ... print x 4.94065645841e-324 8.98846567431e+307 How precise is your installation? Help: Add smaller and smaller numbers to 1 while comparing the sum to 1 >>> x = 1. >>> while 1. + x > 1.: ... x /= 2. ... print x 1.11022302463e-16 “A complex system that works is invariably found to have evolved from a simple system that worked” – John Gall Large scale program Python's package notion (‘import’) Module, Package Differences between import and execfile or %run Documentation Run script from Unix shell with $> python myscript.py Alternative: Make script executable $> chmod u+x myscript.py Add “ShiBang” line at the top of the file: Execute: $> ./myscript.py Or run script from within Python >>> execfile(“myscript.py”) Or within IPython: In [1]: %run myscript[.py] #!/usr/bin/python Multiline codes (functions, classes, …) or interactive command list Documentation Usually in the __doc__ function within the “module” (file) Ex: def __doc__: ... “”” this is my documentation for this module. usage: >helloWorld() ... “”” Directly in each block: def helloWorld(): ... “”” This is a dummy function “”” ... print “hello World!” Usage: >>> import myscript In[1]: %run myscript >>> help(myscript) In[2]: helloWorld? >>> help(myscript.helloWorld) In[3]: helloWorld?? __doc__ : introduce documentation In general, any __<name>__ is a special function/variable Select instructions that are executed only when running the script and not when import is called everything in the “__main__” block __main__ : Top-level script environment if __name__ == “__main__”: <...> Reading a photometric dataset to generate HR diagram Python's package notion (‘import’) Mandatory packages for doing science: numpy, matplotlib File access: reading from / writing to files Lists, array and indexes, slices For loop statements: while, for Functions: classical and lambda functions Basic visualization: plot, scatter, contours, colormaps First python scripts Read the file Manipulate the data Build the figures File description: data.ascii made of a header (lines start with #) 6424 lines into 9 columns (space as separator) First solution: elbow-grease (like in C) f = open('data.ascii', 'r') # 'r' by default Reading: f.read(), f.readline(), f.readlines() f.close() Reading is done by simply looping over the lines Long solution, you rarely need to be at this low-level. Learning interest… #include <stdio> #include <stdlib> If you had to do it with C code… int main(void) { int ncols = 9, nlines = 6424, i, j; string filename = “data.ascii”; FILE *file; float *data = (float *) malloc( ncols * nlines * sizeof(float) ); file = fopen(filename, "r"); for ( i = 0; i < nlines; i++ ){ for ( j=0; j < ncols; j++ ){ fscanf( file, "%f ", data[j + ncols*i] ); } } fclose(file); ... free(data); } …I am not sure this works though >>> >>> >>> >>> >>> >>> import numpy # low-level does not mean very-low ncol = 9 nrow = 6424 data = numpy.array(ncol, nrow) # common array (one type of data) l = '#' while l[0] == '#': #Skip the comments ... l = f.readline() """Warning: at the end of the loop, l contains the first data row """ >>> for i in range(nrow): # <=> i in [0, ..., 6423] ... """ We should check whether ‘\n’ is present at the end of l ... and cut it out. In any other language, we may have something like: ... if (l[len(l)-1] == '\n') { l = l[0:len(l)-2] } ... But not in python: """ ... if l[-1] == '\n': ... l = l[:-1] ... l = l.split() # split according to spaces ... for k in range(ncol): # Now fill data array ... """Python only reads strings that have to be converted then """ ... data[k,i] = float(l[k]) ... l = f.readline() # go on to the next line >>> f.close() A bit more in a reptile way: >>> data = [] # declare an empty list >>> for l in f: #loop over each line of f (don’t care about #lines) ... if l[0] != '#': ... # Magical `inline` for loop, faster than a classic `for` ... cl = [ float(k) for k in l[:-1].split()] ... data.append( cl ) # add processed line to data >>> f.close() """ data constains n-cells of m-values. In order to access the data by columns, we either transpose the "matrix", or create a sub-list:""" >>> d = numpy.transpose(data) >>> bflux = d[0] >>> blux = [ k[0] for k in data ] Finally in python: (think functions) >>> def readAscii(fichier, transpose = False): """ Function that will read the file """ ... f = open('data.ascii') ... # lambda function <=> math function, x-->f(x) ... convertToFloat = lambda x: [ float(k) for k in x ] ... data = [ convertToFloat( l[:-1].split() ) \ for l in f if l[0] != '#' ] ... f.close() ... if transpose: ... return numpy.transpose(data) ... else: ... return data >>> data = readAscii(f) >>> d = readAscii(f, transpose = True) # pédagoqique ;) BUT: Someone already coded that for you Either numpy or scipy or matplotlib or pylab >>> data = numpy.loadtxt('data.ascii') Variations: >>> data = numpy.loadtxt('data.ascii', comments='#', skiprows=0, delimiter=None ) >>> bflux, bmag, bmerr, sep, vflux, vmag, vmerr, x, y = numpy.loadtxt('data.ascii', unpack=True) loadtxt() is usually fine But sometimes can be limited for big files Other options are available e.g. ATpy, asciitable, … http://astropython.org/resources for other references ATpy is also able to read and write fits, votable, sql … Calibrate the dataset: We apply operations to the whole arrays >>> bmag = bmag - 1.3 >>> vmag -= 2.5 # classic # better for memory space sereval possibilities for visualization 2D: matplotlib/pylab (other exists) 3D, matplotlib (still young), mayavi2 (openGL), … We will use matplotlib/pylab in this workshop >>> from numpy import loadtxt # only what we need >>> import pylab as plt # import & alias to plt >>> bflux, bmag, bmerr, sep, vflux, vmag, vmerr, x, y = loadtxt('data.ascii', unpack=True) >>> bmag += 1.3; vmag += 2.5 # zeropoint correction Plot star positions: >>> plt.figure() # declare a new figure >>> plt.plot(x, y, ',', color='0.0') # do the plot # do not hesitate with help cmds: `plt.plot?` >>> plt.xlabel('X (in pixels)') # set labels >>> plt.ylabel('Y (in pixels)') >>> plt.grid() >>> plt.show() # display (when not auto) Plot star positions: >>> plt.figure() # declare a new figure >>> plt.plot(x, y, ',', color='0.0') # do the plot # do not hesitate with help cmds: `plt.plot?` >>> plt.xlabel('X (in pixels)') # set labels >>> plt.ylabel('Y (in pixels)') >>> plt.grid() >>> plt.show() # display (when not auto) Build HR diagram of the stars: >>> plt.figure() >>> plt.scatter(bmag-vmag, vmag) # `plt.plot` would also work ;) >>> plt.xlabel(r'B$-$V') >>> plt.ylabel(r'V') >>> plt.xlim(1.9,3.4) >>> plt.ylim(6.7,15) >>> plt.ylim(plt.ylim()[::-1]) # reverse y-axis, or also `plt.ylim(15, 6.7)` >>> plt.savefig(‘myfigure.png’) # can export into eps,ps,pdf,png,svg… >>> plt.show() Build HR diagram of the stars: >>> plt.figure() >>> plt.scatter(bmag-vmag, vmag) # `plt.plot` would also work ;) >>> plt.xlabel(r'B$-$V') >>> plt.ylabel(r'V') >>> plt.xlim(1.9,3.4) >>> plt.ylim(6.7,15) >>> plt.ylim(plt.ylim()[::-1]) # reverse y-axis, or also `plt.ylim(15, 6.7)` >>> plt.savefig(‘myfigure.png’) # can export into eps,ps,pdf,png,svg… >>> plt.show() Now explore matplotlib/pylab Generate a density plot of this HR diagram numpy.where, pylab.histogram2d pylab.contour and/or pylab.contourf pylab.figure and/or pylab.subplot import numpy, pylab # get the data bflux, bmag, bmerr, sep, vflux, vmag, vmerr, x, y = numpy.loadtxt('data.ascii', unpack=True) bmag += 1.3 vmag += 2.5 # Select space region of interest ind = numpy.where( ( (bmag-vmag) > -0.5 ) & ( (bmag-vmag) < 2. ) & ( vmag > 10. ) & ( vmag < 22. ) ) #do the histogram h = pylab.histogram2d( bmag[ind]-vmag[ind], vmag[ind], bins=100 ) image = h[0] extent=[ h[1][0], h[1][-1], h[2][0], h[2][-1] ] #axes limits #display part ax = pylab.subplot(121) #Create a subplot, 1 row 2 cols, #1 ax.imshow( image.transpose(), extent=extent, origin='lower', aspect='auto‘, cmap=pylab.cm.Blues) ax.set_xlim(-0.5,1.0) ; ax.set_ylim(20, 11) ax.set_ylabel(r'V‘) ; ax.set_xlabel(r'B$-$V') ax.set_title('histogram2d') ax = pylab.subplot(122) #Create a subplot, 1 row 2 cols, #2 levels = [0.1, 3., 6., 9., 12., 15., 18., 21.] ax.contourf( image.transpose(), extent=extent, levels=levels,cmap=pylab.cm.Blues ) ax.set_xlim(-0.5,1.0); ax.set_ylim(20, 11) ax.set_ylabel(r'V') ; ax.set_xlabel(r'B$-$V') ax.set_title('contourf') pylab.savefig('m3_ex2.png') ax = pylab.gcf().get_axes() # get subplots ax1 = ax[1] # second subplot (contourf) ax1.set_ytitle('') # delete tick labels on the y-axis pylab.setp(ax1.get_yticklabels(), visible=False) # shrink space between the plots pylab.subplots_adjust(wspace=0.001) Use some more magic and add minor ticks, change fonts, ... (battery included in figure.py) “It’s a trap!” Data: values, pointers and references Location, affectation in C Location, affectation in Python Differences between equality and identity int x, y; Memory location x type: int Memory location y type: int ? ? Memory location x type: int Memory location y type: int int x, y; ? ? x = 2; 2 ? Memory location x type: int Memory location y type: int int x, y; ? ? x = 2; 2 ? x = 3; 3 ? Memory location x type: int Memory location y type: int int x, y; ? ? x = 2; 2 ? x = 3; 3 ? y = x; 3 3 Value copied Variables Objects (names) (data) >>> planet = “Pluto” >> Planet “Pluto”, type: str Variables Objects (names) (data) >>> planet = “Pluto” >>> planet = “Tiamat” Planet “Pluto”, type: str “Tiamat”, type: str Variables Objects (names) >>> planet = “Pluto” (data) >>> planet = “Tiamat” Planet >>> legend = planet legend “Tiamat”, type: str Variables Objects (names) >>> planet = "Pluto“ (data) >>> planet = “Tiamat“ Planet >>> legend = planet legend >>> del planet “Tiamat”, type: str Variables Objects (names) >>> planet = "Pluto“ (data) >>> planet = “Tiamat“ >>> legend = planet legend >>> del planet >>> del legend “Tiamat”, type: str >>> l = [1,2,3] >>> m = [1,2,3] # l and m different objects >>> l == m # Equality (of values) tested with == >>> l is m # Identity (of objects) tested with is >>> l[0] = 42 >>> print l >>> print m # m[0] = ? >>> l = [1,2,3]; m = l # l == m ? l is m ? >>> l = numpy.array([1,2,3]) >>> m = l.copy() # l and m different objects >>> l == m # Equality (of values) tested with == >>> l is m # Identity (of objects) tested with is >>> l[0] = 42 >>> print l >>> print m # m[0] = ? Particular variables in Python: tuples, lists, arrays, & dictionaries tuples, lists, arrays dictionaries Looping philosophy Use case: Find out which the 10 most frequent words in a text are. “It feels like somebody is using your head for origami, but it doesn't bend that way…” Tuples and lists are ordered sequences of values of different types Tuples are immutable: does not support item assignment Lists are mutable In [1]: t = (1,2,3); u = [1,2,3] In [2]: t[0], t[-1], t[1:], u[0], u[-1], u[1:] Out[2]: (1, 3, (2, 3), 1, 3, [2, 3]) In [3]: t[0] = 10 TypeError : 'tuple' object does not support item assignment In [4]: u[0] = 10; u.append(4) #ok In [5]: v = list(t) In [6]: del u[0] # v is now a list of t-values # delete one element “It feels like somebody is using your head for origami, but it doesn't bend that way…” In [1]: t = ([1,2,3],[4,5]) In [2]: t[0] = 42 # list is immutable TypeError : ‘tuple’ object does not support item assignment In [3]: t[0][0] = 42; print t Out[3]: ([42,2,3],[4,5]) In [4]: t+t # same result with lists Out[4]: ([42,2,3],[4,5],[42,2,3],[4,5]) In [4]: t*2 # duplicate!! (also with lists) Out[4]: ([42,2,3],[4,5],[42,2,3],[4,5]) Kind of named lists or like c-structures Dictionaries are unordered, mutable set of (key, value) pairs In [1]: d = {'spam': 2, 'eggs': 3} Order is not preserved Use of keys instead of indexes to access data (kind of named array) In [2]: d[‘spam’] Out[2]: 2 In [3]: d[‘spam’] = 10000 In [4]: d[‘bacon’] KeyError: ... Kind of named lists or like c-structures In [2]: d[‘spam’] Out[2]: 2 In [3]: d[‘spam’] = 10000 In [4]: d[‘bacon’] KeyError: ... Either check in advance: >>> d.has_key('bacon') Or use get method with def. val. >>> d.get('bacon', 42) Adding items: >>> d['bacon'] = 13 Removing items: >>> del d['spam'] Keys may be of any immutable type but unique (value are any) [1,2] (1,2) ‘str’, val… To be or not to be iterable… Loops use while or for constructions while constructions are “classical” For construction aims at iterating over an object Lists, tuples and dictionaries are iterable objects Any object with a __iter__ function is iterable >>> for i in range(10): … # range(10) = [0,…,9] >>> d = {'spam': 2, 'eggs': 3, 'bacon': 5} >>> for k in d: ... print k, d[k] >>> for k in d.keys(): # Clearer equivalent version ... print k, d[k] >>> for k in sorted(d.keys()): # Sorted loop ... print k, d[k] Special for-loop construction >>> d = {'spam': 2, 'eggs': 3, 'bacon': 5} >>> val = []: >>> for k in d.keys(): ... if d[k] > 2.5: ... val.append( (k,d[k]) ) >>> v = [ (k,d[k]) for k in d.keys() if d[k] > 2.5 ] Automatic and optimized allocation, hence faster! Find out which the 10 most frequent words in a text are. “The First Men in the Moon” by H. G. Wells (Herbert Georges) http://www.gutenberg.org/ebooks/1013 (UTF-8 plain text format) Help: use dictionaries or lists or tuples, and for-loops Useful help: import re – this let you use regular expressions (see in particular: re.sub) Open the text file, read the full text into a variable Filter for `\r\n` sequences Be careful of case sensitivity, punctuation and so on… during word counts Some commands (not all needed): ▪ <str>.split, <str>.join, <str>.upper, <str>.lower, <str>.replace … ▪ <list>.count, re.sub, numpy.argsort, numpy.unique ... codesources/textCount.py Subplots & linked axes <…>.subplot(…,sharex=ax) Power Law !!! codesources/textCount.py def count(): f = open('firstMenOnTheMoon.txt') #open file # =============== Get all lines and set to lower case ================ txt = [ l.lower() for l in f if l != '\r\n'] f.close() #file can be closed now txt = ''.join(txt) # ================== filter & replace punctuation ==================== txt = txt.replace('\r\n', ' ').replace('--', ' ') w = re.sub('[.,:\'"!?;()]',' ', txt) w = w.split() # ========================== Do the stats ============================ #words = set(w) # also works, generate a sequence of unique words words = numpy.unique(w) counts = [ w.count(kwd) for kwd in words ] sind = numpy.argsort(counts)[::-1] sc = numpy.array(counts)[sind] sw = numpy.array(words)[sind] print “Statistics:\ncontains %d words\n“ % (len(w) for k in range(10): print "Word '%s' used '%d' times" % (sw[k], sc[k]) return sw,sc Fitting a black body model on the Sun’s spectrum Fitting data with arbitrary functions Using scipy: scipy.optimize, scipy.optimize.leastsq reading FITS files with pyfits Lambda functions We will fit a blackbody model onto the solar spectrum. First, we need to code the blackbody model Planck constant: speed of light: Boltzmann constant: h = 6.626069e-34 (J.s) c = 2.99792458e8 (m.s) k = 1.3807e-23 (J/K) 1 parameter: T Read the Solar spectrum: sun.fits Optimize using a least square method import numpy from numpy import exp # h c k Module global variables = 6.626069e-34 # Planck (J.s) = 2.99792458e8 # speed of light (m.s) = 1.3807e-23 # Boltzmann (J/K) def blackbody(wave, temp): """ wave in angstroms, """ BBCONST = h*c/(k * 1.e-10) exparg = BBCONST/(numpy.array(wave)*temp) return 2*h*c**2*1e50/( wave**5 * (numpy.exp(exparg)-1.) ) from scipy import optimize import pylab Def fakeData(): #Pure Black body with noise dx = numpy.linspace(1000,20000, 1000) dy = blackbody(dx, 7000) dy /= dy.sum() dy += numpy.random.normal(0,1e-4, len(dy)) return dx, dy # The target function fitfunc = lambda p, x: p[0]*blackbody(x,p[1]) # Distance to the target function errfunc = lambda p, x, y: fitfunc(p,x)-y dx, dy = fakeData() # Test case p0 = [1., 6000] # Initial guess for the parameters p1, success = optimize.leastsq(errfunc, p0[:], args=(dx, dy) ) pylab.figure() pylab.plot(dx,dy, ',') pylab.plot(dx,fitfunc(p1,dx), color='#ff0000') pylab.xlabel(r'Wavelength ($\rm{\AA}$)') pylab.ylabel(r'Flux (Arbitrary Units)') A short digression using pyfits In [1]: h = pyfits.open('sun.fits') In [2]: h Out[2]: [<pyfits.core.PrimaryHDU object at 0xae03bcc>, <pyfits.core.BinTableHDU object at 0xb9f57ec>] In [3]: print h[1].header XTENSION= 'BINTABLE' / binary table extension <...> In [4]: h[1].data.field('flux') Out[4]: array([ 6.19000000e-05, 5.61400000e-04, 4.90100000e-03, ..., 2.95000000e-10, 1.01000000e-10, 3.38000000e-12]) In [5]: x = h[1].data.field('wavelength') In [6]: y = h[1].data.field('flux') Using pyfits def getSunData(): import pyfits t = pyfits.open(‘sun.fits’) x = h[1].data.field('wavelength') y = h[1].data.field('flux') t.close() return x, y Now we can go back to our problem and fit the Solar spectrum Adapt the previous method to the new dataset Plot the result Look at the statistics What is the official temperature of the Sun? x, y = getSunData() # Sun spectrum Case ind = numpy.where((x>1e3) & (y< 2.0e4)) p1, success = optimize.leastsq(errfunc, p0[:], args=(x[ind], y[ind])) e1 = errfunc(p1, x[ind], y[ind]) e1 *= e1 print " Norm = %g, Temperature = %f, error=%f " % (p1[0],p1[1], numpy.sqrt(e1.sum()) ) pylab.figure() pylab.plot(x[ind], y[ind]) pylab.plot(x[ind], fitfunc(p1, x[ind]), color='#ff0000') pylab.xlabel(r'Wavelength ($\rm{\AA}$)') pylab.ylabel(r'Flux (Arbitrary Units)') Norm = 8.91261e-14, Temperature = 5466.278346, error=4.390300 Norm = 6.65493e-14, Temperature = 5800.000000, error=5.359789 Glue It All Together With Python With SAMPy Simple example: (requires mytables, mysamp and dependencies) In [1]: import mysamp, mytables In [2]: c = mysamp.Client() [SAMP] Warning (2011-03-21T11:08:53.322297): Another SAMP Hub is already running. Start-up process aborted. <=> Aladin & Topcat launched before already providing a hub In [3]: t = mytables.load('toto.csv') Auto-detected input type: csv In [4]: c['t'] = t Auto-detected input type: fits Overwrite existing file 't.fits'. Data exported into t.fits Auto-detected input type: fits Objects oriented codes, graphical interface, Fortran-C interaction Everything is object in python int, float, …, variables, functions, classes, modules … “a class is a construct that is used as a blueprint to create objects of that class.” – Hum… A class is a definition of a set of variables associated with functions Equivalent to c/fortran structures + functions An object is an instance of one class Example: a class “Vector” A vector of 3 coordinates Function/”methods”: addition, scalar product, projection … Coding a Vector class in Python: class Vector(object): # “object” is the top-lvl type def __init__(self, x,y,z): # constructor self.x, self.y, self.z = x,y,z def sum(self, p): # addition: p1+p2 return Vector(self.x+p.x, self.y+p.y, self.z+p.z) def scalarProd(self,p): # scalar product p1*p2 return self.x*p.x + self.y*p.y + self.z*p.z Instanciate Vector objects: >>> >>> >>> >>> a = Vector(1,0,0) b = Vector(0,1,0) c = a.scalarProd(b) print c # ??? What about print a? Print is equivalent to a.__str__ (“convert a to string”) Hence you need to define (overload) a __str__ method Example: def __str__(self): return “Vect(%f,%f,%f)” % (self.x, self.y, self.z) Sometimes it is better to use ‘+’ and ‘*’ Reading and usage are easier You need to define operator functions: __add__, __sub__, __mul__, __div__, __pow__ … More at http://docs.python.org/reference/datamodel.html Now comes the good part: exercise Try to implement a complete “Vector class” of your own Manage a string representation of the vectors Manage the operators ‘+’, ‘– ’ as common sense Implement a scalar product solution based on ‘*’ Implement vector product based on ‘**’ (power operator) class Vector(object): def __init__(self, x, y=None, z=None): # Class constructor: from list/tuple or 3 values if hasattr(x,'__len__'): assert len(x) == 3 self.x, self.y, self.z = x else: assert((y != None) & (z != None)) self.x, self.y, self.z = x,y,z def __repr__(self): # String repr, called when >>> Vector([1,2,3]) <CR> s = self.__str__()+", " s += object.__repr__(self) return s def __str__(self): # String Conversion return "Vect(%s)" % [self.x,self.y,self.z] def __add__(self, p): if isinstance(p, Vector): # if p Vector, sum of p1*p2 return Vector(self.x+p.x, self.y+p.y, self.z+p.z) else: # if not p Vector return Vector(self.x+p, self.y+p, self.z+p) def __mul__(self,p): if isinstance(p, Vector): # scalar product p1*p2 return self.x*p.x + self.y*p.y + self.z*p.z else: return Vector(self.x*p, self.y*p, self.z*p) def __pow__(self,p): if isinstance(p, Vector): # vector product p1**p2 x = self.y*p.z-self.z*p.y y = self.z*p.x-self.x*p.z z = self.x*p.y-self.y*p.x return Vector(x,y,z) else: return Vector(self.x**p, self.y**p, self.z**p) In [1]: In [2]: In [3]: In [4]: Out[4]: import vector a = vector.Vector(1,0,0) b = vector.Vector([0,1,0]) a+1 Vect([2, 1, 1]), <vector.Vector object at 0x9544bcc> In [5]: print a+b Vect([1, 1, 0]) In [6]: print (a+1)**2 Vect([4, 1, 1]) In [7]: print a**b, b**a Vect([0, 0, 1]) Vect([0, 0, -1]) Graphical User Interface from python btw JK, may I be cannibal? ;) Many classical libraries are available Tcl/TK, QT, Wx, WxGTK … An example is better than words… Example taken from Joachim K. homeworks “Applet” to show the physics of a 2 body gravitational interaction One body is fixed and the second “orbits” around Trajectory of the second body is plotted ▪ Depending on its initial position and velocity Source in `OneBody.py` C/C++, Fortran interaction Python presents many advantages However, too slow for highly intensive numerical computations You can interface Python with C or Fortran codes Best of all worlds Pre- and post- processing in Python Intensive part in Fortran (or in C) F2py is one solution (many others exists) Requires NumPy (> = 1.3) and Scipy_distutils (>= 0.2.2) A few lines in your fortran source code Interface file (F2py is able to auto-generate simple interface that you can edit) Example (F90/95): subroutine norme(a,b,c) real(8), intent(in) :: a,b Real(8), intent(out) :: c c = sqrt( a*a + b*b ) end subroutine norme Generate the interface with f2py $> f2py -c norme.f90 -m vect \ --fcompiler=gfortran --f90flags=-03 <norme.f90> From a python shell: >>> import vect >>> c = vect.norme(3,4) >>> print c 5.0 Documentation is also generated by f2py: >>> print vect.norme.__doc__ norme – Function signature: c = norme(a,b) Required arguments: a : input float b : input float Return objects: c: float Works with lists and numpy.arrays subroutine norme(a,c,n) integer :: n, i real(8), dimension(n), intent(in) :: a !f2py optional, depend(a)::n=len(a) real(8), intent(out) :: c real(8) :: sumc sumc = 0. do i=1,n sumc = sumc + a(i)*a(i) end do c = sqrt( sumc ) end subroutine norme From a python shell: >>> from vect import * >>> a = [2,3,4] # a given list >>> type(a) <type ‘list’> >>> norme(a) 5.3851648071345037 >>> from numpy import array >>> a = array([2,3,4]) # an equivalent numpy.array >>> type(a) <type ‘numpy.ndarray’> >>> norme(a) 5.3851648071345037 You can do much more Callbacks are available (call python from fortran) Encapsulate fortran within python objects Compile fortran code inside python scripts … Visit http://www.scipy.org/F2py C/C++ wrapper for python and many more (19 languages!) Please don’t shoot… I haven’t use it yet! Just trying to answer some of your questions http://www.swig.org SWIG is an interface compiler connects programs written in C and C++ with scripting languages such as Perl, Python … works by taking the declarations found in C/C++ header files generates the wrapper code that scripting languages need to access the underlying C/C++ code. SWIG currently generates wrapper code for nineteen different target languages: Allegro CL, C#, CFFI, CLISP, Chicken, D, Go, Guile, Java, Lua, Modula-3, Mzscheme, OCAML, Octave, Perl, PHP, Python, R, Ruby, Tcl, UFFI So you want to get going in a hurry? To illustrate the use of SWIG, suppose you have some C functions you want added to Python /* File : example.c */ #include <time.h> double My_variable = 3.0; int fact(int n) { if (n <= 1) return 1; else return n*fact(n-1); } int my_mod(int x, int y) { return (x%y); } char *get_time() { time_t ltime; time(<ime); return ctime(<ime); } Interface file Now, in order to add these files to your favorite language, you need to write an "interface file" which is the input to SWIG. An interface file for these C functions might look like this : /* example.i */ %module example %{ /* Put header files here or function declarations */ extern double My_variable; extern int fact(int n); extern int my_mod(int x, int y); extern char *get_time(); %} extern double My_variable; extern int fact(int n); extern int my_mod(int x, int y); extern char *get_time(); Building a Python module Turning C code into a Python module is also easy. Simply do the following: $> swig -python example.i $> gcc -c example.c example_wrap.c \ -I/usr/local/include/python2.1 $> ld -shared example.o example_wrap.o -o _example.so We can now use the Python module as follows : >>> import example >>> example.fact(5) 120 >>> example.my_mod(7,3) 1 >>> example.get_time() 'Sun Feb 11 23:01:07 1996’ More at http://www.swig.org Mayavi2, Blender A short example from the official website: def getData(): import numpy as np # Create data, z Gaussian function of x and y. np.random.seed(12345) x = 4*(np.random.random(500) – 0.5) y = 4*(np.random.random(500) – 0.5) f = lambda (x, y): np.exp(-(x**2 + y**2)) z = f(x, y) return (x,y,z) from enthought.mayavi import mlab mlab.figure(1, fgcolor=(0, 0, 0), bgcolor=(1, 1, 1)) # get Dataset x, y, z = getData() # Visualize the points pts = mlab.points3d(x, y, z, z, scale_mode=‘none’, scale_factor=0.2) # Create and visualize the mesh mesh = mlab.pipeline.delaunay2d(pts) surf = mlab.pipeline.surface(mesh) mlab.view(47, 57, 8.2, (0.1, 0.15, 0.14)) mlab.show() This example: how you can use Mayavi interactively to visualize numpy arrays while doing numerical work with scipy. It assumes that you are now familiar with numerical Python tools. Use-case: study the trajectories of a particle in a potential. very common problem: orbits, particles… visualization of the potential and the trajectories The potential: a periodic lattice, immersed in a parabolic confinement. We will shake this potential and see how the particle jumps from a hole of the lattice to another. The parabolic confinement is there to limit the excursions of the particle More at http://code.enthought.com/projects/mayavi Code in the appendixes Who said science is not fun? Blender is a free open-source 3D content creation suite Movies are being made with it (e.g. Sintel, available on youtube) A suite like 3D Studio Max: 3D scenes, movies, animations Full integration of python: scripting scenes and movies. 2.5x version: Full UI in python (calculations might be in C) And you can even fly-in !! Non scientific model of M31 (Andromeda Galaxy) https://bitbucket.org/quiark/galaxy/.../galaxy2.blend Just imagine… N-body simulation of the Antennae Galaxy from Florent Renaud Python to build the scene Blender to fly-in Setuptools mainly! (easy_install, pip) How to install python? Apt-get install python / fink install python / python-x.xx.msi(ish?) About the “packages”: First install `setuptools` (or `pip` equiv., python package index) ▪ Apt-get / fink / windows binary Then easy_install [opts] <package name> ▪ E.g. easy_install numpy, easy_install pyfits… ▪ Is also able to update packages (-U option) Quid if I am not root of my computer? VirtualEnv is the fix. http://pypi.python.org/pypi/virtualenv ▪ Isolated python envir. (bottle) You got out of the egg ... You have now the materials to go on by yourself… Soon you will dance to the flute songs One day is quite short to show everything… This example: how you can use Mayavi interactively to visualize numpy arrays while doing numerical work with scipy. It assumes that you are now familiar with numerical Python tools. Use-case: study the trajectories of a particle in a potential. very common problem: orbits, particles… visualization of the potential and the trajectories The potential: a periodic lattice, immersed in a parabolic confinement. We will shake this potential and see how the particle jumps from a hole of the lattice to another. The parabolic confinement is there to limit the excursions of the particle More at http://code.enthought.com/projects/mayavi Use the mlab module to interactively visualize the data. For this it is best to use an interactive Python shell Mayavi2 application has a built-in interactive shell also Let us visualize the 3D isosurfaces of the potential: >>> from enthought.mayavi import mlab >>> mlab.contour3d(X, Y, Z, V) We can interact with the visualization Rotation, zoom, … In particular vary the iso-surfaces (maya-tree) We want to study the motion of a particle in this potential We derive the corresponding forces from the gradient of the potential. We create a gradient function def gradient(f, x, y, z, d=0.01): """ Return the gradient of f in (x, y, z). """ fx = f(x+d, y, z); fx_ = f(x-d, y, z) fy = f(x, y+d, z); fy_ = f(x, y-d, z) fz = f(x, y, z+d); fz_ = f(x, y, z-d) return ( (fx-fx_)/(2*d), (fy-fy_)/(2*d), (fz-fz_)/(2*d) ) Let us visualize the corresponding vector field. only along a cut for X=50, and every three points on our grid Vx, Vy, Vz = gradient( V, X[50,::3,::3], Y[50,::3,::3], Z[50,::3,::3] ) mlab.quiver3d( X[50,::3,::3], Y[50,::3,::3], Z[50,::3,::3], Vx, Vy, Vz, scale_factor=-0.2, color=(1,1,1) ) $> ipython -wthread import numpy as np def V(x, y, z): """ Potential: A 3D sinusoidal lattice with a parabolic confinement. """ return np.cos(10*x) + np.cos(10*y) + p.cos(10*z) + 2*(x**2 + y**2 + z**2) We would like to see what it looks like in 3D. To do this we create a 3D grid of points, and sample it on these later X, Y, Z = np.mgrid[-2:2:100j, -2:2:100j, -2:2:100j] A good view of the potential: turning off auto contours choosing -0.5 and 15 We can also change the colors used by selecting a different LUT (Look Up Table). select a paired as it separates well levels. To get a better view of the potential: display more contours, but closed contours hide their interior. One solution: a cut plane. Right-click on the IsoSurface node and add a ScalarCutPlane. You can move the cut plane by clicking on it and dragging. Add a colorbar: mlab.colorbar(title=“Potential”) We can use scipy to integrate the trajectories. first define a dynamical flow of the system The flow is used by every ODE (ordinary differential equation) solver The dynamics is made of the force deriving from the potential def flow(r, t): """ The dynamical flow of the system """ x, y, z, vx, vy, vz = r fx, fy, fz = gradient(V, x-.2*np.sin(6*t), y-.2*np.sin(6*t+1), z-.2*np.sin(6*t+2)) return np.array( (vx, vy, vz, -fx - 0.3*vx, -fy - 0.3*vy, -fz - 0.3*vz) ) Now we can integrate the trajectory: from scipy.integrate import odeint R0 = (0, 0, 0, 0, 0, 0) # Initial conditions t = np.linspace(0, 50, 500) # Time Sampling R = odeint(flow, R0, t) Finally we can plot the trajectories: x, y, z, vx, vy, vz = R.T trajectory = mlab.plot3d(x, y, z, t, colormap=“hot”,tube_radius=None) mlab.colorbar(trajectory, title=“Time”)