Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Scientific Computing using Python Two day PET KY884 tutorial at HEAT Center, Aberdeen, MD Mon-Tue, July 20-21, 2009, 8:30am - 4:30pm Dr. Craig Rasmussen, and Dr. Sameer Shende [email protected] http://www.paratools.com/arl09 1 Tutorial Outline • • • • • • • • Basic Python IPython : Interactive Python Advanced Python NumPy : High performance arrays for Python Matplotlib : Basic plotting tools for Python MPI4py : Parallel programming with Python F2py and SWIG : Language interoperability Extra Credit – SciPy and SAGE : mathematical and scientific computing – Traits : Typing system for Python – Dune : A Python CCA-Compliant component framework • Portable performance evaluation using TAU • Labs 2 Tutorial Goals • This tutorial is intended to introduce Python as a tool for high productivity scientific software development. • Today you should leave here with a better understanding of… – – – – The basics of Python, particularly for scientific and numerical computing. Toolkits and packages relevant to specific numerical tasks. How Python is similar to tools like MATLAB or GNU Octave. How Python might be used as a component architecture • …And most importantly, – Python makes scientific programming easy, quick, and fairly painless, leaving more time to think about science and not programming. 3 SECTION 1 INTRODUCTION 4 What Is Python? Python is an interpreted language that allows you to accomplish what you would with a compiled language, but without the complexity. • Interpreted and interactive • Truly Modular • Easy to learn and use • NumPy • PySparse • FFTW • Plotting • Automatic garbage collection • MPI4py • Object-oriented and Functional • Co-Array Python • Fun • Free and portable 5 Running Python $$ ipython Python 2.5.1 (… Feb 6 2009 …) Ipython 0.9.1 – An enhanced … # what is math >>> type(math) <type 'module'> '''a comment line …''' # another comment style # the IPython prompt In [1]: # what is in math >>> dir(math) ['__doc__',…, 'cos',…, pi, …] # the Python prompt, when native # python interpreter is run >>> # import a module >>> import math >>> cos(pi) NameError: name 'cos' is not defined # import into global namespace >>> from math import * >>> cos(pi) -1.0 6 Interactive Calculator # adding two values >>> 3 + 4 7 # setting a variable >>> a = 3 >>> a 3 # checking a variables type >>> type(a) <type 'int'> # an arbitrarily long integer >>> a = 1204386828483L >>> type(a) <type 'long'> # real numbers >>> b = 2.4/2 >>> print b 1.2 >>> type(b) <type 'float'> # complex numbers >>> c = 2 + 1.5j >>> c (2+1.5j) # multiplication >>> a = 3 >>> a*c (6+4.5j) 7 Online Python Documentation # command line documentation $$ pydoc math Help on module math: >>> dir(math) ['__doc__', >>> math.__doc__ …mathematical functions defined… >>> help(math) Help on module math: >>> type(math) <type 'module'> # ipython documentation In[3]: math.<TAB> …math.pi math.sin math.sqrt In[4]: math? Type: module Base Class: <type 'module'> In[5]: import numpy In[6]: numpy?? Source:=== “““\ NumPy ========= 8 Labs! Lab: Explore and Calculate 9 Strings # creating strings >>> s1 = "Hello " >>> s2 = 'world!' # string operations >>> s = s1 + s2 >>> print s Hello world! >>> 3*s1 'Hello Hello Hello ' >>> len(s) 12 # the string module >>> import string # split space delimited words >>> word_list = string.split(s) >>> print word_list ['Hello', 'world!'] >>> string.join(word_list) 'Hello world!' >>> string.replace(s,'world', 'class') 'Hello class!' 10 Labs! Lab: Strings 11 Tuples and Lists: sequence objects # a tuple is a collection of obj >>> t = (44,) # length of one >>> t = (1,2,3) >>> print t (1,2,3) # accessing elements >>> t[0] 1 >>> t[1] = 22 TypeError: 'tuple' object does not support item assignment # a list is a mutable collection >>> l = [1,22,3,3,4,5] >>> l [1,22,3,3,4,5] >>> l[1] = 2 >>> l [1,2,3,3,4,5] >>> del l[2] >>> l [1,2,3,4,5] >>> len(l) 5 # in or not in >>> 4 in l True >>> 4 not in l False 12 More on Lists # negative indices count # backward from the end of # the list >>> l [1,2,3,4,5] >>> l[-1] 5 >>> l[-2] 4 >>> dir(list) [__add__, 'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort'] # what does count do? >>> list.count <method 'count' of 'list'…> >>> help(list.count) 'L.count(value) -> integer -return number of occurrences of value' 13 Slicing var[lower:upper] Slices extract a portion of a sequence (e.g., a list or a NumPy array). Mathematically the range is [lower, upper). >>> print l [1,2,3,4,5] # some ways to return entire # portion of the sequence >>> l[0:5] >>> l[0:] >>> l[:5] >>> l[:] [1,2,3,4,5] # middle three elements >>> l[1:4] >>> l[1:-1] >>> l[-4:-1] [2,3,4] # last two elements >>> l[3:] >>> l[-2:] [4,5] 14 Dictionaries: key/value pairs Dictionaries store key/value pairs. Indexing a dictionary by a key returns the value associate with it. # create data >>> pos = [1.0, 2.0, 3.0, 4.0, 5.0] >>> T = [9.9, 8.8. 7.7, 6.6, 5.5] # store data in a dictionary >>> data_dict = {'position': pos, 'temperature': T} # access elements >>> data_dict['position'] [1.0, 2.0, 3.0, 4.0, 5.0] 15 Labs! Lab: Sequence Objects 16 If Statements and Loops # if/elif/else example >>> print l [1,2,3,4,5] >>> … … … … … … yes if 3 in l: print 'yes' elif 3 not in l: print 'no' else: print 'impossible!' < hit return > # for loop examples >>> for i in range(1,3): print i … < hit return > 1 2 >>> for x in l: print x … < hit return > 1 … # while loop example >>> i = 1 >>> while i < 3: print i; i += 1 … < hit return > 1 2 17 Functions # create a function in funcs.py def Celcius_to_F(T_C): T_F = (9./5.)*T_C + 32. return T_F ''' Note: indentation is used for scoping, no braces {} ''' # run from command line and # start up with created file $ python -i funcs.py >>> dir() ['Celcius_to_F', '__builtins__',… ' >>> Celsius_to_F = Celcius_to_F >>> Celsius_to_F <function Celsius_to_F at …> >>> Celsius_to_F(0) 32.0 >>> C = 100. >>> F = Celsius_to_F(C) >>> print F 212.0 18 Labs! Lab: Functions 19 Classes # create a class in Complex.py class Complex: '''A simple Complex class''' def __init__(self, real, imag): '''Create and initialize''' self.real = real self.imag = imag def norm(self): '''Return the L2 Norm''' import math d = math.hypot(self.real,self.imag) return d #end class Complex # run from command line $ python -i Complex.py # help will display comments >>> help(Complex) Help on class Complex in module … # create a Complex object >>> c = Complex(3.0, -4.0) # print Complex attributes >>> c.real 3.0 >>> c.imag -4.0 # execute a Complex method >>> c.norm() 5.0 20 Labs! Lab: Classes 21 SECTION 2 Interactive Python IPython 22 IPython Summary • An enhanced interactive Python shell • An architecture for interactive parallel computing • IPython contains – – – – Object introspection System shell access Special interactive commands Efficient environment for Python code development • Embeddable interpreter for your own programs • Inspired by Matlab • Interactive testing of threaded graphical toolkits 23 Running IPython $$ ipython -pylab IPython 0.9.1 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object'. ?object also works, ?? Prints # %fun_name are magic commands # get function info In [1]: %history? Print input history (_i<n> variables), with most recent last. In [2]: %history 1: #?%history 2: _ip.magic("history ") 24 More IPython Commands # some shell commands are available In [27]: ls 01-Lab-Explore.ppt* 04-Lab-Functions.ppt* # TAB completion for more information about objects In [28]: %<TAB> %alias %autocall %autoindent %automagic %bg %bookmark %cd %clear %color_info %colors %cpaste %debug %dhist %dirs %doctest_mode # retrieve Out[] values In [29]: 4/2 Out[29]: 2 In [30]: b = Out[29] In [31]: print b 2 25 More IPython Commands # %run runs a Python script and loads its data into interactive # namespace; useful for programming In [32]: %run hello_script Hello # ! gives access to shell commands In [33]: !date Tue Jul 7 23:04:37 MDT 2009 # look at logfile (see %logstart and %logstop) In [34]: !cat ipython_log.py #log# Automatic Logger file. *** THIS MUST BE THE FIRST LINE *** #log# DO NOT CHANGE THIS LINE OR THE TWO BELOW #log# opts = Struct({'__allownew': True, 'logfile': 'ipython_log.py'}) #log# args = [] #log# It is safe to make manual edits below here. #log#----------------------------------------------------------------------_ip.magic("run hello”) 26 Interactive Shell Recap – – – – – – – – – – – – – – – – Object introspection (? and ??) Searching in the local namespace (‘TAB’) Numbered input/output prompts with command history User-extensible ‘magic’ commands (‘%’) Alias facility for defining your own system aliases Complete system shell access Background execution of Python commands in a separate thread Expand python variables when calling the system shell Filesystem navigation via a magic (‘%cd’) command – Bookmark with (‘%bookmark’) A lightweight persistence framework via the (‘%store’) command Automatic indentation (optional) Macro system for quickly re-executing multiple lines of previous input Session logging and restoring Auto-parentheses (‘sin 3’) Easy debugger access (%run –d) Profiler support (%prun and %run –p) 27 Labs! Lab: IPython Try out ipython commands as time allows 28 SECTION 3 Advanced Python 29 Regular Expressions # The re module provides regular expression tools for advanced # string processing. >>> import re # Get a refresher on regular expressions >>> help(re) >>> help(re.findall) >>> help(re.sub) >>> re.findall(r'\bf[a-z]*', 'which foot or hand fell fastest') ['foot', 'fell', 'fastest’] >>> re.sub(r'(\b[a-z]+) \1', r'\1', 'cat in the the hat') 'cat in the hat' 30 Labs! Lab: Regular Expressions Try out the re module as time allows 31 Fun With Functions # a filter returns those items # for which the given function returns True >>> def f(x): return x < 3 >>> filter(f, [0,1,2,3,4,5,6,7]) [0, 1, 2] # map applies the given function to each item in a sequence >>> def square(x): return x*x >>> map(square, range(7)) [0, 1, 4, 9, 16, 25, 36] # lambda functions are small functions with no name (anonymous) >>> map(lambda x: x*x, range(7)) [0, 1, 4, 9, 16, 25, 36] 32 More Fun With Functions # reduce returns a single value by applying a binary function >>> reduce(lambda x,y: x+y, [0,1,2,3]) 6 # list comprehensions provide an easy way to create lists # [an expression followed by for then zero or more for or if] >>> vec = [2, 4, 6] >>> [3*x for x in vec] [6, 12, 18] >>> [3*x for x in vec if x > 3] [12, 18] >>> [x*y for x in vec for y in [3, 2, -1]] [6, 4, -2, 12, 8, -4, 18, 12, -6] 33 Labs! Lab: Fun with Functions 34 Input/Output # dir(str) shows methods on str object # a string representation of a number >>> x = 3.25 >>> 'number is' + repr(x) 'number is3.25' # pad with zeros >>> '12'.zfill(5) '00012' # explicit formatting (Python 2.6) >>> 'The value of {0} is approximately {1:.3f}.'.format('PI', math.pi) The value of PI is approximately 3.142. 35 File I/O # file objects need to be opened # some modes - 'w' (write), 'r' (read), 'a' (append) # - 'r+' (read+write), 'rb', (read binary) >>> f = open('/tmp/workfile', 'w') >>> print f <open file '/tmp/workfile', mode 'w' at 80a0960> >>> help(f) >>> f.write('I want my binky!') >>> f.close() >>> f = open('/tmp/workfile', 'r+') >>> f.readline() 'I want my binky!' 36 Search and Replace # file substitute.py import re fin = open('fadd.f90', 'r') p = re.compile('(subroutine)') try: while True: s = fin.readline() if s == "": break sout = p.sub('SUBROUTINE', s) print sout.replace('\n', "") # sys.stdout.write simpler except: print "Finished reading, file" # is this line reached? fin.close() 37 Iterators over Containers Interators require two methods: next() and __iter__() Fibonacci: f[n] = f[n-1] + f[n-2]; with f[0] = f[1] = 1 class fibnum: def __init__(self): self.fn1 = 1 self.fn2 = 1 # f [n-1] # f [n-2] def next(self): # next() is the heart of any iterator oldfn2 = self.fn2 self.fn2 = self.fn1 self.fn1 = self.fn1 + oldfn2 return oldfn2 def __iter__(self): return self 38 Iterators… # use Fibonacci iterator class >>> from fibnum import * # construct a member of the class >>> f = fibnum() >>> l = [] >>> for i in f: l.append(i) if i > 20: break >>> l = [] [1, 1, 2, 3, 5, 8, 13, 21] # thanks to (and for more information on iterators): # http://heather.cs.ucdavis.edu/~matloff/Python/PyIterGen.pdf 39 Binary I/O Anticipating the next module NumPy (numerical arrays), you may want to look at the file PVReadBin.py to see how binary I/O is done in a practical application. 40 Labs! Lab: Input/Output Try out file I/O as time allows 41 SECTION 4 NUMERICAL PYTHON 42 NumPy • Offers Matlab like capabilities within Python • Information – http://numpy.scipy.org/ • Download – http://sourceforge.net/projects/numpy/files/ • Numeric developers (initial coding Jim Hugunin) – – – – Paul Dubouis Travis Oliphant Konrad Hinsen Charles Waldman 43 Creating Array: Basics >>> from numpy import * >>> a = array([1.1, 2.2, 3.3]) >>> print a [ 1.1 2.2 3.3] # two-dimension array >>> b = array(([1,2,3],[4,5,6])) >>> print b [[1 2 3] [4 5 6]] >>> print ones((2,3), float) [[1. 1. 1.] [1. 1. 1.]] >>> print resize(b,(2,6)) [[1 2 3 4 5 6] [1 2 3 4 5 6]] >>> print reshape(b,(3,2)) [[1 2] >>> b.shape [3 4] (2,3) [5 6]] 44 Creating Arrays: Strategies # user reshape with range >>> a = reshape(range(12),(2,6)) >>> print a [[0 1 2 3 4 5] [6 7 8 9 10 11]] # set an entire row (or column) >>> a[0,:] = range(1,12,2) >>> print a [[1 3 5 7 9 11] [6 7 8 9 10 11]] # loop to set individual values >>> for i in range(50): … for j in range(100): … a[i,j] = i + j # call user function set(x,y) >>> shape = (50,100) >>> a = fromfunction(set, shape) # use scipy.io module to read # values from a file into an # array >>> a = zeros([50,100]) 45 Simple Array Operations >>> a = arange(1,4); print a [1 2 3] # addition (element wise) >>> print 3 + a [4 5 6] # multiplication (element wise) >>> print 3*a [3 6 9] # it really is element wise >>> print a*a [1 4 9] # power: a**b -> power(a,b) >>> print a**a [1 4 27] # functions: sin(x), log(x), … >>> print sqrt(a*a) [1. 2. 3.] # comparison: ==, >, and, … >>> print a < a [False False False] # reductions >>> add.reduce(a) 6 46 Slicing Arrays >>> >>> [[0 [3 [6 a = reshape(range(9),(3,3)) print a 1 2] 4 5] 7 8]] # second column >>> print a[:,1] [1 4 7] # last row >>> print a[-1,:] [6 7 8] # slices are references to # original memory, true for # all array/sequence assignment # work on the first row of a >>> b = a[0,:] >>> b[0] = 99 ; print b [99 1 2] # what is a[0,:] now? >>> print a[0,:] [99 1 2] 47 Array Temporaries and ufuncs >>> a = arange(10) >>> b = arange(10,20) # What will the following do? >>> a = a + b # Universal functions, ufuncs >>> type(add) <type 'numpy.ufunc'> # Is the following different? >>> c = a + b >>> a = c # add is a binary operator # Does “a” reference old or new # memory? Answer, new memory! # in place operation # Watch out for array # temporaries with large arrays! >>> a = add(a,b) >>> add(a,b,a) 48 Array Functions >>> a = arange(1,11); print a [1 2 3 4 5 6 7 8 9 10] >>> a = reshape(range(9),(3,3)) >>> b = transpose(a); print b [[0 3 6] # create an index array >>> ind = [0, 5, 8] # take values from the array >>> print take(a,ind) >>> print a[ind] [1 6 9] [1 4 7] [2 5 8]] >>> print diagonal(b) [0 4 8] >>> print trace(b) 12 >>> print where(b >= 3, 9, 0) # put values to the array >>> put(a,ind,[0,0,0]); print a >>> a[ind] = (0,0,0); print a [0 2 3 4 5 0 7 8 0 10] [[0 9 9] [0 9 9] [0 9 9]] 49 Labs! Lab: NumPy Basics 50 Linear Algebra >>> import numpy.linalg as la >>> dir(la) ['Heigenvalues', 'Heigenvectors', 'LinAlgError', 'ScipyTest', '__builtins__', '__doc__', '__file__', '__name__', '__path__', 'cholesky', 'cholesky_decomposition', 'det', 'determinant', 'eig', 'eigenvalues', 'eigenvectors', 'eigh', 'eigvals', 'eigvalsh', 'generalized_inverse', 'info', 'inv', 'inverse', 'lapack_lite', 'linalg', 'linear_least_squares', 'lstsq', 'pinv', 'singular_value_decomposition', 'solve', 'solve_linear_equations', 'svd', 'test'] 51 Linear Algebra: Eigenvalues # assume a exists already # a multiple-valued function >>> val,vec = la.eigenvectors(a) >>> print a [[ 1. 0. 0. 0. [ 0. 2. 0. 0.01] # eigenvalues >>> print val [ 0. 0. 5. 0. [2.50019992 [ 0. 0.01 0. 2.5 ]] ] ] 1.99980008 1. 5. ] # eigenvectors >>> la.determinant(a) >>> print vec 24.999500000000001 [[0. 0.01998801 0. 0.99980022] [0. 0.99980022 0. -0.01998801] [1. 0. 0. 0. ] [0. 0. 1. 0. ]] 52 Linear Algebra: solve linear equations # assume a, q exists already # a variable can ref. a function >>> solv = la.solve_linear_equations >>> print a [[ 1. 0. 0. 0. [ 0. 2. 0. 0.01] [ 0. 0. 5. 0. [ 0. 0.01 0. 2.5 ]] 4.04 15. ] # solve linear system, a*b = q >>> b = solv(a,q) >>> print b [1. 2. 3. 4.] >>> q_new = matrixmultiply(a,b) >>> print q_new >>> print q [1. ] 10.02] [1. 4.04 15. 10.02] >>> print q_new == q [True True True True] 53 Jacobi Iteration T = zeros((50,100), float) # set top boundary condition T[0,:] = 1 # iterate 10 times for t in range(10): T[1:-1,1:-1] = ( T[0:-2,1:-1] + T[2:,1:-1] + T[1:-1,0:-2] + T[1:-1,2:] ) / 4 # dump binary output to file (Numarray only) T.tofile('jacobi.out') 54 Labs! Lab: Linear Algebra 55 SECTION 5 Visualization and Imaging with Python 56 Section Overview • In this section we will cover two related topics: image processing and basic visualization. • Image processing tasks include loading, creating, and manipulating images. • Basic visualization will cover everyday plotting activities, both 2D and 3D. 57 Plotting tools • Many plotting packages available – Python Computer Graphics Kit (RenderMan) – Tkinter – Tk – Turtle graphics – Stand-alone GNUplot interface available – Python bindings to VTK, OpenGL, etc… • In this tutorial, we focus on the Matplotlib package • Unlike some of the other packages available, Matplotlib is available for nearly every platform. – Comes with http://www.scipy.org/ (Enthought) • http://matplotlib.sourceforge.net/ 58 Getting started • A simple example # easiest to run ipython with –pylab option $$ ipython –pylab In [1]: plot([1,2,3]) In [2]: ylabel('some numbers') In [3]: show() # not needed with interactive # output 59 Getting Started 60 Matplotlib with numpy • The matplotlib package is compatible with numpy arrays. # create data using numpy t = arange(0.0, 2.0, 0.01) s = sin(2*pi*t) # create the plot plot(t, s, linewidth=1.0) # decorate the plot xlabel('time (s)') ylabel('voltage (mV)') title('About as simple as it gets, folks') grid(True) show() 61 Simple Plot 62 Improving the axis settings # get axis settings >>> axis() (0.0, 2.0, -1.0, 1.0) # changes should show up immediately >>> axis([0.0, 2.0, -1.5, 1.5]) # a plot can be saved from the menu bar 63 Better axes 64 Colorful background subplot(111, axisbg=‘darkslategray’) t = arange(0.0, 2.0, 0.01) # first plot plot(t, sin(2*pi*t), ‘y’) # second plot t = arange(0.0, 2.0, 0.05) plot(t, sin(pi*t), ‘ro’) 65 Colorful background 66 Fill demo # data t = arange(0.0, 1.01, 0.01) s = sin(2*2*np.pi*t) # graph fill(t, s*np.exp(-5*t), 'r') grid(True) 67 Fill demo 68 Subplot demo def f(t): s1 = cos(2*pi*t); e1 = exp(-t) return multiply(s1,e1) t1 = arange(0.0, 5.0, 0.1) t2 = arange(0.0, 5.0, 0.02) t3 = arange(0.0, 2.0, 0.01) subplot(211) plot(t1, f(t1), 'bo', t2, f(t2), 'k--', markerfacecolor='green') grid(True) title('A tale of 2 subplots') ylabel('Damped oscillation') subplot(212) plot(t3, cos(2*pi*t3), 'r.') grid(True) xlabel('time (s)') ylabel('Undamped’) 69 Subplot demo 70 A basic 3D plot example • Matplotlib can do polar plots, contours, …, and can even plot mathematical symbols using LaTeX • 3D graphics? – not so great • Matplotlib has simple 3D graphics but is limited relative to packages based on OpenGL like VTK. • Note: mplot3d module may not be loaded on your system. 71 3D example from mpl_toolkits.mplot3d import Axes3D from matplotlib import cm import random fig = figure() ax = Axes3D(fig) X = arange(-5, 5, 0.25) Y = arange(-5, 5, 0.25) X, Y = meshgrid(X, Y) R = sqrt(X**2 + Y**2) Z = sin(R) ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.jet) 72 3D example 73 More visualization tools • Matplotlib is pretty good for simple plots. There are other tools out there that are quite nice: – – – – MayaVI : http://mayavi.sourceforge.net/ VTK : http://www.vtk.org/ SciPy/plt : http://www.scipy.org/ Python Computer Graphics Kit based on Pixar’s RenderMan: http://cgkit.sourceforge.net/ 74 Image Processing • A commonly used package for image processing in Python is the Python Imaging Library (PIL). • http://www.pythonware.com/products/pil/ 75 Getting started • How to load the package – import Image, ImageOps, … • Image module contains main class to load and represent images. • PIL comes with many additional modules for specialized operations 76 Additional PIL Modules • ImageDraw : Basic 2D graphics for Image objects • ImageEnhance : Image enhancement operations • ImageFile : File operations, including parser • ImageFilter : A set of pre-defined filter operations • ImageOps : A set of pre-defined common operations • ImagePath : Express vector graphics, usable with ImageDraw • ImageSequence : Implements iterator for image sequences or frames. • ImageStat : Various statistical operations for Images 77 Loading an image • Loading an image is simple, no need to explicitly specify format. import Image im = Image.open(“image.jpg") 78 Supported Image Formats • Most image formats people wish to use are available. – – – – – – – – JPEG GIF BMP TGA, TIFF PNG XBM,XPM PDF, EPS And many other formats that aren’t as commonly used – CUR,DCX,FLI,FLC,FPX,GBR,GD,ICO,IM,IMT,MIC,MCIDAS,PCD, PCX,PPM,PSD,SGI,SUN • Not all are fully read/write capable - check the latest docs for status. 79 Image representation • Images are represented with the PIL Image class. • Often we will want to write algorithms that treat the image as a NumPy array of grayscale or RGB values. • It is simple to convert images to and from Image objects and numpy arrays. 80 Converting the image to a NumPy array def PIL2NUMARRAY(im): if im.mode not in ("L", "F"): raise ValueError, "image must be single-layer." ar = array(im.getdata()) ar.shape = im.size[0], im.size[1] return ar Note: This works for mode “L”, or monochrome, images. RGB would require more work - similar concept though. 81 Converting a NumPy array back to an Image def NUMARRAY2PIL(ar,size): im = Image.new("L",size) im.putdata(reshape(ar,(size[0]*size[1],))) return im Notice that we need to flatten the 2D array into a 1D array for the PIL structure. Size need not be explicitly passed in - one can query ar for the shape and size. 82 Saving an image • Much like reading, writing images is also very simple. • Many formats available. – Either explicitly specify output format, or let PIL infer it from the filename extension. outfname=“somefile.jpg” imgout = NUMARRAY2PIL(workarray,size) imgout.save(outfname,"JPEG") 83 Labs! Lab: Graphics 84 SECTION 6 Parallel programming with Python: MPI4Py and Co-Array Python 85 IPython Parallelism • IPython supports many styles of parallelism – Single program, multiple data (SPMD) parallelism – Multiple program, multiple data (MPMD) parallelism – Message passing using ‘’MPI’’ • Getting Started with Parallel Ipython – – – – Starting ipcluster Using FURLS Using a Multi-Engine Client (MEC) %px • First we look at using MPI with mpi4py 86 Parallel Computing with mpi4py mpi4py is primarily run from a script # file par_hello.py from mpi4py import MPI # communication in MPI is through a communicator comm = MPI.COMM_WORLD rank = comm.Get_rank() size = comm.Get_size() print "Hello, rank", rank, "of", size 87 Running an MPI Script mpiexec runs python on multiple processors concurrently $$ python par_hello.py Hello, rank 0 of 1 $$ mpiexec –n Hello, rank 2 Hello, rank 3 Hello, rank 1 Hello, rank 0 4 python par_hello.py of 4 of 4 of 4 of 4 # notice that execution by rank is not ordered 88 Passing Information in a Ring # file ring.py from mpi4py import MPI import numpy as np # Create message buffers message_in = np.zeros(3, dtype=np.int) message_out = np.zeros(3, dtype=np.int) comm = MPI.COMM_WORLD rank = comm.Get_rank() size = comm.Get_size() #Calc the rank of the previous and next process in the ring next = (rank + 1) % size; prev = (rank + size - 1) % size; 89 More ring.py # Let message be (prev,rank,next) message_out[:] = (prev,rank,next) # Must break symmetry by one sending and others receiving if rank == 0: comm.Send([message_out, MPI.INT], dest=next, tag=11) else: comm.Recv([message_in, MPI.INT], source=prev, tag=11) # Reverse order if rank == 0: comm.Recv([message_in, MPI.INT], source=prev, tag=11) else: comm.Send([message_out, MPI.INT], dest=next, tag=11) print rank, ':', message_in 90 Running ring.py $$ python ring.py 0 : [0 0 0] $$ mpiexec –n 4 python ring.py 1 2 3 0 : : : : [3 [0 [1 [2 0 1 2 3 1] 2] 3] 0] 91 Interactive Parallel Computing First start server processes on remote (or local) cluster: $$ ipcluster –n 2 & Starting controller: Controller PID: 5351 Starting engines: Engines PIDs: [5353, 5354] Log files: /home/rasmussn/.ipython/log/ipcluster-5351-* Your cluster is up and running. For interactive use, you can make a MultiEngineClient with: from IPython.kernel import client mec = client.MultiEngineClient() You can then cleanly stop the cluster from IPython using: mec.kill(controller=True) You can also hit Ctrl-C to stop it, or use from the cmd line: kill -INT 5350 92 Local IPython Client On local client: In [1]: from IPython.kernel import client In [2]: mec = client.MultiEngineClient() In [3]: mec.get_ids() Out[3]: [0,1,2,3] In [4]: %px? Executes the given python command on the active IPython Controller. To activate a Controller in IPython, first create it and then call the activate() method. In [5]: mec.activate() 93 More Parallel IPython In [6]: %px a=3 Parallel execution on engines :all Out[6]: <Results List> [0] In [1]: a=3 [1] In [1]: a=3 In [7]: %px print a Parallel execution on engines: all Out[7]: <Results List> [0] In [2]: print a [0] Out[2]: 3 [1] In [2]: print a [1] Out[2]: 3 94 Result method >>> %result? Print the result of command i on all engines of the actv controller >>> result 1 <Results List> [0] In [1]: a=3 [1] In [1]: a=3 95 What Can I Do in Parallel? • What can you imagine doing with multiple Python engines? – Execute code? – mec.execute – mec.map – mec.run # execute a function on a set of nodes # map a function and distribute data to nodes # run code from a file on engines – Exchange data? – mec.scatter – mec.gather – mec.push # distribute a sequence to nodes # gather a sequence from nodes # push python objects to nodes • Targets parameter in many of the mec methods selects the particular set of engines 96 Labs! Lab: Parallel IPython Try out parallel ipython as time permits 97 Why Co-Array Python • Scientists like Python – Powerful scripting language – Numerous extension modules – NumPy, PySparse, … – Gives an environment like MatLab • But, scientists often need parallel computers • MPI4Py (and others) was developed • But let’s try something besides explicit message passing • Co-Array Python borrows from Co-Array Fortran 98 Co-Array Programming Model • SPMD model • All processors run Python interpretor via PyMPI • Local view of array data – local, not global indexing • Adds another array dimension for remote memory access – the co-dimension • Uses ARMCI for communication – portable Cray shmem library 99 Co-Array Python Syntax # # put to remote processor number 1 # T(1)[3,3] = T[3,3] # # get from remote processor number 8 # T[4,5] = T(8)[4,5] 100 Co-Array Python Example • Jacobi problem on 2 dimensional grid • Derichlet boundary conditions • Average of four nearest neighbors 101 Computational Domain up me ghost boundary cells me me ghost boundary cells dn 102 Initialization from CoArray import * nProcs = mpi.size me = mpi.rank M = 200; N = M/nProcs T = coarray((N+2, M+2), Numeric.Float) up = me - 1 dn = me + 1 if me up if me dn == 0: = None == nProcs - 1: = None 103 Jacobi Update (inner loop): I # # update interior values (no communication) # T[1:-1,1:-1] = ( T[0:-2,1:-1] + T[2:,1:-1] + T[1:-1,0:-2] + T[1:-1,2:] ) / 4.0 104 Jacobi Update (inner loop): II up boundary row me dn boundary row # # exchange boundary conditions # mpi.barrier() if up != None: T(up)[-1:,:] = T[ 1,:] if dn != None: T(dn)[ 0:,:] = T[-2,:] mpi.barrier() 105 Timing Data Size CoPcomm CoPtotal PyMPIcomm 128x128 256x256 512x512 1024x1024 2048x2048 0.017 0.023 0.041 0.068 0.089 0.33 1.28 6.28 28.4 113.5 0.07 0.13 0.28 0.52 PyMPItotal 0.38 1.41 6.47 28.78 Ccomm Ctotal 0.013 0.015 0.020 0.032 0.047 0.05 0.14 0.55 2.49 10.13 Table 1. Timing data for Co-Array Python (CoP), MPI (PyMPI) and C MPI (C) versions • Most of time spent in computation (Python 1/10 C performance) • Co-Array Python communication rivals C (Python 1/2 C performance) – Co-Array Python communication much faster than PyMPI – better data marshalling – ARMCI 106 Conclusions • Co-Arrays allows direct addressing of remote memory – e.g. T(remote)[local] • Explicit parallelism • Parallel programming made easy • Fun • Explore new programming models (Co-Arrays) • Looking at Chapel – implicit parallelism – global view of memory (for indexing) 107 Status • Not entirely finished – reason a research note, not a full paper – but available to “play” with – [email protected] • Hope to finish soon and put on Scientific Python web site – http://www.scipy.org/ 108 SECTION 7 Language Interoperability 109 Language Interoperability • Python features many tools to make binding Python to languages like C/C++ and Fortran 77/95 easy. • We will cover: – F2py: Fortran to Python wrapper generator – SWIG: The Simple Wrapper Interface Generator • For Fortran, we also consider: – Fortran interoperability standard – Fortran Transformational Tools (FTT) project 110 Fortran Example: fadd.f90 • Consider the following simple Fortran subroutine to add two arrays subroutine fadd(A, B, C, N) real, dimension(N) :: A, B, C integer :: N ! do j = 1, N ! C(j) = A(j) + B(j) ! end do C = A + B end subroutine fadd 111 Annotate for F2py • F2py works better if you let it know what the variables are doing (intents) ! file fadd.f90 ! subroutine fadd(A, B, C, N) real, dimension(N) :: A, B, C integer :: N !F2PY intent(out) :: C !F2PY intent(hide) :: N !F2PY real, dimension(N) :: A, B, C C = A + B end subroutine fadd 112 Running F2py • Once you have annotated the source file, run f2py to generate the Python bindings $$ f2py -c -m fadd fadd.f90 $$ ls fadd.f90 fadd.so 113 Try out the new module • Run the new fadd module from ipython In [1]: from fadd import * In [2]: fadd? Docstring: fadd - Function signature: c = fadd(a,b) Required arguments: a : input rank-1 array('f') with bounds (n) b : input rank-1 array('f') with bounds (n) Return objects: c : rank-1 array('f') with bounds (n) In [3]: fadd([1,2,3,4,5], [5,4,3,2,1]) Out[5]: array([ 6., 6., 6., 6., 6.], dtype=float32) 114 Fortran Interoperability Standard • Fortran 2003 provides a standard mechanism for interoperability with C – This could be used to reduce the need for annotations – But improved tools support needed interface subroutine fadd(A, B, C, N) BIND(C, name=“fadd”) use, intrinsic :: ISO_C_BINDING real(C_FLOAT), intent(in), dimension(N) :: A, B real(C_FLOAT), intent(out), dimension(N) :: C integer(C_INT), value :: N end subroutine fadd end interface 115 SWIG: example.c /* File : example.c */ double My_variable = 3.0; /* Compute factorial of n */ Int fact(int n) { if (n <= 1) return 1; else return n*fact(n-1); } /* Compute n mod m */ int my_mod(int n, int m) { return(n % m); } 116 SWIG: example.i /* File : example.i */ %module example %{ /* Put headers and other declarations here */ %} extern double My_variable; extern int fact(int); extern int my_mod(int n, int m); 117 Data Dictionary • Share Fortran arrays with Python by “name” • Fortran subroutine get_arrays(dict) integer :: dict integer, save :: A(3,4) integer :: rank = 2, type = INTEGER_TYPE integer :: shape = (/3,4/) call put_array(dict, “A”, A, rank, shape, type) end subroutine • Python A = dict[‘A’] 118 Running SWIG • Once you have created the .i file, run swig to generate the Python bindings unix > swig -python example.I unix > ls example.c example.i example.py example_wrap.c 119 SWIG: build module • Build the example module – create setup.py – execute setup.py unix > cat setup.py from distutils.core import setup, Extension setup(name=”_example", version="1.0", ext_modules=[ Extension(”_example", [”_example.c", "example_wrap.c"], ), ]) unix > python setup.py config unix > python setup.py build 120 SWIG: build module • Run the code – where is _example.so (set path) >>> from _example import * >>> # try factorial function >>> fact(5) 120 >>> # try mod function >>> my_mod(3,4) 3 >>> 3 % 4 3 121 NumPy and Fortran Arrays • Chasm provides a bridge between Fortran and Python arrays • The only way to use Fortran assumed-shape arguments with Python • Call the following routine from Python subroutine F90_multiply(a, b, c) integer, pointer :: a(:,:), b(:,:), c(:,:) c = MatMul(a,b) ! Fortran intrinsic end subroutine F90_multiply 122 Labs! Lab: Language Interoperability Try out f2py and swig as time allows 123 Extra Credit: SciPy and SAGE SciPy and SAGE 124 SciPy • Open-source software for mathematics, science, and engineering • Information – http://docs.scipy.org/ • Download – http://scipy.org/Download 125 scipy >>> import scipy; help(scipy) odr sparse.linalg.eigen.arpack fftpack sparse.linalg.eigen.lobpcg lib.blas sparse.linalg.eigen stats lib.lapack maxentropy integrate linalg interpolate optimize cluster signal sparse --------------------------------- Orthogonal Distance Regression Eigenvalue solver using iterative Discrete Fourier Transform Locally Optimal Block Preconditioned Wrappers to BLAS library Sparse Eigenvalue Solvers Statistical Functions Wrappers to LAPACK library Routines for fitting maximum entropy Integration routines Linear algebra routines Interpolation Tools Optimization Tools Vector Quantization / Kmeans Signal Processing Tools Sparse Matrices 126 FFT Example >>> from scipy import * # create input values >>> v = zeros(1000) >>> v[:100] = 1 # take FFT >>> y = fft(v) 127 # plot results (rearranged so zero frequency is at center) >>> x = arange(-500,500,1) >>> plot(x, abs(concatenate((y[500:],y[:500])))) 127 FFT Results Zoom 128 FFT Results Expanded 129 Optimization Example >>> from scipy import optimize as op # create function >>> def square(x): return x*x >>> op.fmin(square, -5) Optimization terminated successfully. Current function value: 0.000000 Iterations: 20 Function evaluations: 40 array([ 0.]) >>> op.anneal(square, -5) Warning: Cooled to 4.977261 at 2.23097753984 but this is not the smallest point found. (-0.068887616435477916, 5) 130 SAGE Functionality http://showmedo.com/videotutorials/ search for sage 131 Labs! Lab: SciPy Try out scipy as time allows 132 Extra Credit Traits 133 What are traits? • Traits add typing-like facilities to Python. – Python by default has no explicit typing. • Traits are bound to fields of classes. • Traits allow classes to dictate the types for their fields. • Furthermore, they can specify ranges! • Traits also can be inherited. Thanks to scipy.org for the original Traits slides. 134 An example class Person(HasTraits) name = Str # String value, default is '' age = Trait(35, TraitRange(1,120)) weight = Trait(160.0,TraitRange(75.0,500.0)) # Creat someone, default age is 35, 160.0 lbs weight >>> someone = Person() >>> someone.name = ‘Bill’ >>> print '%s: %s' % (someone.name, someone.age) Bill: 35 >>> person.age = 75 # OK >>> person.weight = ‘fat’ # Error, not a number. 135 Another example: Enumerated traits class InventoryItem(HasTraits) name = Str # String value, default is '' stock = Trait(None, 0, 1, 2, 3, 'many') # Enumerated list, default value >>> hats = InventoryItem() >>> hats.name = 'Stetson' >>> print '%s: %s' % (hats.name, Stetson: None >>> hats.stock = 2 # OK >>> hats.stock = 'many' # OK >>> hats.stock = 4 # Error, >>> hats.stock = None # Error, is 'None' hats.stock) value is not in permitted list value is not in permitted list 136 Why traits? Validation • It’s nice to let the author of a class be able to enforce checking not only of types, but values class Amplifier(HasTraits) volume = Range(0.0, 11.0, default=5.0) # This one goes to eleven... >>> spinal_tap = Amplifier() >>> spinal_tap.volume 5.0 >>> spinal_tap.volume = 11.0 #OK >>> spinal_tap.volume = 12.0 # Error, value is out of range 137 Notification (Events) • You can also use notification to trigger actions when traits change. class Amplifier(HasTraits) volume = Range(0.0, 11.0, default=5.0) def _volume_changed(self, old, new): if new == 11.0: print “This one goes to eleven” # This one goes to eleven... >>> spinal_tap = Amplifier() >>> spinal_tap.volume = 11.0 This one goes to eleven 138 Notification (Events) • You can even set up notification for classes with traits later, from the caller or class instantiator. class Amplifier(HasTraits) volume = Range(0.0, 11.0, default=5.0) # This one goes to eleven... >>> def volume_changed(self, old, new): ... if new == 11.0: ... print “This one goes to eleven” >>> spinal_tap = Amplifier() >>> spinal_tap.on_trait_change(volume_changed, ‘volume’) >>> spinal_tap.volume = 11.0 This one goes to eleven 139 Delegation model • Traits can be delegated class Company(HasTraits) address = Str class Employee(HasTraits) __traits__ = { ‘name’: ‘’, ‘employer’: Company, ‘address’: TraitDelegate(‘employer’) } • By default, employee has same address as their employer. • However, you can assign a new address to the employee if a different address is necessary. 140 More about Traits • Traits originally came from the GUI world – A trait may be the ranges for a slider widget for example. • Clever use of traits can enforce correct units in computations. – You can check traits when two classes interact to ensure that their units match! – NASA lost a satellite due to this sort of issue, so it’s definitely important! NASA Mars Climate Orbiter: units victim 141 Dune A Python-CCA, Rapid Prototyping Framework Craig E Rasmussen, Matthew J. Sottile Christopher D. Rickett, Sung-Eun Choi, 142 Scientific Software Life Cycle: A need for two software environments (Research and Production) Maintenance and Refinement Exploration Concept Porting Production Research Reuse The challenge is to mix a rapid-prototyping environment with a production environment 143 Rapid Prototyping Framework: An AdvectionDiffusion-Reaction component-application example Dune Python-CCA Framework for Component Assembly And Language Interoperability Advection Driver (main) Time Integrator Multiphysics Diffusion Reaction 144 A “Python” Research Component Python, Fortran, or C/C++ Python • A Research Component can be: – A pure Python component for rapid prototyping – Or a Fortran or C/C++ module, wrapped for reuse of production “components” 145 A Production Component Fortran or C++ Python • Remove the Python “cap” and the Fortran or C++ component can be linked and run in a traditional scientific application. 146 Minimal Code to be a Python-CCA Component • Requirement to be a Python CCA component is minimal (five lines of Python code) # ---------------------------------------------------------# Register ports with a framework services object. # def setServices(self, services): self.services = services ''' Provide an integrator port ''' services.addProvidesPort(self, "integrator", "adr.integrator") ''' Register uses ports ''' services.registerUsesPort("multiphysics", "adr.multiphysics") 147 Conclusions • Stable, well-designed interfaces are key to supporting the two modes of scientific computing, Research and Production and to the sharing of components between the two environments. Fortran or C++ Python 148 Python for High Productivity Computing July 2009 Tutorial 149 Overview of packages • Python : http://www.python.org/ • SciPy : http://www.scipy.org/ • NumPy : http://numpy.scipy.org/ • FFTW : http://www.fftw.org/ • MPI4py : http://mpi4py.scipy.org/ • PySparse : http://pysparse.sourceforge.net/ • SAGE : http://www.sagemath.org/ • Traits : http://code.enthought.com/projects/traits 150 Thanks To • Eric Jones, … – Enthought • Also many others for ideas – python.org – scipy.org – Jose Unpingco – https://www.osc.edu/cms/sip/ – http://showmedo.com/videotutorials/ipython 151 Portable Performance Evaluation of Python Programs using TAU 152 Performance Evaluation of Python • Introduction to TAU – Python Instrumentation – Measurement – Analysis • Lab Session: Python and TAU 153 What is TAU? • TAU is a performance evaluation tool • It supports parallel profiling and tracing • Profiling shows you how much (total) time was spent in each routine • Tracing shows you when the events take place in each process along a timeline • TAU uses a package called PDT for automatic instrumentation of the source code • Profiling and tracing can measure time as well as hardware performance counters from your CPU • TAU can automatically instrument your source code (routines, loops, I/O, memory, phases, etc.) • TAU runs on all HPC platforms and it is free (BSD style license) • TAU has instrumentation, measurement and analysis tools – paraprof is TAU’s 3D profile browser • To use TAU, you need to set a couple of environment variables and substitute the name of your compiler with a TAU shell script 154 Tutorial Goals • This tutorial is intended to the TAU performance system as a portable performance evaluation tool for Python and Python programmers • Today you should leave here with a better understanding of… – How to instrument your Python program with TAU – Automatic instrumentation at the routine level – Manual instrumentation at the loop/statement level – Environment variables used for generating performance data – How to use the TAU profile browser, ParaProf – General familiarity with TAU’s use for Fortran, C++,C, MPI for mixed language programming with Python 155 TAU • Good References: – “The TAU Parallel Performance System,” Sameer Shende and Allen D. Malony, Intl. Journal of High Performance Computing Applications, ACTS Special Issue, Spring 2006. – TAU Users Guide. – Both available from http://tau.uoregon.edu 156 Performance Evaluation • Profiling – Presents summary statistics of performance metrics – – – – – • number of times a routine was invoked exclusive, inclusive time/hpm counts spent executing it number of instrumented child routines invoked, etc. structure of invocations (calltrees/callgraphs) memory, message communication sizes also tracked Tracing – Presents when and where events took place along a global timeline – timestamped log of events – message communication events (sends/receives) are tracked – shows when and where messages were sent – large volume of performance data generated leads to more perturbation in the program • Most performance tools support either profiling or tracing - TAU supports both! 157 TAU Parallel Performance System Goals • Multi-level performance instrumentation – Multi-language automatic source instrumentation • Flexible and configurable performance measurement • Widely-ported parallel performance profiling system – Computer system architectures and operating systems – Different programming languages and compilers • Support for multiple parallel programming paradigms – Multi-threading, message passing, mixed-mode, hybrid • Integration in complex software, systems, applications 158 TAU Performance System Architecture event selection 159 TAU Performance System Architecture 160 Program Database Toolkit (PDT) Application / Library C / C++ parser IL C / C++ IL analyzer Program Database Files Fortran parser F77/90/95 IL Fortran IL analyzer DUCTAPE PDBhtml Program documentation SILOON Application component glue CHASM C++ / F90/95 interoperability TAU_instr Automatic source instrumentation 161 Automatic Source-Level Instrumentation in TAU using Program Database Toolkit (PDT) TAU source analyzer Application source Parsed program tau_instrumentor Instrumented source Instrumentation specification file 162 Steps of Performance Evaluation • Collect basic routine-level timing profile to determine where most time is being spent • Collect routine-level hardware counter data to determine types of performance problems • Collect callpath profiles to determine sequence of events causing performance problems • Conduct finer-grained profiling and/or tracing to pinpoint performance bottlenecks – Loop-level profiling with hardware counters – Tracing of communication operations 163 Using TAU: A brief Introduction • TAU supports several measurement options (profiling, tracing, profiling with hardware counters, etc.) • Each measurement configuration of TAU corresponds to a unique stub makefile that is generated when you configure it • To instrument source code using PDT – Choose an appropriate TAU stub makefile in <taudir>/<arch>/lib dir (or $TAU): % setenv TAU_MAKEFILE $TAU/Makefile.tau-mpi-pdt % setenv TAU_OPTIONS ‘-optVerbose …’ (see tau_compiler.sh -help) And use tau_f90.sh, tau_cxx.sh or tau_cc.sh as Fortran, C++ or C compilers: % mpif90 foo.f90 changes to % tau_f90.sh foo.f90 • Execute application and analyze performance data: % pprof (for text based profile display) % paraprof (for GUI) 164 TAU Measurement Configuration % cd $TAU; ls Makefile.* Makefile.tau-pdt Makefile.tau-mpi-pdt Makefile.tau-opari-openmp-mpi-pdt Makefile.tau-mpi-scalasca-epilog-pdt Makefile.tau-mpi-vampirtrace-pdt Makefile.tau-mpi-papi-pdt Makefile.tau-papi-mpi-openmp-opari-pdt Makefile.tau-pthread-pdt… • For an MPI+F90 application, you may want to start with: Makefile.tau-mpi-pdt – Supports MPI instrumentation & PDT for automatic source instrumentation – % setenv TAU_MAKEFILE $TAU/Makefile.tau-mpi-pdt – % tau_f90.sh matrix.f90 -o matrix 165 Usage Scenarios: Routine Level Profile • Goal: What routines account for the most time? How much? • Flat profile with wallclock time: 166 Solution: Generating a flat profile with MPI % setenv TAU_MAKEFILE $TAU/Makefile.tau-mpi-pdt % set path=(/usr/local/packages/tau/i386_linux/bin $path) OR % source $PET_HOME/src/tau.cshrc [ or tau.bashrc] on DSRC systems % make F90=tau_f90.sh (Or edit Makefile and change F90=tau_f90.sh) % mpirun -np 4 ./a.out % paraprof -–pack app.ppk Move the app.ppk file to your desktop. % paraprof app.ppk 167 Usage Scenarios: Loop Level Instrumentation • Goal: What loops account for the most time? How much? • Flat profile with wallclock time with loop instrumentation: 168 Solution: Generating a loop level profile % setenv TAU_MAKEFILE $TAU/Makefile.tau-mpi-pdt % setenv TAU_OPTIONS ‘-optTauSelectFile=select.tau –optVerbose’ % cat select.tau BEGIN_INSTRUMENT_SECTION loops routine=“#” END_INSTRUMENT_SECTION % set path=(/usr/local/packages/tau/i386_linux/bin $path) % make F90=tau_f90.sh (Or edit Makefile and change F90=tau_f90.sh) % mpirun -np 4 ./a.out % paraprof -–pack app.ppk Move the app.ppk file to your desktop. % paraprof app.ppk 169 Usage Scenarios: Compiler-based Instrumentation • Goal: Easily generate routine level performance data using the compiler instead of PDT for parsing the source code 170 Use Compiler-Based Instrumentation % setenv TAU_MAKEFILE $TAU/Makefile.tau-mpi % setenv TAU_OPTIONS ‘-optCompInst –optVerbose’ % % set path=(/usr/local/packages/tau/i386_linux/bin $path) % make F90=tau_f90.sh (Or edit Makefile and change F90=tau_f90.sh) % qsub run.job % paraprof -–pack app.ppk Move the app.ppk file to your desktop. % paraprof app.ppk 171 Usage Scenarios: Calculate mflops in Loops • Goal: What MFlops am I getting in all loops? • Flat profile with PAPI_FP_INS/OPS and time (-multiplecounters) with loop instrumentation: 172 Generate a PAPI profile with 2 or more counters % setenv TAU_MAKEFILE $TAU/Makefile.tau-papi-mpi-pdt % setenv TAU_OPTIONS ‘-optTauSelectFile=select.tau –optVerbose’ % cat select.tau BEGIN_INSTRUMENT_SECTION loops routine=“#” END_INSTRUMENT_SECTION % set path=(/usr/local/packages/tau/i386_linux/bin $path) % make F90=tau_f90.sh (Or edit Makefile and change F90=tau_f90.sh) % setenv TAU_METRICS TIME:PAPI_FP_INS:PAPI_L1_DCM % qsub run.job % paraprof -–pack app.ppk Move the app.ppk file to your desktop. % paraprof app.ppk Choose Options -> Show Derived Panel -> Arg 1 = PAPI_FP_INS, Arg 2 = GET_TIME_OF_DAY, Operation = Divide -> Apply, choose. 173 Derived Metrics in ParaProf 174 Usage Scenarios: Generating Callpath Profile • Goal: Who calls my MPI_Barrier()? Where? • Callpath profile for a given callpath depth: 175 Callpath Profile • Generates program callgraph 176 Generate a Callpath Profile % setenv TAU_MAKEFILE $TAU/Makefile.tau-callpath-mpi-pdt % set path=(/usr/local/packages/tau/i386_linux/bin $path) % make F90=tau_f90.sh (Or edit Makefile and change F90=tau_f90.sh) % setenv TAU_CALLPATH_DEPTH 100 % mpirun -np 4 ./a.out % paraprof -–pack app.ppk Move the app.ppk file to your desktop. % paraprof app.ppk (Windows -> Thread -> Call Graph) NOTE: In TAU v2.18.1+, you may choose to just set: % setenv TAU_CALLPATH 1 instead of recompiling your code with the above stub makefile. Any TAU instrumented executable can generate callpath profiles. 177 Usage Scenario: Detect Memory Leaks 178 Detect Memory Leaks % setenv TAU_MAKEFILE $TAU/Makefile.tau-mpi-pdt % setenv TAU_OPTIONS ‘-optDetectMemoryLeaks -optVerbose’ % set path=(/usr/local/packages/tau/i386_linux/bin $path) % make F90=tau_f90.sh (Or edit Makefile and change F90=tau_f90.sh) % setenv TAU_CALLPATH_DEPTH 100 % mpirun -np 4 ./a.out % paraprof -–pack app.ppk Move the app.ppk file to your desktop. % paraprof app.ppk (Windows -> Thread -> Context Event Window -> Select thread -> select... expand tree) (Windows -> Thread -> User Event Bar Chart -> right click LEAK -> Show User Event Bar Chart) 179 Usage Scenarios: Instrument a Python program • Goal: Generate a flat profile for a Python program 180 Usage Scenarios: Instrument a Python program Original code: Create a wrapper: 181 Generate a Python Profile % setenv TAU_MAKEFILE $TAU/Makefile.tau-python-pdt % set path=(/usr/local/packages/tau/i386_linux/bin $path) % cat wrapper.py import tau def OurMain(): import foo tau.run(‘OurMain()’) Uninstrumented: % ./foo.py Instrumented: % setenv PYTHONPATH $TAU/bindings-python-pdt (same options string as TAU_MAKEFILE) % setenv LD_LIBRARY_PATH $TAU/bindings-python-pdt\:$LD_LIBRARY_PATH % ./wrapper.py Wrapper invokes foo and generates performance data % pprof/paraprof 182 Usage Scenarios: Mixed Python+F90+C+pyMPI • Goal: Generate multi-level instrumentation for Python+MPI+C+F90+C++ ... 183 Generate a Multi-Language Profile w/ Python % setenv TAU_MAKEFILE $TAU/Makefile.tau-python-mpi-pdt % set path=(/usr/local/packages/tau/i386_linux/bin $path) % setenv TAU_OPTIONS ‘-optShared -optVerbose…’ (Python needs shared object based TAU library) % make F90=tau_f90.sh CXX=tau_cxx.sh CC=tau_cc.sh (build libs, pyMPI w/TAU) % cat wrapper.py import tau def OurMain(): import App tau.run(‘OurMain()’) Uninstrumented: % mpirun.lsf $PET_HOME/.unsupported/pyMPI-2.5b0/bin/pyMPI ./App.py Instrumented: % setenv PYTHONPATH $TAU/bindings-python-mpi-pdt (same options string as TAU_MAKEFILE) % setenv LD_LIBRARY_PATH $TAU/bindings-python-mpi-pdt\:$LD_LIBRARY_PATH % mpirun –np 4 /usr/local/packages/pyMPI-TAU/bin/pyMPI ./wrapper.py (Instrumented pyMPI with wrapper.py) 184 Usage Scenarios: Generating a Trace File • Goal: Identify the temporal aspect of performance. What happens in my code at a given time? When? • Event trace visualized in Vampir/Jumpshot 185 VNG Process Timeline with PAPI Counters 186 Vampir Counter Timeline Showing I/O BW 187 Generate a Trace File % setenv TAU_MAKEFILE $TAU /lib/Makefile.tau-mpi-pdt-trace or setenv TAU_TRACE 1 (in TAU v2.18.2+) % set path=(/usr/local/packages/tau/i386_linux/bin $path) % make F90=tau_f90.sh (Or edit Makefile and change F90=tau_f90.sh) % qsub run.job % tau_treemerge.pl (merges binary traces to create tau.trc and tau.edf files) JUMPSHOT: % tau2slog2 tau.trc tau.edf –o app.slog2 % jumpshot app.slog2 OR VAMPIR: % tau2otf tau.trc tau.edf app.otf –n 4 –z (4 streams, compressed output trace) % vampir app.otf (or vng client with vngd server). 188 Usage Scenarios: Evaluate Scalability • Goal: How does my application scale? What bottlenecks occur at what core counts? • Load profiles in PerfDMF database and examine with PerfExplorer 189 Usage Scenarios: Evaluate Scalability 190 Performance Regression Testing 191 Evaluate Scalability using PerfExplorer Charts % setenv TAU_MAKEFILE $TAU/Makefile.tau-mpi-pdt % set path=(/usr/local/packages/tau/i386_linux/bin $path) % make F90=tau_f90.sh (Or edit Makefile and change F90=tau_f90.sh) % mpirun -np 1 ./a.out % paraprof -–pack 1p.ppk % mpirun -np 2 ./a.out … % paraprof -–pack 2p.ppk … and so on. On your client: % perfdmf_configure --create-default (Chooses derby, blank user/passwd, yes to save passwd, defaults) % perfexplorer_configure (Yes to load schema, defaults) % paraprof (load each trial: DB -> Add Trial -> Type (Paraprof Packed Profile) -> OK) OR use perfdmf_loadtrial Then, % perfexplorer (Select experiment, Menu: Charts -> Speedup) 192 Communication Matrix Display • Goal: What is the volume of inter-process communication? Along which calling path? 193 Evaluate Scalability using PerfExplorer Charts % setenv TAU_MAKEFILE $TAU/Makefile.tau-mpi-pdt % set path=(/usr/local/packages/tau/i386_linux/bin $path) % make F90=tau_f90.sh (Or edit Makefile and change F90=tau_f90.sh) % setenv TAU_COMM_MATRIX 1 % setenv TAU_CALLPATH_DEPTH 10 % mpirun -np 4 ./a.out (setting the environment variables) % paraprof (Windows -> Communication Matrix) 194 TAU Instrumentation Approach • Support for standard program events – Routines – Classes and templates – Statement-level blocks • Support for user-defined events – Begin/End events (“user-defined timers”) – Atomic events (e.g., size of memory allocated/freed) – Selection of event statistics • Support definition of “semantic” entities for mapping • Support for event groups • Instrumentation optimization (eliminate instrumentation in lightweight routines) 195 TAU Instrumentation • Flexible instrumentation mechanisms at multiple levels – Source code – manual (TAU API, TAU Component API) – automatic – C, C++, F77/90/95 (Program Database Toolkit (PDT)) – OpenMP (directive rewriting (Opari), POMP spec) – Object code – pre-instrumented libraries (e.g., MPI using PMPI) – statically-linked and dynamically-linked – Executable code – dynamic instrumentation (pre-execution) (DynInstAPI) – virtual machine instrumentation (e.g., Java using JVMPI) – Proxy Components 196 Using TAU • Configuration • Instrumentation – – – – – – Manual MPI – Wrapper interposition library PDT- Source rewriting for C,C++, F77/90/95 OpenMP – Directive rewriting Component based instrumentation – Proxy components Binary Instrumentation – DyninstAPI – Runtime Instrumentation/Rewriting binary – Java – Runtime instrumentation – Python – Runtime instrumentation • Measurement • Performance Analysis 197 TAU Measurement System Configuration • configure [OPTIONS] – – – – – – – – – – – – – – – {-c++=<CC>, -cc=<cc>} Specify C++ and C compilers {-pthread, -sproc} Use pthread or SGI sproc threads -openmp Use OpenMP threads -jdk=<dir> Specify Java instrumentation (JDK) -opari=<dir> Specify location of Opari OpenMP tool -papi=<dir> Specify location of PAPI -pdt=<dir> Specify location of PDT -dyninst=<dir> Specify location of DynInst Package -mpi[inc/lib]=<dir> Specify MPI library instrumentation -shmem[inc/lib]=<dir> Specify PSHMEM library instrumentation -python[inc/lib]=<dir> Specify Python instrumentation -epilog=<dir> Specify location of EPILOG -slog2[=<dir>] Specify location of SLOG2/Jumpshot -vtf=<dir> Specify location of VTF3 trace package -arch=<architecture> Specify architecture explicitly (bgp, craycnl,ibm64,ibm64linux…) 198 TAU Measurement System Configuration • configure [OPTIONS] – – – – – – – – – – – – – -TRACE -PROFILE (default) -PROFILECALLPATH -PROFILEPHASE -PROFILEMEMORY -PROFILEHEADROOM -MULTIPLECOUNTERS -COMPENSATE -CPUTIME -PAPIWALLCLOCK -PAPIVIRTUAL -SGITIMERS -LINUXTIMERS Generate binary TAU traces Generate profiles (summary) Generate call path profiles Generate phase based profiles Track heap memory for each routine Track memory headroom to grow Use hardware counters + time Compensate timer overhead Use usertime+system time Use PAPI’s wallclock time Use PAPI’s process virtual time Use fast IRIX timers Use fast x86 Linux timers 199 TAU Measurement Configuration – Examples • ./configure -pythoninc=/usr/include/python2.5 – • Configure using Python instrumentation ./configure -papi=/usr/local/packages/papi -pythoninc=/usr/include/python2.5 -pdt=/usr/local/pdtoolkit-3.14.1 -mpiinc=/usr/local/include -mpilib=/usr/local/lib – Use PAPI counters (one or more) with C/C++/F90/Python automatic instrumentation. Also instrument the MPI library. • Typically configure multiple measurement libraries • Each configuration creates a unique <arch>/lib/Makefile.tau-<options> stub makefile (set TAU_MAKEFILE environment variable) that corresponds to the configuration options specified. – /usr/local/packages/tau/i386_linux/lib/Makefile.tau-mpi-python-pdt – /usr/local/packages/tau/i386_linux/lib/Makefile.tau-papi-mpi-python-pdt and bindings directory (add to PYTHONPATH & LD_LIBRARY_PATH): – /usr/local/packages/tau/i386_linux/lib/bindings-papi-mpi-python-pdt 200 TAU_SETUP: A GUI for Installing TAU 201 Tau_[cxx,cc,f90].sh – Improves Integration in Makefiles # set TAU_MAKEFILE and TAU_OPTIONS env vars CXX = tau_cxx.sh F90 = tau_f90.sh CFLAGS = LIBS = -lm OBJS = f1.o f2.o f3.o … fn.o app: $(OBJS) $(CXX) $(LDFLAGS) $(OBJS) -o $@ $(LIBS) .cpp.o: $(CC) $(CFLAGS) -c $< 202 Using TAU with Python Applications Step I: Configure TAU with Python % configure –pythoninc=/usr/include/python2.5 % make clean; make install Builds <taudir>/<arch>/lib/<bindings>/pytau.py and tau.py packages for manual and automatic instrumentation respectively % setenv PYTHONPATH $PYTHONPATH\:<taudir>/<arch>/lib/[<dir>] 203 Python Automatic Instrumentation Example #!/usr/bin/env/python import tau from time import sleep Running: % setenv PYTHONPATH <tau>/<arch>/lib % ./auto.py def f2(): print “ In f2: Sleeping for 2 seconds ” Instruments OurMain, f1, f2, print… sleep(2) def f1(): print “ In f1: Sleeping for 3 seconds ” sleep(3) def OurMain(): f1() tau.run(‘OurMain()’) 204 Optimization of Program Instrumentation • Need to eliminate instrumentation in frequently executing lightweight routines • Throttling of events at runtime (default): % setenv TAU_THROTTLE 1 Turns off instrumentation in routines that execute over 100000 times (TAU_THROTTLE_NUMCALLS) and take less than 10 microseconds of inclusive time per call (TAU_THROTTLE_PERCALL) • Selective instrumentation file to filter events % tau_instrumentor [options] –f <file> • Compensation of local instrumentation overhead % configure -COMPENSATE 205 Performance Analysis • paraprof profile browser (GUI) • pprof (text based profile browser) • TAU traces can be exported to many different tools – Vampir/VNG [T.U. Dresden] (formerly Intel (R) Trace Analyzer) – Jumpshot (bundled with TAU) [Argonne National Lab] ... 206 Building Bridges to Other Tools: TAU 207 ParaProf 208 ParaProf - SciPy Callpath Profile 209 ParaProf - Callpath Thread Relations Window 210 ParaProf 211 ParaProf - SciPy Callgraph 212 PerfDMF: Performance Data Mgmt. Framework 213 Labs! Lab: Python and TAU 214 Labs! Lab: Explore http://www.scipy.org/ 215 Labs! Lab: Explore and Calculate 216 Lab Instructions • Explore the Python web site – http://python.org/ – Browse the Documentation – Check out Topic Guides • Try the math package – – – – Convert Celcius to Fahrenheit (F = 9/5 C + 32) What does math.hypot do? How is math.pi different from math.sqrt? Remember import, dir, and help 217 Labs! Lab: Strings 218 Lab Instructions • Explore the string module – import string – dir(string) – help(string) • Try some of the string functions – string.find – … 219 Labs! Lab: Sequence Objects 220 Lab Instructions • Become familiar with lists [] – Create a list of integers and assign to variable l – Try various slices of your list – Assign list to another variable, (ll = l) – Change an element of l – Print ll, what happened? – Try list methods such as append, dir(list) • Try creating a dictionary, d = {} – Print a dictionary element using [] – Try methods, d.keys() and d.values() 221 Labs! Lab: Functions 222 Lab Instructions • In an editor, create file funcs.py • Create a function, mean(), that returns the mean of the elements in a list object – You will need to use the len function – Use for i in range(): • Test your function in Python • Modify mean() – Use for x in list: • Retest mean() 223 Labs! Lab: Classes 224 Lab Instructions • Create SimpleStat class in SimpleStat.py – Create constructor that takes a list object – Add attribute, list_obj to contain list object – Create method, mean() – Returns the mean of the contained list object – Create method, greater_than_mean() – Returns number of elements greater than the mean – Test your class from Python interpreter – What does type(SimpleStat) return? – Did you import or from SimpleStat import * 225 Labs! Lab: Numerical Array Basics 226 Lab Instructions • Import numpy – Try dir(numpy) – Browse the documentation, help(numpy) – Create and initialize arrays in different ways – How is arange() different from range()? – Try ones(), resize() and reshape() – Become friendly with slices – Try addition and multiplication with arrays – Try sum, add, diagonal, trace, transpose 227 Labs! Lab: Linear Algebra 228 Lab Instructions • Goal: Investigate a college basketball rating system – Can be applied to any sport – Multivariate linear regression to find team ratings • Copy ratings.py games.py from disk • $python -i games.py • >>> ratings = numpy.linalg.solve(ah, bh) – print team_names, ratings – sort ratings – ask instructor about the arrays ah and bh 229 PET Computational Environment (CE) Tool Environment • PET CE FAPOC David Cronk, [email protected] • Ptoolsrte, PAPI, PerfSuite, TAU, and KOJAK installed in $PET_HOME/pkgs on machines at DoD DSRCs • Contact [email protected] with – – – – questions problems suggestions success stories! 230 Acknowledgements “This publication was made possible through support provided by DoD HPCMP PET activities through Mississippi State University under the terms of Agreement No. #GSO4TO1BFC0060. The opinions expressed herein are those of the author(s) and do not necessarily reflect the views of the DoD or Mississippi State University.” 231 Acknowledgements • • • • HPCMP DoD PET Program • University of Oregon • • Los Alamos National Laboratory • Research Centre Juelich, Germany Department of Energy National Science Foundation University of Tennessee – David Cronk – Joseph Thomas – A. D. Malony, A. Morris, M. Sottile, W. Spear TU Dresden – Holger Brunst – Wolfgang Nagel – Bernd Mohr – Felix Wolf 232