Download Labs!

Document related concepts
no text concepts found
Transcript
Scientific Computing using
Python
Two day PET KY884 tutorial at HEAT Center, Aberdeen, MD
Mon-Tue, July 20-21, 2009, 8:30am - 4:30pm
Dr. Craig Rasmussen, and Dr. Sameer Shende
[email protected]
http://www.paratools.com/arl09
1
Tutorial Outline
•
•
•
•
•
•
•
•
Basic Python
IPython : Interactive Python
Advanced Python
NumPy : High performance arrays for Python
Matplotlib : Basic plotting tools for Python
MPI4py : Parallel programming with Python
F2py and SWIG : Language interoperability
Extra Credit
– SciPy and SAGE : mathematical and scientific computing
– Traits : Typing system for Python
– Dune : A Python CCA-Compliant component framework
• Portable performance evaluation using TAU
• Labs
2
Tutorial Goals
• This tutorial is intended to introduce Python as a tool for high
productivity scientific software development.
• Today you should leave here with a better understanding of…
–
–
–
–
The basics of Python, particularly for scientific and numerical computing.
Toolkits and packages relevant to specific numerical tasks.
How Python is similar to tools like MATLAB or GNU Octave.
How Python might be used as a component architecture
• …And most importantly,
– Python makes scientific programming easy, quick, and fairly painless,
leaving more time to think about science and not programming.
3
SECTION 1
INTRODUCTION
4
What Is Python?
Python is an interpreted language that allows you to
accomplish what you would with a compiled language,
but without the complexity.
• Interpreted and interactive
•
Truly Modular
• Easy to learn and use
•
NumPy
•
PySparse
•
FFTW
•
Plotting
• Automatic garbage collection
•
MPI4py
• Object-oriented and Functional
•
Co-Array Python
• Fun
• Free and portable
5
Running Python
$$ ipython
Python 2.5.1 (… Feb 6 2009 …)
Ipython 0.9.1 – An enhanced …
# what is math
>>> type(math)
<type 'module'>
'''a comment line …'''
# another comment style
# the IPython prompt
In [1]:
# what is in math
>>> dir(math)
['__doc__',…, 'cos',…, pi, …]
# the Python prompt, when native
# python interpreter is run
>>>
# import a module
>>> import math
>>> cos(pi)
NameError: name 'cos' is not
defined
# import into global namespace
>>> from math import *
>>> cos(pi)
-1.0
6
Interactive Calculator
# adding two values
>>> 3 + 4
7
# setting a variable
>>> a = 3
>>> a
3
# checking a variables type
>>> type(a)
<type 'int'>
# an arbitrarily long integer
>>> a = 1204386828483L
>>> type(a)
<type 'long'>
# real numbers
>>> b = 2.4/2
>>> print b
1.2
>>> type(b)
<type 'float'>
# complex numbers
>>> c = 2 + 1.5j
>>> c
(2+1.5j)
# multiplication
>>> a = 3
>>> a*c
(6+4.5j)
7
Online Python Documentation
# command line documentation
$$ pydoc math
Help on module math:
>>> dir(math)
['__doc__',
>>> math.__doc__
…mathematical functions defined…
>>> help(math)
Help on module math:
>>> type(math)
<type 'module'>
# ipython documentation
In[3]: math.<TAB>
…math.pi
math.sin
math.sqrt
In[4]: math?
Type:
module
Base Class: <type 'module'>
In[5]: import numpy
In[6]: numpy??
Source:===
“““\
NumPy
=========
8
Labs!
Lab: Explore and Calculate
9
Strings
# creating strings
>>> s1 = "Hello "
>>> s2 = 'world!'
# string operations
>>> s = s1 + s2
>>> print s
Hello world!
>>> 3*s1
'Hello Hello Hello '
>>> len(s)
12
# the string module
>>> import string
# split space delimited words
>>> word_list = string.split(s)
>>> print word_list
['Hello', 'world!']
>>> string.join(word_list)
'Hello world!'
>>> string.replace(s,'world',
'class')
'Hello class!'
10
Labs!
Lab: Strings
11
Tuples and Lists: sequence objects
# a tuple is a collection of obj
>>> t = (44,) # length of one
>>> t = (1,2,3)
>>> print t
(1,2,3)
# accessing elements
>>> t[0]
1
>>> t[1] = 22
TypeError: 'tuple' object does
not support item assignment
# a list is a mutable collection
>>> l = [1,22,3,3,4,5]
>>> l
[1,22,3,3,4,5]
>>> l[1] = 2
>>> l
[1,2,3,3,4,5]
>>> del l[2]
>>> l
[1,2,3,4,5]
>>> len(l)
5
# in or not in
>>> 4 in l
True
>>> 4 not in l
False
12
More on Lists
# negative indices count
# backward from the end of
# the list
>>> l
[1,2,3,4,5]
>>> l[-1]
5
>>> l[-2]
4
>>> dir(list)
[__add__, 'append', 'count',
'extend', 'index', 'insert',
'pop', 'remove', 'reverse',
'sort']
# what does count do?
>>> list.count
<method 'count' of 'list'…>
>>> help(list.count)
'L.count(value) -> integer -return number of occurrences
of value'
13
Slicing
var[lower:upper]
Slices extract a portion of a sequence (e.g., a list or a
NumPy array). Mathematically the range is [lower, upper).
>>> print l
[1,2,3,4,5]
# some ways to return entire
# portion of the sequence
>>> l[0:5]
>>> l[0:]
>>> l[:5]
>>> l[:]
[1,2,3,4,5]
# middle three elements
>>> l[1:4]
>>> l[1:-1]
>>> l[-4:-1]
[2,3,4]
# last two elements
>>> l[3:]
>>> l[-2:]
[4,5]
14
Dictionaries: key/value pairs
Dictionaries store key/value pairs. Indexing a dictionary by
a key returns the value associate with it.
# create data
>>> pos = [1.0, 2.0, 3.0, 4.0, 5.0]
>>> T = [9.9, 8.8. 7.7, 6.6, 5.5]
# store data in a dictionary
>>> data_dict = {'position': pos, 'temperature': T}
# access elements
>>> data_dict['position']
[1.0, 2.0, 3.0, 4.0, 5.0]
15
Labs!
Lab: Sequence Objects
16
If Statements and Loops
# if/elif/else example
>>> print l
[1,2,3,4,5]
>>>
…
…
…
…
…
…
yes
if 3 in l:
print 'yes'
elif 3 not in l:
print 'no'
else:
print 'impossible!'
< hit return >
# for loop examples
>>> for i in range(1,3): print i
…
< hit return >
1
2
>>> for x in l: print x
…
< hit return >
1 …
# while loop example
>>> i = 1
>>> while i < 3: print i; i += 1
…
< hit return >
1
2
17
Functions
# create a function in funcs.py
def Celcius_to_F(T_C):
T_F = (9./5.)*T_C + 32.
return T_F
'''
Note: indentation is used for
scoping, no braces {}
'''
# run from command line and
# start up with created file
$ python -i funcs.py
>>> dir()
['Celcius_to_F',
'__builtins__',… '
>>> Celsius_to_F = Celcius_to_F
>>> Celsius_to_F
<function Celsius_to_F at …>
>>> Celsius_to_F(0)
32.0
>>> C = 100.
>>> F = Celsius_to_F(C)
>>> print F
212.0
18
Labs!
Lab: Functions
19
Classes
# create a class in Complex.py
class Complex:
'''A simple Complex class'''
def __init__(self, real, imag):
'''Create and initialize'''
self.real = real
self.imag = imag
def norm(self):
'''Return the L2 Norm'''
import math
d = math.hypot(self.real,self.imag)
return d
#end class Complex
# run from command line
$ python -i Complex.py
# help will display comments
>>> help(Complex)
Help on class Complex in module …
# create a Complex object
>>> c = Complex(3.0, -4.0)
# print Complex attributes
>>> c.real
3.0
>>> c.imag
-4.0
# execute a Complex method
>>> c.norm()
5.0
20
Labs!
Lab: Classes
21
SECTION 2
Interactive Python
IPython
22
IPython Summary
• An enhanced interactive Python shell
• An architecture for interactive parallel computing
• IPython contains
–
–
–
–
Object introspection
System shell access
Special interactive commands
Efficient environment for Python code development
• Embeddable interpreter for your own programs
• Inspired by Matlab
• Interactive testing of threaded graphical toolkits
23
Running IPython
$$ ipython -pylab
IPython 0.9.1 -- An enhanced Interactive Python.
?
-> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help
-> Python's own help system.
object?
-> Details about 'object'. ?object also works, ?? Prints
# %fun_name are magic commands
# get function info
In [1]: %history?
Print input history (_i<n> variables), with most recent last.
In [2]: %history
1: #?%history
2: _ip.magic("history ")
24
More IPython Commands
# some shell commands are available
In [27]: ls
01-Lab-Explore.ppt*
04-Lab-Functions.ppt*
# TAB completion for more information about objects
In [28]: %<TAB>
%alias
%autocall
%autoindent
%automagic
%bg
%bookmark %cd
%clear
%color_info
%colors
%cpaste
%debug
%dhist
%dirs
%doctest_mode
# retrieve Out[] values
In [29]: 4/2
Out[29]: 2
In [30]: b = Out[29]
In [31]: print b
2
25
More IPython Commands
# %run runs a Python script and loads its data into interactive
# namespace; useful for programming
In [32]: %run hello_script
Hello
# ! gives access to shell commands
In [33]: !date
Tue Jul 7 23:04:37 MDT 2009
# look at logfile (see %logstart and %logstop)
In [34]: !cat ipython_log.py
#log# Automatic Logger file. *** THIS MUST BE THE FIRST LINE ***
#log# DO NOT CHANGE THIS LINE OR THE TWO BELOW
#log# opts = Struct({'__allownew': True, 'logfile': 'ipython_log.py'})
#log# args = []
#log# It is safe to make manual edits below here.
#log#----------------------------------------------------------------------_ip.magic("run hello”)
26
Interactive Shell Recap
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
Object introspection (? and ??)
Searching in the local namespace (‘TAB’)
Numbered input/output prompts with command history
User-extensible ‘magic’ commands (‘%’)
Alias facility for defining your own system aliases
Complete system shell access
Background execution of Python commands in a separate thread
Expand python variables when calling the system shell
Filesystem navigation via a magic (‘%cd’) command
– Bookmark with (‘%bookmark’)
A lightweight persistence framework via the (‘%store’) command
Automatic indentation (optional)
Macro system for quickly re-executing multiple lines of previous input
Session logging and restoring
Auto-parentheses (‘sin 3’)
Easy debugger access (%run –d)
Profiler support (%prun and %run –p)
27
Labs!
Lab: IPython
Try out ipython commands as time allows
28
SECTION 3
Advanced Python
29
Regular Expressions
# The re module provides regular expression tools for advanced
# string processing.
>>> import re
# Get a refresher on regular expressions
>>> help(re)
>>> help(re.findall)
>>> help(re.sub)
>>> re.findall(r'\bf[a-z]*', 'which foot or hand fell fastest')
['foot', 'fell', 'fastest’]
>>> re.sub(r'(\b[a-z]+) \1', r'\1', 'cat in the the hat')
'cat in the hat'
30
Labs!
Lab: Regular Expressions
Try out the re module as time allows
31
Fun With Functions
# a filter returns those items
# for which the given function returns True
>>> def f(x): return x < 3
>>> filter(f, [0,1,2,3,4,5,6,7])
[0, 1, 2]
# map applies the given function to each item in a sequence
>>> def square(x): return x*x
>>> map(square, range(7))
[0, 1, 4, 9, 16, 25, 36]
# lambda functions are small functions with no name (anonymous)
>>> map(lambda x: x*x, range(7))
[0, 1, 4, 9, 16, 25, 36]
32
More Fun With Functions
# reduce returns a single value by applying a binary function
>>> reduce(lambda x,y: x+y, [0,1,2,3])
6
# list comprehensions provide an easy way to create lists
# [an expression followed by for then zero or more for or if]
>>> vec = [2, 4, 6]
>>> [3*x for x in vec]
[6, 12, 18]
>>> [3*x for x in vec if x > 3]
[12, 18]
>>> [x*y for x in vec for y in [3, 2, -1]]
[6, 4, -2, 12, 8, -4, 18, 12, -6]
33
Labs!
Lab: Fun with Functions
34
Input/Output
# dir(str) shows methods on str object
# a string representation of a number
>>> x = 3.25
>>> 'number is' + repr(x)
'number is3.25'
# pad with zeros
>>> '12'.zfill(5)
'00012'
# explicit formatting (Python 2.6)
>>> 'The value of {0} is approximately {1:.3f}.'.format('PI',
math.pi)
The value of PI is approximately 3.142.
35
File I/O
# file objects need to be opened
# some modes - 'w' (write), 'r' (read), 'a' (append)
#
- 'r+' (read+write), 'rb', (read binary)
>>> f = open('/tmp/workfile', 'w')
>>> print f
<open file '/tmp/workfile', mode 'w' at 80a0960>
>>> help(f)
>>> f.write('I want my binky!')
>>> f.close()
>>> f = open('/tmp/workfile', 'r+')
>>> f.readline()
'I want my binky!'
36
Search and Replace
# file substitute.py
import re
fin = open('fadd.f90', 'r')
p = re.compile('(subroutine)')
try:
while True:
s = fin.readline()
if s == "": break
sout = p.sub('SUBROUTINE', s)
print sout.replace('\n', "") # sys.stdout.write simpler
except:
print "Finished reading, file"
# is this line reached?
fin.close()
37
Iterators over Containers
Interators require two methods: next() and __iter__()
Fibonacci: f[n] = f[n-1] + f[n-2]; with f[0] = f[1] = 1
class fibnum:
def __init__(self):
self.fn1 = 1
self.fn2 = 1
# f [n-1]
# f [n-2]
def next(self):
# next() is the heart of any iterator
oldfn2
= self.fn2
self.fn2 = self.fn1
self.fn1 = self.fn1 + oldfn2
return oldfn2
def __iter__(self):
return self
38
Iterators…
# use Fibonacci iterator class
>>> from fibnum import *
# construct a member of the class
>>> f = fibnum()
>>> l = []
>>> for i in f:
l.append(i)
if i > 20: break
>>> l = []
[1, 1, 2, 3, 5, 8, 13, 21]
# thanks to (and for more information on iterators):
# http://heather.cs.ucdavis.edu/~matloff/Python/PyIterGen.pdf
39
Binary I/O
Anticipating the next module NumPy (numerical arrays),
you may want to look at the file PVReadBin.py to see
how binary I/O is done in a practical application.
40
Labs!
Lab: Input/Output
Try out file I/O as time allows
41
SECTION 4
NUMERICAL PYTHON
42
NumPy
• Offers Matlab like capabilities within Python
• Information
– http://numpy.scipy.org/
• Download
– http://sourceforge.net/projects/numpy/files/
• Numeric developers (initial coding Jim Hugunin)
–
–
–
–
Paul Dubouis
Travis Oliphant
Konrad Hinsen
Charles Waldman
43
Creating Array: Basics
>>> from numpy import *
>>> a = array([1.1, 2.2, 3.3])
>>> print a
[ 1.1 2.2 3.3]
# two-dimension array
>>> b = array(([1,2,3],[4,5,6]))
>>> print b
[[1 2 3]
[4 5 6]]
>>> print ones((2,3), float)
[[1. 1. 1.]
[1. 1. 1.]]
>>> print resize(b,(2,6))
[[1 2 3 4 5 6]
[1 2 3 4 5 6]]
>>> print reshape(b,(3,2))
[[1 2]
>>> b.shape
[3 4]
(2,3)
[5 6]]
44
Creating Arrays: Strategies
# user reshape with range
>>> a = reshape(range(12),(2,6))
>>> print a
[[0 1 2 3 4 5]
[6 7 8 9 10 11]]
# set an entire row (or column)
>>> a[0,:] = range(1,12,2)
>>> print a
[[1 3 5 7 9 11]
[6 7 8 9 10 11]]
# loop to set individual values
>>> for i in range(50):
…
for j in range(100):
…
a[i,j] = i + j
# call user function set(x,y)
>>> shape = (50,100)
>>> a = fromfunction(set, shape)
# use scipy.io module to read
# values from a file into an
# array
>>> a = zeros([50,100])
45
Simple Array Operations
>>> a = arange(1,4); print a
[1 2 3]
# addition (element wise)
>>> print 3 + a
[4 5 6]
# multiplication (element wise)
>>> print 3*a
[3 6 9]
# it really is element wise
>>> print a*a
[1 4 9]
# power: a**b -> power(a,b)
>>> print a**a
[1 4 27]
# functions: sin(x), log(x), …
>>> print sqrt(a*a)
[1. 2. 3.]
# comparison: ==, >, and, …
>>> print a < a
[False False False]
# reductions
>>> add.reduce(a)
6
46
Slicing Arrays
>>>
>>>
[[0
[3
[6
a = reshape(range(9),(3,3))
print a
1 2]
4 5]
7 8]]
# second column
>>> print a[:,1]
[1 4 7]
# last row
>>> print a[-1,:]
[6 7 8]
# slices are references to
# original memory, true for
# all array/sequence assignment
# work on the first row of a
>>> b = a[0,:]
>>> b[0] = 99 ; print b
[99 1 2]
# what is a[0,:] now?
>>> print a[0,:]
[99 1 2]
47
Array Temporaries and ufuncs
>>> a = arange(10)
>>> b = arange(10,20)
# What will the following do?
>>> a = a + b
# Universal functions, ufuncs
>>> type(add)
<type 'numpy.ufunc'>
# Is the following different?
>>> c = a + b
>>> a = c
# add is a binary operator
# Does “a” reference old or new
# memory? Answer, new memory!
# in place operation
# Watch out for array
# temporaries with large arrays!
>>> a = add(a,b)
>>> add(a,b,a)
48
Array Functions
>>> a = arange(1,11); print a
[1 2 3 4 5 6 7 8 9 10]
>>> a = reshape(range(9),(3,3))
>>> b = transpose(a); print b
[[0 3 6]
# create an index array
>>> ind = [0, 5, 8]
# take values from the array
>>> print take(a,ind)
>>> print a[ind]
[1 6 9]
[1 4 7]
[2 5 8]]
>>> print diagonal(b)
[0 4 8]
>>> print trace(b)
12
>>> print where(b >= 3, 9, 0)
# put values to the array
>>> put(a,ind,[0,0,0]); print a
>>> a[ind] = (0,0,0); print a
[0 2 3 4 5 0 7 8 0 10]
[[0 9 9]
[0 9 9]
[0 9 9]]
49
Labs!
Lab: NumPy Basics
50
Linear Algebra
>>> import numpy.linalg as la
>>> dir(la)
['Heigenvalues', 'Heigenvectors', 'LinAlgError', 'ScipyTest',
'__builtins__', '__doc__', '__file__', '__name__', '__path__',
'cholesky', 'cholesky_decomposition', 'det', 'determinant',
'eig', 'eigenvalues', 'eigenvectors', 'eigh', 'eigvals',
'eigvalsh', 'generalized_inverse', 'info', 'inv', 'inverse',
'lapack_lite', 'linalg', 'linear_least_squares', 'lstsq',
'pinv', 'singular_value_decomposition', 'solve',
'solve_linear_equations', 'svd', 'test']
51
Linear Algebra: Eigenvalues
# assume a exists already
# a multiple-valued function
>>> val,vec = la.eigenvectors(a)
>>> print a
[[ 1.
0.
0.
0.
[ 0.
2.
0.
0.01]
# eigenvalues
>>> print val
[ 0.
0.
5.
0.
[2.50019992
[ 0.
0.01
0.
2.5 ]]
]
]
1.99980008
1.
5. ]
# eigenvectors
>>> la.determinant(a)
>>> print vec
24.999500000000001
[[0.
0.01998801
0.
0.99980022]
[0.
0.99980022
0.
-0.01998801]
[1.
0.
0.
0.
]
[0.
0.
1.
0.
]]
52
Linear Algebra: solve linear equations
# assume a, q exists already
# a variable can ref. a function
>>> solv = la.solve_linear_equations
>>> print a
[[ 1.
0.
0.
0.
[ 0.
2.
0.
0.01]
[ 0.
0.
5.
0.
[ 0.
0.01
0.
2.5 ]]
4.04
15.
]
# solve linear system, a*b = q
>>> b = solv(a,q)
>>> print b
[1. 2. 3. 4.]
>>> q_new = matrixmultiply(a,b)
>>> print q_new
>>> print q
[1.
]
10.02]
[1.
4.04
15.
10.02]
>>> print q_new == q
[True True True True]
53
Jacobi Iteration
T = zeros((50,100), float)
# set top boundary condition
T[0,:] = 1
# iterate 10 times
for t in range(10):
T[1:-1,1:-1] = ( T[0:-2,1:-1] + T[2:,1:-1] +
T[1:-1,0:-2] + T[1:-1,2:] ) / 4
# dump binary output to file (Numarray only)
T.tofile('jacobi.out')
54
Labs!
Lab: Linear Algebra
55
SECTION 5
Visualization and Imaging with Python
56
Section Overview
• In this section we will cover two related topics: image
processing and basic visualization.
• Image processing tasks include loading, creating, and
manipulating images.
• Basic visualization will cover everyday plotting activities,
both 2D and 3D.
57
Plotting tools
• Many plotting packages available
– Python Computer Graphics Kit (RenderMan)
– Tkinter
– Tk – Turtle graphics
– Stand-alone GNUplot interface available
– Python bindings to VTK, OpenGL, etc…
• In this tutorial, we focus on the Matplotlib package
• Unlike some of the other packages available, Matplotlib is available
for nearly every platform.
– Comes with http://www.scipy.org/ (Enthought)
• http://matplotlib.sourceforge.net/
58
Getting started
• A simple example
# easiest to run ipython with –pylab option
$$ ipython –pylab
In [1]: plot([1,2,3])
In [2]: ylabel('some numbers')
In [3]: show()
# not needed with interactive
# output
59
Getting Started
60
Matplotlib with numpy
• The matplotlib package is compatible with numpy arrays.
# create data using numpy
t = arange(0.0, 2.0, 0.01)
s = sin(2*pi*t)
# create the plot
plot(t, s, linewidth=1.0)
# decorate the plot
xlabel('time (s)')
ylabel('voltage (mV)')
title('About as simple as it gets, folks')
grid(True)
show()
61
Simple Plot
62
Improving the axis settings
# get axis settings
>>> axis()
(0.0, 2.0, -1.0, 1.0)
# changes should show up immediately
>>> axis([0.0, 2.0, -1.5, 1.5])
# a plot can be saved from the menu bar
63
Better axes
64
Colorful background
subplot(111, axisbg=‘darkslategray’)
t = arange(0.0, 2.0, 0.01)
# first plot
plot(t, sin(2*pi*t), ‘y’)
# second plot
t = arange(0.0, 2.0, 0.05)
plot(t, sin(pi*t), ‘ro’)
65
Colorful background
66
Fill demo
# data
t = arange(0.0, 1.01, 0.01)
s = sin(2*2*np.pi*t)
# graph
fill(t, s*np.exp(-5*t), 'r')
grid(True)
67
Fill demo
68
Subplot demo
def f(t):
s1 = cos(2*pi*t); e1 = exp(-t)
return multiply(s1,e1)
t1 = arange(0.0, 5.0, 0.1)
t2 = arange(0.0, 5.0, 0.02)
t3 = arange(0.0, 2.0, 0.01)
subplot(211)
plot(t1, f(t1), 'bo', t2, f(t2), 'k--', markerfacecolor='green')
grid(True)
title('A tale of 2 subplots')
ylabel('Damped oscillation')
subplot(212)
plot(t3, cos(2*pi*t3), 'r.')
grid(True)
xlabel('time (s)')
ylabel('Undamped’)
69
Subplot demo
70
A basic 3D plot example
• Matplotlib can do polar plots, contours, …, and can even
plot mathematical symbols using LaTeX
• 3D graphics?
– not so great
• Matplotlib has simple 3D graphics but is limited relative to
packages based on OpenGL like VTK.
• Note: mplot3d module may not be loaded on your system.
71
3D example
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
import random
fig = figure()
ax = Axes3D(fig)
X = arange(-5, 5, 0.25)
Y = arange(-5, 5, 0.25)
X, Y = meshgrid(X, Y)
R = sqrt(X**2 + Y**2)
Z = sin(R)
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.jet)
72
3D example
73
More visualization tools
• Matplotlib is pretty good for simple plots. There are other
tools out there that are quite nice:
–
–
–
–
MayaVI : http://mayavi.sourceforge.net/
VTK : http://www.vtk.org/
SciPy/plt : http://www.scipy.org/
Python Computer Graphics Kit based on Pixar’s RenderMan:
http://cgkit.sourceforge.net/
74
Image Processing
• A commonly used package for image processing in
Python is the Python Imaging Library (PIL).
•
http://www.pythonware.com/products/pil/
75
Getting started
• How to load the package
– import Image, ImageOps, …
• Image module contains main class to load and represent
images.
• PIL comes with many additional modules for specialized
operations
76
Additional PIL Modules
• ImageDraw : Basic 2D graphics for Image objects
• ImageEnhance : Image enhancement operations
• ImageFile : File operations, including parser
• ImageFilter : A set of pre-defined filter operations
• ImageOps : A set of pre-defined common operations
• ImagePath : Express vector graphics, usable with ImageDraw
• ImageSequence : Implements iterator for image sequences or
frames.
• ImageStat : Various statistical operations for Images
77
Loading an image
• Loading an image is simple, no need to explicitly specify
format.
import Image
im = Image.open(“image.jpg")
78
Supported Image Formats
• Most image formats people wish to use are available.
–
–
–
–
–
–
–
–
JPEG
GIF
BMP
TGA, TIFF
PNG
XBM,XPM
PDF, EPS
And many other formats that aren’t as commonly used
– CUR,DCX,FLI,FLC,FPX,GBR,GD,ICO,IM,IMT,MIC,MCIDAS,PCD,
PCX,PPM,PSD,SGI,SUN
• Not all are fully read/write capable - check the latest docs for status.
79
Image representation
• Images are represented with the PIL Image class.
• Often we will want to write algorithms that treat the image
as a NumPy array of grayscale or RGB values.
• It is simple to convert images to and from Image objects
and numpy arrays.
80
Converting the image to a NumPy array
def PIL2NUMARRAY(im):
if im.mode not in ("L", "F"):
raise ValueError, "image must be single-layer."
ar = array(im.getdata())
ar.shape = im.size[0], im.size[1]
return ar
Note: This works for mode “L”, or monochrome, images.
RGB would require more work - similar concept though.
81
Converting a NumPy array back to an Image
def NUMARRAY2PIL(ar,size):
im = Image.new("L",size)
im.putdata(reshape(ar,(size[0]*size[1],)))
return im
Notice that we need to flatten the 2D array into a
1D array for the PIL structure. Size need not be
explicitly passed in - one can query ar for the
shape and size.
82
Saving an image
• Much like reading, writing images is also very simple.
• Many formats available.
– Either explicitly specify output format, or let PIL infer it from the
filename extension.
outfname=“somefile.jpg”
imgout = NUMARRAY2PIL(workarray,size)
imgout.save(outfname,"JPEG")
83
Labs!
Lab: Graphics
84
SECTION 6
Parallel programming with Python:
MPI4Py and Co-Array Python
85
IPython Parallelism
• IPython supports many styles of parallelism
– Single program, multiple data (SPMD) parallelism
– Multiple program, multiple data (MPMD) parallelism
– Message passing using ‘’MPI’’
• Getting Started with Parallel Ipython
–
–
–
–
Starting ipcluster
Using FURLS
Using a Multi-Engine Client (MEC)
%px
• First we look at using MPI with mpi4py
86
Parallel Computing with mpi4py
mpi4py is primarily run from a script
# file par_hello.py
from mpi4py import MPI
# communication in MPI is through a communicator
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
print "Hello, rank", rank, "of", size
87
Running an MPI Script
mpiexec runs python on multiple processors concurrently
$$ python par_hello.py
Hello, rank 0 of 1
$$ mpiexec –n
Hello, rank 2
Hello, rank 3
Hello, rank 1
Hello, rank 0
4 python par_hello.py
of 4
of 4
of 4
of 4
# notice that execution by rank is not ordered
88
Passing Information in a Ring
# file ring.py
from mpi4py import MPI
import numpy as np
# Create message buffers
message_in = np.zeros(3, dtype=np.int)
message_out = np.zeros(3, dtype=np.int)
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
#Calc the rank of the previous and next process in the ring
next = (rank + 1) % size;
prev = (rank + size - 1) % size;
89
More ring.py
# Let message be (prev,rank,next)
message_out[:] = (prev,rank,next)
# Must break symmetry by one sending and others receiving
if rank == 0:
comm.Send([message_out, MPI.INT], dest=next, tag=11)
else:
comm.Recv([message_in, MPI.INT], source=prev, tag=11)
# Reverse order
if rank == 0:
comm.Recv([message_in, MPI.INT], source=prev, tag=11)
else:
comm.Send([message_out, MPI.INT], dest=next, tag=11)
print rank, ':', message_in
90
Running ring.py
$$ python ring.py
0 : [0 0 0]
$$ mpiexec –n 4 python ring.py
1
2
3
0
:
:
:
:
[3
[0
[1
[2
0
1
2
3
1]
2]
3]
0]
91
Interactive Parallel Computing
First start server processes on remote (or local) cluster:
$$ ipcluster –n 2 &
Starting controller: Controller PID: 5351
Starting engines:
Engines PIDs:
[5353, 5354]
Log files: /home/rasmussn/.ipython/log/ipcluster-5351-*
Your cluster is up and running.
For interactive use, you can make a MultiEngineClient with:
from IPython.kernel import client
mec = client.MultiEngineClient()
You can then cleanly stop the cluster from IPython using:
mec.kill(controller=True)
You can also hit Ctrl-C to stop it, or use from the cmd line:
kill -INT 5350
92
Local IPython Client
On local client:
In [1]: from IPython.kernel import client
In [2]: mec = client.MultiEngineClient()
In [3]: mec.get_ids()
Out[3]: [0,1,2,3]
In [4]: %px?
Executes the given python command on the active IPython Controller.
To activate a Controller in IPython, first create it and then call
the activate() method.
In [5]: mec.activate()
93
More Parallel IPython
In [6]: %px a=3
Parallel execution on engines :all
Out[6]:
<Results List>
[0] In [1]: a=3
[1] In [1]: a=3
In [7]: %px print a
Parallel execution on engines: all
Out[7]:
<Results List>
[0] In [2]: print a
[0] Out[2]: 3
[1] In [2]: print a
[1] Out[2]: 3
94
Result method
>>> %result?
Print the result of command i on all engines of the actv controller
>>> result 1
<Results List>
[0] In [1]: a=3
[1] In [1]: a=3
95
What Can I Do in Parallel?
• What can you imagine doing with multiple Python
engines?
– Execute code?
– mec.execute
– mec.map
– mec.run
# execute a function on a set of nodes
# map a function and distribute data to nodes
# run code from a file on engines
– Exchange data?
– mec.scatter
– mec.gather
– mec.push
# distribute a sequence to nodes
# gather a sequence from nodes
# push python objects to nodes
• Targets parameter in many of the mec methods selects
the particular set of engines
96
Labs!
Lab: Parallel IPython
Try out parallel ipython as time permits
97
Why Co-Array Python
• Scientists like Python
– Powerful scripting language
– Numerous extension modules
– NumPy, PySparse, …
– Gives an environment like MatLab
• But, scientists often need parallel computers
• MPI4Py (and others) was developed
• But let’s try something besides explicit message passing
• Co-Array Python borrows from Co-Array Fortran
98
Co-Array Programming Model
• SPMD model
• All processors run Python interpretor via PyMPI
• Local view of array data
– local, not global indexing
• Adds another array dimension for remote memory access
– the co-dimension
• Uses ARMCI for communication
– portable Cray shmem library
99
Co-Array Python Syntax
#
# put to remote processor number 1
#
T(1)[3,3] = T[3,3]
#
# get from remote processor number 8
#
T[4,5] = T(8)[4,5]
100
Co-Array Python Example
• Jacobi problem on 2 dimensional grid
• Derichlet boundary conditions
• Average of four nearest neighbors
101
Computational Domain
up
me ghost boundary cells
me
me ghost boundary cells
dn
102
Initialization
from CoArray import *
nProcs = mpi.size
me = mpi.rank
M = 200; N = M/nProcs
T = coarray((N+2, M+2), Numeric.Float)
up = me - 1
dn = me + 1
if me
up
if me
dn
== 0:
= None
== nProcs - 1:
= None
103
Jacobi Update (inner loop): I
#
# update interior values (no communication)
#
T[1:-1,1:-1] = ( T[0:-2,1:-1] + T[2:,1:-1] +
T[1:-1,0:-2] + T[1:-1,2:] ) / 4.0
104
Jacobi Update (inner loop): II
up boundary row
me
dn boundary row
#
# exchange boundary conditions
#
mpi.barrier()
if up != None: T(up)[-1:,:] = T[ 1,:]
if dn != None: T(dn)[ 0:,:] = T[-2,:]
mpi.barrier()
105
Timing Data
Size
CoPcomm CoPtotal PyMPIcomm
128x128
256x256
512x512
1024x1024
2048x2048
0.017
0.023
0.041
0.068
0.089
0.33
1.28
6.28
28.4
113.5
0.07
0.13
0.28
0.52
PyMPItotal
0.38
1.41
6.47
28.78
Ccomm
Ctotal
0.013
0.015
0.020
0.032
0.047
0.05
0.14
0.55
2.49
10.13
Table 1. Timing data for Co-Array Python (CoP), MPI (PyMPI) and C
MPI (C) versions
• Most of time spent in computation (Python 1/10 C performance)
• Co-Array Python communication rivals C (Python 1/2 C performance)
– Co-Array Python communication much faster than PyMPI
– better data marshalling
– ARMCI
106
Conclusions
• Co-Arrays allows direct addressing of remote memory
– e.g.
T(remote)[local]
• Explicit parallelism
• Parallel programming made easy
• Fun
• Explore new programming models (Co-Arrays)
• Looking at Chapel
– implicit parallelism
– global view of memory (for indexing)
107
Status
• Not entirely finished
– reason a research note, not a full paper
– but available to “play” with
– [email protected]
• Hope to finish soon and put on Scientific Python web site
– http://www.scipy.org/
108
SECTION 7
Language Interoperability
109
Language Interoperability
• Python features many tools to make binding Python to
languages like C/C++ and Fortran 77/95 easy.
• We will cover:
– F2py: Fortran to Python wrapper generator
– SWIG: The Simple Wrapper Interface Generator
• For Fortran, we also consider:
– Fortran interoperability standard
– Fortran Transformational Tools (FTT) project
110
Fortran Example: fadd.f90
• Consider the following simple Fortran subroutine to add
two arrays
subroutine fadd(A, B, C, N)
real, dimension(N) :: A, B, C
integer :: N
! do j = 1, N
!
C(j) = A(j) + B(j)
! end do
C = A + B
end subroutine fadd
111
Annotate for F2py
• F2py works better if you let it know what the variables are
doing (intents)
! file fadd.f90
!
subroutine fadd(A, B, C, N)
real, dimension(N) :: A, B, C
integer :: N
!F2PY intent(out) :: C
!F2PY intent(hide) :: N
!F2PY real, dimension(N) :: A, B, C
C = A + B
end subroutine fadd
112
Running F2py
• Once you have annotated the source file, run f2py to
generate the Python bindings
$$ f2py -c -m fadd fadd.f90
$$ ls
fadd.f90
fadd.so
113
Try out the new module
• Run the new fadd module from ipython
In [1]: from fadd import *
In [2]: fadd?
Docstring:
fadd - Function signature:
c = fadd(a,b)
Required arguments:
a : input rank-1 array('f') with bounds (n)
b : input rank-1 array('f') with bounds (n)
Return objects:
c : rank-1 array('f') with bounds (n)
In [3]: fadd([1,2,3,4,5], [5,4,3,2,1])
Out[5]: array([ 6., 6., 6., 6., 6.], dtype=float32)
114
Fortran Interoperability Standard
• Fortran 2003 provides a standard mechanism for
interoperability with C
– This could be used to reduce the need for annotations
– But improved tools support needed
interface
subroutine fadd(A, B, C, N) BIND(C, name=“fadd”)
use, intrinsic :: ISO_C_BINDING
real(C_FLOAT), intent(in), dimension(N) :: A, B
real(C_FLOAT), intent(out), dimension(N) :: C
integer(C_INT), value :: N
end subroutine fadd
end interface
115
SWIG: example.c
/* File : example.c */
double My_variable = 3.0;
/* Compute factorial of n */
Int fact(int n)
{
if (n <= 1) return 1;
else return n*fact(n-1);
}
/* Compute n mod m */
int my_mod(int n, int m) { return(n % m); }
116
SWIG: example.i
/* File : example.i */
%module example
%{
/* Put headers and other declarations here */
%}
extern double My_variable;
extern int
fact(int);
extern int
my_mod(int n, int m);
117
Data Dictionary
• Share Fortran arrays with Python by “name”
• Fortran
subroutine get_arrays(dict)
integer
:: dict
integer, save :: A(3,4)
integer
:: rank = 2, type = INTEGER_TYPE
integer
:: shape = (/3,4/)
call put_array(dict, “A”, A, rank, shape, type)
end subroutine
• Python
A = dict[‘A’]
118
Running SWIG
• Once you have created the .i file, run swig to generate the
Python bindings
unix > swig -python example.I
unix > ls
example.c
example.i
example.py
example_wrap.c
119
SWIG: build module
• Build the example module
– create setup.py
– execute setup.py
unix > cat setup.py
from distutils.core import setup, Extension
setup(name=”_example", version="1.0",
ext_modules=[
Extension(”_example",
[”_example.c", "example_wrap.c"],
),
])
unix > python setup.py config
unix > python setup.py build
120
SWIG: build module
• Run the code
– where is _example.so (set path)
>>> from _example import *
>>> # try factorial function
>>> fact(5)
120
>>> # try mod function
>>> my_mod(3,4)
3
>>> 3 % 4
3
121
NumPy and Fortran Arrays
• Chasm provides a bridge between Fortran and Python
arrays
• The only way to use Fortran assumed-shape arguments
with Python
• Call the following routine from Python
subroutine F90_multiply(a, b, c)
integer, pointer :: a(:,:), b(:,:), c(:,:)
c = MatMul(a,b) ! Fortran intrinsic
end subroutine F90_multiply
122
Labs!
Lab: Language Interoperability
Try out f2py and swig as time allows
123
Extra Credit: SciPy and SAGE
SciPy and SAGE
124
SciPy
• Open-source software for mathematics, science, and
engineering
• Information
– http://docs.scipy.org/
• Download
– http://scipy.org/Download
125
scipy
>>> import scipy; help(scipy)
odr
sparse.linalg.eigen.arpack
fftpack
sparse.linalg.eigen.lobpcg
lib.blas
sparse.linalg.eigen
stats
lib.lapack
maxentropy
integrate
linalg
interpolate
optimize
cluster
signal
sparse
---------------------------------
Orthogonal Distance Regression
Eigenvalue solver using iterative
Discrete Fourier Transform
Locally Optimal Block Preconditioned
Wrappers to BLAS library
Sparse Eigenvalue Solvers
Statistical Functions
Wrappers to LAPACK library
Routines for fitting maximum entropy
Integration routines
Linear algebra routines
Interpolation Tools
Optimization Tools
Vector Quantization / Kmeans
Signal Processing Tools
Sparse Matrices
126
FFT Example
>>> from scipy import *
# create input values
>>> v = zeros(1000)
>>> v[:100] = 1
# take FFT
>>> y = fft(v)
127
# plot results (rearranged so zero frequency is at center)
>>> x = arange(-500,500,1)
>>> plot(x, abs(concatenate((y[500:],y[:500]))))
127
FFT Results
Zoom
128
FFT Results Expanded
129
Optimization Example
>>> from scipy import optimize as op
# create function
>>> def square(x): return x*x
>>> op.fmin(square, -5)
Optimization terminated successfully.
Current function value: 0.000000
Iterations: 20
Function evaluations: 40
array([ 0.])
>>> op.anneal(square, -5)
Warning: Cooled to 4.977261 at 2.23097753984 but this is not the smallest
point found.
(-0.068887616435477916, 5)
130
SAGE Functionality
http://showmedo.com/videotutorials/ search for sage
131
Labs!
Lab: SciPy
Try out scipy as time allows
132
Extra Credit
Traits
133
What are traits?
• Traits add typing-like facilities to Python.
– Python by default has no explicit typing.
• Traits are bound to fields of classes.
• Traits allow classes to dictate the types for their fields.
• Furthermore, they can specify ranges!
• Traits also can be inherited.
Thanks to scipy.org for the original Traits slides.
134
An example
class Person(HasTraits)
name = Str
# String value, default is ''
age = Trait(35, TraitRange(1,120))
weight = Trait(160.0,TraitRange(75.0,500.0))
# Creat someone, default age is 35, 160.0 lbs weight
>>> someone = Person()
>>> someone.name = ‘Bill’
>>> print '%s: %s' % (someone.name, someone.age)
Bill: 35
>>> person.age = 75
# OK
>>> person.weight = ‘fat’ # Error, not a number.
135
Another example: Enumerated traits
class InventoryItem(HasTraits)
name = Str
# String value, default is ''
stock = Trait(None, 0, 1, 2, 3, 'many')
# Enumerated list, default value
>>> hats = InventoryItem()
>>> hats.name = 'Stetson'
>>> print '%s: %s' % (hats.name,
Stetson: None
>>> hats.stock = 2
# OK
>>> hats.stock = 'many' # OK
>>> hats.stock = 4
# Error,
>>> hats.stock = None
# Error,
is 'None'
hats.stock)
value is not in permitted list
value is not in permitted list
136
Why traits? Validation
• It’s nice to let the author of a class be able to enforce
checking not only of types, but values
class Amplifier(HasTraits)
volume = Range(0.0, 11.0, default=5.0)
# This one goes to eleven...
>>> spinal_tap = Amplifier()
>>> spinal_tap.volume
5.0
>>> spinal_tap.volume = 11.0 #OK
>>> spinal_tap.volume = 12.0 # Error, value is out of range
137
Notification (Events)
• You can also use notification to trigger actions when traits
change.
class Amplifier(HasTraits)
volume = Range(0.0, 11.0, default=5.0)
def _volume_changed(self, old, new):
if new == 11.0:
print “This one goes to eleven”
# This one goes to eleven...
>>> spinal_tap = Amplifier()
>>> spinal_tap.volume = 11.0
This one goes to eleven
138
Notification (Events)
• You can even set up notification for classes with traits
later, from the caller or class instantiator.
class Amplifier(HasTraits)
volume = Range(0.0, 11.0, default=5.0)
# This one goes to eleven...
>>> def volume_changed(self, old, new):
...
if new == 11.0:
...
print “This one goes to eleven”
>>> spinal_tap = Amplifier()
>>> spinal_tap.on_trait_change(volume_changed, ‘volume’)
>>> spinal_tap.volume = 11.0
This one goes to eleven
139
Delegation model
• Traits can be delegated
class Company(HasTraits)
address = Str
class Employee(HasTraits)
__traits__ = {
‘name’: ‘’,
‘employer’: Company,
‘address’: TraitDelegate(‘employer’)
}
• By default, employee has same address as their employer.
• However, you can assign a new address to the employee if a different
address is necessary.
140
More about Traits
• Traits originally came from the GUI world
– A trait may be the ranges for a slider widget for example.
• Clever use of traits can enforce correct units in
computations.
– You can check traits when two classes interact to ensure that
their units match!
– NASA lost a satellite due to this sort of issue, so it’s definitely
important!
NASA Mars Climate Orbiter: units victim
141
Dune
A Python-CCA, Rapid
Prototyping Framework
Craig E Rasmussen, Matthew J. Sottile
Christopher D. Rickett, Sung-Eun Choi,
142
Scientific Software Life Cycle: A need for two
software environments (Research and Production)
Maintenance and
Refinement
Exploration
Concept
Porting
Production
Research
Reuse
The challenge is to mix a rapid-prototyping
environment with a production environment
143
Rapid Prototyping Framework: An AdvectionDiffusion-Reaction component-application example
Dune
Python-CCA
Framework
for Component
Assembly
And
Language
Interoperability
Advection
Driver
(main)
Time Integrator
Multiphysics
Diffusion
Reaction
144
A “Python” Research Component
Python, Fortran,
or C/C++
Python
• A Research Component can be:
– A pure Python component for rapid prototyping
– Or a Fortran or C/C++ module, wrapped for reuse of production
“components”
145
A Production Component
Fortran
or C++
Python
• Remove the Python “cap” and the Fortran or C++
component can be linked and run in a traditional scientific
application.
146
Minimal Code to be a Python-CCA Component
•
Requirement to be a Python CCA component is minimal (five lines of Python
code)
# ---------------------------------------------------------# Register ports with a framework services object.
#
def setServices(self, services):
self.services = services
''' Provide an integrator port '''
services.addProvidesPort(self, "integrator", "adr.integrator")
''' Register uses ports '''
services.registerUsesPort("multiphysics", "adr.multiphysics")
147
Conclusions
• Stable, well-designed interfaces are key to supporting the
two modes of scientific computing, Research and
Production and to the sharing of components between
the two environments.
Fortran
or C++
Python
148
Python for High Productivity
Computing
July 2009 Tutorial
149
Overview of packages
• Python : http://www.python.org/
• SciPy : http://www.scipy.org/
• NumPy : http://numpy.scipy.org/
• FFTW : http://www.fftw.org/
• MPI4py : http://mpi4py.scipy.org/
• PySparse : http://pysparse.sourceforge.net/
• SAGE : http://www.sagemath.org/
• Traits : http://code.enthought.com/projects/traits
150
Thanks To
• Eric Jones, …
– Enthought
• Also many others for ideas
– python.org
– scipy.org
– Jose Unpingco
– https://www.osc.edu/cms/sip/
– http://showmedo.com/videotutorials/ipython
151
Portable Performance Evaluation of Python
Programs using TAU
152
Performance Evaluation of Python
• Introduction to TAU
– Python Instrumentation
– Measurement
– Analysis
• Lab Session: Python and TAU
153
What is TAU?
•
TAU is a performance evaluation tool
•
It supports parallel profiling and tracing
•
Profiling shows you how much (total) time was spent in each routine
•
Tracing shows you when the events take place in each process along a timeline
•
TAU uses a package called PDT for automatic instrumentation of the source code
•
Profiling and tracing can measure time as well as hardware performance counters
from your CPU
•
TAU can automatically instrument your source code (routines, loops, I/O, memory,
phases, etc.)
•
TAU runs on all HPC platforms and it is free (BSD style license)
•
TAU has instrumentation, measurement and analysis tools
– paraprof is TAU’s 3D profile browser
•
To use TAU, you need to set a couple of environment variables and substitute the
name of your compiler with a TAU shell script
154
Tutorial Goals
•
This tutorial is intended to the TAU performance system as a portable
performance evaluation tool for Python and Python programmers
•
Today you should leave here with a better understanding of…
– How to instrument your Python program with TAU
– Automatic instrumentation at the routine level
– Manual instrumentation at the loop/statement level
– Environment variables used for generating performance data
– How to use the TAU profile browser, ParaProf
– General familiarity with TAU’s use for Fortran, C++,C, MPI for mixed language
programming with Python
155
TAU
• Good References:
– “The TAU Parallel Performance System,” Sameer Shende and
Allen D. Malony, Intl. Journal of High Performance Computing
Applications, ACTS Special Issue, Spring 2006.
– TAU Users Guide.
– Both available from http://tau.uoregon.edu
156
Performance Evaluation
•
Profiling
– Presents summary statistics of performance metrics
–
–
–
–
–
•
number of times a routine was invoked
exclusive, inclusive time/hpm counts spent executing it
number of instrumented child routines invoked, etc.
structure of invocations (calltrees/callgraphs)
memory, message communication sizes also tracked
Tracing
–
Presents when and where events took place along a global timeline
– timestamped log of events
– message communication events (sends/receives) are tracked
–
shows when and where messages were sent
– large volume of performance data generated leads to more perturbation in the program
•
Most performance tools support either profiling or tracing - TAU supports both!
157
TAU Parallel Performance System Goals
• Multi-level performance instrumentation
– Multi-language automatic source instrumentation
• Flexible and configurable performance measurement
• Widely-ported parallel performance profiling system
– Computer system architectures and operating systems
– Different programming languages and compilers
• Support for multiple parallel programming paradigms
– Multi-threading, message passing, mixed-mode, hybrid
• Integration in complex software, systems, applications
158
TAU Performance System Architecture
event
selection
159
TAU Performance System Architecture
160
Program Database Toolkit (PDT)
Application
/ Library
C / C++
parser
IL
C / C++
IL analyzer
Program
Database
Files
Fortran parser
F77/90/95
IL
Fortran
IL analyzer
DUCTAPE
PDBhtml
Program
documentation
SILOON
Application
component glue
CHASM
C++ / F90/95
interoperability
TAU_instr
Automatic source
instrumentation
161
Automatic Source-Level Instrumentation in TAU
using Program Database Toolkit (PDT)
TAU source
analyzer
Application
source
Parsed
program
tau_instrumentor
Instrumented
source
Instrumentation
specification file
162
Steps of Performance Evaluation
• Collect basic routine-level timing profile to determine
where most time is being spent
• Collect routine-level hardware counter data to determine
types of performance problems
• Collect callpath profiles to determine sequence of events
causing performance problems
• Conduct finer-grained profiling and/or tracing to pinpoint
performance bottlenecks
– Loop-level profiling with hardware counters
– Tracing of communication operations
163
Using TAU: A brief Introduction
•
TAU supports several measurement options (profiling, tracing, profiling with
hardware counters, etc.)
•
Each measurement configuration of TAU corresponds to a unique stub makefile
that is generated when you configure it
•
To instrument source code using PDT
– Choose an appropriate TAU stub makefile in <taudir>/<arch>/lib dir (or $TAU):
% setenv TAU_MAKEFILE $TAU/Makefile.tau-mpi-pdt
% setenv TAU_OPTIONS ‘-optVerbose …’ (see tau_compiler.sh -help)
And use tau_f90.sh, tau_cxx.sh or tau_cc.sh as Fortran, C++ or C compilers:
% mpif90 foo.f90
changes to
% tau_f90.sh foo.f90
•
Execute application and analyze performance data:
% pprof (for text based profile display)
% paraprof (for GUI)
164
TAU Measurement Configuration
% cd $TAU; ls Makefile.*
Makefile.tau-pdt
Makefile.tau-mpi-pdt
Makefile.tau-opari-openmp-mpi-pdt
Makefile.tau-mpi-scalasca-epilog-pdt
Makefile.tau-mpi-vampirtrace-pdt
Makefile.tau-mpi-papi-pdt
Makefile.tau-papi-mpi-openmp-opari-pdt
Makefile.tau-pthread-pdt…
• For an MPI+F90 application, you may want to start with:
Makefile.tau-mpi-pdt
–
Supports MPI instrumentation & PDT for automatic source instrumentation
– % setenv TAU_MAKEFILE $TAU/Makefile.tau-mpi-pdt
– % tau_f90.sh matrix.f90 -o matrix
165
Usage Scenarios: Routine Level Profile
•
Goal: What routines account for the most time? How much?
•
Flat profile with wallclock time:
166
Solution: Generating a flat profile with MPI
% setenv TAU_MAKEFILE $TAU/Makefile.tau-mpi-pdt
% set path=(/usr/local/packages/tau/i386_linux/bin $path)
OR
% source $PET_HOME/src/tau.cshrc [ or tau.bashrc] on DSRC systems
% make F90=tau_f90.sh
(Or edit Makefile and change F90=tau_f90.sh)
% mpirun -np 4 ./a.out
% paraprof -–pack app.ppk
Move the app.ppk file to your desktop.
% paraprof app.ppk
167
Usage Scenarios: Loop Level Instrumentation
•
Goal: What loops account for the most time? How much?
•
Flat profile with wallclock time with loop instrumentation:
168
Solution: Generating a loop level profile
% setenv TAU_MAKEFILE $TAU/Makefile.tau-mpi-pdt
% setenv TAU_OPTIONS ‘-optTauSelectFile=select.tau –optVerbose’
% cat select.tau
BEGIN_INSTRUMENT_SECTION
loops routine=“#”
END_INSTRUMENT_SECTION
% set path=(/usr/local/packages/tau/i386_linux/bin $path)
% make F90=tau_f90.sh
(Or edit Makefile and change F90=tau_f90.sh)
% mpirun -np 4 ./a.out
% paraprof -–pack app.ppk
Move the app.ppk file to your desktop.
% paraprof app.ppk
169
Usage Scenarios: Compiler-based Instrumentation
•
Goal: Easily generate routine level performance data using the compiler
instead of PDT for parsing the source code
170
Use Compiler-Based Instrumentation
% setenv TAU_MAKEFILE $TAU/Makefile.tau-mpi
% setenv TAU_OPTIONS ‘-optCompInst –optVerbose’
% % set path=(/usr/local/packages/tau/i386_linux/bin $path)
% make F90=tau_f90.sh
(Or edit Makefile and change F90=tau_f90.sh)
% qsub
run.job
% paraprof -–pack app.ppk
Move the app.ppk file to your desktop.
% paraprof app.ppk
171
Usage Scenarios: Calculate mflops in Loops
•
Goal: What MFlops am I getting in all loops?
•
Flat profile with PAPI_FP_INS/OPS and time
(-multiplecounters) with loop instrumentation:
172
Generate a PAPI profile with 2 or more counters
% setenv TAU_MAKEFILE $TAU/Makefile.tau-papi-mpi-pdt
% setenv TAU_OPTIONS ‘-optTauSelectFile=select.tau –optVerbose’
% cat select.tau
BEGIN_INSTRUMENT_SECTION
loops routine=“#”
END_INSTRUMENT_SECTION
% set path=(/usr/local/packages/tau/i386_linux/bin $path)
% make F90=tau_f90.sh
(Or edit Makefile and change F90=tau_f90.sh)
% setenv TAU_METRICS TIME:PAPI_FP_INS:PAPI_L1_DCM
% qsub
run.job
% paraprof -–pack app.ppk
Move the app.ppk file to your desktop.
% paraprof app.ppk
Choose Options -> Show Derived Panel -> Arg 1 = PAPI_FP_INS,
Arg 2 = GET_TIME_OF_DAY, Operation = Divide -> Apply, choose.
173
Derived Metrics in ParaProf
174
Usage Scenarios: Generating Callpath Profile
•
Goal: Who calls my MPI_Barrier()? Where?
•
Callpath profile for a given callpath depth:
175
Callpath Profile
•
Generates program callgraph
176
Generate a Callpath Profile
% setenv TAU_MAKEFILE $TAU/Makefile.tau-callpath-mpi-pdt
% set path=(/usr/local/packages/tau/i386_linux/bin $path)
% make F90=tau_f90.sh
(Or edit Makefile and change F90=tau_f90.sh)
% setenv TAU_CALLPATH_DEPTH 100
% mpirun -np 4 ./a.out
% paraprof -–pack app.ppk
Move the app.ppk file to your desktop.
% paraprof app.ppk
(Windows -> Thread -> Call Graph)
NOTE: In TAU v2.18.1+, you may choose to just set:
% setenv TAU_CALLPATH 1
instead of recompiling your code with the above stub makefile.
Any TAU instrumented executable can generate callpath profiles.
177
Usage Scenario: Detect Memory Leaks
178
Detect Memory Leaks
% setenv TAU_MAKEFILE $TAU/Makefile.tau-mpi-pdt
% setenv TAU_OPTIONS ‘-optDetectMemoryLeaks -optVerbose’
% set path=(/usr/local/packages/tau/i386_linux/bin $path)
% make F90=tau_f90.sh
(Or edit Makefile and change F90=tau_f90.sh)
% setenv TAU_CALLPATH_DEPTH 100
% mpirun -np 4 ./a.out
% paraprof -–pack app.ppk
Move the app.ppk file to your desktop.
% paraprof app.ppk
(Windows -> Thread -> Context Event Window -> Select thread -> select... expand tree)
(Windows -> Thread -> User Event Bar Chart -> right click LEAK
-> Show User Event Bar Chart)
179
Usage Scenarios: Instrument a Python program
•
Goal: Generate a flat profile for a Python program
180
Usage Scenarios: Instrument a Python program
Original
code:
Create a wrapper:
181
Generate a Python Profile
% setenv TAU_MAKEFILE $TAU/Makefile.tau-python-pdt
% set path=(/usr/local/packages/tau/i386_linux/bin $path)
% cat wrapper.py
import tau
def OurMain():
import foo
tau.run(‘OurMain()’)
Uninstrumented:
% ./foo.py
Instrumented:
% setenv PYTHONPATH $TAU/bindings-python-pdt
(same options string as TAU_MAKEFILE)
% setenv LD_LIBRARY_PATH $TAU/bindings-python-pdt\:$LD_LIBRARY_PATH
% ./wrapper.py
Wrapper invokes foo and generates performance data
% pprof/paraprof
182
Usage Scenarios: Mixed Python+F90+C+pyMPI
•
Goal: Generate multi-level instrumentation for Python+MPI+C+F90+C++ ...
183
Generate a Multi-Language Profile w/ Python
% setenv TAU_MAKEFILE $TAU/Makefile.tau-python-mpi-pdt
% set path=(/usr/local/packages/tau/i386_linux/bin $path)
% setenv TAU_OPTIONS ‘-optShared -optVerbose…’
(Python needs shared object based TAU library)
% make F90=tau_f90.sh CXX=tau_cxx.sh CC=tau_cc.sh (build libs, pyMPI w/TAU)
% cat wrapper.py
import tau
def OurMain():
import App
tau.run(‘OurMain()’)
Uninstrumented:
% mpirun.lsf $PET_HOME/.unsupported/pyMPI-2.5b0/bin/pyMPI ./App.py
Instrumented:
% setenv PYTHONPATH $TAU/bindings-python-mpi-pdt
(same options string as TAU_MAKEFILE)
% setenv LD_LIBRARY_PATH $TAU/bindings-python-mpi-pdt\:$LD_LIBRARY_PATH
% mpirun –np 4 /usr/local/packages/pyMPI-TAU/bin/pyMPI ./wrapper.py
(Instrumented pyMPI with wrapper.py)
184
Usage Scenarios: Generating a Trace File
•
Goal: Identify the temporal aspect of performance. What happens in my code at a given
time? When?
•
Event trace visualized in Vampir/Jumpshot
185
VNG Process Timeline with PAPI Counters
186
Vampir Counter Timeline Showing I/O BW
187
Generate a Trace File
% setenv TAU_MAKEFILE $TAU
/lib/Makefile.tau-mpi-pdt-trace
or setenv TAU_TRACE 1 (in TAU v2.18.2+)
% set path=(/usr/local/packages/tau/i386_linux/bin $path)
% make F90=tau_f90.sh
(Or edit Makefile and change F90=tau_f90.sh)
% qsub run.job
% tau_treemerge.pl
(merges binary traces to create tau.trc and tau.edf files)
JUMPSHOT:
% tau2slog2 tau.trc tau.edf –o app.slog2
% jumpshot app.slog2
OR
VAMPIR:
% tau2otf tau.trc tau.edf app.otf –n 4 –z
(4 streams, compressed output trace)
% vampir app.otf
(or vng client with vngd server).
188
Usage Scenarios: Evaluate Scalability
•
Goal: How does my application scale? What bottlenecks occur at what core counts?
•
Load profiles in PerfDMF database and examine with PerfExplorer
189
Usage Scenarios: Evaluate Scalability
190
Performance Regression Testing
191
Evaluate Scalability using PerfExplorer Charts
% setenv TAU_MAKEFILE $TAU/Makefile.tau-mpi-pdt
% set path=(/usr/local/packages/tau/i386_linux/bin $path)
% make F90=tau_f90.sh
(Or edit Makefile and change F90=tau_f90.sh)
% mpirun -np 1 ./a.out
% paraprof -–pack 1p.ppk
% mpirun -np 2 ./a.out …
% paraprof -–pack 2p.ppk … and so on.
On your client:
% perfdmf_configure --create-default
(Chooses derby, blank user/passwd, yes to save passwd, defaults)
% perfexplorer_configure
(Yes to load schema, defaults)
% paraprof
(load each trial: DB -> Add Trial -> Type (Paraprof Packed Profile) -> OK) OR use perfdmf_loadtrial
Then,
% perfexplorer
(Select experiment, Menu: Charts -> Speedup)
192
Communication Matrix Display
•
Goal: What is the volume of inter-process communication? Along which calling path?
193
Evaluate Scalability using PerfExplorer Charts
% setenv TAU_MAKEFILE $TAU/Makefile.tau-mpi-pdt
% set path=(/usr/local/packages/tau/i386_linux/bin $path)
% make F90=tau_f90.sh
(Or edit Makefile and change F90=tau_f90.sh)
% setenv TAU_COMM_MATRIX 1
% setenv TAU_CALLPATH_DEPTH 10
% mpirun -np 4 ./a.out (setting the environment variables)
% paraprof
(Windows -> Communication Matrix)
194
TAU Instrumentation Approach
•
Support for standard program events
– Routines
– Classes and templates
– Statement-level blocks
•
Support for user-defined events
– Begin/End events (“user-defined timers”)
– Atomic events (e.g., size of memory allocated/freed)
– Selection of event statistics
•
Support definition of “semantic” entities for mapping
•
Support for event groups
•
Instrumentation optimization (eliminate instrumentation in
lightweight routines)
195
TAU Instrumentation
• Flexible instrumentation mechanisms at multiple levels
– Source code
– manual (TAU API, TAU Component API)
– automatic
– C, C++, F77/90/95 (Program Database Toolkit (PDT))
– OpenMP (directive rewriting (Opari), POMP spec)
– Object code
– pre-instrumented libraries (e.g., MPI using PMPI)
– statically-linked and dynamically-linked
– Executable code
– dynamic instrumentation (pre-execution) (DynInstAPI)
– virtual machine instrumentation (e.g., Java using JVMPI)
– Proxy Components
196
Using TAU
•
Configuration
•
Instrumentation
–
–
–
–
–
–
Manual
MPI – Wrapper interposition library
PDT- Source rewriting for C,C++, F77/90/95
OpenMP – Directive rewriting
Component based instrumentation – Proxy components
Binary Instrumentation
– DyninstAPI – Runtime Instrumentation/Rewriting binary
– Java – Runtime instrumentation
– Python – Runtime instrumentation
•
Measurement
•
Performance Analysis
197
TAU Measurement System Configuration
•
configure [OPTIONS]
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
{-c++=<CC>, -cc=<cc>}
Specify C++ and C compilers
{-pthread, -sproc}
Use pthread or SGI sproc threads
-openmp
Use OpenMP threads
-jdk=<dir>
Specify Java instrumentation (JDK)
-opari=<dir>
Specify location of Opari OpenMP tool
-papi=<dir>
Specify location of PAPI
-pdt=<dir>
Specify location of PDT
-dyninst=<dir>
Specify location of DynInst Package
-mpi[inc/lib]=<dir>
Specify MPI library instrumentation
-shmem[inc/lib]=<dir>
Specify PSHMEM library instrumentation
-python[inc/lib]=<dir>
Specify Python instrumentation
-epilog=<dir>
Specify location of EPILOG
-slog2[=<dir>]
Specify location of SLOG2/Jumpshot
-vtf=<dir>
Specify location of VTF3 trace package
-arch=<architecture>
Specify architecture explicitly
(bgp, craycnl,ibm64,ibm64linux…)
198
TAU Measurement System Configuration
•
configure [OPTIONS]
–
–
–
–
–
–
–
–
–
–
–
–
–
-TRACE
-PROFILE (default)
-PROFILECALLPATH
-PROFILEPHASE
-PROFILEMEMORY
-PROFILEHEADROOM
-MULTIPLECOUNTERS
-COMPENSATE
-CPUTIME
-PAPIWALLCLOCK
-PAPIVIRTUAL
-SGITIMERS
-LINUXTIMERS
Generate binary TAU traces
Generate profiles (summary)
Generate call path profiles
Generate phase based profiles
Track heap memory for each routine
Track memory headroom to grow
Use hardware counters + time
Compensate timer overhead
Use usertime+system time
Use PAPI’s wallclock time
Use PAPI’s process virtual time
Use fast IRIX timers
Use fast x86 Linux timers
199
TAU Measurement Configuration – Examples
•
./configure -pythoninc=/usr/include/python2.5
–
•
Configure using Python instrumentation
./configure -papi=/usr/local/packages/papi -pythoninc=/usr/include/python2.5
-pdt=/usr/local/pdtoolkit-3.14.1 -mpiinc=/usr/local/include
-mpilib=/usr/local/lib
–
Use PAPI counters (one or more) with C/C++/F90/Python automatic
instrumentation. Also instrument the MPI library.
•
Typically configure multiple measurement libraries
•
Each configuration creates a unique <arch>/lib/Makefile.tau-<options> stub
makefile (set TAU_MAKEFILE environment variable)
that corresponds to the configuration options specified.
– /usr/local/packages/tau/i386_linux/lib/Makefile.tau-mpi-python-pdt
– /usr/local/packages/tau/i386_linux/lib/Makefile.tau-papi-mpi-python-pdt
and bindings directory (add to PYTHONPATH & LD_LIBRARY_PATH):
– /usr/local/packages/tau/i386_linux/lib/bindings-papi-mpi-python-pdt
200
TAU_SETUP: A GUI for Installing TAU
201
Tau_[cxx,cc,f90].sh – Improves Integration in Makefiles
# set TAU_MAKEFILE and TAU_OPTIONS env vars
CXX = tau_cxx.sh
F90 = tau_f90.sh
CFLAGS =
LIBS = -lm
OBJS = f1.o f2.o f3.o … fn.o
app: $(OBJS)
$(CXX) $(LDFLAGS) $(OBJS) -o $@ $(LIBS)
.cpp.o:
$(CC) $(CFLAGS) -c $<
202
Using TAU with Python Applications
Step I: Configure TAU with Python
% configure –pythoninc=/usr/include/python2.5
% make clean; make install
Builds <taudir>/<arch>/lib/<bindings>/pytau.py and tau.py packages
for manual and automatic instrumentation respectively
% setenv PYTHONPATH $PYTHONPATH\:<taudir>/<arch>/lib/[<dir>]
203
Python Automatic Instrumentation Example
#!/usr/bin/env/python
import tau
from time import sleep
Running:
% setenv PYTHONPATH
<tau>/<arch>/lib
% ./auto.py
def f2():
print “ In f2: Sleeping for 2 seconds ”
Instruments OurMain, f1, f2,
print…
sleep(2)
def f1():
print “ In f1: Sleeping for 3 seconds ”
sleep(3)
def OurMain():
f1()
tau.run(‘OurMain()’)
204
Optimization of Program Instrumentation
• Need to eliminate instrumentation in frequently executing
lightweight routines
• Throttling of events at runtime (default):
% setenv TAU_THROTTLE 1
Turns off instrumentation in routines that execute over 100000 times
(TAU_THROTTLE_NUMCALLS) and take less than 10
microseconds of inclusive time per call
(TAU_THROTTLE_PERCALL)
• Selective instrumentation file to filter events
% tau_instrumentor [options] –f <file>
• Compensation of local instrumentation overhead
% configure -COMPENSATE
205
Performance Analysis
• paraprof profile browser (GUI)
• pprof (text based profile browser)
• TAU traces can be exported to many different tools
– Vampir/VNG [T.U. Dresden] (formerly Intel (R) Trace Analyzer)
– Jumpshot (bundled with TAU) [Argonne National Lab] ...
206
Building Bridges to Other Tools: TAU
207
ParaProf
208
ParaProf - SciPy Callpath Profile
209
ParaProf - Callpath Thread Relations Window
210
ParaProf
211
ParaProf - SciPy Callgraph
212
PerfDMF: Performance Data Mgmt. Framework
213
Labs!
Lab: Python and TAU
214
Labs!
Lab: Explore http://www.scipy.org/
215
Labs!
Lab: Explore and Calculate
216
Lab Instructions
• Explore the Python web site
– http://python.org/
– Browse the Documentation
– Check out Topic Guides
• Try the math package
–
–
–
–
Convert Celcius to Fahrenheit (F = 9/5 C + 32)
What does math.hypot do?
How is math.pi different from math.sqrt?
Remember import, dir, and help
217
Labs!
Lab: Strings
218
Lab Instructions
• Explore the string module
– import string
– dir(string)
– help(string)
• Try some of the string functions
– string.find
– …
219
Labs!
Lab: Sequence Objects
220
Lab Instructions
• Become familiar with lists []
– Create a list of integers and assign to variable l
– Try various slices of your list
– Assign list to another variable, (ll = l)
– Change an element of l
– Print ll, what happened?
– Try list methods such as append, dir(list)
• Try creating a dictionary, d = {}
– Print a dictionary element using []
– Try methods, d.keys() and d.values()
221
Labs!
Lab: Functions
222
Lab Instructions
• In an editor, create file funcs.py
• Create a function, mean(), that returns the mean of the
elements in a list object
– You will need to use the len function
– Use for i in range():
• Test your function in Python
• Modify mean()
– Use for x in list:
• Retest mean()
223
Labs!
Lab: Classes
224
Lab Instructions
• Create SimpleStat class in SimpleStat.py
– Create constructor that takes a list object
– Add attribute, list_obj to contain list object
– Create method, mean()
– Returns the mean of the contained list object
– Create method, greater_than_mean()
– Returns number of elements greater than the mean
– Test your class from Python interpreter
– What does type(SimpleStat) return?
– Did you import or from SimpleStat import *
225
Labs!
Lab: Numerical Array Basics
226
Lab Instructions
• Import numpy
– Try dir(numpy)
– Browse the documentation, help(numpy)
– Create and initialize arrays in different ways
– How is arange() different from range()?
– Try ones(), resize() and reshape()
– Become friendly with slices
– Try addition and multiplication with arrays
– Try sum, add, diagonal, trace, transpose
227
Labs!
Lab: Linear Algebra
228
Lab Instructions
• Goal: Investigate a college basketball rating
system
– Can be applied to any sport
– Multivariate linear regression to find team ratings
• Copy ratings.py games.py from disk
• $python -i games.py
• >>> ratings = numpy.linalg.solve(ah, bh)
– print team_names, ratings
– sort ratings
– ask instructor about the arrays ah and bh
229
PET Computational Environment (CE) Tool
Environment
• PET CE FAPOC David Cronk, [email protected]
• Ptoolsrte, PAPI, PerfSuite, TAU, and KOJAK installed in
$PET_HOME/pkgs on machines at DoD DSRCs
• Contact [email protected] with
–
–
–
–
questions
problems
suggestions
success stories!
230
Acknowledgements
“This publication was made possible through support
provided by DoD HPCMP PET activities through
Mississippi State University under the terms of
Agreement No. #GSO4TO1BFC0060. The opinions
expressed herein are those of the author(s) and do not
necessarily reflect the views of the DoD or Mississippi
State University.”
231
Acknowledgements
•
•
•
•
HPCMP DoD PET Program
•
University of Oregon
•
•
Los Alamos National Laboratory
•
Research Centre Juelich, Germany
Department of Energy
National Science Foundation
University of Tennessee
– David Cronk
– Joseph Thomas
– A. D. Malony, A. Morris, M. Sottile,
W. Spear
TU Dresden
– Holger Brunst
– Wolfgang Nagel
– Bernd Mohr
– Felix Wolf
232