Download Scientific Computing Beyond Matlab

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Scientific Computing Beyond
Matlab
Nov 19, 2012
Jason Su
Motivation
• I’m interested in (re-)coding a general solver for
sc/mcDESPOT relaxometry mapping
– Open source
– Extensibility to new/add’l sequences with better sensitivity
to certain parameters, e.g. B0 and MWF
– Better parallelization
• But:
– Large-scale code development in Matlab is cumbersome
– Matlab is slow
– C is hard (to write, read, debug)
• Creates large barrier for others to contribute
Matlab
Pros
• Ubiquitous, code is crossplatform
• Can be fast with vectorized
code
• Data visualization
• Quick development time
• Great IDE for general research
– Poor for large projects
• Many useful native
libraries/toolboxes
• Built-in profiling tools
Cons
• Requires license, not free
(though there is Octave)
• Vectorized code is often
non-intuitive to write and
hard to read
• Slow for general
computations
• Limited parallel computing
and GPU support
C/C++
Pros
• Fast
• Great IDEs for large coding
projects
– Not as great for general
science work
• Strong parallel computer
support and CUDA
• Community libraries for
scientific computing
• Profiling dependent on IDE
Cons
• High learning curve and
development time
• No data visualization
• Compiled code is platform
specific
• Compiler is not generally
installed with OSX and
Windows
Python
Pros
• Preinstalled with OSX and
Linux-based systems
• Readability is a core tenet
(“pythonic”)
• Quick development time
• Native parallel computing
support and community GPU
modules
• Extensive community support
– Including neuroimagingspecific: NiPype, NiBabel
• Built-in profiling module and
some IDE tools
Cons
• Slow for general
computation
• Mixed bag of IDEs, some are
great for coding, others for
research
• Out of the box it’s a poor
alternative: no linear
algebra or data visualization
Python & Friends
Cons
Solutions
• Slow for general computation
• Mixed bag of IDEs, some are great
for coding, others for research
• Cython, JIT compilers like PyPy
• There are a few good options out
there that I’ve found:
– Eclipse + PyDev, NetBeanz
– Spyder – closest to MATLAB
– Sage Math Notebook, IPython – like
Mathematica
– It may come down to preference.
• Out of the box it’s a poor
alternative: no linear algebra or
data visualization
• NumPy + SciPy + Matplotlib =
PyLab
– Sage Math includes these as well as
other capabilities like symbolic math
and graph theory
Pythonic?
• A term of praise used by the community to refer to clean code that
is readable, intuitive, explicit, and takes advantage of coding idioms
• Python
people = [‘John Doe’, ’Jane Doe’, ’John Smith’]
smith_family = []
for name in people:
if ‘Smith’ in name:
smith_family.append(name)
smith_family = [name for name in people if ‘Smith’ in name]
• Matlab
people = {‘John Doe’, ’Jane Doe’, ’John Smith’};
smith_family = {}
for name = people
if strfind(name{1},’Smith’)
smith_family = [smith_family name];
end
end
Installation
• On any OS:
– Sage Math (http://www.sagemath.org/), easy unzip
installation but many “extraneous” packages (500MB)
• Some issues on OSX with matplotlib
• On OSX:
– Use MacPorts to install Python (2.7), SciPy, matplotlib,
and Cython
• Requires gcc compiler available through Apple Developer
NumPy + SciPy vs Matlab
• Same core libraries: LAPACK
• Equivalent syntax but not trying to be similar
• http://www.scipy.org/
NumPy_for_Matlab_Users
• Key differences:
– Python uses 0 (zero) based indexing. The initial
element of a sequence is found using [0].
– In NumPy arrays have pass-by-reference
semantics. Slice operations are views into an array.
Syntax
Matlab
• a\b
• max(a(:))
• a(end-4:end)
• [0:9]
NumPy
• linalg.lstsq(a,b)
• a.max()
• a[-5:]
• arange(10.) or
r_[:10.]
Cython
• Requires a C compiler
• Cython is Python with C data types.
– Dynamic typing of Python has overhead, slow for computation
• Allows seamless coding of Python and embedded C-speed routines
• Python values and C values can be freely intermixed, with
conversions occurring automatically wherever possible
– This means for debugging C-level code, we can use all the plotting
tools available in Python
• Process is sort of like EPIC
1. Write a .pyx source file
2. Run the Cython compiler to generate a C file
3. Run a C compiler to generate a compiled library
4. Run the Python interpreter and ask it to import the module
Code Comparison – Matlab
• Let’s try a really basic speed comparison test
s = 0
tic
for i = 1:1e8
s = s + i;
end
toc
tic
x = 1:1e8;
sum(x)
toc
Code Comparison – C
#include <time.h>
#include <stdio.h>
int main()
{
long long unsigned int sum = 0;
long long unsigned int i = 0;
long long unsigned int max = 100000000;
clock_t tic =
for (i = 0; i
sum = sum
}
clock_t toc =
clock();
<= max; i++) {
+ i;
clock();
printf("%15lld, Elapsed: %f seconds\n", sum, (double)(toc - tic) /
CLOCKS_PER_SEC);
return 0;
}
Code Comparison – Python
import time
from numpy import *
s = 0
t = time.time()
for i in xrange(100000001):
s += i
print time.time() - t
t = time.time()
x = arange(100000001)
sum(x)
print time.time() - t
Code Comparison – Cython
• addCy.pyx
import time
cdef long long int n = 100000000
cdef long long int s = 0
cdef long long int i = 0
t = time.time()
for i in xrange(n+1):
s += i
print time.time() – t
• runCy.py
import pyximport; pyximport.install()
import addCy
Speed Comparison
Language/Implementation
Time (sec)
Matlab/For loop
Matlab/Vector sum
Python/For loop
Python/NumPy sum
0.547
0.817 (0.036 for sum only!)
15.944
0.648 (0.135 for sum only)
C/For loop
Cython/For loop
0.222
0.068 (!)
Summary
• Python
– Full featured programming language with an emphasis on
“pythonic” readability
• NumPy/SciPy
– Core libraries for linear algebra and computation (fft,
optimization)
• Cython
– Allows as much optimization as you want, degrading
gracefully from high-level Python to low-level C
– Profile, don’t over optimize too early!