Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Document related concepts

no text concepts found

Transcript

Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su Motivation • I’m interested in (re-)coding a general solver for sc/mcDESPOT relaxometry mapping – Open source – Extensibility to new/add’l sequences with better sensitivity to certain parameters, e.g. B0 and MWF – Better parallelization • But: – Large-scale code development in Matlab is cumbersome – Matlab is slow – C is hard (to write, read, debug) • Creates large barrier for others to contribute Matlab Pros • Ubiquitous, code is crossplatform • Can be fast with vectorized code • Data visualization • Quick development time • Great IDE for general research – Poor for large projects • Many useful native libraries/toolboxes • Built-in profiling tools Cons • Requires license, not free (though there is Octave) • Vectorized code is often non-intuitive to write and hard to read • Slow for general computations • Limited parallel computing and GPU support C/C++ Pros • Fast • Great IDEs for large coding projects – Not as great for general science work • Strong parallel computer support and CUDA • Community libraries for scientific computing • Profiling dependent on IDE Cons • High learning curve and development time • No data visualization • Compiled code is platform specific • Compiler is not generally installed with OSX and Windows Python Pros • Preinstalled with OSX and Linux-based systems • Readability is a core tenet (“pythonic”) • Quick development time • Native parallel computing support and community GPU modules • Extensive community support – Including neuroimagingspecific: NiPype, NiBabel • Built-in profiling module and some IDE tools Cons • Slow for general computation • Mixed bag of IDEs, some are great for coding, others for research • Out of the box it’s a poor alternative: no linear algebra or data visualization Python & Friends Cons Solutions • Slow for general computation • Mixed bag of IDEs, some are great for coding, others for research • Cython, JIT compilers like PyPy • There are a few good options out there that I’ve found: – Eclipse + PyDev, NetBeanz – Spyder – closest to MATLAB – Sage Math Notebook, IPython – like Mathematica – It may come down to preference. • Out of the box it’s a poor alternative: no linear algebra or data visualization • NumPy + SciPy + Matplotlib = PyLab – Sage Math includes these as well as other capabilities like symbolic math and graph theory Pythonic? • A term of praise used by the community to refer to clean code that is readable, intuitive, explicit, and takes advantage of coding idioms • Python people = [‘John Doe’, ’Jane Doe’, ’John Smith’] smith_family = [] for name in people: if ‘Smith’ in name: smith_family.append(name) smith_family = [name for name in people if ‘Smith’ in name] • Matlab people = {‘John Doe’, ’Jane Doe’, ’John Smith’}; smith_family = {} for name = people if strfind(name{1},’Smith’) smith_family = [smith_family name]; end end Installation • On any OS: – Sage Math (http://www.sagemath.org/), easy unzip installation but many “extraneous” packages (500MB) • Some issues on OSX with matplotlib • On OSX: – Use MacPorts to install Python (2.7), SciPy, matplotlib, and Cython • Requires gcc compiler available through Apple Developer NumPy + SciPy vs Matlab • Same core libraries: LAPACK • Equivalent syntax but not trying to be similar • http://www.scipy.org/ NumPy_for_Matlab_Users • Key differences: – Python uses 0 (zero) based indexing. The initial element of a sequence is found using [0]. – In NumPy arrays have pass-by-reference semantics. Slice operations are views into an array. Syntax Matlab • a\b • max(a(:)) • a(end-4:end) • [0:9] NumPy • linalg.lstsq(a,b) • a.max() • a[-5:] • arange(10.) or r_[:10.] Cython • Requires a C compiler • Cython is Python with C data types. – Dynamic typing of Python has overhead, slow for computation • Allows seamless coding of Python and embedded C-speed routines • Python values and C values can be freely intermixed, with conversions occurring automatically wherever possible – This means for debugging C-level code, we can use all the plotting tools available in Python • Process is sort of like EPIC 1. Write a .pyx source file 2. Run the Cython compiler to generate a C file 3. Run a C compiler to generate a compiled library 4. Run the Python interpreter and ask it to import the module Code Comparison – Matlab • Let’s try a really basic speed comparison test s = 0 tic for i = 1:1e8 s = s + i; end toc tic x = 1:1e8; sum(x) toc Code Comparison – C #include <time.h> #include <stdio.h> int main() { long long unsigned int sum = 0; long long unsigned int i = 0; long long unsigned int max = 100000000; clock_t tic = for (i = 0; i sum = sum } clock_t toc = clock(); <= max; i++) { + i; clock(); printf("%15lld, Elapsed: %f seconds\n", sum, (double)(toc - tic) / CLOCKS_PER_SEC); return 0; } Code Comparison – Python import time from numpy import * s = 0 t = time.time() for i in xrange(100000001): s += i print time.time() - t t = time.time() x = arange(100000001) sum(x) print time.time() - t Code Comparison – Cython • addCy.pyx import time cdef long long int n = 100000000 cdef long long int s = 0 cdef long long int i = 0 t = time.time() for i in xrange(n+1): s += i print time.time() – t • runCy.py import pyximport; pyximport.install() import addCy Speed Comparison Language/Implementation Time (sec) Matlab/For loop Matlab/Vector sum Python/For loop Python/NumPy sum 0.547 0.817 (0.036 for sum only!) 15.944 0.648 (0.135 for sum only) C/For loop Cython/For loop 0.222 0.068 (!) Summary • Python – Full featured programming language with an emphasis on “pythonic” readability • NumPy/SciPy – Core libraries for linear algebra and computation (fft, optimization) • Cython – Allows as much optimization as you want, degrading gracefully from high-level Python to low-level C – Profile, don’t over optimize too early!