Download Introduction to Python - Jodrell Bank Centre for Astrophysics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
D EVELOPMENT IN A FRICA WITH R ADIO A STRONOMY
Introduction to Python
Original created by M. Henson* & adapted by J. Radcliffe (University of Manchester)
1. W HAT IS P YTHON ?
Python is a scripting language that has been specifically designed to be readable and easy
to use. It’s widely used in astronomy, for data analysis, visualising data and for running
simulations. Python is a good alternative to IDL and Matlab as it offers the same (or better)
functionality and is free. It’s generally not as efficient as languages such as C and Fortran, but
it’s usually much quicker to write a program in python.
This tutorial refers to versions 2.6.x/2.7.x of python only. Python 3.x has a very different syntax
and will not be covered.
Most of the machines have python 2.7 and 3 installed. To check, open up a terminal and type
python to check for python 3.x and type python2.7 to check python2.7.x is installed. If it is
installed on the machine then an interactive python session will begin. If it is not installed on
the machine, then see Appendix A, which details how to install python.
You shouldn’t need any prior programming experience to work through this document. There
are a few exercises throughout for you to practice your skills. If you are already familiar with
python feel free to skip to the end, where there is a summary of the key exercises.
* http://www.jb.man.ac.uk/h̃enson/
1
2. G ETTING S TARTED : C OMMAND L INE I NTERFACE
To get started with python, open a terminal window and type python.
Python has five main variable types that we’ll consider here: integer, float, string and boolean.
Try the following:
a=3.
b=5
a=True
a=‘Hello’
b="World"
To check the type of a variable, write
type(<variable name>)
Notice that floats are indicated by a decimal point (3. or 3.0 rather than 3).
Now consider operations. The main operators are +,−,%,∗ and ∗∗ . Note the output of the
following:
3*5
3./5.
3/5.
3/5
2*5
2**5
17%3
Note that the inclusion of a float in an expression leads to a result that is a float.
Python tends to work quite intuitively. Some of these operations also work on strings. For
example, try
a+b
assuming a=‘Hello’ and "b=World" from above. Similarly, try
a*4
2
3. C OMMENTS
A comment is a piece of code that is ignored when the code is run. Comments in python begin
with #.
4. L ISTS , T UPLES AND I NDEXING
Python has several different ways of storing arrays of data. In native python,the main structures
are lists and tuples. Try the following:
a=(1,2,3,4,5)
type(a)
b=[‘Hello’,2,True,4,5]
type(b)
b[0]
b[0]*b[1]
b[-2]
b[::-1]
We see that square brackets indicate a list, whereas curly brackets indicate a tuple. Lists and
tuples have a key difference - tuples are immutable, which means they can’t be changed. To
illustrate this, try the following:
b[1]+=2
b
a[1]+=2
a[1]
#Should return an error
#a remains unchanged
Try the following and note the output:
a=‘Hello’
a[0]
a[-1]
a[:3]
a[2:]
We see that strings are just a form of list.
One quick way of generating a list is the range function. Try the following lines to see what
they do:
3
range(10)
range(-5,-20)
range(0,10,3)
There’s a couple of other useful operations for list: in and len. What do you think they do? Try
the following and note what they do:
‘e’ in a
‘i’ in a
len(a)
Other useful list operations are remove, pop,append and extend. These are object functions,
which means they are used as
<variable>.<function>()
For example, to remove the first item in the list a whose value is ‘e’, type
a.remove(‘e’)
5. G ETTING H ELP
When using python interactively, there is a useful help. If you want to know how to use a
function just type
help(<function>)
To see the functions available for a variable, type
help(<variable>)
6. S CRIPTING
So far, we’ve been running python interactively. To run a non-interactive python script, type
your commands into a file (with the extension .py) and run with
python <filename>.py
4
Alternatively, to avoid having to type python everytime you can make the script executable. To
do this, write
#!/usr/bin/env python
on the first line of the python file. This tells the shell what to interpret the script with. Then, in
your terminal type
chmod +x <filename>.py
which makes the file executable.
Try running some of the statements from above in a script. To print the output to the screen,
you will need to use the print statement. For example try running,
print ‘Hello’,5,True,‘World’
7. C ONDITIONAL S TATEMENTS
There are four conditional statements in python: if, for, try and while. while is rarely used
and will not be covered here.
Try the following and note what happens:
a=‘Jodrell’
if ‘Jod’ in a:
a+=‘ Bank’
elif ‘Hub’ in a:
a+=‘ Space Telescope’
else:
print ‘I hadn’t planned for this’
Notice the whitespace - python relies on whitespace to indicate where a code block begins
and ends. If you remove the whitespace from the above snippet it won’t work.
**Tip** try to use spaces rather than tabs for whitespace. Different text editors interpret tabs as
different numbers of spaces, and using tabs can lead to your code failing on different platforms.
Python also allows for a more readable syntax
5
a=‘Jodrell’
if a is ‘Jodrell’:
a+=‘ Bank’
elif a is ‘Hubble’:
a+=‘ Space Telescope’
else:
print ‘I hadn’t planned for this’
Now try the following to see what the for routine does:
for k in range(0,5):
if k<2:
print k
else:
print k,‘is bigger than or equal to 2.’
Exercises:
1. Write a script to sum all odd numbers up to 1000.
2. Write a script to print all of the letters in a string in reverse order, with the position of the
letter in the string next to it. For example, for the string ‘jodcast’, it would print
t
s
a
c
d
o
j
7
6
5
4
3
2
1
Finally, the try statement. Implement the following (make sure x is undefined) and see what
happens:
print x
try:
print x:
except:
print ‘x’ has not yet been defined.
print ‘Continuing...’
The try statement attempts to perform an operation. If that operation fails, it prevents the
code from crashing and instead implements the except statement.
6
8. E XTERNAL M ODULES
The most useful python functions can be found in the various python packages available.
Packages are collections of functions that you can use in your programs. To make use of these
external functions from a given package you first have to import the package. To do this, write
import <package name>
in your terminal or at the beginning of your script. If you want to give the package a shorthand
name, use the syntax
import <package name> as <nickname>
For example, the package numpy is commonly imported as
import numpy as np
If numpy is imported using import numpy as np, then a numpy function is used as follows:
np.<function name>(<function arguments>)
Alternatively, when importing a module, all the functions can be added to the namespace. For
example, if numpy was imported using the following syntax:
from numpy import *
then all numpy functions could be used with the following syntax:
<function name>(<function arguments>)
This approach should be used with caution. Some modules may have functions with similar
names and if you import all functions from multiple modules then you may end up redefining
a function with a given name. This makes it confusing when using the functions, as you can’t
be sure which one you’re using.
8.1. N UMPY
We’ve already met one way of storing arrays of data in python in lists and tuples. The numpy
package provides another way - in numpy arrays. Numpy arrays can be multi-dimensional
(unlike lists), and the numpy package provides function which can act on whole arrays at once.
7
Numpy is very well documented, with the documentation available here.
Exercises: Use the numpy documentation to complete the following exercises
1. Create an array of length 10, where all but the fifth value are equal to 0. The fifth value is
equal to 1.
2. Create a 3x3x3 array with random values.
3. Reshape the 3x3x3 array to a 1D array. Find the mean and standard deviation of that
array using the numpy functions.
4. Create two 2x3 arrays of random values and divide one by the other elementwise. Save
the resulting array using numpy.savetxt in a text file.
8.1.1. M ATHEMATICAL F UNCTIONS
In python, most mathematical functions (e.g. trigonometric functions like sin and cos) are
provided in the math package (use import math). However, most of these functions are also
available in numpy, and using the numpy versions allows for them to be used on whole arrays
at once.
Exercise:
1. Create an array of evenly spaced particle positions in (r,θ ,φ), where 0 ≤ r ≤ 1, 0 ≤ θ ≤ π
and −π ≤ φ ≤ π. Convert these positions to Cartesian coordinates.
8.2. M ATPLOTLIB
Matplotlib is a python package for visualizing data. It allows for easy creation of line plots,
scatter diagrams, histograms, surface plots and 3D plots.
matplotlib.pyplot is a set of functions which make matplotlib work like MATLAB. It is
commonly imported as
import matplotlib.pyplot as plt
Each pyplot (or plt) function changes a figure. To make a simple plot, try the following:
x=np.arange(10)
#Creating data for plotting
plt.plot(x,x**2.,label=‘Function’)
#Plot the data
plt.xlabel(‘x’)
#Label the x axis
plt.ylabel(‘y’)
#Label the y axis
8
plt.legend()
plt.show()
#Show a legend on the figure
#Show the figure
In the second line, the label argument creates the line label on the legend.
Figures are saved using the savefig command. The try placing the following before show()
in the previous example:
plt.savefig(‘my-first-figure.png’)
Matplotlib recognises the file extension at the end of the filename, and uses this to set the
filetype for the image. Matplotlib recognises PNG, SVF, PDF and more.
Now try placing the savefig() command after show(). What happens?
This demonstrates that when you close the pop up window created by show(), this closes the
figure in the program so that it can no longer be worked on. To avoid this, always save figures
before calling show(). Since we can call show() after savefig, we see that savefig() does
not close the figure. We can manually close a figure by using the command close().
Exercise:
1. Write this in a script as you will need to edit it in a follow up exercise
a) Load the data from the file halo-spins.txt using numpy. These data are the spins of
dark matter halos from simulations of galaxy clusters.
b) Take the natural logathrithm of the spins and use this as your dataset.
c) Find the mean and standard deviation of the data
d) Plot a histogram of the halo spins
e) Plot the best fit Gaussian over the histogram and label it with the mean and standard deviation.
f) Spins are denoted by the parameter λ. Label the axes accordingly.
g) Save the figure as a PNG file.
Matplotlib also has an object-oriented interface. This approach is useful for creating multiple
plots at once, whether they are side-by-side or independent. In this approach, the user uses
pyplot to create figures and then explicitly creates axes in those figures for plotting. To create a
figure, try the following:
fig=plt.figure()
#Creates an empty figure
9
ax=plt.add_subplot(1,1,1,1)
x=np.arange(10)
ax.plot(x,x**2.,label=r‘$y=x^2$’)
ax.set_xlabel(‘x’)
ax.set_ylabel(‘y’)
ax.legend()
fig.savefig(‘first-figure.png’)
ax.cla()
fig.clf()
#Adds a set of axes to the figure
#Creating data for plotting
#Clears the axes
#Clears the figure
This illustrates the slight difference in syntax in the object oriented interface. As well as using
the object oriented interface, we’ve also changed one other thing about this figure. What tool
have we used to label the legend?
Exercise:
1. Edit the script from the previous exercise so that the resulting figure has the following
properties.
a) The figure is of a size 12x10 (in inches)
b) The limits on the x and y axes are sensible.
c) The fitted line has a line width of 3.
d) All text has a fontsize of 18.
e) The dpi of the saved figure is 256.
f) The histogram is white with teal hatching.
g) The axes major tick sizes are 10pt and the minor tick sizes are 5pt.
h) The axes have line widths of 4.
i) When saving the file, if there is an existing file with that name in the save location,
give the output file a different name (requires another external module).
9. U SEFUL M ODULES AND S OFTWARE
The following are python modules that are of particular use in astronomy and astrophysics:
• scipy
Scientific functions, including correlation analysis, signal processing and curve fitting.
10
• astropy
Combines lots of useful astronomy modules and packages into one.
• APLpy
For making nice FITS images.
• YT
Useful for visualizing 2D and 3D data, particularly for simulations.
• pyfits
FITS file manipulation. Most of this functionality is included in astropy.
• mpi4py
A message passing interface for python which enables for parallelisation across
There are also various pieces of ython-based software available in astronomy that use a similar
syntax. For example, CASA and Parseltongue are useful for radio observers.
10. T IPS AND T RICKS
10.1. S TART- UP F ILES
If you often use python interactively, you may find that you are always initially loading similar
modules (e.g. numpy or matplotlib). Fortunately, there is a way to avoid having to do this
everytime you start a new session: start-up files. To do this, first create a blank python script.
It can be called anything you like, for example, .pythonstart.py. In the script, write any
python commands that you would like executed at the beginning of all future interactive
python sessions. For example, you may include something like the following:
import numpy as np
import h5py
import matplotlib.pyplot as plt
from matplotlib import rcParams
rcParams.update({‘font.size’:18,‘text.latex.preamble’:[r"\usepackage{amsmath}"]})
This start-up file will cause the numpy, h5py and matplotlib.pyplot modules to be loaded
for each interactive session. It also changes the default fontsize for text in matplotlib figures to
18pt. In a similar way, you can design your start-up file to set default figure sizes, line thickness
and colour schemes. If you need additional LATEX packages when writing your axis labels and
11
figure legends, you can include them here (or alternatively at the beginning of any python
script).
Save this python script in a sensible place where it won’t be deleted. To have this script executed
whenever an interactive python session begins, the environment variable PYTHONSTARTUP
needs to be directed at this file. This involves editing the .bashrc file or .cshrc file for
bash and c-shell respectively. For example, if the file .pythonstartup.py is saved in the
/home/henson/ directory, then the following line needs to be added to the .bashrc file:
export PYTHONSTARTUP=$/home/henson/.pythonstartup
if you are running bash. If you are running c-shell, the following needs to be added to the
.cshrc file:
set PYTHONSTARTUP=(/home/henson/.pythonstart.py)
In either case, the relevant file should be located in the home directory.
The name of python start-up file does not have to begin with ‘.’, but it is commonly chosen
since a filename beginning with a full stop indicates a hidden file on Linux/UNIX systems.
10.2. M ATPLOTLIB : D EFAULT CYCLING THROUGH LINE STYLES
By default, matplotlib automatically cycles through different colours when you plot multiple
lines on the same graph. By default it doesn’t cycle through different linestyles. If you have
write access for your matplotlib package (you will if you’re using anaconda), then this can be
changed. Navigate to the matplotlib install location. If you’re not sure where this is, open an
interactive session of python and try the following:
import matplotlib
matplotlib.__file__
This will give you the location of the __init__.pyc file. Navigate to the directory where this
file is located. To find the file that needs editing, open a terminal in this directory and type the
following:
grep -r "itertools.cycle(clist)" .
This will find the file in this directory that contains the string itertools.cycle(clist).
Open this file in a text editor, and find the line that says
12
self.color_cycle = itertools.cycle(clist)
Under this, add the following line:
self.line_cycle = itertools.cycle(["-","--",":","-.",])
The order of this list determines the order in which matplotlib will cycle through linestyles.
Now, look for the line:
kwargs[‘color’]=kw[‘color’]=six.next(self.color_cycle)
and add the following line underneath
kwargs[‘linestyle’]=kw[‘linestyle’]=six.next(self.line_cycle)
Don’t worry if the syntax in the file is slightly different - it may vary in different versions of
matplotlib. Just ensure you follow the same syntax as is already present for the colour cycling,
but replace color with line or linestyle where appropriate.
A. I NSTALLING P YTHON
One of the easiest ways to install python is to use the package Anaconda. This is not the only
way to install python, but it is the only method that will be covered here. You do not need
administrator/root permissions to install Anaconda, provided you choose an install location
that you have permission to write to.
To install Anaconda, go to this website and download the recommended installer for your
operating system. Follow the steps specific to your system. During the install you will have to
choose a location to store Anaconda. It’s better to store this in your own folder in the scratch
area.
On a linux device you may need an additional step if you use C shell rather than bash. To check
which shell you are using, type echo $SHELL in a terminal window. If you’re using C Shell, you
wil need to append the following line to your .cshrc file located in your /home/<username>/
area:
set PATH = (<location of anaconda>/anaconda/bin $PATH)
This adds anaconda to your path. You can always check what’s in your path by typing echo
$PATH into a new terminal window.
13
B. U SING A NACONDA
Anaconda is not just an easy way of installing python - it also makes updating python and
installing new packages much easier. To update Anaconda itself open a terminal and type
conda update conda
conda update anaconda
To install a new package type
conda install <package name>
To update a package type
conda update <package name>
To delete anaconda, just delete the anaconda directory and its contents.
C. S UMMARY OF E XERCISES
Exercises A: From Section 7
1. Write a script to sum all odd numbers up to 1000.
2. Write a script to print all of the letters in a string in reverse order, with the position of the
letter in the string next to it. For example, for the string ‘jodcast’, it would print
t
s
a
c
d
o
j
7
6
5
4
3
2
1
Exercises B: From Section 8.1 Use the numpy documentation to complete the following exercises
1. Create an array of length 10, where all but the fifth value are equal to 0. The fifth value is
equal to 1.
14
2. Create a 3x3x3 array with random values.
3. Reshape the 3x3x3 array to a 1D array. Find the mean and standard deviation of that
array using the numpy functions.
4. Create two 2x3 arrays of random values and divide one by the other elementwise. Save
the resulting array using numpy.savetxt in a text file.
Exercise C: From Section 8.1.1
1. Create an array of evenly spaced particle positions in (r,θ ,φ), where 0 ≤ r ≤ 1, 0 ≤ θ ≤ π
and −π ≤ φ ≤ π. Convert these positions to Cartesian coordinates.
Exercises D: From Section 8.2
1. Write this in a script as you will need to edit it in a follow up exercise
a) Load the data from the file halo-spins.txt using numpy. These data are the spins of
dark matter halos from simulations of galaxy clusters.
b) Take the natural logathrithm of the spins and use this as your dataset.
c) Find the mean and standard deviation of the data
d) Plot a histogram of the halo spins
e) Plot the best fit Gaussian over the histogram and label it with the mean and standard deviation.
f) Spins are denoted by the parameter λ. Label the axes accordingly.
g) Save the figure as a PNG file.
Exercises E: From Section 8.2
1. Edit the script from the previous exercise so that the resulting figure has the following
properties.
a) The figure is of a size 12x10 (in inches)
b) The limits on the x and y axes are sensible.
c) The fitted line has a line width of 3.
d) All text has a fontsize of 18.
e) The dpi of the saved figure is 256.
f) The histogram is white with teal hatching.
15
g) The axes major tick sizes are 10pt and the minor tick sizes are 5pt.
h) The axes have line widths of 4.
i) When saving the file, if there is an existing file with that name in the save location,
give the output file a different name (requires another external module).
If you’d like further exercises to practise your programming skills, then Project Euler is a great
place to start. It is not specific to Python, but it has a range of mathematical programming
challenges.
16