Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
D EVELOPMENT IN A FRICA WITH R ADIO A STRONOMY Introduction to Python Original created by M. Henson* & adapted by J. Radcliffe (University of Manchester) 1. W HAT IS P YTHON ? Python is a scripting language that has been specifically designed to be readable and easy to use. It’s widely used in astronomy, for data analysis, visualising data and for running simulations. Python is a good alternative to IDL and Matlab as it offers the same (or better) functionality and is free. It’s generally not as efficient as languages such as C and Fortran, but it’s usually much quicker to write a program in python. This tutorial refers to versions 2.6.x/2.7.x of python only. Python 3.x has a very different syntax and will not be covered. Most of the machines have python 2.7 and 3 installed. To check, open up a terminal and type python to check for python 3.x and type python2.7 to check python2.7.x is installed. If it is installed on the machine then an interactive python session will begin. If it is not installed on the machine, then see Appendix A, which details how to install python. You shouldn’t need any prior programming experience to work through this document. There are a few exercises throughout for you to practice your skills. If you are already familiar with python feel free to skip to the end, where there is a summary of the key exercises. * http://www.jb.man.ac.uk/h̃enson/ 1 2. G ETTING S TARTED : C OMMAND L INE I NTERFACE To get started with python, open a terminal window and type python. Python has five main variable types that we’ll consider here: integer, float, string and boolean. Try the following: a=3. b=5 a=True a=‘Hello’ b="World" To check the type of a variable, write type(<variable name>) Notice that floats are indicated by a decimal point (3. or 3.0 rather than 3). Now consider operations. The main operators are +,−,%,∗ and ∗∗ . Note the output of the following: 3*5 3./5. 3/5. 3/5 2*5 2**5 17%3 Note that the inclusion of a float in an expression leads to a result that is a float. Python tends to work quite intuitively. Some of these operations also work on strings. For example, try a+b assuming a=‘Hello’ and "b=World" from above. Similarly, try a*4 2 3. C OMMENTS A comment is a piece of code that is ignored when the code is run. Comments in python begin with #. 4. L ISTS , T UPLES AND I NDEXING Python has several different ways of storing arrays of data. In native python,the main structures are lists and tuples. Try the following: a=(1,2,3,4,5) type(a) b=[‘Hello’,2,True,4,5] type(b) b[0] b[0]*b[1] b[-2] b[::-1] We see that square brackets indicate a list, whereas curly brackets indicate a tuple. Lists and tuples have a key difference - tuples are immutable, which means they can’t be changed. To illustrate this, try the following: b[1]+=2 b a[1]+=2 a[1] #Should return an error #a remains unchanged Try the following and note the output: a=‘Hello’ a[0] a[-1] a[:3] a[2:] We see that strings are just a form of list. One quick way of generating a list is the range function. Try the following lines to see what they do: 3 range(10) range(-5,-20) range(0,10,3) There’s a couple of other useful operations for list: in and len. What do you think they do? Try the following and note what they do: ‘e’ in a ‘i’ in a len(a) Other useful list operations are remove, pop,append and extend. These are object functions, which means they are used as <variable>.<function>() For example, to remove the first item in the list a whose value is ‘e’, type a.remove(‘e’) 5. G ETTING H ELP When using python interactively, there is a useful help. If you want to know how to use a function just type help(<function>) To see the functions available for a variable, type help(<variable>) 6. S CRIPTING So far, we’ve been running python interactively. To run a non-interactive python script, type your commands into a file (with the extension .py) and run with python <filename>.py 4 Alternatively, to avoid having to type python everytime you can make the script executable. To do this, write #!/usr/bin/env python on the first line of the python file. This tells the shell what to interpret the script with. Then, in your terminal type chmod +x <filename>.py which makes the file executable. Try running some of the statements from above in a script. To print the output to the screen, you will need to use the print statement. For example try running, print ‘Hello’,5,True,‘World’ 7. C ONDITIONAL S TATEMENTS There are four conditional statements in python: if, for, try and while. while is rarely used and will not be covered here. Try the following and note what happens: a=‘Jodrell’ if ‘Jod’ in a: a+=‘ Bank’ elif ‘Hub’ in a: a+=‘ Space Telescope’ else: print ‘I hadn’t planned for this’ Notice the whitespace - python relies on whitespace to indicate where a code block begins and ends. If you remove the whitespace from the above snippet it won’t work. **Tip** try to use spaces rather than tabs for whitespace. Different text editors interpret tabs as different numbers of spaces, and using tabs can lead to your code failing on different platforms. Python also allows for a more readable syntax 5 a=‘Jodrell’ if a is ‘Jodrell’: a+=‘ Bank’ elif a is ‘Hubble’: a+=‘ Space Telescope’ else: print ‘I hadn’t planned for this’ Now try the following to see what the for routine does: for k in range(0,5): if k<2: print k else: print k,‘is bigger than or equal to 2.’ Exercises: 1. Write a script to sum all odd numbers up to 1000. 2. Write a script to print all of the letters in a string in reverse order, with the position of the letter in the string next to it. For example, for the string ‘jodcast’, it would print t s a c d o j 7 6 5 4 3 2 1 Finally, the try statement. Implement the following (make sure x is undefined) and see what happens: print x try: print x: except: print ‘x’ has not yet been defined. print ‘Continuing...’ The try statement attempts to perform an operation. If that operation fails, it prevents the code from crashing and instead implements the except statement. 6 8. E XTERNAL M ODULES The most useful python functions can be found in the various python packages available. Packages are collections of functions that you can use in your programs. To make use of these external functions from a given package you first have to import the package. To do this, write import <package name> in your terminal or at the beginning of your script. If you want to give the package a shorthand name, use the syntax import <package name> as <nickname> For example, the package numpy is commonly imported as import numpy as np If numpy is imported using import numpy as np, then a numpy function is used as follows: np.<function name>(<function arguments>) Alternatively, when importing a module, all the functions can be added to the namespace. For example, if numpy was imported using the following syntax: from numpy import * then all numpy functions could be used with the following syntax: <function name>(<function arguments>) This approach should be used with caution. Some modules may have functions with similar names and if you import all functions from multiple modules then you may end up redefining a function with a given name. This makes it confusing when using the functions, as you can’t be sure which one you’re using. 8.1. N UMPY We’ve already met one way of storing arrays of data in python in lists and tuples. The numpy package provides another way - in numpy arrays. Numpy arrays can be multi-dimensional (unlike lists), and the numpy package provides function which can act on whole arrays at once. 7 Numpy is very well documented, with the documentation available here. Exercises: Use the numpy documentation to complete the following exercises 1. Create an array of length 10, where all but the fifth value are equal to 0. The fifth value is equal to 1. 2. Create a 3x3x3 array with random values. 3. Reshape the 3x3x3 array to a 1D array. Find the mean and standard deviation of that array using the numpy functions. 4. Create two 2x3 arrays of random values and divide one by the other elementwise. Save the resulting array using numpy.savetxt in a text file. 8.1.1. M ATHEMATICAL F UNCTIONS In python, most mathematical functions (e.g. trigonometric functions like sin and cos) are provided in the math package (use import math). However, most of these functions are also available in numpy, and using the numpy versions allows for them to be used on whole arrays at once. Exercise: 1. Create an array of evenly spaced particle positions in (r,θ ,φ), where 0 ≤ r ≤ 1, 0 ≤ θ ≤ π and −π ≤ φ ≤ π. Convert these positions to Cartesian coordinates. 8.2. M ATPLOTLIB Matplotlib is a python package for visualizing data. It allows for easy creation of line plots, scatter diagrams, histograms, surface plots and 3D plots. matplotlib.pyplot is a set of functions which make matplotlib work like MATLAB. It is commonly imported as import matplotlib.pyplot as plt Each pyplot (or plt) function changes a figure. To make a simple plot, try the following: x=np.arange(10) #Creating data for plotting plt.plot(x,x**2.,label=‘Function’) #Plot the data plt.xlabel(‘x’) #Label the x axis plt.ylabel(‘y’) #Label the y axis 8 plt.legend() plt.show() #Show a legend on the figure #Show the figure In the second line, the label argument creates the line label on the legend. Figures are saved using the savefig command. The try placing the following before show() in the previous example: plt.savefig(‘my-first-figure.png’) Matplotlib recognises the file extension at the end of the filename, and uses this to set the filetype for the image. Matplotlib recognises PNG, SVF, PDF and more. Now try placing the savefig() command after show(). What happens? This demonstrates that when you close the pop up window created by show(), this closes the figure in the program so that it can no longer be worked on. To avoid this, always save figures before calling show(). Since we can call show() after savefig, we see that savefig() does not close the figure. We can manually close a figure by using the command close(). Exercise: 1. Write this in a script as you will need to edit it in a follow up exercise a) Load the data from the file halo-spins.txt using numpy. These data are the spins of dark matter halos from simulations of galaxy clusters. b) Take the natural logathrithm of the spins and use this as your dataset. c) Find the mean and standard deviation of the data d) Plot a histogram of the halo spins e) Plot the best fit Gaussian over the histogram and label it with the mean and standard deviation. f) Spins are denoted by the parameter λ. Label the axes accordingly. g) Save the figure as a PNG file. Matplotlib also has an object-oriented interface. This approach is useful for creating multiple plots at once, whether they are side-by-side or independent. In this approach, the user uses pyplot to create figures and then explicitly creates axes in those figures for plotting. To create a figure, try the following: fig=plt.figure() #Creates an empty figure 9 ax=plt.add_subplot(1,1,1,1) x=np.arange(10) ax.plot(x,x**2.,label=r‘$y=x^2$’) ax.set_xlabel(‘x’) ax.set_ylabel(‘y’) ax.legend() fig.savefig(‘first-figure.png’) ax.cla() fig.clf() #Adds a set of axes to the figure #Creating data for plotting #Clears the axes #Clears the figure This illustrates the slight difference in syntax in the object oriented interface. As well as using the object oriented interface, we’ve also changed one other thing about this figure. What tool have we used to label the legend? Exercise: 1. Edit the script from the previous exercise so that the resulting figure has the following properties. a) The figure is of a size 12x10 (in inches) b) The limits on the x and y axes are sensible. c) The fitted line has a line width of 3. d) All text has a fontsize of 18. e) The dpi of the saved figure is 256. f) The histogram is white with teal hatching. g) The axes major tick sizes are 10pt and the minor tick sizes are 5pt. h) The axes have line widths of 4. i) When saving the file, if there is an existing file with that name in the save location, give the output file a different name (requires another external module). 9. U SEFUL M ODULES AND S OFTWARE The following are python modules that are of particular use in astronomy and astrophysics: • scipy Scientific functions, including correlation analysis, signal processing and curve fitting. 10 • astropy Combines lots of useful astronomy modules and packages into one. • APLpy For making nice FITS images. • YT Useful for visualizing 2D and 3D data, particularly for simulations. • pyfits FITS file manipulation. Most of this functionality is included in astropy. • mpi4py A message passing interface for python which enables for parallelisation across There are also various pieces of ython-based software available in astronomy that use a similar syntax. For example, CASA and Parseltongue are useful for radio observers. 10. T IPS AND T RICKS 10.1. S TART- UP F ILES If you often use python interactively, you may find that you are always initially loading similar modules (e.g. numpy or matplotlib). Fortunately, there is a way to avoid having to do this everytime you start a new session: start-up files. To do this, first create a blank python script. It can be called anything you like, for example, .pythonstart.py. In the script, write any python commands that you would like executed at the beginning of all future interactive python sessions. For example, you may include something like the following: import numpy as np import h5py import matplotlib.pyplot as plt from matplotlib import rcParams rcParams.update({‘font.size’:18,‘text.latex.preamble’:[r"\usepackage{amsmath}"]}) This start-up file will cause the numpy, h5py and matplotlib.pyplot modules to be loaded for each interactive session. It also changes the default fontsize for text in matplotlib figures to 18pt. In a similar way, you can design your start-up file to set default figure sizes, line thickness and colour schemes. If you need additional LATEX packages when writing your axis labels and 11 figure legends, you can include them here (or alternatively at the beginning of any python script). Save this python script in a sensible place where it won’t be deleted. To have this script executed whenever an interactive python session begins, the environment variable PYTHONSTARTUP needs to be directed at this file. This involves editing the .bashrc file or .cshrc file for bash and c-shell respectively. For example, if the file .pythonstartup.py is saved in the /home/henson/ directory, then the following line needs to be added to the .bashrc file: export PYTHONSTARTUP=$/home/henson/.pythonstartup if you are running bash. If you are running c-shell, the following needs to be added to the .cshrc file: set PYTHONSTARTUP=(/home/henson/.pythonstart.py) In either case, the relevant file should be located in the home directory. The name of python start-up file does not have to begin with ‘.’, but it is commonly chosen since a filename beginning with a full stop indicates a hidden file on Linux/UNIX systems. 10.2. M ATPLOTLIB : D EFAULT CYCLING THROUGH LINE STYLES By default, matplotlib automatically cycles through different colours when you plot multiple lines on the same graph. By default it doesn’t cycle through different linestyles. If you have write access for your matplotlib package (you will if you’re using anaconda), then this can be changed. Navigate to the matplotlib install location. If you’re not sure where this is, open an interactive session of python and try the following: import matplotlib matplotlib.__file__ This will give you the location of the __init__.pyc file. Navigate to the directory where this file is located. To find the file that needs editing, open a terminal in this directory and type the following: grep -r "itertools.cycle(clist)" . This will find the file in this directory that contains the string itertools.cycle(clist). Open this file in a text editor, and find the line that says 12 self.color_cycle = itertools.cycle(clist) Under this, add the following line: self.line_cycle = itertools.cycle(["-","--",":","-.",]) The order of this list determines the order in which matplotlib will cycle through linestyles. Now, look for the line: kwargs[‘color’]=kw[‘color’]=six.next(self.color_cycle) and add the following line underneath kwargs[‘linestyle’]=kw[‘linestyle’]=six.next(self.line_cycle) Don’t worry if the syntax in the file is slightly different - it may vary in different versions of matplotlib. Just ensure you follow the same syntax as is already present for the colour cycling, but replace color with line or linestyle where appropriate. A. I NSTALLING P YTHON One of the easiest ways to install python is to use the package Anaconda. This is not the only way to install python, but it is the only method that will be covered here. You do not need administrator/root permissions to install Anaconda, provided you choose an install location that you have permission to write to. To install Anaconda, go to this website and download the recommended installer for your operating system. Follow the steps specific to your system. During the install you will have to choose a location to store Anaconda. It’s better to store this in your own folder in the scratch area. On a linux device you may need an additional step if you use C shell rather than bash. To check which shell you are using, type echo $SHELL in a terminal window. If you’re using C Shell, you wil need to append the following line to your .cshrc file located in your /home/<username>/ area: set PATH = (<location of anaconda>/anaconda/bin $PATH) This adds anaconda to your path. You can always check what’s in your path by typing echo $PATH into a new terminal window. 13 B. U SING A NACONDA Anaconda is not just an easy way of installing python - it also makes updating python and installing new packages much easier. To update Anaconda itself open a terminal and type conda update conda conda update anaconda To install a new package type conda install <package name> To update a package type conda update <package name> To delete anaconda, just delete the anaconda directory and its contents. C. S UMMARY OF E XERCISES Exercises A: From Section 7 1. Write a script to sum all odd numbers up to 1000. 2. Write a script to print all of the letters in a string in reverse order, with the position of the letter in the string next to it. For example, for the string ‘jodcast’, it would print t s a c d o j 7 6 5 4 3 2 1 Exercises B: From Section 8.1 Use the numpy documentation to complete the following exercises 1. Create an array of length 10, where all but the fifth value are equal to 0. The fifth value is equal to 1. 14 2. Create a 3x3x3 array with random values. 3. Reshape the 3x3x3 array to a 1D array. Find the mean and standard deviation of that array using the numpy functions. 4. Create two 2x3 arrays of random values and divide one by the other elementwise. Save the resulting array using numpy.savetxt in a text file. Exercise C: From Section 8.1.1 1. Create an array of evenly spaced particle positions in (r,θ ,φ), where 0 ≤ r ≤ 1, 0 ≤ θ ≤ π and −π ≤ φ ≤ π. Convert these positions to Cartesian coordinates. Exercises D: From Section 8.2 1. Write this in a script as you will need to edit it in a follow up exercise a) Load the data from the file halo-spins.txt using numpy. These data are the spins of dark matter halos from simulations of galaxy clusters. b) Take the natural logathrithm of the spins and use this as your dataset. c) Find the mean and standard deviation of the data d) Plot a histogram of the halo spins e) Plot the best fit Gaussian over the histogram and label it with the mean and standard deviation. f) Spins are denoted by the parameter λ. Label the axes accordingly. g) Save the figure as a PNG file. Exercises E: From Section 8.2 1. Edit the script from the previous exercise so that the resulting figure has the following properties. a) The figure is of a size 12x10 (in inches) b) The limits on the x and y axes are sensible. c) The fitted line has a line width of 3. d) All text has a fontsize of 18. e) The dpi of the saved figure is 256. f) The histogram is white with teal hatching. 15 g) The axes major tick sizes are 10pt and the minor tick sizes are 5pt. h) The axes have line widths of 4. i) When saving the file, if there is an existing file with that name in the save location, give the output file a different name (requires another external module). If you’d like further exercises to practise your programming skills, then Project Euler is a great place to start. It is not specific to Python, but it has a range of mathematical programming challenges. 16