Download File I/O

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Python Crash Course
File I/O
Sterrenkundig Practicum 2
V1.0
dd 07-01-2015
Hour 5
File I/O
• Types of input/output available
– Interactive
• Keyboard
• Screen
– Files
• Ascii/text
– txt
– csv
• Binary
• Structured
– FITS > pyFITS, astropy.io.fits
• URL
• Pipes
Interactive I/O, fancy output
>>> s = 'Hello, world.'
>>> str(s)
'Hello, world.'
>>> repr(s)
"'Hello, world.'"
>>> str(1.0/7.0)
'0.142857142857'
>>> repr(1.0/7.0)
'0.14285714285714285'
>>> x = 10 * 3.25
>>> y = 200 * 200
>>> s = 'The value of x is ' + repr(x) + ', and y is ' + repr(y) + '...'
>>> print s
The value of x is 32.5, and y is 40000...
>>> # The repr() of a string adds string quotes and backslashes:
... hello = 'hello, world\n'
>>> hellos = repr(hello)
>>> print hellos
'hello, world\n'
>>> # The argument to repr() may be any Python object:
... repr((x, y, ('spam', 'eggs')))
"(32.5, 40000, ('spam', 'eggs'))"
Interactive I/O, fancy output
Old string formatting
>>> import math
>>> print 'The value of PI is approximately %5.3f.' % math.pi
The value of PI is approximately 3.142.
New string formatting
>>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 7678}
>>> for name, phone in table.items():
...
print '{0:10} ==> {1:10d}'.format(name, phone)
...
Jack
==>
4098
Dcab
==>
7678
Sjoerd
==>
4127
Formatting I/O
A conversion specifier contains two or more characters and has the following
components, which must occur in this order:
•The "%" character, which marks the start of the specifier.
•Mapping key (optional), consisting of a parenthesised sequence of characters (for example,
(somename)).
•Conversion flags (optional), which affect the result of some conversion types.
•Minimum field width (optional). If specified as an "*" (asterisk), the actual width is read from the next
element of the tuple in values, and the object to convert comes after the minimum field width and
optional precision.
•Precision (optional), given as a "." (dot) followed by the precision. If specified as "*" (an asterisk), the
actual width is read from the next element of the tuple in values, and the value to convert comes after
the precision.
•Length modifier (optional).
•Conversion type.
>>> print '%(language)s has %(#)03d quote types.' % \
{'language': "Python", "#": 2}
Python has 002 quote types.
Formatting I/O
The conversion types are:
Conversion
Meaning
d
Signed integer decimal.
i
Signed integer decimal.
o
Unsigned octal.
u
Unsigned decimal.
x
Unsigned hexadecimal (lowercase).
X
Unsigned hexadecimal (uppercase).
e
Floating point exponential format (lowercase).
E
Floating point exponential format (uppercase).
f
Floating point decimal format.
F
Floating point decimal format.
g
Same as "e" if exponent is greater than -4 or less than precision, "f" otherwise.
G
Same as "E" if exponent is greater than -4 or less than precision, "F" otherwise.
c
Single character (accepts integer or single character string).
r
String (converts any python object using repr()).
s
String (converts any python object using str()).
%
No argument is converted, results in a "%" character in the result.
Interactive I/O
>>> print “Python is great,”, ”isn’t it?”
>>> str = raw_input( “Enter your input: ”)
>>> print “Received input is: “,str
Enter your input: Hello Python
Received input is: Hello Python
>>> str = input("Enter your input: ");
>>> print "Received input is: ", str
Enter your input: [x*5 for x in range(2,10,2)]
Received input is: [10, 20, 30, 40]
If the readline modules was loaded the raw_input() will use it to provide elaborate line editing and
history features.
File I/O
>>> fname = ‘myfile.dat’
>>> f = file(fname)
>>> lines = f.readlines()
>>> f.close()
>>> f = file(fname)
>>> firstline = f.readline()
>>> secondline = f.readline()
>>> f = file(fname)
>>> for l in f:
...
print l.split()[1]
>>> f.close()
>>>
>>>
>>>
>>>
outfname = ‘myoutput’
outf = file(outfname, ‘w’) # second argument denotes writable
outf.write(‘My very own file\n’)
outf.close()
Read File I/O
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
f = open("test.txt")
# Read everything into single string:
content = f.read()
len(content)
print content
f.read() # At End Of File
f.close()
# f.read(20) reads (at most) 20 bytes
Using with block:
>>> with open(’test.txt’, ’r’) as f:
...
content = f.read()
>>> f.closed
CSV file:
>>>
>>>
>>>
>>>
...
>>>
import csv
ifile = open(’photoz.csv’, "r")
reader = csv.reader(ifile)
for row in reader:
print row,
ifile.close()
Read and write text file
>>> from numpy import *
>>> data = loadtxt("myfile.txt") # myfile.txt contains 4 columns of numbers
>>> t,z = data[:,0], data[:,3] # data is a 2D numpy array, t is 1st col, z is 4th col
>>> t,x,y,z = loadtxt("myfile.txt", unpack=True) # to automatically unpack all columns
>>> t,z = loadtxt("myfile.txt", usecols = (0,3), unpack=True) # to select just a few columns
>>>
>>>
>>>
>>>
data
data
data
data
=
=
=
=
loadtxt("myfile.txt",
loadtxt("myfile.txt",
loadtxt("myfile.txt",
loadtxt("myfile.txt",
>>>
>>>
>>>
>>>
>>>
>>>
>>>
from numpy import *
savetxt("myfile.txt",
savetxt("myfile.txt",
savetxt("myfile.txt",
savetxt("myfile.txt",
savetxt("myfile.txt",
savetxt("myfile.txt",
skiprows = 7) # to skip 7 rows from top of file
comments = '!') # use '!' as comment char instead of '#'
delimiter=';') # use ';' as column separator instead of whitespace
dtype = int) # file contains integers instead of floats
data) # data is 2D array
x) # if x is 1D array then get 1 column in file.
(x,y)) # x,y are 1D arrays. 2 rows in file.
transpose((x,y))) # x,y are 1D arrays. 2 columns in file.
transpose((x,y)), fmt='%6.3f') # use new format instead of '%.18e'
data, delimiter = ';') # use ';' to separate columns instead of space
String formatting for output
>>> sigma = 6.76/2.354
>>> print(‘sigma is %5.3f metres’%sigma)
sigma is 2.872 metres
>>> d = {‘bob’: 1.87, ‘fred’: 1.768}
>>> for name, height in d.items():
...
print(‘%s is %.2f metres tall’%(name.capitalize(), height))
...
Bob is 1.87 metres tall
Fred is 1.77 metres tall
>>>
>>>
>>>
>>>
...
...
>>>
nsweets = range(100)
calories = [i * 2.345 for i in nsweets]
fout = file(‘sweetinfo.txt’, ‘w’)
for i in range(nsweets):
fout.write(‘%5i %8.3f\n’%(nsweets[i], calories[i]))
fout.close()
File I/O, CSV files
• CSV (Comma Separated Values) format is the most common import
and export format for spreadsheets and databases.
• Functions
–
–
–
–
–
–
–
csv.reader
csv.writer
csv.register_dialect
csv.unregister_dialect
csv.get_dialect
csv.list_dialects
csv.field_size_limit
File I/O, CSV files
•
Reading CSV files
import csv
# imports the csv module
f = open('data1.csv', 'rb')
try:
reader = csv.reader(f)
for row in reader:
print row
finally:
f.close()
# opens the csv file
•
# creates the reader object
# iterates the rows of the file in orders
# prints each row
# closing
Writing CSV files
import csv
ifile
reader
ofile
writer
=
=
=
=
open('test.csv', "rb")
csv.reader(ifile)
open('ttest.csv', "wb")
csv.writer(ofile, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL)
for row in reader:
writer.writerow(row)
ifile.close()
ofile.close()
File I/O, CSV files
• The csv module contains a the following quoting options.
• csv.QUOTE_ALL
Quote everything, regardless of type.
• csv.QUOTE_MINIMAL
Quote fields with special characters
• csv.QUOTE_NONNUMERIC
Quote all fields that are not integers or floats
• csv.QUOTE_NONE
Do not quote anything on output
Handling FITS files - PyFITS
http://www.stsci.edu/resources/software_hardware/pyfits
Read, write and manipulate all aspects of FITS files
extensions
headers
images
tables
Low-level interface for details
High-level functions for quick and easy use
PyFITS - reading
>>> import pyfits
>>> imgname = “testimage.fits”
>>> img = pyfits.getdata(imgname)
>>> img
array([[2408, 2408, 1863, ..., 3660,
[2952, 2408, 1863, ..., 3660,
[2748, 2748, 2204, ..., 4000,
...,
[2629, 2901, 2357, ..., 2261,
[2629, 2901, 3446, ..., 1717,
[2425, 2697, 3242, ..., 2942,
>>> img.mean()
4958.4371977768678
>>> img[img > 2099].mean()
4975.1730909593043
>> import numpy
>>> numpy.median(img)
4244.0
3660, 4749],
3115, 4204],
3455, 4000],
2806, 2261],
2261, 1717],
2125, 1581]], dtype=int16)
PyFITS – reading FITS images
>>> x = 348; y = 97
>>> delta = 5
>>> print img[y-delta:y+delta+1,
... x-delta:x+delta+1].astype(numpy.int)
[[5473 5473 3567 3023 3295 3295 3839 4384
[3295 4384 3567 3023 3295 3295 3295 3839
[2478 3567 4112 3023 3295 3295 3295 3295
[3023 3023 3023 3023 2750 2750 3839 3839
[3295 3295 3295 3295 3295 3295 3839 3839
[3295 3295 2750 2750 3295 3295 2750 2750
[2887 2887 2887 2887 3976 3431 3159 2614
[2887 2887 3431 3431 3976 3431 3159 2614
[3159 3703 3159 3703 3431 2887 3703 3159
[3703 3159 2614 3159 3431 2887 3703 3159
[3431 3431 2887 2887 3159 3703 3431 2887
4282
3737
3397
3397
3397
2852
3125
3669
3941
3397
3125
4282
3737
4486
4486
3941
3397
3669
4214
4486
3941
3669
3737]
4282]
4486]
3941]
3397]
4486]
4758]
4214]
3669]
3669]
3669]]
row = y = first index
column = x = second index
numbering runs as normal (e.g. in ds9) BUT zero indexed!
PyFITS – reading FITS tables
>>> tblname = ‘data/N891PNdata.fits’
>>> d = pyfits.getdata(tblname)
>>> d.names
('x0', 'y0', 'rah', 'ram', 'ras', 'decd', 'decm', 'decs', 'wvl', 'vel',
'vhel', 'dvel', 'dvel2', 'xL', 'yL', 'xR', 'yR', 'ID', 'radeg', 'decdeg',
'x', 'y')
>>> d.x0
array([ 928.7199707 ,
532.61999512,
968.14001465,
519.38000488,…
1838.18994141, 1888.26000977, 1516.2199707 ], dtype=float32)
>>> d.field(‘x0’)
# case-insensitive
array([ 928.7199707 ,
532.61999512,
968.14001465,
519.38000488,…
1838.18994141, 1888.26000977, 1516.2199707 ], dtype=float32)
>>> select = d.x0 < 200
>>> dsel = d[select]
# can select rows all together
>>> print dsel.x0
[ 183.05000305 165.55000305 138.47999573 158.02999878 140.96000671
192.58000183 157.02999878 160.1499939
161.1000061
136.58999634
175.19000244]
PyFITS – reading FITS headers
>>> h = pyfits.getheader(imgname)
>>> print h
SIMPLE =
BITPIX =
NAXIS
=
NAXIS1 =
NAXIS2 =
EXTEND =
DATE
=
ORIGIN =
PLTLABEL=
PLATEID =
REGION =
DATE-OBS=
UT
=
EPOCH
=
PLTRAH =
PLTRAM =
PLTRAS =
PLTDECSN=
PLTDECD =
PLTDECM =
T
16
2
1059
1059
T
'05/01/11
'
'CASB -- STScI
'
'E30
'
'06UL
'
'XE295
'
'22/12/49
'
'03:09:00.00
'
2.0499729003906E+03
1
26
5.4441800000000E+00
'+
'
30
45
/FITS header
/No.Bits per pixel
/No.dimensions
/Length X axis
/Length Y axis
/
/Date of FITS file creation
/Origin of FITS image
/Observatory plate label
/GSSS Plate ID
/GSSS Region Name
/UT date of Observation
/UT time of observation
/Epoch of plate
/Plate center RA
/
/
/Plate center Dec
/
/ >>> h[‘KMAGZP’]
>>> h['REGION']
'XE295‘
# Use h.items() to iterate through all header entries
PyFITS – writing FITS images
>>> newimg = sqrt((sky+img)/gain + rd_noise**2) * gain
>>> newimg[(sky+img) < 0.0] = 1e10
>>> hdr = h.copy() # copy header from original image
>>> hdr.add_comment(‘Calculated noise image’)
>>> filename = ‘sigma.fits’
>>> pyfits.writeto(filename, newimg, hdr)
# create new file
>>> pyfits.append(imgname, newimg, hdr) # add a new FITS extension
>>> pyfits.update(filename, newimg, hdr, ext) # update a file
# specifying a header is optional,
# if omitted automatically adds minimum header
PyFITS – writing FITS tables
>>> import pyfits
>>> import numpy as np
>>> # create data
>>> a1 = numpy.array(['NGC1001', 'NGC1002', 'NGC1003'])
>>> a2 = numpy.array([11.1, 12.3, 15.2])
>>> # make list of pyfits Columns
>>> cols = []
>>> cols.append(pyfits.Column(name='target', format='20A',
array=a1))
>>> cols.append(pyfits.Column(name='V_mag', format='E', array=a2))
>>> # create HDU and write to file
>>> tbhdu=pyfits.new_table(cols)
>>> tbhdu.writeto(’table.fits’)
# these examples are for a simple FITS file containing just one
# table or image but with a couple more steps can create a file
# with any combination of extensions (see the PyFITS manual online)
URL
URLS can be used for reading
>>> import urllib2
>>> url = 'http://python4astronomers.github.com/_downloads/data.txt'
>>> response = urllib2.urlopen(url)
>>> data = response.read()
>>> print data
RAJ
DEJ
Jmag
e_Jmag
2000 (deg) 2000 (deg) 2MASS
(mag) (mag)
---------- ---------- ----------------- ------ -----010.684737 +41.269035 00424433+4116085
9.453 0.052
010.683469 +41.268585 00424403+4116069
9.321 0.022
010.685657 +41.269550 00424455+4116103 10.773 0.069
010.686026 +41.269226 00424464+4116092
9.299 0.063
010.683465 +41.269676 00424403+4116108 11.507 0.056
010.686015 +41.269630 00424464+4116106
9.399 0.045
010.685270 +41.267124 00424446+4116016 12.070 0.035
URL
URLS sometimes need input data. Such as POST data for a form
import urllib
import urllib2
url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
'location' : 'Northampton',
'language' : 'Python' }
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
URL
And for GET type of parameter passing:
import urllib
import urllib2>>> import urllib2
>>> import urllib
>>> data = {}
>>> data['name'] = 'Somebody Here'
>>> data['location'] = 'Northampton'
>>> data['language'] = 'Python'
>>> url_values = urllib.urlencode(data)
>>> print url_values # The order may differ.
name=Somebody+Here&language=Python&location=Northampton
>>> url = 'http://www.example.com/example.cgi'
>>> full_url = url + '?' + url_values
>>> handler = urllib2.urlopen(full_url)
Note that the full URL is created by adding a ? to the URL, followed by the encoded
values.
Introduction to language
End
Related documents