Download Lab 3 key - The Huttenhower Lab

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
BIO508: Lab Session 3 Key
Announcements
• Homework 3 is due at 11:55pm next Monday, 2/24. Get started early!
• As homeworks get more complicated, I will get less able to assign partial credit; make sure your
functions give the output requested. Testing is an important part of coding.
• Common homework mistakes:
– Naming functions
∗ There is a reason we ask for specific function names in the homework, and it is because the
grading script relies on these names to find your functions.
∗ Function names are case sensitive (Mean is not the same as mean).
– Printing vs. returning in a function
– Make sure your triple-quoted documentation string is in the correct format!
– Don’t leave if __name__ == "__main__" block empty; this produces an error.
– Returning Booleans vs. strings
– You can call one function from another (for example, you can call mean in stdev).
– float(iN)/iM is not the same as float(iN/iM)
• Another useful Python resource is Learn Python The Hard Way (learnpythonthehardway.org/book/)
This lab is intended to offer several exercises to help you go over homework mistakes and become familiar
with the reading/writing concepts discussed in class. If you feel proficient in these concepts, then you are
welcome to start on the homework or work on the iPython minilab we started yesterday. Many of these
examples, especially at the beginning, are intended for you to practice writing scripts as well as solidifying
homework concepts. Also, many have portions that ask you to make your own copy of a script based on
what is written here. For your own sake, don’t copy and paste; as tempting as it may be, you need to type
these examples out yourself in order to practice (a) writing your own code and (b) paying attention to the
details that differentiate code that runs correctly, code that runs but does the wrong thing and code that
gives errors.
Reviewing Some Common Homework Mistakes
Naming functions
In a new text file called lab03 ex1.py, type the following exactly:
#!/usr/bin/env python
def my_func():
return 5
if __name__ == "__main__":
print my_func()
print My_func()
1
Run this script on the command line using python lab03 ex1.py .
You should see:
sph182-159:Lab 03 emmaschwager$ python lab03_ex1.py
5
Traceback (most recent call last):
File "lab03_ex1.py", line 10, in <module>
print My_func()
NameError: name ’My_func’ is not defined
When Python gives you an error, the first line (File "lab03_ex1.py", line 10, in <module>) gives you
the line number where the error occurred, the second line gives a copy of that line and the third line tells
you something (sometimes cryptic) about what the error might be.
1. In English, what is the error telling you?
That Python can’t find a definition for the function My_func.
2. Fix the script so that it runs without an error.
#!/usr/bin/env python
def my_func():
return 5
if __name__ == "__main__":
#
print my_func()
print My_func()
2
Printing vs. Returning in a Function
In a new text file called lab03 ex2.py, type the following exactly:
#!/usr/bin/env python
def add_print(a,b):
print "ADDING",a,"+",b
print a + b
def add_return(a,b):
print "ADDING",a,"+",b
return a+b
if __name__ == "__main__":
num1 = 3
num2 = 5
c = add_print(num1,num2)
d = add_return(num1,num2)
print "c = ",c
print "d = ",d
if c==(num1+num2):
print "add_print works!"
else:
print "add_print doesn’t work."
if d==(num1+num2):
print "add_return works!"
else:
print "add_print doesn’t work."
Run this on a terminal using python lab03 ex2.py .
You should see:
sph182-159:Lab 03 emmaschwager$ python lab03_ex2.py
ADDING 3 + 5
8
ADDING 3 + 5
c = None
d = 8
add_print doesn’t work.
add_return works!
1. Which function “works”? Why do you think that is?
add return works because the return statement allows the value of a+b to be assigned to
the variable d, while printing the result does not allow c to have any value assigned.
3
2. Does it help to change the values of num1 and num2? Why or why not?
It does not help to change the values because the problem is in the function itself and
not with the inputs.
3. Try adding the line print "extra line" on the line after the return statement in add return so
that add return looks like:
def add_return(a,b):
print "ADDING",a,"+",b
return a+b
print "extra line"
Save and run the script again; do you see "extra line" in your output? Why do you think that might
be?
You don’t see "extra line" in your output because the function ends with the return
statement and doesn’t execute any line thereafter.
Leaving if
name ==" main " block empty
In a new text file called lab03_ex3.py, type the following:
#!/usr/bin/env python
if __name__ == "__main__":
Run the file using python lab03 ex3.py .
You should see:
sph182-159:Lab 03 emmaschwager$ python lab03_ex3.py
File "lab03_ex3.py", line 6
^
IndentationError: expected an indented block
Note that when you see an indentation error, you usually forgot to indent something following a colon.
1. In English, what does this error mean?
It means that Python expected an indent following the definition of the if __name__=="__main__"
block, but you hadn’t indented anything.
2. Fix the script so that it runs without an error (note that there are an infinite number of ways to do
this; you can choose to do it simply or complicatedly.)
#!/usr/bin/env python
if __name__ == "__main__":
print "Hi!"
4
Booleans versus Strings
Open a Python interpreter by opening a terminal (command line) and typing python . Type the following
commands:
bTrue = True
bFalse = False
sTrue = "True"
sFalse = "False"
if bTrue: print "Truth!"
if not(bFalse): print "Not falsehood!"
if sTrue: print "Truth!"
if not(sFalse): print "Not falsehood!"
You should see:
>>> bTrue = True
>>> bFalse = False
>>> sTrue = "True"
>>> sFalse = "False"
>>> if bTrue: print "Truth!"
...
Truth!
>>> if not(bFalse): print "Not falsehood!"
...
Not falsehood!
>>> if sTrue: print "Truth!"
...
Truth!
>>> if not(sFalse): print "Not falsehood!"
...
>>>
1. Why do you think not(sFalse) doesn’t return True?
Because sFalse is a string and so, by definition, True; not(sFalse) evaluates to False.
References
Remember that in Python, unit types such as integer, Boolean, float and string are stored by value but that
collection types such as lists and dictionaries are stored by reference. This can lead to some counter-intuitive
behavior. We will first explore some of the common bugs that can arise.
Open the Python interpreter by going to the command line and typing python . Type the following:
aList1 = [1,2,3]
aList2 = aList1
aList1[0] = 12
5
1. What is the value of aList1? Of aList2? Are they the same or different? Why?
aList1 = [12,2,3] and aList2 = [12,2,3], which are the same. They are the same because
the = operator assigned the reference of aList1 to aList2, meaning that any alteration to
one alters the other as well.
2. Can you think of some ways to circumvent this problem?
You could simply use aList2 = [1,2,3] instead; or you could use the deepcopy function
from the copy module.
Now back in the interpreter, type the following:
hDict1 = {’a’:13,’b’:18}
hDict2 = hDict1
hDict2[’c’] = 12
1. What do you think is the value of hDict1? Double check your answer.
hDict1 = {’a’:13,’b’:18,’c’:12}
2. Can you think of some ways to circumvent this problem?
You could simply use hDict2 = {’a’:13,’b’:18} instead; or you could use the deepcopy
function from the copy module.
Again going back to the interpreter, type the following:
aList3 = [12,7,15,2,10]
sorted(aList3)
aList3
aList3.sort()
aList3
1. What happens to aList3 when you use the sorted() function?
The function returns a sorted copy of aList3, but aList3 remains unchanged.
2. What happens to aList3 when you use the .sort() function?
The function changes aList3 to a sorted version of itself.
3. In your own words, how would you describe the difference between these two functions?
sorted() returns a different list, while .sorted() makes a change to the list itself.
You can quit the interpreter by typing quit() . This will take you back to the command line.
6
Modules
In this class (and in life!), there are several modules we will frequently need to use, including:
Module
Name
Tasks
Example
Functions/objects
Where to learn more
sys
Interacting with the
system/command line
sys.argv,
sys.stdin,
sys.stdout
http://docs.python.org/2/library/sys.html
random
Doing things that involve stochasticity
random.random(),
random.range()
http://docs.python.org/2/library/random.html
bisect
Doing things to list
while
maintaining
sorted order
bisect.bisect left() http://docs.python.org/2/library/bisect.html
re
Using regular expressions
re.search(),
re.split()
http://docs.python.org/2/library/re.html
To bring in a module, use the import command under the docstring in your script.
Working with files
Some Useful String Functions
Anything that comes into Python from a file or from sys.stdin comes in as a string, so it is useful to
know functions for dealing with strings. We will learn these functions by experience. Start by opening the
Python interpreter (open the command line and type python ). A useful reference for working with strings
is http://docs.python.org/2/library/string.html.
.strip()
Enter the following into the interpreter:
strString = "\tSubj\tSequence1\tSequence2\n"
print strString
strString.strip()
print strString
You should see:
>>> strString = "\tSubj\tSequence1\tSequence2\n"
>>> print strString
Subj
Sequence1
Sequence2
>>> strString.strip()
’Subj\tSequence1\tSequence2’
>>> print strString
Subj
Sequence1
Sequence2
>>>
7
1. What does .strip() do?
It removes leading and trailing whitespace characters from a string.
2. Is .strip() the kind of function that modifies the value in place, or that creates a copy and returns
it?
It creates a copy and returns it.
.split()
Now enter the following into the interpreter:
strString2 = ",Subj,Sequence1,Sequence2\n"
strString.split("\t")
strString.split(" ")
strString2.split(",")
strString.split("\tSubj")
You should see:
>>> strString2 = ’,Subj,Sequence1,Sequence2\n’
>>> strString.split(’\t’)
[’’, ’Subj’, ’Sequence1’, ’Sequence2\n’]
>>> strString.split(’ ’)
[’\tSubj\tSequence1\tSequence2\n’]
>>> strString2.split(’,’)
[’’, ’Subj’, ’Sequence1’, ’Sequence2\n’]
>>> strString.split(’\tSubj’)
[’’, ’\tSequence1\tSequence2\n’]
>>>
1. What does .split(<string>) do?
It splits a string into a list by removing all instances of <string> from the string.
2. Is .split() a function that modifies the value in place, or that creates a copy and returns it? It
creates a copy and returns it.
8
.join()
Now enter the following in the interpreter:
astrList = ["Subj","Sequence1","Sequence2"]
"\t".join(astrList)
" ".join(astrList)
"\t".join(["Subj"])
"\t".join("Subj")
" a string ".join(astrList)
"".join(["a",1])
You should see:
>>> astrList = ["Subj","Sequence1","Sequence2"]
>>> "\t".join(astrList)
’Subj\tSequence1\tSequence2’
>>> " ".join(astrList)
’Subj Sequence1 Sequence2’
>>> "\t".join(["Subj"])
’Subj’
>>> "\t".join("Subj")
’S\tu\tb\tj’
>>> " a string ".join(astrList)
’Subj a string Sequence1 a string Sequence2’
>>> "".join(["a",1])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sequence item 1: expected string, int found
>>>
1. What does <string>.join(<list>) do?
It joins the elements of <list> into one string by adding <string> in between each two
elements. It requires that all elements of <list> are strings.
2. Is .join() a function that modifies the value in place, or that creates a copy and returns it? It creates
a copy and returns it.
3. What causes the error when you try "".join(["a",1])?
<string>.join(<list>) expects that <list> is composed of strings, but 1 is an integer, and
so Python gives us an error.
4. How could you fix the error in question 3 so that the output is "a1"?
One possible way is "".join(["a",str(1)])
You can quit the interpreter by using quit()
9
Command Line Caller and Python Receiver
Note that there is a connection between the way you set up your program to be run on the command line
and which input/output streams you use in your program:
Command Line
Python Receiver
python script.y < input.txt
sys.stdin
python script.py input.txt
open(sys.argv[1])
python script.py < input.txt > output.txt
sys.stdin and sys.stdout
python script.py input.txt output.txt
open(sys.argv[1]) and open(sys.argv[2],’w’)
We will practice some of these next.
Reading and Writing Files Using Standard I/O
Download the practice file HMP_trunc.txt from the course website. We will use this file for practicing input/output.
Now, in a new text file called lab03_ex4.py, type the following:
#!/usr/bin/env python
import sys
if __name__ == "__main__":
for strLine in sys.stdin:
sys.stdout.write(strLine.split(’\t’)[0])
Run this file using HMP_trunc.txt with the command python lab03 ex4.py < HMP trunc.txt .
10
You should see:
Emma-Schwager:Lab 03 emmaschwager$ python lab03_ex4.py < HMP_trunc.txt
sidk__Bacteria|p__Proteobacteria|c__Betaproteobacteria|o__Burkholderiales|f__Sutterellac
eaek__Bacteria|p__Proteobacteria|c__Alphaproteobacteria|o__Rhodospirillales|f__Acetobact
eraceaek__Bacteria|p__Proteobacteria|c__Alphaproteobacteria|o__Rhizobiales|f__Bradyrhizo
biaceaek__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Coriobacteriales|f__Coriobacte
riaceae|g__Atopobium|s__Atopobium_vaginaek__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__
Bacteroidales|f__Porphyromonadaceaek__Bacteria|p__Thermi|c__Deinococci|o__Deinococcales|
f__Deinococcaceae|g__Deinococcus|s__Deinococcus_unclassifiedk__Bacteria|p__Firmicutes|c_
_Clostridia|o__Clostridiales|f__Clostridiales_Family_XI_Incertae_Sedis|g__Peptoniphilus|
s__Peptoniphilus_unclassifiedk__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomy
cetales|f__Micrococcaceae|g__Arthrobacter|s__Arthrobacter_unclassifiedk__Bacteria|p__Spi
rochaetes|c__Spirochaetesk__Bacteria|p__Proteobacteria|c__Betaproteobacteria|o__Burkhold
eriales|f__Comamonadaceae|g__Delftia|s__Delftia_acidovoransk__Bacteria|p__Proteobacteria
|c__Gammaproteobacteria|o__Pseudomonadales|f__Pseudomonadaceae|g__Pseudomonas|s__Pseudom
onas_unclassifiedk__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__C
orynebacteriaceae|g__Corynebacterium|s__Corynebacterium_kroppenstedtiik__Bacteria|p__Fir
micutes|c__Clostridia|o__Clostridiales|f__Ruminococcaceae|g__Ruminococcusk__Bacteria|p__
Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Prevotellaceae|g__Prevotella|s__Prevote
lla_biviak__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Eubacteriaceae|g__E
ubacterium|s__Eubacterium_saburreumk__Bacteria|p__Thermi|c__Deinococci|o__Deinococcalesk
__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Pasteurellales|f__Pasteurellaceae
|g__Haemophilus|s__Haemophilus_parasuisk__Bacteria|p__Firmicutes|c__Negativicutes|o__Sel
enomonadales|f__Veillonellaceae|g__Veillonella|s__Veillonella_atypicak__Bacteria|p__Firm
icutes|c__Clostridia|o__Clostridiales|f__Lachnospiraceae|g__Butyrivibrio|s__Butyrivibrio
_crossotusk__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Bacteroidaceae
|g__Bacteroides|s__Bacteroides_fragilisk__Archaea|p__Euryarchaeota|c__Methanobacteria|o_
_Methanobacterialesk__Bacteria|p__Proteobacteria|c__Alphaproteobacteria|o__Rhizobiales|f
__Rhizobiaceae|g__Agrobacteriumk__Archaea|p__Euryarchaeota|c__Halobacteria|o__Halobacter
iales|f__Halobacteriales_unclassifiedk__Bacteria|p__Proteobacteria|c__Deltaproteobacteri
a|o__DesulfovibrionalesEmma-Schwager:Lab 03 emmaschwager$
1. What does this script do?
It prints the first column of a tab-delimited text file to the standard out.
2. Why are there no newlines between the lines?
Because sys.stdout.write() doesn’t add a newline.
3. Alter the script so that after each line is on its own line (instead of a concatenated jumble.)
#!/usr/bin/env python
import sys
if __name__ == "__main__":
for strLine in sys.stdin:
sys.stdout.write(strLine.split(’\t’)[0] + "\n")
4. Now run your altered script on HMP_trunc.txt but redirect the output to HMP_ex4.txt using >.
python lab03_ex4.py < HMP_trunc.txt > HMP_ex4.txt
11
5. What do you think would happen if you tried to run your script using python lab03_ex4.py HMP_trunc.txt?
Try it and see! (Hint: CTRL-D should tell the script that you are done inputting things.)
sph182-159:Lab 03 emmaschwager$ python lab03_ex4.py HMP_trunc.txt
Reading and Writing Files within Python
sys.argv
The first task for this is to get comfortable with sys.argv. In a new text file called lab03_ex5.py, write
the following:
#!/usr/bin/env python
import sys
if __name__ == "__main__":
print sys.argv
This will print the arguments that Python sees from sys.argv. Now, run this script three different times:
python lab03 ex5.py
python lab03 ex5.py input.txt
python lab03 ex5.py input.txt output.txt
You should see:
Emma-Schwager:Lab 03 emmaschwager$ python lab03_ex5.py
[’lab03_ex5.py’]
Emma-Schwager:Lab 03 emmaschwager$ python lab03_ex5.py input.txt
[’lab03_ex5.py’, ’input.txt’]
Emma-Schwager:Lab 03 emmaschwager$ python lab03_ex5.py input.txt output.txt
[’lab03_ex5.py’, ’input.txt’, ’output.txt’]
1. What kind of object is sys.argv (integer, dictionary, etc.)?
It is a list of strings.
2. Change your script so that you only print the arguments to the script but not the script name.
Change print sys.argv to print sys.argv[1:]
12
Using sys.argv and open
In a new text file called lab03_ex6.py type the following:
#!/usr/bin/env python
import sys
if __name__ == "__main__":
fInFile = open(sys.argv[1])
fOutFile = open(sys.argv[2], ’w’)
for strLine in fInFile:
fOutFile.write(strLine.split(’\t’)[0])
fInFile.close()
fOutFile.close()
The input file for this function will be HMP_trunc.txt and the output file will be HMP_ex6.txt. When you
run this correctly, you will not see any output on the screen.
1. What does this function do?
It reads in a tab-delimited file and outputs the first column of each line to the output
file.
2. What is the correct command to run this file?
python lab03_ex6.py HMP_trunc.txt HMP_ex6.txt
3. What does your output file look like? How could you change your code to make it more readable?
It looks like the output from the original lab03 ex4.py. You could change
fOutFile.write(strLine.split(’\t’)[0]) to fOutFile.write(strLine.split(’\t’)[0] + ’\n’)
Exercises
1. Write a script called txt2csv.py that converts a tab-delimited (txt) file into a comma-separated value
(csv) file. Choose an I/O style between:
(a) python txt2csv.py < input.txt > output.csv
(b) python txt2csv.py input.txt output.csv
Use this script to convert HMP_trunc.txt to HMP_trunc.csv.
If using I/O style 1a
#!/usr/bin/env python
import sys
if __name__ == "__main__":
for strLine in sys.stdin:
sys.stdout.write(strLine.replace(’\t’,’,’))
13
If using I/O style 1b
#!/usr/bin/env python
import sys
if __name__ == "__main__":
fInFile = open(sys.argv[1])
fOutFile = open(sys.argv[2], ’w’)
for strLine in fInFile:
fOutFile.write(strLine.replace(’\t’,’,’))
fInFile.close()
fOutFile.close()
2. Write a script called transpose.py that takes an input file and outputs its transpose. You can assume
that your input file is tab-delimited and that any missing entries are designated by a tab. Choose an
I/O style between:
(a) python transpose.py < input.txt > output.txt
(b) python transpose.py input.txt output.txt
Use this script to convert HMP_trunc.txt to HMP_trunc_transpose.txt.
If using I/O style 2a
#!/usr/bin/env python
import sys
if __name__ == "__main__":
aaFile = []
for strLine in sys.stdin:
aaFile.append(strLine.replace(’\n’,’’).split(’\t’))
aaTranspose = []
for iColIdx in range(len(aaFile[0])):
aaTranspose.append([])
for iRowIdx in range(len(aaFile)):
aaTranspose[iColIdx].append(aaFile[iRowIdx][iColIdx])
astrTranspose = [’\t’.join(aTransposeLine) for aTransposeLine in aaTranspose]
for strLine in astrTranspose:
sys.stdout.write(strLine + ’\n’)
14
If using I/O style 2b
#!/usr/bin/env python
import sys
if __name__ == "__main__":
fInFile = open(sys.argv[1])
fOutFile = open(sys.argv[2],’w’)
aaFile = []
for strLine in fInFile:
aaFile.append(strLine.replace(’\n’,’’).split(’\t’))
aaTranspose = []
for iColIdx in range(len(aaFile[0])):
aaTranspose.append([])
for iRowIdx in range(len(aaFile)):
aaTranspose[iColIdx].append(aaFile[iRowIdx][iColIdx])
astrTranspose = [’\t’.join(aTransposeLine) for aTransposeLine in aaTranspose]
for strLine in astrTranspose:
fOutFile.write(strLine + ’\n’)
fInFile.close()
fOutFile.close()
15