Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BIO508: Lab Session 3 Key Announcements • Homework 3 is due at 11:55pm next Monday, 2/24. Get started early! • As homeworks get more complicated, I will get less able to assign partial credit; make sure your functions give the output requested. Testing is an important part of coding. • Common homework mistakes: – Naming functions ∗ There is a reason we ask for specific function names in the homework, and it is because the grading script relies on these names to find your functions. ∗ Function names are case sensitive (Mean is not the same as mean). – Printing vs. returning in a function – Make sure your triple-quoted documentation string is in the correct format! – Don’t leave if __name__ == "__main__" block empty; this produces an error. – Returning Booleans vs. strings – You can call one function from another (for example, you can call mean in stdev). – float(iN)/iM is not the same as float(iN/iM) • Another useful Python resource is Learn Python The Hard Way (learnpythonthehardway.org/book/) This lab is intended to offer several exercises to help you go over homework mistakes and become familiar with the reading/writing concepts discussed in class. If you feel proficient in these concepts, then you are welcome to start on the homework or work on the iPython minilab we started yesterday. Many of these examples, especially at the beginning, are intended for you to practice writing scripts as well as solidifying homework concepts. Also, many have portions that ask you to make your own copy of a script based on what is written here. For your own sake, don’t copy and paste; as tempting as it may be, you need to type these examples out yourself in order to practice (a) writing your own code and (b) paying attention to the details that differentiate code that runs correctly, code that runs but does the wrong thing and code that gives errors. Reviewing Some Common Homework Mistakes Naming functions In a new text file called lab03 ex1.py, type the following exactly: #!/usr/bin/env python def my_func(): return 5 if __name__ == "__main__": print my_func() print My_func() 1 Run this script on the command line using python lab03 ex1.py . You should see: sph182-159:Lab 03 emmaschwager$ python lab03_ex1.py 5 Traceback (most recent call last): File "lab03_ex1.py", line 10, in <module> print My_func() NameError: name ’My_func’ is not defined When Python gives you an error, the first line (File "lab03_ex1.py", line 10, in <module>) gives you the line number where the error occurred, the second line gives a copy of that line and the third line tells you something (sometimes cryptic) about what the error might be. 1. In English, what is the error telling you? That Python can’t find a definition for the function My_func. 2. Fix the script so that it runs without an error. #!/usr/bin/env python def my_func(): return 5 if __name__ == "__main__": # print my_func() print My_func() 2 Printing vs. Returning in a Function In a new text file called lab03 ex2.py, type the following exactly: #!/usr/bin/env python def add_print(a,b): print "ADDING",a,"+",b print a + b def add_return(a,b): print "ADDING",a,"+",b return a+b if __name__ == "__main__": num1 = 3 num2 = 5 c = add_print(num1,num2) d = add_return(num1,num2) print "c = ",c print "d = ",d if c==(num1+num2): print "add_print works!" else: print "add_print doesn’t work." if d==(num1+num2): print "add_return works!" else: print "add_print doesn’t work." Run this on a terminal using python lab03 ex2.py . You should see: sph182-159:Lab 03 emmaschwager$ python lab03_ex2.py ADDING 3 + 5 8 ADDING 3 + 5 c = None d = 8 add_print doesn’t work. add_return works! 1. Which function “works”? Why do you think that is? add return works because the return statement allows the value of a+b to be assigned to the variable d, while printing the result does not allow c to have any value assigned. 3 2. Does it help to change the values of num1 and num2? Why or why not? It does not help to change the values because the problem is in the function itself and not with the inputs. 3. Try adding the line print "extra line" on the line after the return statement in add return so that add return looks like: def add_return(a,b): print "ADDING",a,"+",b return a+b print "extra line" Save and run the script again; do you see "extra line" in your output? Why do you think that might be? You don’t see "extra line" in your output because the function ends with the return statement and doesn’t execute any line thereafter. Leaving if name ==" main " block empty In a new text file called lab03_ex3.py, type the following: #!/usr/bin/env python if __name__ == "__main__": Run the file using python lab03 ex3.py . You should see: sph182-159:Lab 03 emmaschwager$ python lab03_ex3.py File "lab03_ex3.py", line 6 ^ IndentationError: expected an indented block Note that when you see an indentation error, you usually forgot to indent something following a colon. 1. In English, what does this error mean? It means that Python expected an indent following the definition of the if __name__=="__main__" block, but you hadn’t indented anything. 2. Fix the script so that it runs without an error (note that there are an infinite number of ways to do this; you can choose to do it simply or complicatedly.) #!/usr/bin/env python if __name__ == "__main__": print "Hi!" 4 Booleans versus Strings Open a Python interpreter by opening a terminal (command line) and typing python . Type the following commands: bTrue = True bFalse = False sTrue = "True" sFalse = "False" if bTrue: print "Truth!" if not(bFalse): print "Not falsehood!" if sTrue: print "Truth!" if not(sFalse): print "Not falsehood!" You should see: >>> bTrue = True >>> bFalse = False >>> sTrue = "True" >>> sFalse = "False" >>> if bTrue: print "Truth!" ... Truth! >>> if not(bFalse): print "Not falsehood!" ... Not falsehood! >>> if sTrue: print "Truth!" ... Truth! >>> if not(sFalse): print "Not falsehood!" ... >>> 1. Why do you think not(sFalse) doesn’t return True? Because sFalse is a string and so, by definition, True; not(sFalse) evaluates to False. References Remember that in Python, unit types such as integer, Boolean, float and string are stored by value but that collection types such as lists and dictionaries are stored by reference. This can lead to some counter-intuitive behavior. We will first explore some of the common bugs that can arise. Open the Python interpreter by going to the command line and typing python . Type the following: aList1 = [1,2,3] aList2 = aList1 aList1[0] = 12 5 1. What is the value of aList1? Of aList2? Are they the same or different? Why? aList1 = [12,2,3] and aList2 = [12,2,3], which are the same. They are the same because the = operator assigned the reference of aList1 to aList2, meaning that any alteration to one alters the other as well. 2. Can you think of some ways to circumvent this problem? You could simply use aList2 = [1,2,3] instead; or you could use the deepcopy function from the copy module. Now back in the interpreter, type the following: hDict1 = {’a’:13,’b’:18} hDict2 = hDict1 hDict2[’c’] = 12 1. What do you think is the value of hDict1? Double check your answer. hDict1 = {’a’:13,’b’:18,’c’:12} 2. Can you think of some ways to circumvent this problem? You could simply use hDict2 = {’a’:13,’b’:18} instead; or you could use the deepcopy function from the copy module. Again going back to the interpreter, type the following: aList3 = [12,7,15,2,10] sorted(aList3) aList3 aList3.sort() aList3 1. What happens to aList3 when you use the sorted() function? The function returns a sorted copy of aList3, but aList3 remains unchanged. 2. What happens to aList3 when you use the .sort() function? The function changes aList3 to a sorted version of itself. 3. In your own words, how would you describe the difference between these two functions? sorted() returns a different list, while .sorted() makes a change to the list itself. You can quit the interpreter by typing quit() . This will take you back to the command line. 6 Modules In this class (and in life!), there are several modules we will frequently need to use, including: Module Name Tasks Example Functions/objects Where to learn more sys Interacting with the system/command line sys.argv, sys.stdin, sys.stdout http://docs.python.org/2/library/sys.html random Doing things that involve stochasticity random.random(), random.range() http://docs.python.org/2/library/random.html bisect Doing things to list while maintaining sorted order bisect.bisect left() http://docs.python.org/2/library/bisect.html re Using regular expressions re.search(), re.split() http://docs.python.org/2/library/re.html To bring in a module, use the import command under the docstring in your script. Working with files Some Useful String Functions Anything that comes into Python from a file or from sys.stdin comes in as a string, so it is useful to know functions for dealing with strings. We will learn these functions by experience. Start by opening the Python interpreter (open the command line and type python ). A useful reference for working with strings is http://docs.python.org/2/library/string.html. .strip() Enter the following into the interpreter: strString = "\tSubj\tSequence1\tSequence2\n" print strString strString.strip() print strString You should see: >>> strString = "\tSubj\tSequence1\tSequence2\n" >>> print strString Subj Sequence1 Sequence2 >>> strString.strip() ’Subj\tSequence1\tSequence2’ >>> print strString Subj Sequence1 Sequence2 >>> 7 1. What does .strip() do? It removes leading and trailing whitespace characters from a string. 2. Is .strip() the kind of function that modifies the value in place, or that creates a copy and returns it? It creates a copy and returns it. .split() Now enter the following into the interpreter: strString2 = ",Subj,Sequence1,Sequence2\n" strString.split("\t") strString.split(" ") strString2.split(",") strString.split("\tSubj") You should see: >>> strString2 = ’,Subj,Sequence1,Sequence2\n’ >>> strString.split(’\t’) [’’, ’Subj’, ’Sequence1’, ’Sequence2\n’] >>> strString.split(’ ’) [’\tSubj\tSequence1\tSequence2\n’] >>> strString2.split(’,’) [’’, ’Subj’, ’Sequence1’, ’Sequence2\n’] >>> strString.split(’\tSubj’) [’’, ’\tSequence1\tSequence2\n’] >>> 1. What does .split(<string>) do? It splits a string into a list by removing all instances of <string> from the string. 2. Is .split() a function that modifies the value in place, or that creates a copy and returns it? It creates a copy and returns it. 8 .join() Now enter the following in the interpreter: astrList = ["Subj","Sequence1","Sequence2"] "\t".join(astrList) " ".join(astrList) "\t".join(["Subj"]) "\t".join("Subj") " a string ".join(astrList) "".join(["a",1]) You should see: >>> astrList = ["Subj","Sequence1","Sequence2"] >>> "\t".join(astrList) ’Subj\tSequence1\tSequence2’ >>> " ".join(astrList) ’Subj Sequence1 Sequence2’ >>> "\t".join(["Subj"]) ’Subj’ >>> "\t".join("Subj") ’S\tu\tb\tj’ >>> " a string ".join(astrList) ’Subj a string Sequence1 a string Sequence2’ >>> "".join(["a",1]) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: sequence item 1: expected string, int found >>> 1. What does <string>.join(<list>) do? It joins the elements of <list> into one string by adding <string> in between each two elements. It requires that all elements of <list> are strings. 2. Is .join() a function that modifies the value in place, or that creates a copy and returns it? It creates a copy and returns it. 3. What causes the error when you try "".join(["a",1])? <string>.join(<list>) expects that <list> is composed of strings, but 1 is an integer, and so Python gives us an error. 4. How could you fix the error in question 3 so that the output is "a1"? One possible way is "".join(["a",str(1)]) You can quit the interpreter by using quit() 9 Command Line Caller and Python Receiver Note that there is a connection between the way you set up your program to be run on the command line and which input/output streams you use in your program: Command Line Python Receiver python script.y < input.txt sys.stdin python script.py input.txt open(sys.argv[1]) python script.py < input.txt > output.txt sys.stdin and sys.stdout python script.py input.txt output.txt open(sys.argv[1]) and open(sys.argv[2],’w’) We will practice some of these next. Reading and Writing Files Using Standard I/O Download the practice file HMP_trunc.txt from the course website. We will use this file for practicing input/output. Now, in a new text file called lab03_ex4.py, type the following: #!/usr/bin/env python import sys if __name__ == "__main__": for strLine in sys.stdin: sys.stdout.write(strLine.split(’\t’)[0]) Run this file using HMP_trunc.txt with the command python lab03 ex4.py < HMP trunc.txt . 10 You should see: Emma-Schwager:Lab 03 emmaschwager$ python lab03_ex4.py < HMP_trunc.txt sidk__Bacteria|p__Proteobacteria|c__Betaproteobacteria|o__Burkholderiales|f__Sutterellac eaek__Bacteria|p__Proteobacteria|c__Alphaproteobacteria|o__Rhodospirillales|f__Acetobact eraceaek__Bacteria|p__Proteobacteria|c__Alphaproteobacteria|o__Rhizobiales|f__Bradyrhizo biaceaek__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Coriobacteriales|f__Coriobacte riaceae|g__Atopobium|s__Atopobium_vaginaek__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__ Bacteroidales|f__Porphyromonadaceaek__Bacteria|p__Thermi|c__Deinococci|o__Deinococcales| f__Deinococcaceae|g__Deinococcus|s__Deinococcus_unclassifiedk__Bacteria|p__Firmicutes|c_ _Clostridia|o__Clostridiales|f__Clostridiales_Family_XI_Incertae_Sedis|g__Peptoniphilus| s__Peptoniphilus_unclassifiedk__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomy cetales|f__Micrococcaceae|g__Arthrobacter|s__Arthrobacter_unclassifiedk__Bacteria|p__Spi rochaetes|c__Spirochaetesk__Bacteria|p__Proteobacteria|c__Betaproteobacteria|o__Burkhold eriales|f__Comamonadaceae|g__Delftia|s__Delftia_acidovoransk__Bacteria|p__Proteobacteria |c__Gammaproteobacteria|o__Pseudomonadales|f__Pseudomonadaceae|g__Pseudomonas|s__Pseudom onas_unclassifiedk__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__C orynebacteriaceae|g__Corynebacterium|s__Corynebacterium_kroppenstedtiik__Bacteria|p__Fir micutes|c__Clostridia|o__Clostridiales|f__Ruminococcaceae|g__Ruminococcusk__Bacteria|p__ Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Prevotellaceae|g__Prevotella|s__Prevote lla_biviak__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Eubacteriaceae|g__E ubacterium|s__Eubacterium_saburreumk__Bacteria|p__Thermi|c__Deinococci|o__Deinococcalesk __Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Pasteurellales|f__Pasteurellaceae |g__Haemophilus|s__Haemophilus_parasuisk__Bacteria|p__Firmicutes|c__Negativicutes|o__Sel enomonadales|f__Veillonellaceae|g__Veillonella|s__Veillonella_atypicak__Bacteria|p__Firm icutes|c__Clostridia|o__Clostridiales|f__Lachnospiraceae|g__Butyrivibrio|s__Butyrivibrio _crossotusk__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Bacteroidaceae |g__Bacteroides|s__Bacteroides_fragilisk__Archaea|p__Euryarchaeota|c__Methanobacteria|o_ _Methanobacterialesk__Bacteria|p__Proteobacteria|c__Alphaproteobacteria|o__Rhizobiales|f __Rhizobiaceae|g__Agrobacteriumk__Archaea|p__Euryarchaeota|c__Halobacteria|o__Halobacter iales|f__Halobacteriales_unclassifiedk__Bacteria|p__Proteobacteria|c__Deltaproteobacteri a|o__DesulfovibrionalesEmma-Schwager:Lab 03 emmaschwager$ 1. What does this script do? It prints the first column of a tab-delimited text file to the standard out. 2. Why are there no newlines between the lines? Because sys.stdout.write() doesn’t add a newline. 3. Alter the script so that after each line is on its own line (instead of a concatenated jumble.) #!/usr/bin/env python import sys if __name__ == "__main__": for strLine in sys.stdin: sys.stdout.write(strLine.split(’\t’)[0] + "\n") 4. Now run your altered script on HMP_trunc.txt but redirect the output to HMP_ex4.txt using >. python lab03_ex4.py < HMP_trunc.txt > HMP_ex4.txt 11 5. What do you think would happen if you tried to run your script using python lab03_ex4.py HMP_trunc.txt? Try it and see! (Hint: CTRL-D should tell the script that you are done inputting things.) sph182-159:Lab 03 emmaschwager$ python lab03_ex4.py HMP_trunc.txt Reading and Writing Files within Python sys.argv The first task for this is to get comfortable with sys.argv. In a new text file called lab03_ex5.py, write the following: #!/usr/bin/env python import sys if __name__ == "__main__": print sys.argv This will print the arguments that Python sees from sys.argv. Now, run this script three different times: python lab03 ex5.py python lab03 ex5.py input.txt python lab03 ex5.py input.txt output.txt You should see: Emma-Schwager:Lab 03 emmaschwager$ python lab03_ex5.py [’lab03_ex5.py’] Emma-Schwager:Lab 03 emmaschwager$ python lab03_ex5.py input.txt [’lab03_ex5.py’, ’input.txt’] Emma-Schwager:Lab 03 emmaschwager$ python lab03_ex5.py input.txt output.txt [’lab03_ex5.py’, ’input.txt’, ’output.txt’] 1. What kind of object is sys.argv (integer, dictionary, etc.)? It is a list of strings. 2. Change your script so that you only print the arguments to the script but not the script name. Change print sys.argv to print sys.argv[1:] 12 Using sys.argv and open In a new text file called lab03_ex6.py type the following: #!/usr/bin/env python import sys if __name__ == "__main__": fInFile = open(sys.argv[1]) fOutFile = open(sys.argv[2], ’w’) for strLine in fInFile: fOutFile.write(strLine.split(’\t’)[0]) fInFile.close() fOutFile.close() The input file for this function will be HMP_trunc.txt and the output file will be HMP_ex6.txt. When you run this correctly, you will not see any output on the screen. 1. What does this function do? It reads in a tab-delimited file and outputs the first column of each line to the output file. 2. What is the correct command to run this file? python lab03_ex6.py HMP_trunc.txt HMP_ex6.txt 3. What does your output file look like? How could you change your code to make it more readable? It looks like the output from the original lab03 ex4.py. You could change fOutFile.write(strLine.split(’\t’)[0]) to fOutFile.write(strLine.split(’\t’)[0] + ’\n’) Exercises 1. Write a script called txt2csv.py that converts a tab-delimited (txt) file into a comma-separated value (csv) file. Choose an I/O style between: (a) python txt2csv.py < input.txt > output.csv (b) python txt2csv.py input.txt output.csv Use this script to convert HMP_trunc.txt to HMP_trunc.csv. If using I/O style 1a #!/usr/bin/env python import sys if __name__ == "__main__": for strLine in sys.stdin: sys.stdout.write(strLine.replace(’\t’,’,’)) 13 If using I/O style 1b #!/usr/bin/env python import sys if __name__ == "__main__": fInFile = open(sys.argv[1]) fOutFile = open(sys.argv[2], ’w’) for strLine in fInFile: fOutFile.write(strLine.replace(’\t’,’,’)) fInFile.close() fOutFile.close() 2. Write a script called transpose.py that takes an input file and outputs its transpose. You can assume that your input file is tab-delimited and that any missing entries are designated by a tab. Choose an I/O style between: (a) python transpose.py < input.txt > output.txt (b) python transpose.py input.txt output.txt Use this script to convert HMP_trunc.txt to HMP_trunc_transpose.txt. If using I/O style 2a #!/usr/bin/env python import sys if __name__ == "__main__": aaFile = [] for strLine in sys.stdin: aaFile.append(strLine.replace(’\n’,’’).split(’\t’)) aaTranspose = [] for iColIdx in range(len(aaFile[0])): aaTranspose.append([]) for iRowIdx in range(len(aaFile)): aaTranspose[iColIdx].append(aaFile[iRowIdx][iColIdx]) astrTranspose = [’\t’.join(aTransposeLine) for aTransposeLine in aaTranspose] for strLine in astrTranspose: sys.stdout.write(strLine + ’\n’) 14 If using I/O style 2b #!/usr/bin/env python import sys if __name__ == "__main__": fInFile = open(sys.argv[1]) fOutFile = open(sys.argv[2],’w’) aaFile = [] for strLine in fInFile: aaFile.append(strLine.replace(’\n’,’’).split(’\t’)) aaTranspose = [] for iColIdx in range(len(aaFile[0])): aaTranspose.append([]) for iRowIdx in range(len(aaFile)): aaTranspose[iColIdx].append(aaFile[iRowIdx][iColIdx]) astrTranspose = [’\t’.join(aTransposeLine) for aTransposeLine in aaTranspose] for strLine in astrTranspose: fOutFile.write(strLine + ’\n’) fInFile.close() fOutFile.close() 15