Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Computational Approaches for Life Scientists Faculty of Biology Technion Spring 2014 Amir Rubinstein Python basics 2 Lecture 1 highlights • A reminder about some programming concepts • Python – installation, sources • Getting started with Python: • Functions, loops, conditionals, types, operations • input and output, built-in functions • Sequences: strings, lists 2 Lecture 2 - Outline • More containers in python • str • list • range • dictionary • set • tuple • Files • import modules • random, math, time,… 3 Sequences (reminder) • Sequences are highly important in our biological context. • We mentioned so far 2 kinds of sequences: - Strings (sequences of characters) multi line strings: "GCTTA" or 'GCTTA' """GTAT GGTA TCCG""" • Lists (sequences of any instances) There are some common properties to strings and lists (and other sequences). We’ll se these first, and then some differences. 4 [1, 2, 3] Strings and Lists commonalities - Some built-in python function len(seq) are applicable to both str and list: min(seq) max(seq) - indexing : seq[i] 0≤i<len(seq) - slicing: seq[a:b:c] - concatenation (+) seq1 + seq2 - duplication (*) seq * int - comparison (==) seq1 == seq2 seq1 != seq2 Can be 5 - membership test: if element in seq - looping: for element in seq: True/False Strings and Lists commonalities (2) • loops: we can iterate over sequences for name in sequence : <-tab-> statement statement … statement #out of loop … name is assigned sequences element one by one. >>> lst = [1, 2, 3.7, "abc"] >>> for elem in lst: print(elem*2) >>> st = "gctt" >>> for b in st: print(b*2) 2 4 7.4 abcabc gg cc tt tt 6 Loops: for and while for name in sequence : <-tab-> statement statement … statement #out of loop … def countA(dna): cntA = 0 for base in dna: if base == "A": cntA += 1 return cntA 7 while condition : <-tab-> statement statement … statement #out of loop … def countA(dna): cntA = 0 i=0 while i < len(dna): if dna[i] == "A": cntA += 1 i += 1 return cntA One of the statement must affect the condition (what would happen if not?) Strings and Lists differences • Strings consist only of characters. Lists consist of anything. • lists are mutable (we can mutate their inner elements) strings are immutable. >>> dna = "AGGACGATTAGACG" >>> dna[4] = "G" Traceback (most recent call last): File "<pyshell#43>", line 1, in <module> dna[4] = "G" TypeError: 'str' object does not support item assignment >>> Glycine = ["GGU", "GGC", "GGA", "GGg"] >>> Glycine[3][2] = "G" Traceback (most recent call last): File "<pyshell#45>", line 1, in <module> Glycine[3][2] = "G" TypeError: 'str' object does not support item assignment >>> Glycine[3] = "GGG" >>> Glycine ['GGU', 'GGC', 'GGA', 'GGG'] 8 Strings and Lists differences (2) • Each of these types has its own methods. You can get to list of a type's methods using: - help command on that type - manuals and sources on the web - or simply by typing the name of a class (or an instance of it) followed by a dot (.), and then tab. a list of functions bound to that class will open. • 9 Some methods are in both classes: count(…) and index(…) Conversion between lists and strings • Converting str to list is easy: >>> list("TATAAA") ['T', 'A', 'T', 'A', 'A', 'A'] • What about the opposite? >>> str(['T', 'A', 'T', 'A', 'A', 'A']) "['T', 'A', 'T', 'A', 'A', 'A']" not exactly what we expected. solution: str class has a function join: seperator_string.join(list_of_characters) 10 >>> "w".join(['T', 'A', 'T', 'A', 'A', 'A']) 'TwAwTwAwAwA' >>> "".join(['T', 'A', 'T', 'A', 'A', 'A']) 'TATAAA' >>> str.join("",['T', 'A', 'T', 'A', 'A', 'A']) #same same 'TATAAA' Class exercise Generalize the GC_content function. instead of getting 1 parameter and returning the CG%, it should get two parameters: - seq: a string representing DNA / protein or any other sequence - items: a list containing characters we want to count. The function should return the total % of all characters in the list which belong to items. >>> count("TATTA", ["T"]) 60.0 >>> count("TATTA", ["t", "T"]) 60.0 >>> count("TATTA", ["t", "a", "g", "c"]) 0.0 >>> count("RPKPQQFFGLM", ["Q", "F"]) 36.36363636363637 11 More containers in Python • • Python has several other built-in collection types. • very useful for various needs • but one needs to choose the appropriate collection that fits the problem Here's a summary: Mutable Ordered (sequences) list [1, 2, "hello"] (can mutate inner elements) Unordered set {1, 2, "hello"} immutable elements dictionary Immutable (can only change the whole container to another location in memory) 12 string tuple range "TGAAC" (1, 2, "hello") range(a,b,c) {1:3, 2:7.34, "hello":"bye"} immutable keys tuple • () • Tuples are much like lists, but immutable >>> t = (1,2,3) >>> type(t) <class 'tuple'> >>> t[2] 3 >>> t[2]=7 Traceback (most recent call last): File "<pyshell#27>", line 1, in <module> t[2]=7 TypeError: 'tuple' object does not support item assignment 13 range • range(a,b) • range(a,b,c) • defaults: a=0, c=1 a, a+1, a+2, a+3, … a, a+c, a+2c, a+3c,… < b >>> for i in range(2,4): print(i**2) 4 9 >>> for i in range(4): print(i**2) 0 1 4 9 >>> for i in range(0,4,2): print(i**2) 0 4 14 <b (last number = b-1) range (cont.) • How can we tell Python to "do something 100 times"? for i in range(100): ... • 15 Two easy ways to go over a sequence: lst = [1, 2, 3.7, "abc"] for elem in lst: print(elem*2) lst = [1, 2, 3.7, "abc"] for i in range(len(lst)) : print(i, lst[i]*2) 2 4 7.4 abcabc 0 1 2 3 2 4 7.4 abcabc When we need access to positions, not only elements • • • • • 16 set {} unordered (no indexing) no repetitions elements must be immutable (e.g. str, tuple, int, float) support operations such as union(), intersection(), etc. >>> s = {"a", "b", 5} check these >>> type(s) yourselves <class 'set'> >>> s[0] Traceback (most recent call last): File "<pyshell#31>", line 1, in <module> s[0] TypeError: 'set' object does not support indexing >>> s.union({4,5,6}) {'a', 'b', 4, 5, 6} >>> s2 = {"a", "b", "c"} >>> s3 = s.intersection(s2) >>> s3 {'a', 'b'} convert to a set: easy >>> s way to remove {'a', 'b', 5} repetitions >>> set([1,2,3,1,1,2,3,2,1]) {1, 2, 3} dict (dictionary) • • • • like sets, but contain pairs of key:value unordered no repeating keys (values may repeat) keys must be immutable standard = { 'ttt': 'F', 'ttc': 'F', 'tta': 'L', 'ttg': 'L', 'ctt': 'L', 'ctc': 'L', 'cta': 'L', 'ctg': 'L', 'att': 'I', 'atc': 'I', 'ata': 'I', 'atg': 'M', 'gtt': 'V', 'gtc': 'V', 'gta': 'V', 'gtg': 'V', 17 'tct': 'tcc': 'tca': 'tcg': 'cct': 'ccc': 'cca': 'ccg': 'act': 'acc': 'aca': 'acg': 'gct': 'gcc': 'gca': 'gcg': 'S', 'S', 'S', 'S', 'P', 'P', 'P', 'P', 'T', 'T', 'T', 'T', 'A', 'A', 'A', 'A', 'tat': 'tac': 'taa': 'tag': 'cat': 'cac': 'caa': 'cag': 'aat': 'aac': 'aaa': 'aag': 'gat': 'gac': 'gaa': 'gag': dictionaries map keys to values 'Y', 'Y', '*', '*', 'H', 'H', 'Q', 'Q', 'N', 'N', 'K', 'K', 'D', 'D', 'E', 'E', 'tgt': 'tgc': 'tga': 'tgg': 'cgt': 'cgc': 'cga': 'cgg': 'agt': 'agc': 'aga': 'agg': 'ggt': 'ggc': 'gga': 'ggg': 'C', 'C', '*', 'W', 'R', 'R', 'R', 'R', 'S', 'S', 'R', 'R', 'G', 'G', 'G', 'G' } dict (cont.) • • [] is used to map keyvalue support operations such as keys(), values(), etc. standard = { 'ttt': 'F', 'tct': 'S', 'tat': 'Y', 'tgt': 'C', 'ttc': 'F', 'tcc': 'S', 'tac': 'Y', 'tgc': 'C', 'tta': 'L', 'tca': 'S', 'taa': '*', 'tga': '*', ... >>> standard['tat'] 'Y' >>> standard['amir'] Traceback (most recent call last): File "<pyshell#1>", line 1, in <module> standard['amir'] KeyError: 'amir' >>> 'tat' in standard True >>> standard['tat'] = ['Y','y'] >>> standard['tat'] ['Y', 'y'] 18 get the value of key='tat' Dictionaries - molecular weight • Another example: protein molecular weight def molWeight(protseq): protweight = {"A":89,"V":117,"L":131,"I":131,"P":115,"F":165, "W":204,"M":149,"G":75,"S":105,"C":121,"T":119, "Y":181,"N":132,"Q":146,"D":133,"E":147, "K":146,"R":174,"H":155 } totalW = 0 protseq = protseq.upper() #so proteins in lower case also OK for aa in protseq: if aa in protweight: totalW = totalW + protweight[aa] totalW = totalW-(18*(len(protseq)-1)) return totalW 19 Molecular weight – better style • Another example: protein molecular weight def molWeight(protseq): protweight = {"A":89,"V":117,"L":131,"I":131,"P":115,"F":165, "W":204,"M":149,"G":75,"S":105,"C":121,"T":119, "Y":181,"N":132,"Q":146,"D":133,"E":147, "K":146,"R":174,"H":155 } totalW = 0 protseq = protseq.upper() #so proteins in lower case also OK for aa in protseq: if aa in protweight: totalW = totalW + protweight[aa] H2O = 18 #weight of water molecule totalW = totalW-(H2O*(len(protseq)-1))#reduce weight for amino bonds return totalW 20 Files • • Why do we want to know how to work with files? • easy way to save and read data • especially when data is too large to visualize on screen basic file operations in python: open, readline, read, close my_file = open("...") path to file my_file = open("...", 'w') tip: .\ for current directory ..\ for containing directory tab to see what's in the directory open to write seq = my_file.readline() #read next line (until '\n') as str seq = my_file.read() my_file.close() 21 #read ALL lines of file might be huge! causion before printing import • Some functionalities of Python require declaring the use of existing "libraries", or modules. • for example: the modules random, time, math,… >>> >>> 'A' >>> 'G' >>> >>> import random random.choice("ATCG") random.choice("ATCG") seq = "" for i in range(10): seq += random.choice("ATCG") >>> seq 'AGCCACCCGA' >>> random.random() #random number from [0,1) 0.7601425767073609 >>> >>> >>> [2, 22 lst = [1,2,3,4] random.shuffle(lst) lst 1, 3, 4] What to do after this class - reminder One can go on and develop expertise in Python. This is not our goal in this course. But make sure you feel comfortable with what we've learned. 23 Next time… A biological (metabolic) network Metabolic and amino acid biosynthesis pathways of yeast Schryer et al. BMC Systems Biology, no. 1. p. 81 (2011) 24 A graph (CS notion)