Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Computational Approaches for Life Scientists Faculty of Biology Technion Spring 2013 Amir Rubinstein Python basics 2 Outline • Python basics: • types in Python • input and output • operators • functions • conditions • loops – while, for • sequences: - strings - lists 2 Getting started with Python 3 Let's see some real Python first dna = input("please enter a dna sequence\n") A = dna.count("A") T = dna.count("T") C = dna.count("G") G = dna.count("C") print ( 100*(C+G)/(A+T+C+G) ) def ambiguous_bases_count(dna): amb = 0 #bases != a,t,g,c for base in dna: if base not in "atgc": amb += 1 return amb def check_dna(dna, alphabet="atgc"): for base in dna: if base not in alphabet: return False return True 4 Let's see some real Python first standard = { 'ttt': 'F', 'ttc': 'F', 'tta': 'L', 'ttg': 'L', 'ctt': 'L', 'ctc': 'L', 'cta': 'L', 'ctg': 'L', 'att': 'I', 'atc': 'I', 'ata': 'I', 'atg': 'M', 'gtt': 'V', 'gtc': 'V', 'gta': 'V', 'gtg': 'V', 5 'tct': 'tcc': 'tca': 'tcg': 'cct': 'ccc': 'cca': 'ccg': 'act': 'acc': 'aca': 'acg': 'gct': 'gcc': 'gca': 'gcg': 'S', 'S', 'S', 'S', 'P', 'P', 'P', 'P', 'T', 'T', 'T', 'T', 'A', 'A', 'A', 'A', 'tat': 'tac': 'taa': 'tag': 'cat': 'cac': 'caa': 'cag': 'aat': 'aac': 'aaa': 'aag': 'gat': 'gac': 'gaa': 'gag': 'Y', 'Y', '*', '*', 'H', 'H', 'Q', 'Q', 'N', 'N', 'K', 'K', 'D', 'D', 'E', 'E', 'tgt': 'tgc': 'tca': 'tcg': 'cgt': 'cgc': 'cga': 'cgg': 'agt': 'agc': 'aga': 'agg': 'ggt': 'ggc': 'gga': 'ggg': 'C', 'C', '*', 'W', 'R', 'R', 'R', 'R', 'S', 'S', 'R', 'R', 'G', 'G', 'G', 'G' } def dna_translate(cdna, code=standard): """ translate a cDNA sequence to a protein """ prot = "" for i in range(0,len(cdna),3): prot += code.get(cdna[i:i+3], "?") return prot Let’s run (some code)! • You can find all these pieces of code on the website – in the file basics.py • Opening .py files for editing: • • right click edit with IDLE • double click will not open the file, just run the program Running py files: • 6 after opening, Run Run Module (or press F5) Interactive (shell) vs. Script mode • When we open IDLE we get the shell mode, also called Interactive mode . This is a "ping-pong" mode. we can run 1 command at a time. Very convinient for short calculations. • script mode enables writing the whole program first, saving it in a .py file, and only then running it line by line. to work in script mode: 1. File New Window 2. write your program 3. Run Run Module (F5) You will be asked to save your program. 7 Make sure the file name ends with .py 8 Basic types in python • Python has classes, or types of objects. An instance is a specific object of a class. Instances of the same class share some common features. type meaning int • 9 Integers float Real numbers bool Boolean Examples for instance …, -3, -2, -1, 0, 1, 2, 3, … x 3.14, -0.001 , … True, False str String (sequence of characters) list Sequence of elements func Function "amir", "!" , "45", "3.14" , … flag msg [ 1, "hello" , -0.25, False, ["a", "b"] ] grades print, input, len, … Variables are names given to objects. x=5 flag = True msg = “hello” grades = [90, 85, 99, 76, 31] Basic types in python • To check the type of an object, we can use the function type(…). >>> type(6) <class 'int'> >>> x=5.7 >>> type(x) <class 'float'> >>> type("biology") <class 'str'> >>> type([4,5,"DNA", x]) <class 'list'> • 10 In Python, types can change (not only values)! >>> x=5.7 >>> type(x) <class 'float'> >>> x="drosophila" >>> type(x) <class 'str'> Type conversions • Python allows any “reasonable” type conversion. using conversion functions such as: int(), float(), str(), list(), … >>> int("6") 6 >>> int(6.7) #rounds down 6 >>> float("3.14") 3.14 >>> str(45) '45' >>> int("a") Traceback (most recent call last): File "<pyshell#64>", line 1, in <module> int("a") ValueError: invalid literal for int() with base 10: 'a' 11 The functions input(…) and print(…) >>> print("Hello") Hello >>> print("Hello", "class!") Hello class! >>> course = "'Comp. App. for Life Scientists'" >>> print("Hello", course, "students") Hello 'Comp. App. for Life Scientists' students >>> name = input("What's your name? ") What's your name? Amir >>> print(name, type(name)) Amir <class 'str'> >>> grade = int(input("What's your grade? ")) What's your grade? 90 new = min(100, round(1.05*grade)) >>> print("Factorized grade is", new, "!") Factorized grade is 95 ! 12 input(…) always returns a string Numerical operators operator 13 meaning Examples - unary minus -5 + plus x+7 - binary minus 6-y * multiplication x*y / float division 28/10 = 2.8 // int division 28//10 = 2 % modulo 10%2 = 0 ** power 2**10 = 1024 Let’s run (some code)! write a program (in script mode) that gets a temperature in Celcius, and prints the temperature in Fahrenheit. The formula: ºF = ºC*1.8 + 32 See the next slide on functions. Now turn your program into a function, run* the code and call your function. 14 * after pressing F5 – a shell windown should be opened, but no commands should be executed, until you actually call you function. Functions def func_name (parameters) : <-tab-> statement statement … function executes until the first return statement dna = input("please enter a dna sequence\n") A = dna.count("A") T = dna.count("T") program calls C = dna.count("G") built-in functions: G = dna.count("C") input, count, print print ( 100*(C+G)/(A+T+C+G) ) make sure you still remember and understand why using functions is good. 15 def GC_content(dna): """calculate CG% in string dna""" A = dna.count("A") user-defined T = dna.count("T") function calls builtC = dna.count("G") in function count) G = dna.count("C") return 100*(C+G)/(A+T+C+G) Function classification • functions can be diveded into these categories: • user-defined def func_name : … • built-in come "free" with python print, input, type, int, float,… some built-in functions are bound to some class. example: dna.count(…) the function count(…) of class str. • imported will see this soon. requires import statement. (like #include in C) 16 Conditionals if condition : <-tab-> statement statement … elif condition : # 0 or more <-tab-> statement … else: # optional <-tab-> statement … Can use these operators in the conditions: comparison operators: ==, !=, >, <, >=, <= logical operator: and, or, not write a function that gets the room temperature, and three state transition temperatures of a polymer: Tg (glass transition), Tm (melting), Tv (vaporization). The function will print the state of the polymer: solid or liquid. if in solid state, also print glassy state: glassy (< Tg) or rubbery . 17 A word about tabs • In python, tabs are very important! • they determine the scope. • Tip: take care when copying-pasting from other sources. sometimes the code will not work because tabs were ruined. 18 Sequences • Sequences are highly important in our biological context. • We mentioned so far 2 kinds of sequences: - Strings (sequences of characters) multi lines: "GCTTA" or 'GCTTA' """GTAT GGTA TCCG""" • Lists (sequences of any instances) There are some common properties to strings and lists (and other sequences). We’ll se these first, and then some differences. 19 [1, 2, 3] Strings and Lists commonalities - Some built-in python function are applicable to both str and list: len(seq) min(seq) max(seq) - 20 indexing : seq[i] 0≤i<len(seq) including higher dimension indexing: seq[i][j][…] - slicing: seq[a:b:c] - membership test: element in seq - concatenation (+) seq1 + seq2 - duplication (*) seq * int - comparison (==) seq1 == seq2 seq1 != seq2 Strings and Lists commonalities (2) • loops: we can iterate over a sequence using a for loop. for name in sequence : <-tab-> statement statement … statement #out of loop … name is assigned sequences element one by one. def countA(dna): cntA = 0 for base in dna: if base == "A": cntA += 1 return cntA 21 Loops: for and while for name in sequence : <-tab-> statement statement … statement #out of loop … def countA(dna): cntA = 0 for base in dna: if base == "A": cntA += 1 return cntA 22 while condition : <-tab-> statement statement … statement #out of loop … def countA(dna): cntA = 0 i=0 while i < len(dna): if dna[i] == "A": cntA += 1 i += 1 return Acnt One of the statement must affect the condition (what would happen if not?) Strings and Lists differences • Strings consist only of characters. Lists consist anything. • lists are mutable (we can mutate their inner elements) strings are immutable. >>> dna = "AGGACGATTAGACG" >>> dna[4] = "G" Traceback (most recent call last): File "<pyshell#43>", line 1, in <module> dna[4] = "G" TypeError: 'str' object does not support item assignment >>> Glycine = ["GGU", "GGC", "GGA", "GGg"] >>> Glycine[3][2] = "G" Traceback (most recent call last): File "<pyshell#45>", line 1, in <module> Glycine[3][2] = "G" TypeError: 'str' object does not support item assignment >>> Glycine[3] = "GGG" >>> Glycine ['GGU', 'GGC', 'GGA', 'GGG'] 23 Strings and Lists differences (2) • Each of these types has its own functions that are bound to it. You can get to list of a type's available functions using: - help command on that type - manuals and sources on the web - or simply by typing the name of a class (or an instance of it) followed by a dot (.), and then tab. a list of functions bound to that class will open. • 24 However there are functions in both classes: count(…) and index(…) Conversion between lists and strings • Converting str to list is easy: >>> list("TATAAA") ['T', 'A', 'T', 'A', 'A', 'A'] • What about the opposite? >>> str(['T', 'A', 'T', 'A', 'A', 'A']) "['T', 'A', 'T', 'A', 'A', 'A']" not exactly what we expected. solution: str class has a function join: seperator_string.join(list_of_characters) >>> "w".join(['T', 'A', 'T', 'A', 'A', 'A']) 'TwAwTwAwAwA' >>> "".join(['T', 'A', 'T', 'A', 'A', 'A']) 'TATAAA' 25 Exercise Generalize the GC_content function. instead of getting 1 parameter and returning the CG%, it should get two parameters: - seq: a string representing DNA / protein or any other sequence - items: a list containing characters we want to count. the function should return the total % of all the characters in the list. >>> count("TATTA", ["T"]) 60.0 >>> count("TATTA", ["t", "T"]) 60.0 >>> count("TATTA", ["t", "a", "g", "c"]) 0.0 >>> count("RPKPQQFFGLM", ["Q", "F"]) 36.36363636363637 26 What's left (Python basics)? - files - dictionaries - libraries and import One can go on and develop expertese in Python. This is not our goal in this course. But make sure you feel comfortable with what we've learned. 27 What to do after this class - reminder 28