Download condition - Computational Approaches for Life Scientists

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Computational Approaches
for Life Scientists
Faculty of Biology
Technion
Spring 2013
Amir Rubinstein
Python basics 2
Outline
• Python basics:
• types in Python
• input and output
• operators
• functions
• conditions
• loops – while, for
• sequences:
- strings
- lists
2
Getting started with Python
3
Let's see some real Python first
dna = input("please enter a dna sequence\n")
A = dna.count("A")
T = dna.count("T")
C = dna.count("G")
G = dna.count("C")
print ( 100*(C+G)/(A+T+C+G) )
def ambiguous_bases_count(dna):
amb = 0 #bases != a,t,g,c
for base in dna:
if base not in "atgc":
amb += 1
return amb
def check_dna(dna, alphabet="atgc"):
for base in dna:
if base not in alphabet:
return False
return True
4
Let's see some real Python first
standard = {
'ttt': 'F',
'ttc': 'F',
'tta': 'L',
'ttg': 'L',
'ctt': 'L',
'ctc': 'L',
'cta': 'L',
'ctg': 'L',
'att': 'I',
'atc': 'I',
'ata': 'I',
'atg': 'M',
'gtt': 'V',
'gtc': 'V',
'gta': 'V',
'gtg': 'V',
5
'tct':
'tcc':
'tca':
'tcg':
'cct':
'ccc':
'cca':
'ccg':
'act':
'acc':
'aca':
'acg':
'gct':
'gcc':
'gca':
'gcg':
'S',
'S',
'S',
'S',
'P',
'P',
'P',
'P',
'T',
'T',
'T',
'T',
'A',
'A',
'A',
'A',
'tat':
'tac':
'taa':
'tag':
'cat':
'cac':
'caa':
'cag':
'aat':
'aac':
'aaa':
'aag':
'gat':
'gac':
'gaa':
'gag':
'Y',
'Y',
'*',
'*',
'H',
'H',
'Q',
'Q',
'N',
'N',
'K',
'K',
'D',
'D',
'E',
'E',
'tgt':
'tgc':
'tca':
'tcg':
'cgt':
'cgc':
'cga':
'cgg':
'agt':
'agc':
'aga':
'agg':
'ggt':
'ggc':
'gga':
'ggg':
'C',
'C',
'*',
'W',
'R',
'R',
'R',
'R',
'S',
'S',
'R',
'R',
'G',
'G',
'G',
'G'
}
def dna_translate(cdna, code=standard):
""" translate a cDNA sequence to a protein """
prot = ""
for i in range(0,len(cdna),3):
prot += code.get(cdna[i:i+3], "?")
return prot
Let’s run (some code)!
•
You can find all these pieces of code on the website – in the file basics.py
•
Opening .py files for editing:
•
•
right click  edit with IDLE
•
double click will not open the file, just run the program
Running py files:
•
6
after opening, Run  Run Module (or press F5)
Interactive (shell) vs. Script mode
•
When we open IDLE we get the shell mode,
also called Interactive mode .
This is a "ping-pong" mode.
we can run 1 command at a time.
Very convinient for short calculations.
•
script mode enables writing the whole program first, saving it in a .py
file, and only then running it line by line.
to work in script mode:
1. File  New Window
2. write your program
3. Run  Run Module (F5)
You will be asked to save your program.
7
Make sure the file name ends with .py
8
Basic types in python
•
Python has classes, or types of objects.
An instance is a specific object of a class. Instances of the same class
share some common features.
type
meaning
int
•
9
Integers
float
Real numbers
bool
Boolean
Examples for instance
…, -3, -2, -1, 0, 1, 2, 3, …
x
3.14, -0.001 , …
True, False
str
String (sequence of
characters)
list
Sequence of
elements
func
Function
"amir", "!" , "45", "3.14" , …
flag
msg
[ 1, "hello" , -0.25, False, ["a", "b"] ]
grades
print, input, len, …
Variables are names given to objects.
x=5
flag = True
msg = “hello”
grades = [90, 85, 99, 76, 31]
Basic types in python
•
To check the type of an object, we can use the function type(…).
>>> type(6)
<class 'int'>
>>> x=5.7
>>> type(x)
<class 'float'>
>>> type("biology")
<class 'str'>
>>> type([4,5,"DNA", x])
<class 'list'>
•
10
In Python, types can change (not only values)!
>>> x=5.7
>>> type(x)
<class 'float'>
>>> x="drosophila"
>>> type(x)
<class 'str'>
Type conversions
•
Python allows any “reasonable” type conversion.
using conversion functions such as:
int(), float(), str(), list(), …
>>> int("6")
6
>>> int(6.7) #rounds down
6
>>> float("3.14")
3.14
>>> str(45)
'45'
>>> int("a")
Traceback (most recent call last):
File "<pyshell#64>", line 1, in <module>
int("a")
ValueError: invalid literal for int() with base 10: 'a'
11
The functions input(…) and print(…)
>>> print("Hello")
Hello
>>> print("Hello", "class!")
Hello class!
>>> course = "'Comp. App. for Life Scientists'"
>>> print("Hello", course, "students")
Hello 'Comp. App. for Life Scientists' students
>>> name = input("What's your name? ")
What's your name? Amir
>>> print(name, type(name))
Amir <class 'str'>
>>> grade = int(input("What's your grade? "))
What's your grade? 90
new = min(100, round(1.05*grade))
>>> print("Factorized grade is", new, "!")
Factorized grade is 95 !
12
input(…) always
returns a string
Numerical operators
operator
13
meaning
Examples
-
unary minus
-5
+
plus
x+7
-
binary minus
6-y
*
multiplication
x*y
/
float division
28/10 = 2.8
//
int division
28//10 = 2
%
modulo
10%2 = 0
**
power
2**10 = 1024
Let’s run (some code)!
write a program (in script mode) that gets a temperature in Celcius, and
prints the temperature in Fahrenheit.
The formula: ºF = ºC*1.8 + 32
See the next slide on functions.
Now turn your program into a function, run* the code and call your
function.
14
* after pressing F5 – a shell windown should be opened, but no commands
should be executed, until you actually call you function.
Functions
def func_name (parameters) :
<-tab-> statement
statement
…
function executes
until the first
return statement
dna = input("please enter a dna sequence\n")
A = dna.count("A")
T = dna.count("T")
program calls
C = dna.count("G")
built-in functions:
G = dna.count("C")
input, count, print
print ( 100*(C+G)/(A+T+C+G) )
make sure you
still remember
and understand
why using
functions is
good.
15
def GC_content(dna):
"""calculate CG% in string dna"""
A = dna.count("A")
user-defined
T = dna.count("T")
function calls builtC = dna.count("G")
in function count)
G = dna.count("C")
return 100*(C+G)/(A+T+C+G)
Function classification
•
functions can be diveded into these categories:
•
user-defined
def func_name :
…
•
built-in
come "free" with python
print, input, type, int, float,…
some built-in functions are bound
to some class.
example: dna.count(…)
the function count(…) of class str.
•
imported
will see this soon.
requires import statement.
(like #include in C)
16
Conditionals
if condition :
<-tab-> statement
statement
…
elif condition :
# 0 or more
<-tab-> statement
…
else:
# optional
<-tab-> statement
…
Can use these operators in the
conditions:
comparison operators:
==, !=, >, <, >=, <=
logical operator:
and, or, not
write a function that gets the room temperature, and three state transition
temperatures of a polymer:
Tg (glass transition), Tm (melting), Tv (vaporization).
The function will print the state of the polymer: solid or liquid.
if in solid state, also print glassy state: glassy (< Tg) or rubbery .
17
A word about tabs
•
In python, tabs are very important!
•
they determine the scope.
•
Tip: take care when copying-pasting from other sources.
sometimes the code will not work because tabs were ruined.
18
Sequences
•
Sequences are highly important in our biological context.
•
We mentioned so far 2 kinds of sequences:
-
Strings (sequences of characters)
multi lines:
"GCTTA" or 'GCTTA'
"""GTAT
GGTA
TCCG"""
•
Lists (sequences of any instances)
There are some common properties to strings and lists (and other sequences).
We’ll se these first, and then some differences.
19
[1, 2, 3]
Strings and Lists commonalities
-
Some built-in python function
are applicable to both str and list:
len(seq)
min(seq)
max(seq)
-
20
indexing :
seq[i] 0≤i<len(seq)
including higher dimension indexing:
seq[i][j][…]
-
slicing:
seq[a:b:c]
-
membership test:
element in seq
-
concatenation (+)
seq1 + seq2
-
duplication (*)
seq * int
-
comparison (==)
seq1 == seq2
seq1 != seq2
Strings and Lists commonalities (2)
•
loops: we can iterate over a sequence using a for loop.
for name in sequence :
<-tab-> statement
statement
…
statement #out of loop
…
name is assigned
sequences element
one by one.
def countA(dna):
cntA = 0
for base in dna:
if base == "A":
cntA += 1
return cntA
21
Loops: for and while
for name in sequence :
<-tab-> statement
statement
…
statement #out of loop
…
def countA(dna):
cntA = 0
for base in dna:
if base == "A":
cntA += 1
return cntA
22
while condition :
<-tab-> statement
statement
…
statement #out of loop
…
def countA(dna):
cntA = 0
i=0
while i < len(dna):
if dna[i] == "A":
cntA += 1
i += 1
return Acnt
One of the statement must
affect the condition
(what would happen if not?)
Strings and Lists differences
•
Strings consist only of characters. Lists consist anything.
•
lists are mutable (we can mutate their inner elements)
strings are immutable.
>>> dna = "AGGACGATTAGACG"
>>> dna[4] = "G"
Traceback (most recent call last):
File "<pyshell#43>", line 1, in <module>
dna[4] = "G"
TypeError: 'str' object does not support item assignment
>>> Glycine = ["GGU", "GGC", "GGA", "GGg"]
>>> Glycine[3][2] = "G"
Traceback (most recent call last):
File "<pyshell#45>", line 1, in <module>
Glycine[3][2] = "G"
TypeError: 'str' object does not support item assignment
>>> Glycine[3] = "GGG"
>>> Glycine
['GGU', 'GGC', 'GGA', 'GGG']
23
Strings and Lists differences (2)
•
Each of these types has its own functions that are bound to it.
You can get to list of a type's available functions using:
-
help command on that type
-
manuals and sources on the web
-
or simply by typing the name of a class (or an instance of it) followed by a
dot (.), and then tab.
a list of functions bound to that class will open.
•
24
However there are functions in both classes: count(…) and index(…)
Conversion between lists and strings
•
Converting str to list is easy:
>>> list("TATAAA")
['T', 'A', 'T', 'A', 'A', 'A']
•
What about the opposite?
>>> str(['T', 'A', 'T', 'A', 'A', 'A'])
"['T', 'A', 'T', 'A', 'A', 'A']"
not exactly what we expected.
solution: str class has a function join:
seperator_string.join(list_of_characters)
>>> "w".join(['T', 'A', 'T', 'A', 'A', 'A'])
'TwAwTwAwAwA'
>>> "".join(['T', 'A', 'T', 'A', 'A', 'A'])
'TATAAA'
25
Exercise
Generalize the GC_content function.
instead of getting 1 parameter and returning the CG%, it should get two
parameters:
-
seq: a string representing DNA / protein or any other sequence
-
items: a list containing characters we want to count.
the function should return the total % of all the characters in the list.
>>> count("TATTA", ["T"])
60.0
>>> count("TATTA", ["t", "T"])
60.0
>>> count("TATTA", ["t", "a", "g", "c"])
0.0
>>> count("RPKPQQFFGLM", ["Q", "F"])
36.36363636363637
26
What's left (Python basics)?
-
files
-
dictionaries
-
libraries and import
One can go on and develop expertese in Python.
This is not our goal in this course.
But make sure you feel comfortable with what we've learned.
27
What to do after this class - reminder
28