Download Files - Computational Approaches for Life Scientists

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Computational Approaches
for Life Scientists
Faculty of Biology
Technion
Spring 2014
Amir Rubinstein
Python basics 2
Lecture 1 highlights
• A reminder about some programming concepts
• Python – installation, sources
• Getting started with Python:
• Functions, loops, conditionals, types, operations
• input and output, built-in functions
• Sequences: strings, lists
2
Lecture 2 - Outline
• More containers in python
• str
• list
• range
• dictionary
• set
• tuple
• Files
• import modules
• random, math, time,…
3
Sequences (reminder)
•
Sequences are highly important in our biological context.
•
We mentioned so far 2 kinds of sequences:
-
Strings (sequences of characters)
multi line strings:
"GCTTA" or 'GCTTA'
"""GTAT
GGTA
TCCG"""
•
Lists (sequences of any instances)
There are some common properties to strings and lists (and other sequences).
We’ll se these first, and then some differences.
4
[1, 2, 3]
Strings and Lists commonalities
-
Some built-in python function
len(seq)
are applicable to both str and list:
min(seq)
max(seq)
-
indexing :
seq[i] 0≤i<len(seq)
-
slicing:
seq[a:b:c]
-
concatenation (+)
seq1 + seq2
-
duplication (*)
seq * int
-
comparison (==)
seq1 == seq2
seq1 != seq2
Can be
5
-
membership test:
if element in seq
-
looping:
for element in seq:
True/False
Strings and Lists commonalities (2)
•
loops: we can iterate over sequences
for name in sequence :
<-tab-> statement
statement
…
statement #out of loop
…
name is assigned
sequences element
one by one.
>>> lst = [1, 2, 3.7, "abc"]
>>> for elem in lst:
print(elem*2)
>>> st = "gctt"
>>> for b in st:
print(b*2)
2
4
7.4
abcabc
gg
cc
tt
tt
6
Loops: for and while
for name in sequence :
<-tab-> statement
statement
…
statement #out of loop
…
def countA(dna):
cntA = 0
for base in dna:
if base == "A":
cntA += 1
return cntA
7
while condition :
<-tab-> statement
statement
…
statement #out of loop
…
def countA(dna):
cntA = 0
i=0
while i < len(dna):
if dna[i] == "A":
cntA += 1
i += 1
return cntA
One of the statement must
affect the condition
(what would happen if not?)
Strings and Lists differences
•
Strings consist only of characters. Lists consist of anything.
•
lists are mutable (we can mutate their inner elements)
strings are immutable.
>>> dna = "AGGACGATTAGACG"
>>> dna[4] = "G"
Traceback (most recent call last):
File "<pyshell#43>", line 1, in <module>
dna[4] = "G"
TypeError: 'str' object does not support item assignment
>>> Glycine = ["GGU", "GGC", "GGA", "GGg"]
>>> Glycine[3][2] = "G"
Traceback (most recent call last):
File "<pyshell#45>", line 1, in <module>
Glycine[3][2] = "G"
TypeError: 'str' object does not support item assignment
>>> Glycine[3] = "GGG"
>>> Glycine
['GGU', 'GGC', 'GGA', 'GGG']
8
Strings and Lists differences (2)
•
Each of these types has its own methods.
You can get to list of a type's methods using:
-
help command on that type
-
manuals and sources on the web
-
or simply by typing the name of a class (or an instance of it) followed by a
dot (.), and then tab.
a list of functions bound to that class will open.
•
9
Some methods are in both classes:
count(…) and index(…)
Conversion between lists and strings
•
Converting str to list is easy:
>>> list("TATAAA")
['T', 'A', 'T', 'A', 'A', 'A']
•
What about the opposite?
>>> str(['T', 'A', 'T', 'A', 'A', 'A'])
"['T', 'A', 'T', 'A', 'A', 'A']"
not exactly what we expected.
solution: str class has a function join:
seperator_string.join(list_of_characters)
10
>>> "w".join(['T', 'A', 'T', 'A', 'A', 'A'])
'TwAwTwAwAwA'
>>> "".join(['T', 'A', 'T', 'A', 'A', 'A'])
'TATAAA'
>>> str.join("",['T', 'A', 'T', 'A', 'A', 'A']) #same same
'TATAAA'
Class exercise
Generalize the GC_content function.
instead of getting 1 parameter and returning the CG%, it should get two
parameters:
-
seq: a string representing DNA / protein or any other sequence
-
items: a list containing characters we want to count.
The function should return the total % of all characters in the list which
belong to items.
>>> count("TATTA", ["T"])
60.0
>>> count("TATTA", ["t", "T"])
60.0
>>> count("TATTA", ["t", "a", "g", "c"])
0.0
>>> count("RPKPQQFFGLM", ["Q", "F"])
36.36363636363637
11
More containers in Python
•
•
Python has several other built-in collection types.
•
very useful for various needs
•
but one needs to choose the appropriate collection that fits the problem
Here's a summary:
Mutable
Ordered (sequences)
list
[1, 2, "hello"]
(can mutate inner
elements)
Unordered
set
{1, 2, "hello"}
immutable elements
dictionary
Immutable
(can only change
the whole
container to
another location
in memory)
12
string
tuple
range
"TGAAC"
(1, 2, "hello")
range(a,b,c)
{1:3, 2:7.34, "hello":"bye"}
immutable keys
tuple
•
()
•
Tuples are much like lists, but immutable
>>> t = (1,2,3)
>>> type(t)
<class 'tuple'>
>>> t[2]
3
>>> t[2]=7
Traceback (most recent call last):
File "<pyshell#27>", line 1, in <module>
t[2]=7
TypeError: 'tuple' object does not support item assignment
13
range

•
range(a,b)
•
range(a,b,c) 
•
defaults: a=0, c=1
a, a+1, a+2, a+3, …
a, a+c, a+2c, a+3c,… < b
>>> for i in range(2,4):
print(i**2)
4
9
>>> for i in range(4):
print(i**2)
0
1
4
9
>>> for i in range(0,4,2):
print(i**2)
0
4
14
<b
(last number = b-1)
range (cont.)
•
How can we tell Python to "do something 100 times"?
for i in range(100):
...
•
15
Two easy ways to go over a sequence:
lst = [1, 2, 3.7, "abc"]
for elem in lst:
print(elem*2)
lst = [1, 2, 3.7, "abc"]
for i in range(len(lst)) :
print(i, lst[i]*2)
2
4
7.4
abcabc
0
1
2
3
2
4
7.4
abcabc
When we need
access to positions,
not only elements
•
•
•
•
•
16
set
{}
unordered (no indexing)
no repetitions
elements must be immutable (e.g. str, tuple, int, float)
support operations such as union(), intersection(), etc.
>>> s = {"a", "b", 5}
check these
>>> type(s)
yourselves
<class 'set'>
>>> s[0]
Traceback (most recent call last):
File "<pyshell#31>", line 1, in <module>
s[0]
TypeError: 'set' object does not support indexing
>>> s.union({4,5,6})
{'a', 'b', 4, 5, 6}
>>> s2 = {"a", "b", "c"}
>>> s3 = s.intersection(s2)
>>> s3
{'a', 'b'}
convert to a set: easy
>>> s
way to remove
{'a', 'b', 5}
repetitions
>>> set([1,2,3,1,1,2,3,2,1])
{1, 2, 3}
dict (dictionary)
•
•
•
•
like sets, but contain pairs of key:value
unordered
no repeating keys (values may repeat)
keys must be immutable
standard = {
'ttt': 'F',
'ttc': 'F',
'tta': 'L',
'ttg': 'L',
'ctt': 'L',
'ctc': 'L',
'cta': 'L',
'ctg': 'L',
'att': 'I',
'atc': 'I',
'ata': 'I',
'atg': 'M',
'gtt': 'V',
'gtc': 'V',
'gta': 'V',
'gtg': 'V',
17
'tct':
'tcc':
'tca':
'tcg':
'cct':
'ccc':
'cca':
'ccg':
'act':
'acc':
'aca':
'acg':
'gct':
'gcc':
'gca':
'gcg':
'S',
'S',
'S',
'S',
'P',
'P',
'P',
'P',
'T',
'T',
'T',
'T',
'A',
'A',
'A',
'A',
'tat':
'tac':
'taa':
'tag':
'cat':
'cac':
'caa':
'cag':
'aat':
'aac':
'aaa':
'aag':
'gat':
'gac':
'gaa':
'gag':
dictionaries map
keys to values
'Y',
'Y',
'*',
'*',
'H',
'H',
'Q',
'Q',
'N',
'N',
'K',
'K',
'D',
'D',
'E',
'E',
'tgt':
'tgc':
'tga':
'tgg':
'cgt':
'cgc':
'cga':
'cgg':
'agt':
'agc':
'aga':
'agg':
'ggt':
'ggc':
'gga':
'ggg':
'C',
'C',
'*',
'W',
'R',
'R',
'R',
'R',
'S',
'S',
'R',
'R',
'G',
'G',
'G',
'G'
}
dict (cont.)
•
•
[] is used to map keyvalue
support operations such as keys(), values(), etc.
standard = {
'ttt': 'F', 'tct': 'S', 'tat': 'Y', 'tgt': 'C',
'ttc': 'F', 'tcc': 'S', 'tac': 'Y', 'tgc': 'C',
'tta': 'L', 'tca': 'S', 'taa': '*', 'tga': '*',
...
>>> standard['tat']
'Y'
>>> standard['amir']
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
standard['amir']
KeyError: 'amir'
>>> 'tat' in standard
True
>>> standard['tat'] = ['Y','y']
>>> standard['tat']
['Y', 'y']
18
get the value of
key='tat'
Dictionaries - molecular weight
•
Another example: protein molecular weight
def molWeight(protseq):
protweight = {"A":89,"V":117,"L":131,"I":131,"P":115,"F":165,
"W":204,"M":149,"G":75,"S":105,"C":121,"T":119,
"Y":181,"N":132,"Q":146,"D":133,"E":147,
"K":146,"R":174,"H":155 }
totalW = 0
protseq = protseq.upper() #so proteins in lower case also OK
for aa in protseq:
if aa in protweight:
totalW = totalW + protweight[aa]
totalW = totalW-(18*(len(protseq)-1))
return totalW
19
Molecular weight – better style
•
Another example: protein molecular weight
def molWeight(protseq):
protweight = {"A":89,"V":117,"L":131,"I":131,"P":115,"F":165,
"W":204,"M":149,"G":75,"S":105,"C":121,"T":119,
"Y":181,"N":132,"Q":146,"D":133,"E":147,
"K":146,"R":174,"H":155 }
totalW = 0
protseq = protseq.upper() #so proteins in lower case also OK
for aa in protseq:
if aa in protweight:
totalW = totalW + protweight[aa]
H2O = 18 #weight of water molecule
totalW = totalW-(H2O*(len(protseq)-1))#reduce weight for amino bonds
return totalW
20
Files
•
•
Why do we want to know how to work with files?
•
easy way to save and read data
•
especially when data is too large to visualize on screen
basic file operations in python: open, readline, read, close
my_file = open("...")
path to file
my_file = open("...", 'w')
tip:
.\ for current directory
..\ for containing directory
tab to see what's in the directory
open to write
seq = my_file.readline() #read next line (until '\n') as str
seq = my_file.read()
my_file.close()
21
#read ALL lines of file
might be huge!
causion before printing
import
•
Some functionalities of Python require declaring the use of existing "libraries",
or modules.
•
for example: the modules random, time, math,…
>>>
>>>
'A'
>>>
'G'
>>>
>>>
import random
random.choice("ATCG")
random.choice("ATCG")
seq = ""
for i in range(10):
seq += random.choice("ATCG")
>>> seq
'AGCCACCCGA'
>>> random.random() #random number from [0,1)
0.7601425767073609
>>>
>>>
>>>
[2,
22
lst = [1,2,3,4]
random.shuffle(lst)
lst
1, 3, 4]
What to do after this class - reminder
One can go on and develop expertise in Python.
This is not our goal in this course.
But make sure you feel comfortable with what we've learned.
23
Next time…
A biological (metabolic) network
Metabolic and amino acid biosynthesis pathways of yeast
Schryer et al. BMC Systems Biology, no. 1. p. 81 (2011)
24
A graph (CS notion)