Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Computational and Systems Biology
Introduction to
Programming
David Koes
TechBio
5/22/2013
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
2
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
Why learn programming?
3
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
Why learn programming?
...to be a programmer...
3
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
Why learn programming?
...to be a programmer...
...to solve hard problems that
require extensive computation...
3
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
Why learn programming?
...to be a programmer...
...to solve hard problems that
require extensive computation...
...to learn to think computationally.
3
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
Why learn programming?
It has often been said that a person does not
really understand something until he teaches
it to someone else. Actually a person does
not really understand something until after
teaching it to a computer, i.e., express it
as an algorithm.
Donald Knuth
4
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
5
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
5
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
Sequencing the Human Genome
Whole genome shotgun sequencing
Celera Genomics
3 years (1998 - 2001)
300 million dollars
Hierarchical Shotgun sequencing
Human Genome Project
13 years (1990 - 2003)
~3 billion dollars
6
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
Sequencing the Human Genome
Whole genome shotgun sequencing
Celera Genomics
3 years (1998 - 2001)
300 million dollars
Hierarchical Shotgun sequencing
Human Genome Project
13 years (1990 - 2003)
~3 billion dollars
Steve Jobs - $100,000
6
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
Sequencing the Human Genome
Whole genome shotgun sequencing
Celera Genomics
3 years (1998 - 2001)
300 million dollars
Hierarchical Shotgun sequencing
Human Genome Project
13 years (1990 - 2003)
~3 billion dollars
Steve Jobs - $100,000
6
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
An Introduction to Python
How to Think Like a Computer Scientist¶
http://www.openbookproject.net/thinkcs/python/english2e/
http://docs.python.org/2.7/tutorial/
Interpreted Language
print “Hi”
python
Compiled Language (C)
int main() {
printf(“Hi”);
}
gcc
*.exe
Windows
7
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
Variables and Basic Types
>>> message = "What's up, Doc?"
Strings
>>> reply = ‘Not much’
Integers
>>> n = 17
>>> pi = 3.14159
Floating point
>>> decide = True
Boolean
>>> empty = None
NoneType
Variable names must start with a letter and may contain
letters, numbers and ‘_’
>>> $n = 17
>>> 1n = 17
>>> this is not a variable name = 17
>>> this_is_a_long_variable_name = 17
8
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
Operators
Arithmetic
>>> print (2*6+3)/5-‐2
Exponentiation
>>> print 2**2,2**3,2**4
Modulus
>>> print 5 % 2
>>> print “hello” + “sir” String concatenation
>>> print not (3 >= 0)
>>> print (3 > 0) and (3 < 0)
Comparison and Logical
>>> print (3 == 0) or (3 < 0)
printf-style String Formatting
tuple of values
>>> print “%i of the %s are at %.2f%%” % (5, “widgets”,99.9)
insert
integer
insert
string
insert float with 2 decimal
places followed by a percent
9
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
Conditionals
>>> if x < 0:
...
print('Negative')
... elif x == 0:
...
print('Zero')
... else:
...
if x == 1:
...
print('Single')
...
else:
...
print('More')
Whitespace is significant!
Indentation must match; no mixing spaces and tabs
IndentationError: unindent does not
match any outer indentation level
10
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
Conditionals
>>> if x < 0:
...
print('Negative')
... elif x == 0:
...
print('Zero')
... else:
...
if x == 1:
...
print('Single')
...
else:
...
print('More')
Whitespace is significant!
Indentation must match; no mixing spaces and tabs
IndentationError: unindent does not
match any outer indentation level
10
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
Tuples and Lists
>>> t = 12345, 54321, 'hello!'
>>> t = (12345, 54321, 'hello!')
>>> print t[0]
>>> # Tuples are immutable:
... t[0] = 88888
TypeError
>>>
>>>
>>>
>>>
>>>
Tuples
squares = [1, 2, 4, 9, 16, 25]
print squares[0] # indexing returns the item
print squares[-1]
squares[0] = “not a square”
Lists
print squares
>>> squares.append(36)
>>> squares.sort()
Built-in methods
11
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
Dictionaries
>>> eng2sp = {}
>>> eng2sp['one'] = 'uno'
>>> eng2sp['two'] = 'dos'
Mapping type
>>> print eng2sp
>>> del eng2sp['two']
>>> print eng2sp
>>> eng2sp = {'one': 'uno', 'two': 'dos'}
>>> print eng2sp.keys()
>>> print eng2sp.values()
>>> print eng2sp.items()
>>> print eng2sp.has_key('three')
Built-in methods
12
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
Iteration
For Loop
>>> words = ['cat', 'window', 'defenestrate']
>>> for w in words:
Sequence type: list, tuple, string...
...
print w
While Loop
For Loop Revisited
>>> i = 0
>>> while i < 100:
...
print i
...
i = i+1
>>> for i in range(100):
...
print i
Range
>>> print range(0, 10, 3)
13
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
Functions
Function Definition
>>> def info(name, color=“blue”, age=34):
...
print name, color, age
>>> def distance(x1, y1, x2, y2):
...
dx = x2 -‐ x1
...
dy = y2 -‐ y1
...
return (dx**2 + dy**2)**0.5
Function Invocation
>>>
>>>
>>>
>>>
>>>
info(“Spunky”)
info(“Spunky”, "red”)
info(“Spunky”,age=17)
info(name=“Spunky”,age=17,color=“purple”)
info(age=17,color=“purple”)
14
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
Built-in Functions
http://docs.python.org/2/library/functions.html
15
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
Classes
class Bag:
Constructor - initialize
def __init__(self):
data members
self.data = []
def add(self, x):
self.data.append(x)
Methods
def addtwice(self, x):
self.add(x)
self.add(x)
x = Bag() Class instantiation - creates object, calls __init__
x.add(3)
Method invocation - x passed as self
x.addtwice(4)
print x.data
Bag.addtwice = Bag.add
Class definition is dynamic
x.addtwice(5)
print x.data
16
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
Modules
import
import
import
import
import
sys, os
math
re
numpy
pymol
print sys.argv
os.remove("bad_data.txt")
print math.pi
m = re.match(r"(\w+) (\w+)", "Isaac Newton")
numpy.linalg.eigvals(M)
17
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
PyMol
Open-source molecular visualization
Fully scriptable with python
18
University of Pittsburgh
Computational and Systems Biology
Department of Computational Biology
Project
Count and identify the hydrogen bonds at a
protein-protein interface
simple h-bond definition: O and N within 4Å
pymol commands:
cmd.get_model() Returns model in pymol viewer
List of all the atoms of the model
model.atom
Atomic symbol of atom
atom.symbol
Protein chain of atom
atom.chain
x,y,z sequence of atom coordinates
atom.coord
index of atom
atom.index
cmd.distance(name,sel1, sel2)
ex: cmd.distance(“bonds”, “index 1”, “index 2”)
19