Download Python_1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Roadmap
The topics:

Homework due
 basic concepts of molecular biology
 Elements of Python










Where to get Python?
If you want to run your python programs on your
own machine, download Python interpreter
from different places:
http://www.activestate.com/activepython/
or
http://www.python.org/download/
Python’s data types
Python’
Python functions
Python Control of Flow
Python regex
overview of the field
biological databases and database searching
sequence alignments
phylogenetics
structure prediction
microarray & next gen
Running Python

Python is one of the best scripting languages. It
is often being used in texttext-based command shell.
You can find the links at the class website.
website.
Download IEP as your IDE
There are many good Python IDEs. I have the links to some
of them at the class website.
website. I’ll use IEP in class:
Download IEP at:
http://www.iep--project.org/downloads.html
http://www.iep
I downloaded :
“iep
iep--3.2.win32.exe - Windows installer”
Use IEP

You’ll see an icon like the following on your
desktop.
Start your IEP.
Ctrl S: save
Ctrl E: execution
Drawback:
Won’t be able to pass
arguments to script
1
A Taste of Python: at prompt
Programming Python
for
Bioinformatics
Part I

Type statements or expressions at prompt:
>>> print "Hello, world"
Hello, world
>>> x = 12**2
>>> x/2
72
>>> # this is a comment
A Taste of Python: print a message

demo1.py: Greet the entire world.

(where to find python)
#greet the entire world
print “Hello world!”;
- a comment
- variable assignment statement
print “All”, x, “of you!”;
}
- function calls
(output statements)
Assignment
Overview






demo2.py: parsing email addresses
-command interpretation header
#!/usr/bin/python
x = 7e9;
A Taste of Python: scripting
Assignment & Names
Data types
Sequences types: Lists, Tuples, and Strings
Mutability
Understanding Reference Semantics in
Python

Assignment uses = and comparison uses ==
The first assignment to a variable creates it





Dynamic typing: no declarations, names don’
don’t have types,
objects do
For numbers + - * / % are as expected.
 Use of + for string concatenation.

Use of % for string formatting (like printf in C)

Block structure indicated by indentation
Logical operators are words (and,or,not
(and,or,not))
not symbols
The basic printing command is print
Indentation matters to meaning the code
2
Naming Rules

Names are case sensitive and cannot start with a
number. They can contain letters, numbers, and
underscores.
bob

Naming conventions
Bob
_bob
_2_bob_
There are some reserved words:
bob_2
BoB
and, assert, break, class, continue,
def, del, elif,
elif, else, except, exec,
finally, for, from, global, if, import,
in, is, lambda, not, or, pass, print,
raise, return, try, while
The Python community has these recommended naming
conventions
• joined_lower for functions, methods and, attributes
• joined_lower or ALL_CAPS for constants
• StudlyCaps for classes
pre--existing conventions
• camelCase only to conform to pre
• Attributes: interface, _internal, __private
Whitespace
Whitespace is meaningful in Python, especially
indentation and placement of newlines
• Use a newline to end a line of code
Use \ when must go to next line prematurely
braces { } to mark blocks of code, use consistent
indentation instead
• No
•
•
First line with less indentation is outside of the block
First line with more indentation starts a nested block
• Colons
start of a new block in many constructs, e.g.
e.g.
function definitions, then clauses
Comments
comments with #, rest of line is ignored
include a “documentation string”
string” as the first line
of a new function or class you define
 Development environments, debugger, and other
tools use it: it’
it’s good style to include one
 Start
 Can
def fact(n):
fact(n) assumes n is a positive
“““fact(n)
“““
integer and returns factorial of n.”””
n.”””
assert(n>=0)
return 1 if n==0 else n*fact(nn*fact(n-1)
Python’s builtPython’
built-in type hierarchy
Python’ss data types
Python’
3
Basic Datatypes
Numbers
Integers (default for numbers)

z = 5 / 2
# Answer 2, integer division
Numbers

Floats

Can use ""…"
…" or '…' to specify, "foo" == 'foo'
Unmatched can occur within the string
John’’s” or ‘John said “foo!
foo!””.’
“John
Use triple doubledouble-quotes for multimulti-line strings or strings
than contain both ‘ and “ inside of them:
a‘b“c”””
“““a
“““


Operators
add
subract
multiply
divide
modulus/remainder
Relational operators
<
<=
Floating--point – represent numbers with decimal places
Floating

Octal and hexadecimal numbers

Complex numbers
Ex: 3+4j, 3.0+4.0j, 3J
Example
y=5; z=3
x=y+z x=
x=y–z x=
x=y*z x=
x=y/z x=
x=y%z x=
Python Basics – arithmetic
operations
Operators
equal
not equal
greater than
greater than or
equal
less than
less than or equal

Example
y=5; z=3
8
2
15
1
2
<< shift left
>> shift right
| bitwise or
^ bitwise exclusive or
& bitwise and
** raise to power
x = y << 1
x = y >> 2
x=y|z
x=y^z
x=y&z
x = y ** z
x = 10, y=5
x=1
x=7
x=6
x=1
x = 125
Python Basics – Relational
Operators

==
!=, <>
>
>=

Ex: O177, 0x9ff, Oxff
Python Basics – Relational and
Logical Operators

Long Integers – unlimited size
Ex: 1.2, 3.14159,3.14e3.14159,3.14e-10
Python Basics – arithmetic
operations
+
*
/
%

Ex: 9999999999999999999999L
Strings

Normal Integers –represent whole numbers
Ex: 3, -7, 123, 76
x = 3.456


Assume x = 1, y = 4, z = 14
Logical operators
and
or
not
and
or
not
Expression
Value
Interpretation
x<y+z
1
True
y == 2 * x + 3
0
False
z <= x + y
0
False
z>x
1
True
x != y
1
True
4
Python Basics – Logical
Operators

Assume x = 1, y = 4, z = 14
Expression
Value
Interpretation
x<=1 and y==3
0
False
x<= 1 or y==3
1
True
not (x > 1)
1
True
(not x) > 1
0
False
not (x<=1 or y==3)
0
False
Three sequence types:
Tuples,, Lists, and
Tuples
Strings
Sequence Types
are containers that hold objects
ordered, indexed by integers
Tuple:: (1, “a”, [100], “foo
foo””)
 Tuple
 Sequences
 Finite,
•
•
An immutable ordered sequence of items
Items can be of mixed types, including collection types
String::
 String
•
•
•


“foo bar
bar””
An immutable ordered sequence of chars
Conceptually very much like a tuple
 List
List::
Similar Syntax
[“one
one””, “two
two””, 3]
A Mutable ordered sequence of items of mixed types

All three sequence types (tuples, strings, and
lists) share much of the same syntax and
functionality.
Key difference:
 Tuples and strings are immutable
 Lists are mutable
The operations shown in this section can be
applied to all sequence types
 most examples will just show the operation
performed on one
Sequence Types - 1

Define tuples using parentheses and commas

Define lists are using square brackets and commas

Define strings using quotes (“
(“, ‘, or “””
“””).
).
Sequence Types - 2

Access individual members of a tuple, list, or string
array”” notation
using square bracket “array
Note that all are 0 based…
>>> tu = (23, ‘abc’, 4.56, (2,3), ‘def’)

>>> li = [“abc”,
[“abc”, 34, 4.34, 23]
>>> tu = (23, ‘abc’, 4.56, (2,3), ‘def’)
>>> tu[1]
# Second item in the tuple.
abc’
‘ abc
’
>>> st = “Hello World”
>>> st = ‘Hello World’
>>> st = “””This
“””This is a multimulti-line
string that uses triple quotes.”””
quotes.”””
>>> li = [“
[“abc”, 34, 4.34, 23]
>>> li[1]
# Second item in the list.
34
World”
>>> st = “ Hello World”
>>> st[1]
# Second character in string.
‘e’
5
Positive and negative indices
Slicing: Return Copy of a Subset
>>> t = (23, ‘abc’
abc’, 4.56, (2,3), ‘def’
def’)
>>> t = (23, ‘abc
abc’’, 4.56, (2,3), ‘def
def’’)
Returns copy of container with subset of original
members. Start copying at first index, and stop
copying before the second index
>>> t[1:4]
(‘abc’
abc’, 4.56, (2,3))
You can also use negative indices
>>> t[1:t[1:-1]
(‘abc’
abc’, 4.56, (2,3))
Positive index: count from the left, starting with 0
>>> t[1]
abc’’
‘abc
Negative index: count from right, starting with –1
>>> t[t[-3]
4.56
Slicing: Return Copy of a Subset
>>> t = (23, ‘abc
abc’’, 4.56, (2,3), ‘def
def’’)
Omit first index to make a copy starting from the
beginning of container
>>> t[:2]
(23, ‘abc
abc’’)
Omit second index to make a copy starting at 1st index
and going to end of the container
>>> t[2:]
(4.56, (2,3), ‘def
def’’)
Copying the Whole Sequence




(23, ‘abc’, 4.56, (2,3), ‘def’)
Note the difference between these two lines for
mutable sequences
>>> l2 = l1 # Both refer to same ref,
# changing one affects both
>>> l2 = l1[:] # Independent copies, 2 refs
+ Operator is Concatenation
The ‘in
in’’ Operator

[ : ] makes a copy of an entire sequence
>>> t[:]
Boolean test whether a value is inside a container:
>>> t
>>> 3
False
>>> 4
True
>>> 4
False
= [1, 2, 4, 5]
in t
in t
not in t
For strings, tests for substrings
>>> 'TATA' in 'TATATATATATATATATATATATA'
True
>>> 'ATG' in 'TATATATATATATATATATATATA'
False
>>> 'AA' not in 'TATATATATATATATATATATATA'
True
Careful: the in keyword is also used in the syntax of
for loops and list comprehensions

The + operator produces a new tuple, list, or string
whose value is the concatenation of its arguments.
>>> (1, 2, 3) + (4, 5, 6)
(1, 2, 3, 4, 5, 6)
>>> [1, 2, 3] + [4, 5, 6]
[1, 2, 3, 4, 5, 6]
>>> 'ACCTGAGAGCT' + 8*'A'
'ACCTGAGAGCTAAAAAAAA'
6
Other String operations
Expression
Value
Purpose
len(mystring)
11
number of characters in mystring
“%s world”%“hello”
“hello world”
“hello”+“world”
“world” == “hello”
“world” == ‘world’
“a” < “b”
“b” < “a”
“helloworld”
Concatenate strings
0 or False
1 or True
Test for equality
1 or True
0 or False
Format strings (like sprintf)
Alphabetical ordering

count.py:
Exercise
dna =“ATGaCGgaTCAGCCGcAAtACataCACTgttca"
GC content?



dna = "
"ATGaCGgaTCAGCCGcAAtACataCACTgttca
ATGaCGgaTCAGCCGcAAtACataCACTgttca"
"
dna1 = dna.upper
dna.upper()
()
(dna1.count('G') + dna1.count('C')) / len(dna1)
len(dna1)
Many useful built-in functions
>>> mystring = 'ACCTGAGAGCT'
mystring.upper()
'ACCTGAGAGCT'
>>> mystring.replace('GC', 'CG')
'ACCTGAGACGT'
>>> set(mystring)
set(['A', 'C', 'T', 'G'])

transcribe.py:
Exercise
dna =“ATGaCGgaTCAGCCGcAAtACataCACTgttca"
rna = ???;









dna = "
"ATGaCGgaTCAGCCGcAAtACataCACTgttca
ATGaCGgaTCAGCCGcAAtACataCACTgttca"
"
rna = dna.upper()
dna.upper()
rna1 = rna.replace
rna.replace('A',
('A', 'a')
rna = rna1.replace('T', 'A')
rna1 = rna.replace
rna.replace('C',
('C', 'c')
rna = rna1.replace('G', 'C')
rna1 = rna.replace
rna.replace('a',
('a', 'U')
rna=
rna
= rna1.replace('c', 'G')
rna[::-1] # reverse rna
Mutability:
Tuples vs. Lists
Lists are mutable
>>> li = [‘
[‘abc’, 23, 4.34, 23]
>>> li[1] = 45
>>> li
[‘abc
abc’
’, 45, 4.34, 23]


We can change lists in place.
Name li still points to the same memory
reference when we’
we’re done.
Tuples are immutable
>>> t = (23, ‘abc
abc’,
’, 4.56, (2,3), ‘def’)
>>> t[2] = 3.14
File "C:
"C:\
\Users
Users\
\duan
duan\
\Desktop\
Desktop\CS445
CS445\
\demos
demos\
\Ch0
Ch0\
\tmp.py", line 1
t = (23, ‘abc
‘abc’,
’, 4.56, (2,3), ‘def’)
^
SyntaxError:
SyntaxError
: invalid syntax
You
can’’t change a tuple.
can
tuple.
You can make a fresh tuple and assign its reference to
a previously used name.
>>> t = (23, ‘abc’,
abc’, 3.14, (2,3), ‘def’)
Immutability
of tuples  they aare
re faster than lists
7
Tuple details
 The
comma is the tuple creation operator, not parens
Tuples vs. Lists

 Python
shows parens for clarity (best practice)

>>> (1,)
(1,)
 Don't
forget the comma for singletons!

tuples have a special syntactic form
>>> ()
()
>>> tuple()
()

Lists can be modified and they have many handy operations
and methods
Tuples are immutable & have fewer features

>>> (1)
1
 Empty
Lists slower but more powerful than tuples

>>> 1,
(1,)
Sometimes an immutable collection is required (e.g., as a hash
key)
Tuples used for multiple return values and parallel
assignments
x,y,z = 100,200,300
old,new = new,old
Convert tuples and lists using list() and tuple():
tuple():
mylst = list(mytup
list(mytup);
); mytup = tuple
tuple((mylst)
mylst)
Build--in functions vs. methods
Build
 Operations
can be functions or methods
Remember
that (almost) everything is an object
You just have to learn (and remember or lookup)
which operations are functions, which are methods
len() is a function on collections that
returns the number of things they
contain
index() is a method on collections that
returns the index of the 1st occurrence
of its arg
>>> ['a’,'b’,'c'].index('a')
0
>>> ('a','b','c').index('b')
1
>>> "abc".index('c')
2
>>> len(['a', 'b', 'c'])
3
>>> len(('a','b','c'))
3
>>> len("abc")
3
Lists methods
 Lists
have many methods, including index, count,
append, remove, reverse, sort, etc.
 Many of these modify the list
>>> l = [1,3,4]
>>> l.append(0)
# adds a new element to the end of the list
>>> l
[1, 3, 4, 0]
>>> l.insert(1,200) # insert 200 just before index position 1
>>> l
[1, 200, 3, 4, 0]
>>> l.reverse()
# reverse the list in place
>>> l
[0, 4, 3, 200, 1]
>>> l.sort()
# sort the elements. Optional arguments can give
>>> l
# the sorting function and direction
[0, 1, 3, 4, 200]
>>> l.remove(3)
# remove first occurence of element from list
>>> l
[0, 1, 4, 200]
Exercise

A valid DNA sequence?
dna =“ATGaCGgaTDCUAGCCPGcAAtACataCACTngttca"
Python dicts
and sets
8
Dictionaries: A Mapping type
Overview


Python doesn’
doesn’t have traditional vectors and
arrays!
Instead, Python makes heavy use of the dict
datatype (a hashtable) which can serve as a
sparse array


Efficient traditional arrays are available as
modules that interface to C
A Python set is derived from a dict



Updating Dictionaries
Creating & accessing dictionaries
>>> d = {‘
{‘user
user’
bozo’
pswd’
’:‘bozo
’, ‘pswd
’:1234}
’]
>>> d[‘
d[‘user
user’
bozo’’
‘bozo
>>> d[‘
d[‘pswd
pswd’
’]
1234
’]
bozo’
>>> d[‘
d[‘bozo
Traceback (innermost last):
File ‘<interactive input>’
input>’ line 1, in
?
KeyError: bozo
Removing dictionary entries
>>> d = {‘
{‘user
user’’:‘bozo
bozo’’, ‘p’:1234, ‘i’:34}
>>> del d[‘
d[‘user
user’’] # Remove one.
>>> d
{‘p’:1234, ‘i’:34}
>>> d.clear()
# Remove all.
>>> d
{}
>>> a=[1,2]
>>> del a[1]
# del works on lists, too
>>> a
[1]
Dictionaries store a mapping between a set of keys
and a set of values
 Keys can be any immutable type.
 Values can be any type
 A single dictionary can store values of different
types
You can define, modify, view, lookup or delete the
key--value pairs in the dictionary
key
Python’’s dictionaries are also known as hash tables
Python
and associative arrays




>>> d = {‘
{‘user
user’
bozo’
pswd’
’:‘bozo
’, ‘pswd
’:1234}
’] = ‘clown
’
user’
clown’
>>> d[‘
d[‘user
>>> d
{‘user
’:‘clown
’, ‘pswd
’:1234}
user’
clown’
pswd’
Keys must be unique
Assigning to an existing key replaces its value
>>> d[‘
d[‘id
id’
’] = 45
>>> d
user’
clown’
id’
pswd’
{‘user
’:‘clown
’, ‘id
’:45, ‘pswd
’:1234}
Dictionaries are unordered
 New entries can appear anywhere in output
Dictionaries work by hashing
Useful Accessor Methods
‘user
’:‘bozo
’, ‘p’:1234, ‘i’:34}
>>> d = {
{‘
user’
bozo’
>>> d.keys() # List of keys, VERY useful
[‘user
user’
’, ‘p’, ‘i’]
>>> d.values() # List of values
’, 1234, 34]
bozo’
[‘bozo
>>> d.items() # List of item tuples
user’
bozo’
(‘p’,1234), (‘
(‘i’,34)]
‘user
’,‘bozo
’), (‘
[(‘
[(
9
A Dictionary Example
Dictionary example: wf1.py
Problem: count the frequency of each word in text read
from the standard input, print results
Six versions of increasing complexity
wf1.py is a simple start
wf2.py uses a common idiom for default values
wf3.py sorts the output alphabetically
wf4.py downcase and strip punctuation from words
and ignore stop words
wf5.py sort output by frequency
wf6.py add command line options: -n, -t, -h
#!/usr/bin/python
import sys
freq = {}
# frequency of words in text
for line in sys.stdin:
for word in line.split():
if word in freq:
freq[word] = 1 + freq[word]
else:
freq[word] = 1
print freq
Dictionary example wf1.py
Dictionary example wf2.py
#!/usr/bin/python
import sys
freq = {}
# frequency of words in text
for line in sys.stdin:
This is a common pattern
for word in line.split():
if word in freq:
freq[word] = 1 + freq[word]
else:
freq[word] = 1
print freq
Dictionary example wf3.py
#!/usr/bin/python
import sys
freq = {}
# frequency of words in text
for line in sys.stdin:
for word in line.split():
freq[word] = freq.get(word,0)
for w in sorted(freq.keys()):
print w, freq[w]
#!/usr/bin/python
import sys
freq = {}
# frequency of words in text
for line in sys.stdin:
for word in line.split():
freq[word] = 1 + freq.get(word, 0)
print freq
key
Default value if not
found
Dictionary example wf4.py
#!/usr/bin/python
import sys
punctuation = """'!"#$%&\
"""'!"#$%&\'()*+,'()*+,./:;<=>?@[\
./:;<=>?@[
\\]^_`{|}~'"""
freq = {}
text
# frequency of words in
stop_words = set()
for line in open("stop_words.txt"):
stop_words.add(line.strip())
10
Dictionary example wf4.py
Dictionary example wf5.py
#!/usr/bin/python
import sys
from operator import itemgetter
…
words = sorted(freq.items(),
key=itemgetter(1), reverse=True)
for line in sys.stdin:
for word in line.split():
word = word.strip(punct).lower()
if word not in stop_words:
freq[word] = freq.get(word,0)+1
# print sorted words and their frequencies
for w in sorted(freq.keys()):
print w, freq[w]
for (w,f) in words:
print w, f
Dictionary example wf6.py
from optparse import OptionParser
# read command line arguments and process
parser = OptionParser()
parser.add_option('parser.add_option('
-n', '-'--number',
number', type="int",
default=default=
-1, help='number of words to report')
parser.add_option("parser.add_option("
-t", "-"--threshold",
threshold", type="int",
default=0, help=”
help=”print if frequency > threshold")
(options, args) = parser.parse_args()
...
# print the top option.number words but only those
# with freq>option.threshold
for (word, freq) in words[:options.number]:
if freq > options.threshold:
print freq, word
Why must keys be immutable?

>>> name1, name2 = 'john', ['bob', 'marley']
>>> fav = name2
>>> d = {name1: 'alive', name2: 'dead'}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list objects are unhashable




defaultdict
The keys used in a dictionary must be immutable
objects?
Why is this?
Suppose we could index a value for name2
and then did fav[0] = “Bobby
Bobby””
Could we find d[name2] or d[fav] or …?
Project 1
>>> from collections import defaultdict
>>> kids = defaultdict(list, {'alice': ['mary', 'nick'], 'bob': ['oscar', 'peggy']})
>>> kids['bob']
['oscar', 'peggy']
>>> kids['carol']
[]
>>> age = defaultdict(int)
>>> age['alice'] = 30
>>> age['bob']
0
>>> age['bob'] += 1
>>> age
defaultdict(<type 'int'>, {'bob': 1, 'alice': 30})
11