Download Introduction to Python

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Python – Essential characteristics
think Monty, not snakes!
Key Advantages:
• Open source & free (thank you Guido van Rossum!)
• Portable – works on Unix, Linux, Win32 & 64, MacOS etc.
• Easy to learn and logically consistent
• Lends itself to rapid development
• So, good for “quick and dirty” solutions & prototypes
• But also suitable for full fledged applications
• Hides many low-level aspects of computer architecture
• Elegant support of object-orientation and data structures
• Extensive library support – a strong standard library
• Dynamic “duck typing” paradigm is very flexible
• Language is minimalistic, only 31 keywords
Python – Essential characteristics
Some Disadvantages:
• It's not very fast (but often better than PERL!)
• Relatively inefficient for number crunching
• Can have high memory overhead
• Being “far from the metal” has disadvantages – systems
or kernal programming is impractical
• Dynamic typing can be both a blessing and a curse
• Some key libraries are still developing (e.g. BioPython)
• Version 3 breaks compatibility to prior versions
• Some find the whitespace conventions annoying
• Tends towards minimalism in favour of expressiveness
Becoming a Pythonista
Windows and MacOS X installers available at:
www.python.org/getit
Note that BNFO602 will be using version 2.73, not
more recent 3.xx distributions
Even if your machine supports 64 bit, a 32- bit
install is generally a safer choice for compatibility
Linux users may possibly need to download a
source tarball and compile themselves
A Python IDE for BNFO602
Windows, MacOS X, and Linux installers at:
www.jetbrains.com/pycharm
We are using the Free community edition
An IDE is an Integrated Development Environment
While not strictly required, IDEs ease and facilitate
the creation and management of larger programs.
IDLE is the built-in IDE and is another option
Python can also be run interactively.
Documents for Python
For version 2.X, official documentation and
tutorials are here:
docs.python.org/2
While a notable weakness of Python in the past,
the online documentation and tutorials for Python
are now quite good!
StackOverflow.com also has good information:
stackoverflow.com/tags/python/info
The Building Blocks of Python Hello World!
print "Hello World"
Function
No semicolon!
Argument
Keywords
Python 2.7 has only 31 keywords in the language.
It is minimalistic.
Hello World!
if True:
print "Hello"
print "World"
Does NOT use curly brackets
to delimit statement blocks!
Use colon after conditional statement
Statement Block
If statements are the sentences of Python, then
statement blocks are analogous to paragraphs.
Unlike PERL, python is somewhat fussy about how we use
whitespaces (spaces, tabs, line breaks).....
Statement blocks are nested
using whitespace
#Demo of nested blocks
Comments begin with #
print "Outer level"
Escape sequence for “tab”
if True:
(but no variable interpolation as w/ PERL)
print "\tMiddle level #1"
if True:
print "\t\tInner level"
print "\tMiddle level #2"
pass
Dummy statement
print "Outer level #2"
Whitespace delimits statement blocks!
Preferred practice is to use exactly four spaces
Don't use tabs unless your editor maps these to spaces!
Statement blocks can be nested
Output
Outer level
Middle level #1
Inner level
Middle level #2
Outer level #2
Yes, this is a trivial example. Note: scoping within
these simple blocks is a little different than PERL as
there is no “my” statement for local variables
Data Types in Python
Some basic data types
String delimiters
"Hello World!"
42
3.1459
2+6j
False, True
None
String
Integer
Floating point
Complex
Boolean
Null
Some types, like strings, are hard-coded and
cannot be directly changed! They are “immutable”
Data Types in Python
Some compound data types
delimiters
["A", "C", "G", "T"]
list
("A", “C", "G", "T")
tuple
{"A":"T", "C":"G",
"G":"C", "T":"A"}
dict
A tuple is essentially an immutable list
whereas a dict is like a PERL hash
Variables in Python
Variables in Python are NOT associated to a type
They are just identifiers that name some object
Identifiers begin with a letter or underscore
dna_sequence = "AGCTAGC"
seq_len = 9
symbols = ["A", "G", "C", "T"]
empty_dict = {}
symbols = {"A":"Adenine"}
Declaration and definition are usually coincident
Data Types and identifiers
A = [42, 32, 64]
print A
print "The answer is ", A[0]
Index notation
always uses
square brackets
even if a tuple
or a dict
Output
[42, 32, 64]
The answer is 42
Data types are actually implemented as a classes that
know how to print their own instance objects. Later
we'll see how to make our own classes and types
Operators, Operands & Expressions
operands
subexpression
var = 12 * 10
expression
operators
Expressions consist of valid combinations of
operands and operators, and a sub-expression can
act as an operand in an expression
Very similar to PERL, but some operators vary, especially for the
logical operators. Also string concatenation uses "+", not "."
Expressions
Expressions can use the result of a function
(or the result of a method of a class)
as an operand
foo
foo
foo
foo
=
=
=
=
somefunction(foo2)
somefunc(foo2) * foo3
somefunc(foo2) + somefunc2(foo3)
somefunc(somefunc2(foo2))
All of the above are possibly legal Python
expressions depending on the functions
Some Python Operators
Common operators
+
/
*
+
=
Addition
subtraction
division
multiplication
concatenation
assignment
4+2=6
4 – 2 = -2
4/2=2
4*2=8
"4" + "2" = "42"
Does NOT denote equivalence
Use == for testing equivalence!
Operators follow a strict order of operations: e.g. 2 + 7 * 2 = 16
See documentation for complete details
The Assignment Operator
Unlike in algebra, does not imply that both sides
of the equation are equal!
The following is a valid Python statement:
var = var + 1
does
samevalue
thing:
ThisThis
saysalso
“take
the the
current
of var and add
+= 1the result back in var”
one to it, var
then store
*=, -=, /=, all work the same way.
Incrementing and Decrementing
The following are functionally equivalent statements:
var = var + 1
var += 1
Increment by shown amount
Similarly:
var = var - 1;
var -= 1
But NOT:
var++, ++var or var--, --var
No PERL style autoincrement/decrement!
The Equivalence Operator
Python does have an equivalence operator
Print "Is 2 equal to 4:", 2 == 4
print "Is 2 equal to 2:", 2 == 2
equivalence operator
Output:
Is 2 equal to 4: False
Is 2 equal to 2: True
Python has a built-in
Boolean type!
0, Boolean False, None, empty lists, null strings,
and empty dicts are all evaluated as false
Comparison Operators
The equivalence operator is just one of
the comparison operators
==
<
>
<=
>=
!= or <>
equal to
less than
greater than
less than or equal to
greater than or equal to
not equal to
These are the comparison operators for everything
Use caution when testing floating point numbers, especially
for exact equivalence!
Flow Control – if, else and
conditional expressions
Comparison operators enable program flow control
dna = "GATCTCTT"
dna2 = "GATCTCCC"
Conditional expression
if dna == dna2:
print "Sequences identical:", dna
note the colon
else:
print "Sequences different"
Output:
Sequences different
Flow Control – if, else and
conditional expressions
Comparison operators at work #2
dna = "ATGCATC"
if dna:
print "Sequence defined"
else:
print "Sequence not defined"
Output:
Sequence defined
non-None, non-zero, non-False, & non-empty results are logically “true”
Flow Control – if, else and
conditional expressions
Comparison operators at work
dna = ""
if dna == "ATG":
print "Sequence is ATG start codon"
else:
print "Sequence not defined"
Output:
Sequence not defined
Remember, empty lists and null strings are logically equivalent to “false”
Multi-way branching using elif
dna = "ATG"
if dna == "GGG":
print "All Gs"
elif dna == "AAA":
print "All As"
Several elif blocks
elif dna == "TTT":
in a row is OK!
print "All Ts"
elif dna == "CCC":
print "All Cs"
else
print "Something else:", dna
Output:
Something else: ATG
Loops with the while statement
dna = "ATGCATC"
while dna == "ATGCATC":
Conditional expression
print "The sequence is still", dna
Output:will execute their
while statements
The sequence is still ATGCATC
statement
block
unless the
The sequence
is forever
still ATGCATC
The sequence
is still
ATGCATC false.
conditional
expression
becomes
The sequence is still ATGCATC
The sequence is still ATGCATC
Therefore
the variable
in the
The sequence
is stilltested
ATGCATC
The sequence
is still is
ATGCATC
conditional
expression
normally
The sequence is still ATGCATC
manipulated
within the statement block..
The sequence is still ATGCATC
etc…
Loops with the while statement
returns the length of a string
dna = "ATGCATGC"
while len(dna):
conditional expression
print "The sequence is:", dna
dna = dna[0:-1]
More on “slice notation” later when
print "done"
discussing lists. Here we remove the
last character of a string
Output:
The sequence
The sequence
The sequence
The sequence
The sequence
The sequence
The sequence
The sequence
done
is
is
is
is
is
is
is
is
ATGCATGC
ATGCATG
ATGCAT
ATGCA
ATGC
ATG
AT
A
Use break to simulate PERL until
dna = "A"
while True:
len is one of several built-in functions
if len(dna) > 3:
break
print "The sequence is:", dna
string concatenation and assignment
dna += "A"
print "done"
Output:
The sequence is A
The sequence is AA
The sequence is AAA
done
There is no native “do-while” or “until” in Python
Python is minimalistic
Loops with the for statement
nt_list = ("A", "C", "G", "T")
for nt in nt_list:
print "The nt is:", nt
Output:
The
The
The
The
sequence
sequence
sequence
sequence
is
is
is
is
A
C
G
T
for loops iterate over list-like (“iterable”) data types
and are similar to PERL foreach, not the PERL or C for
Loops with the for statement
nt = ("A", "C", "G", "T")
for index in range(len(dna)):
print "The nt is:", dna[index]
Caution! range in 2.x
instantiates an actual list.
Use xrange if iteration is big
Output:
The
The
The
The
sequence
sequence
sequence
sequence
is
is
is
is
A
C
G
T
for loops can have a definite number of iterations
typically using the range or xrange built-in function
Try this example with a string instead of a list!
Data Types in Python Strings
Strings are string-like iterables with a rich
collection of methods for their manipulation
dna = "ACGT"
Some useful methods are:
join, split, strip, upper, lower, count
dna = "ACGT"
dna2 = dna.lower()
# will give "acgt"
“attribute” notation! These are
methods specific to the string
type, not of general utility like
built-ins
Data Types in Python Strings
Strings are string-like iterables with a rich
collection of methods for their manipulation
dna = "ACGT"
Some useful methods are:
join, split, strip, upper, lower, count
dna = "AACGTA"
print dna.count(“A”)
# will give 3
Data Types in Python Lists
A list is simply a sequence of objects
enclosed in square brackets that we can iterate
through and access by index. They are array-like.
["A","G","C","T"]
Unlike PERL, pretty much anything can be put
into a list, including other lists!! Mirabile dictu!
[42,"groovy", dna, 3.14, var1-var2, ["A", "G", "C", "T"]]
Try printing item 5 from the above list….how does this differ from the
result you would get in PERL?
Data Types in Python lists
A list is a powerful type for manipulating lists:
bases = ["A","G","C","T"]
No “@” token to distinguish list variables!!
list elements can be accessed by an index:
index = 2
print bases[0], bases[index]
Output: AC
Note that first element is index 0
Assigning to a non-existent element raises an error exception
There is no PERL-style “autovivication” (although we can fake this)
Data Types in Python Lists
Lists also have rich collection of methods
Some useful methods are:
len, sort, reverse, in, max, min, count
Note that some are built-in functions while others
use attribute notation
pi = 3.14
my_list = ["ACGT", 0, pi]
min and max are built-ins
print min(list)
# will print 0
Data Types in Python Lists
Lists also have rich collection of methods
Some useful methods are:
len, sort, reverse, in, max, min, count
Note that some are built-in functions while others
use attribute notation
my_list = ["A", "C", "G", "T"]
my_list.reverse()
attribute notation
print my_list
# will print ["T", "G", "C", "A"]
Data Types in Python Lists
Lists also have rich collection of methods
Some useful methods are:
len, sort, reverse, in, max, min, count
my_list = ["A"] * 4 #init with 4 "A"s
print my_list.count("A") # prints 4
my_list.append("C")
if "C" in my_list:
print 'The list contained "C"\n'
testing for inclusion with in is a common operation with all iterable types
Lists and slice notation
Slices allow us to specify subarrays
bases = ["A","G","C","T"]
size = len(bases) # will be equal to four
var1, var2, var3, var4 = bases
#var1="A" & var2="G", etc.
Slice indices refer to the
space between elements!
subarray = bases[0:2] #subarray = ["A","G"]
subarray = bases[0:-1] #subarray = ["A","G","C"]
subarray = bases[1:] #subarray = ["G","C","T"]
subarray = bases[1:len(bases)] #subarray = ["G","C","T"]
Array “slices” can be assigned to a subarray
Lists modification and methods
Some useful list methods are:
append, insert, del, sort, remove, count,
reverse, etc.
bases = ["A","G","C"]
bases.append("T") # bases = ["A","G","C","T"]
bases.sort()
# bases = ["A","C","G","T"]
num_of_As = bases.count("A") # num_of_As = 1
bases[:0] = ["a","g","c","t"]
Slice notation can be used to modify a list!
Try this on the previously defined bases list and see what
happens
Data Types in Python dictionaries a.k.a. dicts
dicts are associative arrays similar to PERL hashes:
complement = {"A"
"C"
"G"
"T"
:
:
:
:
"T",
”G",
”C”,
”A”}
no PERL “%” token to distinguish hash identifiers!!
The left hand is the dict key and must be unique, “hashable”,
and “immutable” (this will become clearer later)
On right hand is the associated value. It can be almost
ANY type of object! Nice.
Working with Dicts
dicts are a preferred data type in Python
#A dict for complementing a DNA nucleotide
comp = {"A" : "T",
"C" : "G", "G" : "C",
"T" : "A"}
print "complement of A is:", comp["A"]
print "complement of C is:", comp["C”]
It’s easy to add new pairs to the hash:
Output:
comp["g"] = "c"
complement of A is: T
Or to delete pairs
complement
of in
C the
is:hash:
G
comp.del("g")
Other dict methods
Some useful dict methods are:
keys, values, items, del, in,
copy, etc.
#A hash for complementing a DNA nucleotide
comp = {"A" : "T",
"C" : "G", "G" : "C",
"T" : "A"}
print comp.keys()
["A","C”,"G","T"]
# might return..
No assertion is made as to order of key/value pairs!
Dicts are iterable
#Iterating over hashes
.items() returns a two-element
comp = {"A": "T",
tuple that is “unpacked” here
"C" : "G",
into k and v
"G" : "C",
iterate over both keys and
"T" : "A"}
values together!
for k, v in comp.items():
print 'complement of', k, 'is', v
Output
Or
output
could
could
be:be:
complement of C
A is G
T
The point is that dicts are unordered, and no
complement of A
C is T
G
guarantees are made!!
complement of T
G is A
C
complement of G
T is C
A
Tuples are essentially immutable lists
In most read-only contexts, they work just like lists
you just can't change their value
nucleotides = ("A", "C","G", "T")
tuples are delimited by ()
for NT in nucleotides:
print NT , "is a nucleotide symbol"
Packing and unpacking:
(one, two, three) = (1, 2, 3)
print one
# prints 1
Why Tuples?
The immutable nature of tuples means they do not need to
support all list operations. They can therefore be implemented
differently, are consequently more efficient for certain operations.
And only immutable objects can serve as hash keys
Sparse matrices
An example of tuples as dict keys
3
0
0
0
0
9
7
0
-2
0
0
0
0
0
0
-5
Standard multidimensional array:
matrix = [ [3,0,-2,0], [0,9,0,0], [0,7,0,0], [0,0,0,-5] ]
print matrix[0][2]
# This will print -2
# Not very memory efficient if there are many zero valued
# elements in a very large matrix!!!
Sparse matrix representation:
matrix = { (0,0): 3, (0,2): -2, (1,1): 9, (2,1):7, (3,3):-5 }
print matrix.get( (0,2), 0) # prints -2
# The get method here returns 0 if the key is undefined
# Much more memory efficient, since zero values not stored
Functions
Q: Why do we need Functions?
A: Because we are lazy! Functions are the
foundation of reusable code
Repeatedly typing out the code for a chore
that is used over and over again (or even
only a few times) would be a waste of time
and space, and makes the code hard to read
Functions in Python akin to subroutines in PERL
as well as procedures in some other languages
Functions
Defining a function
Minimally, all we need is a statement block of Python
code that we have named
def I_dont_do_much:
Capital letters OK
#any code you like!!
pass
return
A return value is optional,
None is default if value isn’t specified or
no explicit final return statement
Once defined, functions are called (“invoked”) just by
stating its name, and passing any required arguments:
I_dont_do_much()
Functions
Python has several flexible ways to pass arguments to
function. This example is just the most basic way!
Warning! Python passes objects to functions by reference, never by copy.
Changes to mutable objects in the function change the starting object!!
def expand_name (amino_acid):
No messing with @_
weirdness like in PERL
convert is local
convert = {"R" : "Arg",
"A" : "Ala", etc.} to the function
(i.e. in lexical scope)
if amino_acid in convert:
three_letter = convert[amino_acid]
else:
three_letter = "Ukn"
return three_letter
expand_name(“R”)
Note indentation – line is not part of function
definition, but rather is an invocation of the function
Output: Arg
Using external functions
Python includes many useful libraries
or, it can be code that you have written
In Python its easy to use functions (or indeed other variables or objects)
that are defined in some other file…
Option 1:
import module_name
# use the module name when calling the function..
# i.e. module_name.function(arg)
Option 2:
from module_name import name1, name2, name3
# imports just the names you want
# no need to refer to module name when calling
Option 3:
from module_name import *
# imports all of the public names in a module
Putting it all together An in-class challenge
Get Python up and running, try “Hello
world!” then…
Write a program that:
Defines a function that generates random DNA sequences
of some specified length given a dict describing the probability
distribution of A, C, G, T -- should be familiar from BNFO601
You’ll need the rand function from the math library!!
This is a real-world chore that is frequently
encountered in bioinformatics