Download Bio.Seq.Seq

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Restriction enzyme wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Promoter (genetics) wikipedia , lookup

RNA wikipedia , lookup

Genetic code wikipedia , lookup

Genomic library wikipedia , lookup

SNP genotyping wikipedia , lookup

Nucleosome wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Gene wikipedia , lookup

Transformation (genetics) wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Molecular cloning wikipedia , lookup

Gene expression wikipedia , lookup

Biosynthesis wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

DNA supercoil wikipedia , lookup

Community fingerprinting wikipedia , lookup

Non-coding DNA wikipedia , lookup

Point mutation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

RNA-Seq wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Transcript
Introduction to Biopython
I
“The Biopython project is a mature open source international
collaboration of volunteer developers, providing Python
libraries for a wide range of bioinformatics problems.” —
Biopython paper
I
Available for Linux, Windows and MacOS X, either as a
download or via package manager (e.g. apt-get install
python-biopython on Ubuntu)
I
Excellent tutorial on Biopython website
The Biopython Seq object
I
A nucleotide or amino acid sequence in Biopython is modelled
using the Bio.Seq.Seq class: Bio.Seq is the package name and Seq is
the package name
I
The optional alphabet parameter constrains the operations that
can be performed on the sequence to ones appropriate for the
sequence type
I
In some respects the Seq object operates like a string with
some string-like methods
seq1.py
import Bio.Seq
from Bio.Alphabet import IUPAC
myseq = Bio.Seq.Seq(’GATTACA’,
alphabet=IUPAC.unambiguous_dna)
subseq = myseq[:3]
print subseq # prints GAT
rc_seq = myseq.reverse_complement()
print rc_seq
print rc_seq.lower()
More on Seq objects
I
Like strings, Bio.Seq.Seq objects are immutable (so each method
returns a new object)
I
Operators and methods:
mystring in myseq
myseq.count(subseq)
myseq.complement()
myseq.reverse_complement()
True if mystring is contained in myseq (the
query can be either a Seq object or a string)
Returns the number of occurences of subseq
in myseq
Returns the complementary DNA (or
RNA) sequence
Returns the reversed complement of the
sequence
Bio.Seq.Seq operators and methods (cont)
Return a RNA sequence equivalent to the
DNA sequence
myseq.back_transcribe()
Returns a RNA sequence equivalent to the
DNA sequence
myseq.translate()
Translates DNA or RNA into protein
myseq.ungap()
Returns a Seq object with gap characters removed.
See the documentation for details.
myseq.transcribe()
Sequences and Alphabets
I
Defining a sequence’s alphabet allows for sanity checking
operations on sequences, e.g.
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
dna = Seq(’GATACCAGC’, IUPAC.unambiguous_dna)
protein = Seq(’DTS’, IUPAC.protein)
seq = dna + protein
I
This will result in a
I
This sanity checking has limits though, and the following code
will work:
TypeError
exception
seq = Seq(’ThisIsNotANucleotideSeq’, IUPAC.unambiguous_dna)
More about Alphabets
I
The Bio.Alphabet class and its sub-classes set up the types of
alphabet in Biopython
I
Standard alphabets are defined in
IUPAC.unambiguous dna
IUPAC.unambiguous rna
IUPAC.ambiguous dna
IUPAC.ambiguous rna
IUPAC.extended dna
IUPAC.protein
IUPAC.extended protein
Bio.Alphabet.IUPAC
and include:
DNA as A,C,T,G
RNA as A,C,U,G
DNA include all ambiguous base
characters: R,Y,W,S,M,K,H,B,V,D
and N
RNA including the ambiguous base
characters
DNA including the non-standard
bases B, D, S and W
Standard amino acid alphabet
Amino acids including rare or nonstandard ones
Even more on Alphabets
I
Each alphabet object has a
of letters:
letters
attribute containing its list
from Bio.Alphabet import IUPAC
alpha = IUPAC.unambiguous_dna
print alpha.letters
I
We can use a
alphabet:
Seq
object’s alphabet attribute to get or set its
myseq = Bio.Seq.Seq(’GATAAC’, alphabet=IUPAC.unambiguous_dna)
myalpha = myseq.alphabet
myseq.alphabet = IUPAC.protein
I
We can create a gapped alphabet from any ungapped
alphabet object:
import Bio.Alphabet
alpha = IUPAC.protein
gapped_alpha = Bio.Alphabet.Gapped(alpha)
print gapped_alpha.gap_char
A note on comparing Seq objects
I
You cannot use the == operator to compare
I
This is because == would compare to see if the two variables
refer to the same sequence objects
I
Compare the sequence string instead:
myseq1 = Seq(’GAT’)
myseq2 = Seq(’GAT’)
# don’t do this:
if myseq1 == myseq2:
print "The same sequence!"
# do this
if str(myseq1) == str(myseq2):
print "The same sequence!"
Bio.Seq.Seq
objects
MutableSeq objects
I
Bio.Seq.MutableSeq
objects are like
Bio.Seq.Seq
objects, except they
are mutable
I
Because they are mutable, methods that operate them (e.g.
complement()) operate “in place” and do not return a value
I
The tomutable() method of the Bio.Seq.Seq class returns a
mutable sequence, and similarly the toseq() method of the
Bio.Seq.MutableSeq class returns an immutable sequence
from Bio.Seq import Seq, MutableSeq
seq1 = Seq(’GATAACA’)
mut_seq = seq1.tomutable()
mut_seq[3] = ’T’
rc_seq = seq1.reverse_complement()
mut_seq.reverse_complement() # in place transform
mut_seq2 = MutableSeq(’GATAACA’)
seq2 = mut_seq2.toseq()