* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Bio.Seq.Seq
Restriction enzyme wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Genetic code wikipedia , lookup
Genomic library wikipedia , lookup
SNP genotyping wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Transformation (genetics) wikipedia , lookup
Gel electrophoresis of nucleic acids wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Molecular cloning wikipedia , lookup
Gene expression wikipedia , lookup
Biosynthesis wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
DNA supercoil wikipedia , lookup
Community fingerprinting wikipedia , lookup
Non-coding DNA wikipedia , lookup
Point mutation wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Introduction to Biopython I “The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems.” — Biopython paper I Available for Linux, Windows and MacOS X, either as a download or via package manager (e.g. apt-get install python-biopython on Ubuntu) I Excellent tutorial on Biopython website The Biopython Seq object I A nucleotide or amino acid sequence in Biopython is modelled using the Bio.Seq.Seq class: Bio.Seq is the package name and Seq is the package name I The optional alphabet parameter constrains the operations that can be performed on the sequence to ones appropriate for the sequence type I In some respects the Seq object operates like a string with some string-like methods seq1.py import Bio.Seq from Bio.Alphabet import IUPAC myseq = Bio.Seq.Seq(’GATTACA’, alphabet=IUPAC.unambiguous_dna) subseq = myseq[:3] print subseq # prints GAT rc_seq = myseq.reverse_complement() print rc_seq print rc_seq.lower() More on Seq objects I Like strings, Bio.Seq.Seq objects are immutable (so each method returns a new object) I Operators and methods: mystring in myseq myseq.count(subseq) myseq.complement() myseq.reverse_complement() True if mystring is contained in myseq (the query can be either a Seq object or a string) Returns the number of occurences of subseq in myseq Returns the complementary DNA (or RNA) sequence Returns the reversed complement of the sequence Bio.Seq.Seq operators and methods (cont) Return a RNA sequence equivalent to the DNA sequence myseq.back_transcribe() Returns a RNA sequence equivalent to the DNA sequence myseq.translate() Translates DNA or RNA into protein myseq.ungap() Returns a Seq object with gap characters removed. See the documentation for details. myseq.transcribe() Sequences and Alphabets I Defining a sequence’s alphabet allows for sanity checking operations on sequences, e.g. from Bio.Seq import Seq from Bio.Alphabet import IUPAC dna = Seq(’GATACCAGC’, IUPAC.unambiguous_dna) protein = Seq(’DTS’, IUPAC.protein) seq = dna + protein I This will result in a I This sanity checking has limits though, and the following code will work: TypeError exception seq = Seq(’ThisIsNotANucleotideSeq’, IUPAC.unambiguous_dna) More about Alphabets I The Bio.Alphabet class and its sub-classes set up the types of alphabet in Biopython I Standard alphabets are defined in IUPAC.unambiguous dna IUPAC.unambiguous rna IUPAC.ambiguous dna IUPAC.ambiguous rna IUPAC.extended dna IUPAC.protein IUPAC.extended protein Bio.Alphabet.IUPAC and include: DNA as A,C,T,G RNA as A,C,U,G DNA include all ambiguous base characters: R,Y,W,S,M,K,H,B,V,D and N RNA including the ambiguous base characters DNA including the non-standard bases B, D, S and W Standard amino acid alphabet Amino acids including rare or nonstandard ones Even more on Alphabets I Each alphabet object has a of letters: letters attribute containing its list from Bio.Alphabet import IUPAC alpha = IUPAC.unambiguous_dna print alpha.letters I We can use a alphabet: Seq object’s alphabet attribute to get or set its myseq = Bio.Seq.Seq(’GATAAC’, alphabet=IUPAC.unambiguous_dna) myalpha = myseq.alphabet myseq.alphabet = IUPAC.protein I We can create a gapped alphabet from any ungapped alphabet object: import Bio.Alphabet alpha = IUPAC.protein gapped_alpha = Bio.Alphabet.Gapped(alpha) print gapped_alpha.gap_char A note on comparing Seq objects I You cannot use the == operator to compare I This is because == would compare to see if the two variables refer to the same sequence objects I Compare the sequence string instead: myseq1 = Seq(’GAT’) myseq2 = Seq(’GAT’) # don’t do this: if myseq1 == myseq2: print "The same sequence!" # do this if str(myseq1) == str(myseq2): print "The same sequence!" Bio.Seq.Seq objects MutableSeq objects I Bio.Seq.MutableSeq objects are like Bio.Seq.Seq objects, except they are mutable I Because they are mutable, methods that operate them (e.g. complement()) operate “in place” and do not return a value I The tomutable() method of the Bio.Seq.Seq class returns a mutable sequence, and similarly the toseq() method of the Bio.Seq.MutableSeq class returns an immutable sequence from Bio.Seq import Seq, MutableSeq seq1 = Seq(’GATAACA’) mut_seq = seq1.tomutable() mut_seq[3] = ’T’ rc_seq = seq1.reverse_complement() mut_seq.reverse_complement() # in place transform mut_seq2 = MutableSeq(’GATAACA’) seq2 = mut_seq2.toseq()