Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Bio::Seq Class • The Bio::Seq class allows for efficient manipulation and storage of nucleotide and protein sequences • Handles tasks such as – – – – Manipulating the actual sequence Creating new Bio::Seq subsequence objects Manipulating features for the sequence Manipulating other data associated with the sequence (i.e. accession number, species, etc…) Interfaces that Bio::Seq Implements • Bio::SeqI and PrimarySeqI – Used to access data such as the type of sequence, its accession number and id, and its description • Bio::IdentifieableI – Handles aspects regarding the source of the sequence and the sequence’s id and version • Bio::DescribableI – Methods to access human readable descriptions of the sequence object • Bio::AnnotatableI – Methods to access the annotation of the sequence • Bio::FeatureHolderI – Methods to get features present in this sequence Bio::SeqI top_SeqFeatures all_SeqFeatures seq write_GFF annotation feature_count species primary_seq Bio::IdentifiableI object_id version authority namespace UML Diagram of Bio::Seq Bio::AnnotatableI annotation Bio::Seq new seq validate_seq length subseq display_id accession_number desc primary_id can_call_new alphabet object_id version authority namespace display_name description annotation get_SeqFeatures get_all_SeqFeatures feature_count add_SeqFeature remove_SeqFeature revcom trunc id primary_seq species Bio::FeatureHolderI get_SeqFeatures feature_count get_all_SeqFeatures Bio::DescribableI display_id description Common Bio::Seq methods • new – Constructs a a Bio::Seq object – E.g. $seq = Bio::Seq->new(-seq=>’ACGTCGAC’, display_id=>’foo’) • seq – Gets or sets the string representation of the nucleotides or amino acids in this Bio::Seq object – e.g. print $seq->seq; • length – Gets the length of a Bio::Seq object – e.g. print $seq->length; • accession_number – Gets or sets the accession number of a Bio::Seq object – e.g. $seq->accession_number(‘AC12345); Common Bio::Seq methods (continued) • subseq – Gets the subsequence as a string from the first integer to the second integer, inclusive – e.g. print $seq->subseq(3, 9); • trunc – Returns a new Bio::Seq object that is a truncation from the first integer to the second integer, inclusive – e.g. $trunc_seq = $seq->trunc(3, 9); • revcom – Returns a new Bio::Seq object that is the reverse compliment of this Bio::Seq object – e.g. $revcom = $seq->revcom Obtaining Bio::Seq objects • Bio::Seq objects can be constructed from a variety of flat file formats, Internet databases, or from other Bio::Seq objects. – The Bio::SeqIO class allows for sequential input or output of sequences from or to flat files – Bio::DB classes such as Bio::DB::Genbank or Bio::DB::Fasta allow for random access retrieval of sequences – Using the trunc function of a Bio::Seq object will return a new Bio::Seq object. Bio::SeqIO • Sequential access to a flat file of such types as fasta, gff, embl, swissprot, etc… • The new method’s file argument requires the path to a file of a supported file type – Include a ‘>’ before the file’s path if you wish to write to it Bio::SeqIO->new(-file=>’>path/of/file/to/write’ -format >’embl’); – Include ‘>>’ before the file’s path to append that file Bio::SeqIO->new(-file=>‘>>path/of/file/to/append’ -format> ‘gff’); – To read, simply include the path to the file Bio::SeqIO->new(-file=>‘path/of/file/to/read’ -format> ‘fasta’); Input With Bio::Seq • Allows for construction of Bio::Seq objects in a sequential order • Use next_seq method to get the next sequence from the file, if one exists • To read all the sequences in a file and print their names to the screen: my $seqio = Bio::SeqIO->new(-file=>’foo.fasta’, -format=>’fasta’); while (my $seq = $seqio->next_seq) { print $seq->display_id; print ‘’\n’’; } Output With Bio::SeqIO • Whether writing or appending to a file, the new method creates the file if it does not exist • Write overwrites all data in the file if one existed, append adds sequences to the end • The following writes a sequence to a file # $seq has previously been defined as a Bio::Seq object my $seqio = Bio::SeqIO->new(-file=>’>foo.swissprot’, -format=>’swissprot’); $seqio->write_seq($seq); Bio::DB::* • Bio::DB::* is a collection of similar classes where * varies • * may be – GenBank – A flatfile format • e.g. Fasta, EMBL, SwissProt, GFF, etc… • Modules in this form allow for random access retrieval from – A specific file or a directory of flat files of type * • Uses indexing to allow for quick retrieval of sequence information – The GenBank database at NCBI if * is GenBank. Random Access Retrieval from Flat Files • The new method requires the path to either a file containing sequences or to a directory containing many sequence files – Indexes the files on first run or when the -reindex argument is used with a value of 1 • The following constructs a Bio::Seq object with a given accession from a Fasta file use Bio::DB::Fasta; # given $accession is a valid accession # number $db = Bio::DB::Fasta->new( ‘path/to/file/or/directory’, -reindex=>1); $seq = $db->get_Seq_by_acc($accession); Retrieval from GenBank • Constructing a Bio::Seq object using Bio::DB::GenBank requires an Internet connection to access the GenBank database • The following is an example to construct a Bio::Seq object use Bio::DB::GenBank; # given $accession is defined as a valid # accession number $gb = Bio::DB::GenBank->new(); $seq = $gb->get_Seq_by_acc($accession);