Download Slides Exercise3_2015

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Exercise 3
Inspecting the primary structure of a gene
Pevsner 2009
J Mol Biol 228, 1124--1136
Nature 409, 860-921
HMMgene
•
•
•
•
•
•
•
•
•
Sequence identifier
Program name
Prediction (see table below for the meaning).
Beginning
End
Score between 0 and 1
Strand: $+$ for direct and $-$ for complementary
Frame (for exons it is the position of the donor in the frame)
Group to which prediction belong. If several CDS's are found they
will be called cds_1, cds_2, etc. `bestparse:' is there because
alternative predictions will also be available
HMMgene
firstex
exon_N
lastex
The coding part of the first coding exon starting with the first base of
the start codon.
The N'th predicted internal coding exon.
The coding part of the last coding exon ending with the last base of
the stop codon.
singleex
The coding part of an exon in a gene with only one coding exon.
CDS
Coding region composed of the exon predictions prior to this line.
START
STOP
DON
ACC
Predicted start codon with position of first and last base (only with
signal option).
Predicted stop codon with position of first and last base (only with
signals option).
Predicted donor site with position of the base before and after the
splice site (only with signal option).
Predicted acceptor site with position of the base before and after the
splice site (only signal option).
Fgenesh
• G - predicted gene number, starting from start of sequence;
• Str - DNA strand (+ for direct or - for complementary);
• Feature - type of coding sequence: CDSf - First (Starting with Start
codon), CDSi - internal (internal exon), CDSl - last coding segment,
ending with stop codon);
• TSS - Position of transcription start (TATA-box position and score);
• Start and End - Position of the Feature;
• Score - Log likelihood*10 score for the feature;
• ORF - start/end positions where the first complete codon starts
and the last codon ends.
Related documents