Download TIGR_ISS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genome evolution wikipedia , lookup

RNA-Seq wikipedia , lookup

Non-coding RNA wikipedia , lookup

Epitranscriptome wikipedia , lookup

SR protein wikipedia , lookup

Gene regulatory network wikipedia , lookup

Protein (nutrient) wikipedia , lookup

Gene expression profiling wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Molecular evolution wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Protein wikipedia , lookup

QPNC-PAGE wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Magnesium transporter wikipedia , lookup

Protein structure prediction wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Interactome wikipedia , lookup

Protein domain wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein adsorption wikipedia , lookup

List of types of proteins wikipedia , lookup

Gene expression wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Western blot wikipedia , lookup

Homology modeling wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Protein moonlighting wikipedia , lookup

Transcript
Sequence-based manual annotation, as carried out at TIGR
Genome
sequence
find
coding genes
or predicted protein
coding genes.
RNA finding
(tRNAscan,
RFAM, homology
searches)
predicted
RNA genes
Collect any
literature for
the gene product
translation
Sequence based searches:
Blast-type pairwise alignments;
HMM searches (Pfam, TIGRFAM, etc.);
InterPro; TMHMM; SignalP; TargetP;
COGs; Paralogous families; and more…..
Evaluate evidence
presented
in paper
Evaluation of evidence
pairwise alignments:
Get Candidate GO terms
-from match proteins
-from matching families/
domains/motifs
-from EC number mapping,
InterPro2GO, other mappings, etc.
Search for GO terms if no
candidates present themselves
-GO search/browse tool AmiGO
-many other tools (e.g. Manatee,
QuickGO, etc.)
Evaluate GO terms:
Check that the quality of evidence
supports candidate GO terms at a
particular level of specificity. Read the
literature relevant to the experimental
characterization of any match proteins
used as evidence. Check that any GO
terms that may be assigned to the match
protein are correct. Check GO trees and
definitions to make sure the term makes
sense for your organism.
Generally it is safer to make function GO
annotations than process ones based on
sequence similarity to single proteins.
See IGC chart for more on process
annotations based on sequence.
Visually inspect alignments, look for conserved
active sites, look for (generally) at least 35%
identity across the full lengths of both proteins.
If matches are not full length, look to see if there
are recognized functional domains in the area
where the match occurs.
Decide how much information can be transferred
from the match protein to the query. In order to
assert that the query has the exact same function
as the match protein, the match protein must
be experimentally characterized. If any doubt
about specificity of the function exists, back up to
a more general level of annotation.
family/domain based evidence:
Review search results (InterPro, HMM). Look to
see specificity of the family in question. Can a
specific function be assigned based on
membership in the family?, or is the family broad
in functional scope? If so, can a general function
such as “kinase” or “oxidoreductase” be given.
If not, can a name be given based on family
membership even if function is unknown?
motif predictors:
Look to see what the presence of membrane
spans, signal peptides, etc. is telling you about
the protein in light of other information coming
from other search results - is it all consistent,
does it add up to a particular cellular location
or function? If all you have is a motif, perhaps
you can still make some annotations (eg. “integral
membrane protein” based on for example
multiple TMHMM regions.