Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

List of types of proteins wikipedia , lookup

Bimolecular fluorescence complementation wikipedia , lookup

Protein design wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Structural alignment wikipedia , lookup

Western blot wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Proteomics wikipedia , lookup

Protein purification wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Protein structure prediction wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Protein domain wikipedia , lookup

Homology modeling wikipedia , lookup

Transcript
InterPro/prosite
UCSC Genome
Browser
Exercise 3
Turning information into knowledge
 The
outcome of a sequencing project is
masses of raw data
 The challenge is to turn this raw data into
biological knowledge
 A valuable tool for this challenge is an
automated diagnostic pipe through which
newly determined sequences can be
streamlined
From sequence to function

Nature tends to innovate rather than invent
 Proteins are composed of functional
elements: domains and motifs



Domains are structural units that carry out a
certain function
The same domains are
shared between different
proteins
Motifs are shorter
sequences with certain
biological activity
http://www.ebi.ac.uk/interpro/
InterPro

An integrated documentation resource for
protein families, domains and sites
 Groups signatures describing the same
protein family or domain
 Combines a number of databases that use
different methodologies to derive protein
signature:



UniProt: UniProtKB Swiss-Prot, TrEMBL,
UniRef,UniParc
prosite: documented DB on domains, families and
functional sites.
Pfam: a DB of protein families represented by
MSAs
Member databases
 Sequence-motif


methods:
Protein signature DBs with different
focus
Sequence-cluster methods:

Hierarchically clustered
sequence/structure DBs
InterPro search
http://www.expasy.ch/prosite/
prosite
 A method
for determining the function of
uncharacterized translated protein
sequences
 Consists of a DB of annotated biologically
important
sites/patterns/motifs/signature/fingerprints
prosite
 Entries
are represented with patterns or
profiles
profile
pattern [AC]-A-[GC]-T-[TC]-[GC]
1
2
3
4
5
A
0.66
1
0
0
.
T
0
0
0
1
.
C
0.33
0
0.66
0
.
G
0
0
0.33
0
.
Profiles are used in prosite when the motif is relatively
divergent, and it is difficult to represent as a pattern
Scanning prosite
Query:
sequence
Result: all patterns
found in sequence
Query:
pattern
Result: all sequences
which adhere to this
pattern
Patterns with a high probability of
occurrence

Entries describing commonly found posttranslational modifications or compositionally
biased regions.
 Found in the majority of known protein
sequences
 High probability of occurrence
prosite sequence query
prosite pattern query
UCSC Genome Browser
UCSC Genome Browser Gateway
Reset all
settings of
previous user
UCSC Genome Browser Gateway
UCSC Genome Browser Gateway
UCSC Genome Browser
query results
UCSC Genome Browser
Annotation tracks
Base position
UCSC Genes
UTR
RefSeq
mRNA (GenBank)
Vertebrate
conservation
Single species
compared
SNPs
Repeats
Intron
Exon
Gene
Direction
USCS Gene
UCSC Genome Browser - movement
Zoom x3 +
Center
UCSC Genome Browser – Base view
Annotation track options
dense
squish
pack
full
Annotation track options
Another option to
toggle between
‘pack’ and ‘dense’
view is to click on
the track title
Sickle-cell
anemia distr.
Malaria
distr.
BLAT

BLAT = Blast-Like Alignment Tool
 BLAT is designed to find similarity of >95% on
DNA, >80% for protein
 Rapid search by indexing entire genome.
Good for:
1. Finding genomic coordinates of cDNA
2. Determining exons/introns
3. Finding human (or chimp, dog, cow…)
homologs of another vertebrate sequence
BLAT on UCSC Genome Browser
BLAT on UCSC Genome Browser
BLAT Results
BLAT Results
Match
Non-Match
(mismatch/indel)
Indel
boundaries
BLAT Results
BLAT Results on the browser
Getting DNA sequence of region
Getting DNA sequence of region