Download Homology Detection

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

LSm wikipedia , lookup

Gene expression wikipedia , lookup

P-type ATPase wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

List of types of proteins wikipedia , lookup

Interactome wikipedia , lookup

Protein wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Magnesium transporter wikipedia , lookup

SR protein wikipedia , lookup

QPNC-PAGE wikipedia , lookup

Protein adsorption wikipedia , lookup

Western blot wikipedia , lookup

Protein moonlighting wikipedia , lookup

Cyclol wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein structure prediction wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Trimeric autotransporter adhesin wikipedia , lookup

Homology modeling wikipedia , lookup

Transcript
Homology
and
Information Gathering and
Domain Annotation for Proteins
Outline
• Homology
• Information Gathering for Proteins
• Domain Annotation for Proteins
• Examples and exercises
The concept of homology
“The same organ in
different animals under
every variety of form
and function.”
– Richard Owen, 1843
http://bytesizebio.net/wp-content/uploads/2009/07/homology-limbs
Homologous forelimbs
Homology – Alikeness because of common ancestry
•
•
•
•
•
Homology: The relationship of any
two characters that have descended
with divergence from a common
ancestral character (common
ancestry)
Analogy: The relationship of any two
characters that have descended
convergently from unrelated
ancestors (convergent evolution)
Characters are at very different
levels of biological organization,
ranging from entire organs over
genes and domains to single
nucleotides
Homology is a concept of quality
(all-or-none)
Homology is not precisely defined
pterosaur
bat
bird
http://upload.wikimedia.org/wikipedia/commons/3/38/Homology.jpg
Steven M. Carr, 2009
http://www.mun.ca/biology/scarr/Molecular_Homology_&_Analogy.html
Subtypes of homology
•
Three disjoint subtypes
– Orthology: Two homologous
characters separated by a
speciation event
(Speciation)
(Duplication)
– Paralogy: Two homologous
characters arising from a
duplication event
– Xenology: Two homologous
characters whose history
involves interspecies
(horizontal) transfer of
genetic material
Horizontal
transfer
Walter M. Fitch,Trends in Genetics, 2000
Protein domain is a basic evolutionary module and
an important unit of homology
•
•
•
•
•
Definition: A polypeptide chain capable of autonomous folding
Many proteins are multi-domain proteins
Many domains are found in different contexts – domain shuffling
Exons in eukaryotic genomes often correspond to domains
Therefore, protein classification schemes build on domains not
on entire proteins
Soding & Lupas, Bioessays, 2003
Assessment of homology in proteins
•
•
•
•
•
Assessed by comparing their sequence, structure, and function
Sequence similarity is the primary marker of homology
Due to the relatively minor size of protein structure space, similar
structures are more likely to originate by convergence
However, structure diverges more slowly and therefore allows for the
recognition of more distant relationships
Functional residues within an active site are often the most highly
conserved positions in a protein sequence
Sequence
Structure
Function
Information gathering and
domain annotation for proteins
• Databases and servers
• Domain annotation
A variety of databases enable information
gathering about your protein of interest
• Run by different research institutions
• Allow for free information retrieval for academic
purposes
• The spectrum ranges from broad all-around
databases (Uniprot or NCBI) to databases that
specialize in particular aspects (i.e. hierarchical
structural classification)
The National Center for Biotechnology Information
(NCBI) at the National Institute of Health in the US
• “The NCBI advances science and health by providing
access to biomedical and genomic information”
• Contains numerous popular resources
–
–
–
–
–
PubMed (life science literature)
Sequences (whole genomes to individual proteins)
Gene Expression data
Taxonomy
Numerous Tools, most importantly BLAST for homology
detection
→ A good starting point for an analysis
Protein classifications generate order among
their tremendous diversity
• Sequence-based domain classifications (grouping is
based on homology inferred by detectable sequence
similarity):
– SMART: emphasizes on signaling domains, fast
– Pfam:
a comprehensive database to classify newly found domains
into domain families
• Structure-based classification schemes:
– CATH:
– SCOP:
Class – Architecture – Topology – Homology
Structural Classification of Proteins
Class – Fold – Superfamily – Family
→ Homology is not a criterion on all levels of classification – In
contrast to cellular life proteins are polyphyletic
Example 1:
Annotate domains in LRRK2 (Human)
• Obtain sequence in FASTA1 format from the NCBI2
• Enter name of the protein (LRRK2) in Uniprot3 and
see all the information one can retrieve there
• Put the sequence into domain databases like
SMART4 or Pfam5 and mark the identified domains in
your log file
1) FASTA:
2) NCBI:
3) UniProt:
4) SMART:
5) Pfam:
a widely used plain text file format for sequence data
google “ncbi” or http://www.ncbi.nlm.nih.gov/
google “uniprot” or http://www.uniprot.org/
google “embl smart” or http://smart.embl-heidelberg.de/
google “pfam” or http://pfam.sanger.ac.uk/
Example 2:
Annotate domains in NarX (E. coli)
• …
1) FASTA:
2) NCBI:
3) UniProt:
4) SMART:
5) Pfam:
a widely used plain text file format for sequence data
google “ncbi” or http://www.ncbi.nlm.nih.gov/
google “uniprot” or http://www.uniprot.org/
google “embl smart” or http://smart.embl-heidelberg.de/
google “pfam” or http://pfam.sanger.ac.uk/