Download orthologs-and-others..

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Homology and Homologs
Homology just means sequence similarity by virtue of a common
evolutionary ancestor.
>gi|24640218|ref|NP_572350.2|
CG3126-PA, isoform A [Drosophila melanogaster]
Length=1571
Score = 427 bits (1098), Expect = 6e-118
Identities = 223/415 (53%), Positives = 297/415 (71%), Gaps = 19/415 (4%)
Frame = +2
Query 1901 SLVDHNEIMAKLTLKQEGDDGPDVRGGSGDILLVHATETDRKDLVLYFEAFLTTYRTFIT 2080
++++
I
L LK+ +DGP+V+GG D L+VHA+
+
+ EAF+TT+RTFI
Sbjct 1151 NMLEEVNITRYLILKKREEDGPEVKGGYIDALIVHASRVQKVADNAFCEAFITTFRTFIQ 1210
Query 2081 PEELIQKLQYRYERF-CHFQDTFKQRVSKNTFFVLVRVVDELCLVEMTDEILKLLMELVF 2257
P ++I+KL +RY F C QD KQ+ +K TF +LVRVV++L
++T ++L LL+E V+
Sbjct 1211 PIDVIEKLTHRYTYFFCQVQDN-KQKAAKETFALLVRVVNDLTSTDLTSQLLSLLVEFVY 1269
Query 2258 RLVCKGELSLARILRKNILEKV---ENKRMLHHANS—-ALKPLAARGVAARPG------- 2401
+LVC G+L LA++LR
+EKV
+ ++
+
G+A
G
Sbjct 1270 QLVCSGQLYLAKLLRNKFVEKVTLYKEPKVYGFVGELGGAGSVGGAGIAGSGGCSGTAGG 1329
Query 2402 ----TLHDFHSLEIAEQLTLLDAELFYKIEIPEVLLWAKEQNEEKSPNLTQFTEHFNNMS 2569
+L D SLEIAEQ+TLLDAELF KIEIPEVLL+AK+Q EEKSPNL +FTEHFN MS
Sbjct 1330 GNQPSLLDLKSLEIAEQMTLLDAELFTKIEIPEVLLFAKDQCEEKSPNLNKFTEHFNKMS 1389
Query 2570 YWVRSIIMLQEKAQDRERLLLKFIKIMKHLRKLNNFNSYLAILSALDSAPIRRLEWQKQT 2749
YW RS I+ + A++RE+ + KFIKIMKHLRK+NN+NSYLA+LSALDS PIRRLEWQK
Sbjct 1390 YWARSKILRLQDAKEREKHVNKFIKIMKHLRKMNNYNSYLALLSALDSGPIRRLEWQKGI 1449
Query 2750 SEGLAEYCTLIDSSSSFRAYRAALAEVEPPCIPYLGLILQDLTFVHLGNPDHID-GKVNF 2926
+E + +C LIDSSSSFRAYR ALAE PPCIPY+GLILQDLTFVH+GN D++ G +NF
Sbjct 1450 TEEVRSFCALIDSSSSFRAYRQALAETNPPCIPYIGLILQDLTFVHVGNQDYLSKGVINF 1509
Query 2927 SKRWQQFNILDSMRRFQQVHYEIRRNDEIISFFNDFSDHLAEEALWELSLKIKPR 3091
SKRWQQ+NI+D+M+RF++ Y RRN+ II FF++F D + EE +W++S KIKPR
Sbjct 1510 SKRWQQYNIIDNMKRFKKCAYPFRRNERIIRFFDNFKDFMGEEEMWQISEKIKPR 1564
These two sequences, my
Xenopus query sequence
and the matching
Drosophila sequence,
show strong (and variable)
homology, but even if we
knew the function of the
Drosophila gene it may not
tell us much about the
function of the Xenopus
gene.
Genes and Evolution - I
Gene duplication
though speciation
The two copies of
Gene A will now evolve
independently, but will
continue to have the
same function
They are
ORTHOLOGS
Genes and Evolution - II
Gene duplication
though internal
genome duplication
The two copies of
Gene A will now evolve
independently, but will
probably not continue
to have exactly the
same function
They are PARALOGS
Homologs, orthologs & paralogs
http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Orthology.html
Mutation and Evolution
Translated part of mRNA sequence
Ancestral sequence
ATGAAGGCTGCCTACGACTGCCGTGCCAGAATGCTGAGG
In species A
ATGAAGGCTGCCTATGACTGCCGTGCCAGAATGCTGAGG
ATGAATGCTGCCTATGACTGCCGTGCCAGAATGCTGAGG
ATGAATGCTGCCTATGACTGCCGTGCCAGAATGCTAAGG
ATGAATGCTGCCTATGACTGCCGTG
GAATGCTAAGG
ATGAATGCAGCCTATGACTGCCGTG
GAATGCTAAGG
ATGAATGCAGCCTATGATTGCCGTG
GAATGCTAAGG
ATGAATGCAGCCTATGATTGCCGAG
GAATGCTAAGG
In species B
ATGAAGGCTGCCTACGACTGCCGTGCCATAATGCTGAGG
ATGAAGGCCGCCTACGACTGCCGTGCCATAATGCTGAGG
ATGAAGGCCGCCTACGACTGTCGTGCCATAATGCTGAGG
ATGAAGGCCGCCTACGACTGTCGTGCCATAATGCTGAGA
ATGAAGGCCGCCTACGACTGTCGTGCCATAATCCTGAGA
ATGAAGGCCGCATACGACTGTCGTGCCATAATCCTGAGA
ATGAATGCAGCCTATGATTGCCGAG---GAATGCTAAGG
||||| || || || || || || |
||| || | |
ATGAAGGCCGCATACGACTGTCGTGCCATAATCCTGAGA

MKAAYDCRARMLR







MKAAYDCRARMLR
MNAAYDCRARMLR
MNAAYDCRARMLR
MNAAYDCR GMLR
MNAAYDCR GMLR
MNAAYDCR GMLR
MNAAYDCR GMLR






MKAAYDCRAIMLR
MKAAYDCRAIMLR
MKAAYDCRAIMLR
MKAAYDCRAIMLR
MKAAYDCRAIILR
MKAAYDCRAIILR
MNAAYDCR-GMLR
| |||||| +||
MKAAYDCRAIILR
Searching for Similarity
DNA comparison
ATGAATGCAGCCTATGATTGCCGAG---GAATGCTAAGG
||||| || || || || || || |
||| || | |
ATGAAGGCCGCATACGACTGTCGTGCCATAATCCTGAGA
amino acid comparison
MNAAYDCR-GMLR
| |||||| +||
MKAAYDCRAIILR
The DNA sequence can change while the amino acid sequence
stays the same, so always look for similarities by comparing amino
acid sequences.
We note that evolution causes sequence to change, by substitution,
insertion or deletion, but not usually by small-scale re-ordering.
So we need a tool which will find the ‘alignment’ between the two
sequences which shows the greatest degree of similarity while
introducing the fewest gaps as possible.
The Downside of Gaps
Take two random sequences, with no ‘real’ similarity:
GACACTAGGTCGATGCGTGGTGGCGAGA
ACGCATCCGGATGTGCACCGTGGAACTG
And allow cost free gaps:
GAC--ACT----AGGTCGATGC---GTGG---TGGCGAGA
|| | |
| | | |||
||||
||
ACGCA-TCCGGA--T-G-TGCACCGTGGAACTG
Clearly, although the alignment has no mismatches, it is obviously not biologically
meaningful!
The introduction of gaps into alignments must ideally reflect biological possibilities,
but this is rather difficult. So the tendency is to make gaps ‘expensive’, and
introduced only when they make more long range matching happen than they
introduce ‘un’-matching, e.g.
TTCCCAACTCTCCTCTTTCACCATGAAGCTCAAGGACAGATTCCACTCGCCCCAAAATCAAGCTCACCCCGTCCAAGAA
| ||
|
|| |||||||||||||||||||| ||||||||| ||| |||
|
|||
| | |
TTCCCACCTCTCCTCTTTGCACCATGAAGCTCAAGGACAAATTCCACTCCCCCAAAATCAAGCGCACCCCGTCCCAGAA
TTCCCAACTCTCCTCTTT=CACCATGAAGCTCAAGGACAGATTCCACTCGCCCCAAAATCAAGCTCACCCCGTCCAAGAA
|||||| ||||||||||| |||||||||||||||||||| ||||||||| |||||||||||||| |||||||||| ||||
TTCCCACCTCTCCTCTTTGCACCATGAAGCTCAAGGACAAATTCCACTC=CCCCAAAATCAAGCGCACCCCGTCCCAGAA
The Essential Task
Basically what we are trying to do, is to see whether we can work out the
function of an unknown gene by comparing its sequence with those of
genes in other species where we already know the function.
We can do this because the sequence of most genes is conserved to some
extent during evolution of different species.
The problem is that while gene function is probably related to both its
overall three-dimensional structure and small regions of specific linear
sequence, our only serious tool for discerning similarity between proteins is
based firmly on long range linear sequence similarity.
And there is no obvious requirement on genes to conserve sequence in
order to conserve function – it’s just easier that way…
But it seems clear that we can only expect this to be
effective if we are looking at true ORTHOLOGS.
Finding Orthologs
So how do we find orthologs, and can we know when we have?
The simplest is Reciprocal Best BLAST, but it implicitly relies on having all
the protein sequences of you own organism, and the one you wish to find
an ortholog in.
frog protein
database of
human proteins
best match
human protein
database of
frog proteins
x
Related documents