Download Multiple Sequence Alignment

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Deoxyribozyme wikipedia , lookup

Transposable element wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Gene wikipedia , lookup

Gene expression wikipedia , lookup

Molecular ecology wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Genetic code wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Promoter (genetics) wikipedia , lookup

RNA-Seq wikipedia , lookup

Community fingerprinting wikipedia , lookup

Non-coding DNA wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Multilocus sequence typing wikipedia , lookup

Point mutation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Transcript
Comparing Sequences and
Multiple Sequence Alignment
Comparison of your "query" DNA, RNA, or Amino acid
sequence to a known sequence
Create an alignment of 2 or more sequences indicating
matches
Comparing Sequences and
Multiple Sequence Alignment
Pairwise Comparsion
137 AGACCAACCTGGCCAACATGGTGAAATCCCATCTCTAC.AAAAATACAAA 185
|||||| ||||||||||||||||||| |||||||||| ||||||||||
1 AGACCAGCCTGGCCAACATGGTGAAACTCCATCTCTACTGAAAATACAAA 50
Multiple Comparison/Alignment
S11448
S06443
A25398
S06158
S42164
S20139
B36590
A25089
S03250
A27077
S07197
1
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~MTFD
~~~~~~MTFD
~~~~~~MTYE
~~~~~~MTYE
~~~~~~~~MS
~~~~~~~~MS
~~~~~~~~MS
~~~~MAKSEG
~~~MAGKGEG
~~~~~~MSKG
~~~~~~MSKG
GAIGIDLGTT
GAIGIDLGTT
GAIGIDLGTT
GAIGIDLGTT
KAVGIDLGTT
KAVGIDLGTT
KAVGIDLGTT
PAIGIDLGTT
PAIGIDLGTT
PAVGIDLGTT
PAVGIDLGTT
50
YSCVGVWQNE
YSCVGVWQNE
YSCVGVWQNE
YSCVGVWQNE
YSCVAHFAND
YSCVAHFSND
YSCVAHFAND
YSCVGLWQHD
YSCVGVWQHD
YSCVGVFQHG
YSCVGVFQHG
Pairwise Comparsion
Local Alignment
BestFit
compares regions within two sequences and
can return several matches
BLAST
vs
Global Alignment
compare entire sequences
FASTA
GAP
Pairwise Comparsion
1. BestFit:
Make an optimal alignment of the best segment of similarity between two
sequences by inserting gaps to maximize the number of matches using the local
homology algorithm of Smith and Waterman.
2. Compare:
Compare two protein or nucleic acid sequences
3. DotPlot:
Make a dot-plot with the output file from Compare.
4. Gap:
Alignment of two sequences which has maximum base matches and minimum gap
by using the algorithm of Needleman and Wunsch.
5. GapShow:
Graphic of alignment (use Gap or Bestfit first)
6. FrameAlign:
Create an optimal alignment between a protein sequence and the codons in 3
reading frames on a nucleotide sequence
7. ProfileGap:
Make an optimal alignment between a profile and one or more sequences
Pairwise Comparsion
Nucleotide sequence alignments
match
mismatch
gap
137 AGACCAACCTGGCCAACATGGTGAAATCCCATCTCTAC.AAAAATACAAA 185
|||||| ||||||||||||||||||| |||||||||| ||||||||||
1 AGACCAGCCTGGCCAACATGGTGAAACTCCATCTCTACTGAAAATACAAA 50
Protein sequence alignments
Conserved substitution
ggamma.pep
HGCZG
10
20
30
40
50
60
MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPK
|||||||||||||||||:|||::|||||:|||||:|||||||||||||||||||||||||
MGHFTEEDKATITSLWGHVNVDEAGGETIGRLLVLYPWTQRFFDSFGNLSSASAIMGNPK
10
20
30
40
50
60
Residues with shared chemical properties can substitute for each other
Size, charge, hydrophobicity, polarity
scored less than a match, but better than a mismatch
Conservative changes scored as better than non-conservative
Pairwise Comparsion
FrameAlign creates an optimal alignment of the best segment of
similarity (local alignment) between a protein sequence and the
codons in all possible reading frames on a single strand of a
nucleotide sequence. Optimal alignments may include reading
frame shifts.
Query:Nucleotide sequence
Against:Protein sequence
3 GAAATCAAGAAGGCCATCAAGGAGGAATCTGAAGGCAAAATGAAGGGAAT
|||||||||||||||||||||||||||||||||||||||:::||||||||
261 GluIleLysLysAlaIleLysGluGluSerGluGlyLysLeuLysGlyIl
.
.
.
.
.
53 TTTGGGATACTCTGAGGATGATGTTGTGTCTACCGACTTTGTTGGTGACA
||||||||||...|||||||||||||||||||||||||||||||||||||
278 eLeuGlyTyrThrGluAspAspValValSerThrAspPheValGlyAspA
.
.
.
.
.
103 ACAGGTCAAGCATTTTCGATGCCAAGGCTGGATTGCATTGCATTGAGCGA
||||||||||||||||||||||||||||||||
||||||||||||||
295 snArgSerSerIlePheAspAlaLysAlaGly....IleAlaLeuSerAs
52
277
102
294
152
309
FrameAlign always finds an alignment for any protein and nucleotide sequences
you compare, even if there is no significant similarity between them. You must
evaluate the results critically to decide if the segment shown is not just a random
region of relative similarity
Pairwise Comparsion
BestFit
Percent Similarity:94.251
GAP
Percent Identity: 89.22
Identity, Similarity and Homology
Identity and Similarity is a measurable property
Homology implies functional or evolutionary relatedness
Multiple Sequence Alignment
Compare three or more sequences to each other.
Uses
Identify conserved regions and motifs
Identify gene families
Generates a consensus sequence
First step to the study of phylogenetic relationships
Programs trade sensitivity and alignment quality for computational speed
Use of more than one program is advised
Multiple Sequence Alignment
1. MEME:
Find conserved motifs in a group of unaligned sequences similarity between two
sequences.
2. NoOverlap:
Identify the places where a group of nucleotide sequences do not share any common
subsequences.
3. OldDistances:
Make a table of the pairwise similarities within a group of aligned sequences.
4. Overlap:
Compare two sets of DNA sequences to each other echo in both orientations.
5. PileUp:
Create a multiple sequence alignment from a group of related sequences.
6. PlotSimilarity:
Plot the running average of the similarity among multiple sequence alignment.
7. Pretty:
Display multiple sequence alignments and calculates a consensus sequence.
8. PrettyBox :
Display multiple sequence alignments in PostScript format.
9. ProfileGap:
Make an optimal alignment between a profile and one or more sequences.
10. ProfileMake:
Create a position-specific scoring table, called a profile.
PILEUP
S11448
S06443
A25398
S06158
S42164
S20139
B36590
A25089
S03250
A27077
S07197
A25646
S10859
A29160
JH0095
A03310
JT0285
1
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
PileUp creates a multiple sequence alignment
from a group of related sequences by using a
simplification of the progressive alignment
method of Feng and Doolittle.
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~~~~MTFD
~~~~~~MTFD
~~~~~~MTYE
~~~~~~MTYE
~~~~~~~~MS
~~~~~~~~MS
~~~~~~~~MS
~~~~MAKSEG
~~~MAGKGEG
~~~~~~MSKG
~~~~~~MSKG
~~~~~MSGKG
~~~~~MSARG
~~~~~~MAKA
~~~~~~MAKN
~~~~~MATKG
~~~~~~MSKH
GAIGIDLGTT
GAIGIDLGTT
GAIGIDLGTT
GAIGIDLGTT
KAVGIDLGTT
KAVGIDLGTT
KAVGIDLGTT
PAIGIDLGTT
PAIGIDLGTT
PAVGIDLGTT
PAVGIDLGTT
PAIGIDLGTT
PAIGIDLGTT
AAVGIDLGTT
TAIGIDLGTT
VAVGIDLGTT
NAVGIDLGTT
50
YSCVGVWQNE
YSCVGVWQNE
YSCVGVWQNE
YSCVGVWQNE
YSCVAHFAND
YSCVAHFSND
YSCVAHFAND
YSCVGLWQHD
YSCVGVWQHD
YSCVGVFQHG
YSCVGVFQHG
YSCVGVFQHG
YSCVGVFQHG
YSCVGVFQHG
YSCVGVFQHG
YSCVGVFQHG
YSCVGVFMHG
Sequence Files for PILEUP
gcg 1% pileup
gcg 2% Pileup of what sequences ?
Answer (1) Use wild cards
Ex:mouse.psq, rat.psq, human.psq, chicken.psq
 *.psq
Ex:pkc.mouse, pkc.rat, pkc.human, pkc.chicken
 pkc.*
(2) Use list files  @heatshock.list
This is a test list file
..
hspmouse.naq
/dir/HSP/hsprabbit.naq
gb_in:m25181
gb_ov:xlhsp Begin:486 End:2426 Strand:+
 \\ End of list
 Useless.dna
Preparing an Alignment as a Figure
SeqWEB
Save as html format
GCG Unix
Use Prettybox to build a postscript file
Transfer to PC
Open with Graphic softwares
Done by hand with a word processor
Transfer *.pair or *.msf files to PC
Set font to Courier or other fixed spacing font
Use shaded boxes to highlight important domains
Use color sparingly, red for the most important feature
EXERCISE 1
All Exercises
All Answers
BestFit and GAP
"fetch" the following sequences:
genbank:k02938 (Xenopus 5S RNA gene transcription factor TFIIIA mRNA)
genbank:x15785 (Xenopus TFIIIA gene 5' region)
Perform
(A)bestfit-call the output display file best.pair
(B)gap-call the output display file gap.pair
-->cat best.pair
-->cat gap.pair
-->Compare the results ANSWER
EXERCISE 2
PileUP
"fetch" the following sequences:
sw:capb_chick
sw:capb_mouse
sw:capb_human
sw:capb_caeel
-->Perform pileup capb_*.*
-->call the output display file fetch.msf
ANSWER
(3)Create a list file in PC as follows
sw:capb_chick
sw:capb_mouse
sw:capb_human
sw:capb_caeel
-->and save as capb.txt
-->use ftp to transfer the file to your GCG account
-->pileup @capb.txt
-->call the output display file list.msf
ANSWER
-->Compare list.msf and fetch.msf
EXERCISE 3
Pretty and Prettybox
(A)Use "Pretty" to display *.msf files
-->pretty fetch.msf{*}
-->call the output display file fetch.pretty
-->cat fetch.pretty
(B)Use "Prettybox" to display pretty result
-->prettybox fetch.msf{*}
-->call the output display file fetch.ps
ANSWER
-->use FTP to transfer file to you PC
-->Use Photoshop, CorelDraw, Paintshop Pro or Ghostview to open the file
-->Download gsv27550.exe (ftp://163.25.92.42)
Download All Exercises
Download All Answers