Download Genomics of Theileria parva

Document related concepts

Transcriptional regulation wikipedia , lookup

Gene expression profiling wikipedia , lookup

Protein (nutrient) wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene regulatory network wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Non-coding DNA wikipedia , lookup

Protein wikipedia , lookup

Genome evolution wikipedia , lookup

QPNC-PAGE wikipedia , lookup

Magnesium transporter wikipedia , lookup

List of types of proteins wikipedia , lookup

Community fingerprinting wikipedia , lookup

Gene expression wikipedia , lookup

Protein adsorption wikipedia , lookup

Western blot wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Protein moonlighting wikipedia , lookup

Point mutation wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Proteolysis wikipedia , lookup

Molecular evolution wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Protein structure prediction wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Homology modeling wikipedia , lookup

Transcript
TDR-HAT Bioinformatics Course
Etienne de Villiers
Sonal Patel
ILRI - Kenya
Outline
1. Introduction
2. Nucleic acid sequence analysis
3. Protein sequence analysis
4. Accessing Completed Genomes
5. Homology Searching
6. Multiple Sequence Alignments
7. Comparative Genomics
A gene codes for a protein
Gene/DNA
CCTGAGCCAACTATTGATGAA
transcription
mRNA
CCUGAGCCAACUAUUGAUGAA
translation
Protein
PEPTIDE
Eukaryotes have spliced genes…
Promises of genomics and
bioinformatics
• Medicine
– Knowledge of protein structure facilitates drug
design
– Understanding of genomic variation allows the
tailoring of medical treatment to the individual’s
genetic make-up
– Genome analysis allows the targeting of genetic
diseases
– The effect of a disease or of a therapeutic on RNA
and protein levels can be elucidated
• The same techniques can be applied to
biotechnology, crop and livestock
improvement, etc...
What is bioinformatics?
• Application of information technology to the
storage, management and analysis of
biological information
• Facilitated by the use of computers
What is bioinformatics?
• Sequence analysis
– Geneticists/ molecular biologists analyse genome sequence
information to understand disease processes
• Molecular modeling
– Crystallographers/ biochemists design drugs using computer-aided
tools
• Phylogeny/evolution
– Geneticists obtain information about the evolution of organisms by
looking for similarities in gene sequences
• Ecology and population studies
– Bioinformatics is used to handle large amounts of data obtained in
population studies
Sequence analysis: overview
Sequencing project
management
Nucleotide
sequence
analysis
Sequence
entry
Sequence database
browsing
Manual
sequence
entry
Nucleotide sequence file
Search for protein
coding regions
Search databases for
similar sequences
Design further experiments
Restriction mapping
PCR planning
coding
non-coding
Protein
sequence
analysis
Translate
into protein
Search databases for
similar sequences
Sequence comparison
Search for
known motifs
RNA structure
prediction
Create a multiple
sequence alignment
Edit the alignment
Molecular
phylogeny
Search for
known motifs
Predict
secondary
structure
Sequence comparison
Multiple sequence analysis
Format the alignment
for publication
Protein sequence file
Protein family
analysis
Predict
tertiary
structure
Gene Sequencing
Automated chemical sequencing methods allow rapid generation of
large data banks of gene sequences
Database similarity searching
The BLAST program has been written to allow rapid comparison of a new gene
sequence with the 100s of 1000s of gene sequences in data bases.
Sequences producing significant alignments:
(bits)
Value
gnl|PID|e252316 (Z74911) ORF YOR003w [Saccharomyces cerevisiae]
112
gi|603258 (U18795) Prb1p: vacuolar protease B [Saccharomyces ce... 106
gnl|PID|e264388 (X59720) YCR045c, len:491 [Saccharomyces cerevi... 69
gnl|PID|e239708 (Z71514) ORF YNL238w [Saccharomyces cerevisiae]
30
gnl|PID|e239572 (Z71603) ORF YNL327w [Saccharomyces cerevisiae]
29
gnl|PID|e239737 (Z71554) ORF YNL278w [Saccharomyces cerevisiae]
29
7e-26
5e-24
7e-13
0.66
1.1
1.5
gnl|PID|e252316 (Z74911) ORF YOR003w [Saccharomyces cerevisiae]
Length = 478
Score = 112 bits (278), Expect = 7e-26
Identities = 85/259 (32%), Positives = 117/259 (44%), Gaps = 32/259 (12%)
Query: 2
QSVPWGISRVQAPAAHNRG---------LTGSGVKVAVLDTGIST-HPDLNIRGG-ASFV 50
+ PWG+ RV
G
G GV
VLDTGI T H D
R
+ +
Sbjct: 174 EEAPWGLHRVSHREKPKYGQDLEYLYEDAAGKGVTSYVLDTGIDTEHEDFEGRAEWGAVI 233
Query: 51
PGEPSTQDGNGHGTHVAGTIAALNNSIGVLGVAPSAELYXXXXXXXXXXXXXXXXXQGLE 110
P
D NGHGTH AG I + +
GVA + ++
+G+E
Sbjct: 234 PANDEASDLNGHGTHCAGIIGSKH-----FGVAKNTKIVAVKVLRSNGEGTVSDVIKGIE 288
Sequence comparison
Gene sequences can be aligned to see similarities between gene from
different sources
768 TT....TGTGTGCATTTAAGGGTGATAGTGTATTTGCTCTTTAAGAGCTG
||
||
|| | | ||| | |||| |||||
||| |||
87 TTGACAGGTACCCAACTGTGTGTGCTGATGTA.TTGCTGGCCAAGGACTG
.
.
.
.
.
814 AGTGTTTGAGCCTCTGTTTGTGTGTAATTGAGTGTGCATGTGTGGGAGTG
| | |
| |||||| |
|||| | || |
|
136 AAGGATC.............TCAGTAATTAATCATGCACCTATGTGGCGG
.
.
.
.
.
864 AAATTGTGGAATGTGTATGCTCATAGCACTGAGTGAAAATAAAAGATTGT
||| | ||| || || |||
|
||||||||| ||
|||||| |
173 AAA.TATGGGATATGCATGTCGA...CACTGAGTG..AAGGCAAGATTAT
813
135
863
172
913
216
Restriction mapping
Genes can be analyzed to detect gene sequences that can
be cleaved with restriction enzymes
50
AceIII
AluI
AlwI
ApoI
BanII
BfaI
BfiI
BsaXI
BsgI
BsiHKAI
Bsp1286I
BsrI
BsrFI
CjeI
CviJI
CviRI
DdeI
DpnI
EcoRI
HinfI
MaeIII
MnlI
MseI
MspI
NdeI
Sau3AI
SstI
TfiI
Tsp45I
Tsp509I
TspRI
100
150
200
250
1
2
1
2
1
2
1
1
1
1
1
2
1
2
4
1
2
2
1
2
1
1
2
1
1
2
1
2
1
3
1
CAGCTCnnnnnnn’nnn...
AG’CT
GGATCnnnn’n_
r’AATT_y
G_rGCy’C
C’TA_G
ACTGGG
ACnnnnnCTCC
GTGCAGnnnnnnnnnnn...
G_wGCw’C
G_dGCh’C
ACTG_Gn’
r’CCGG_y
CCAnnnnnnGTnnnnnn...
rG’Cy
TG’CA
C’TnA_G
GA’TC
G’AATT_C
G’AnT_C
’GTnAC_
CCTCnnnnnn_n’
T’TA_A
C’CG_G
CA’TA_TG
’GATC_
G_AGCT’C
G’AwT_C
’GTsAC_
’AATT_
CAGTGnn’
PCR Primer Design
Oligonucleotides for use in the polymerisation chain reaction can
be designed using computer based prgrams
OPTIMAL primer length
MINIMUM primer length
MAXIMUM primer length
OPTIMAL primer melting temperature
MINIMUM acceptable melting temp
MAXIMUM acceptable melting temp
MINIMUM acceptable primer GC%
MAXIMUM acceptable primer GC%
Salt concentration (mM)
DNA concentration (nM)
MAX no. unknown bases (Ns) allowed
MAX acceptable self-complementarity
MAXIMUM 3' end self-complementarity
GC clamp how many 3' bases
-->
-->
-->
-->
-->
-->
-->
-->
-->
-->
-->
-->
-->
-->
20
18
22
60.000
57.000
63.000
20.000
80.000
50.000
50.000
0
12
8
0
Gene discovery
Computer program can be used to recognize the
protein coding regions in DNA
0
1,000
2,000
3,000
4,000
1,000
2,000
3,000
4,000
2.0
1.5
1.0
0.5
-0.0
2.0
1.5
1.0
0.5
-0.0
2.0
1.5
1.0
0.5
-0.0
0
Plot created using codon preference (GCG)
Protein structure prediction
Particular structural features can be recognized in protein sequences
50
100
50
100
5.0
KD Hydrophobicity
-5.0
10
Surface Prob.
0.0
1.2
Flexibility
0.8
1.7
Antigenic Index
-1.7
CF Turns
CF Alpha Helices
CF Beta Sheets
GOR Turns
GOR Alpha Helices
GOR Beta Sheets
Glycosylation Sites
Protein Structure
The 3-D structure of
proteins is used to
understand protein
function and design
new drugs
Multiple sequence alignment
Sequences of proteins from different organisms can be
aligned to see similarities and differences
Alignment formatted using Jalview
Phylogeny inference
Analysis of sequences allows evolutionary relationships to be determined
E.coli
C.botulinum
C.cadavers
C.butyricum
B.subtilis
B.cereus
Phylogenetic tree constructed using the Phylip package
DNA sequence analysis
Inferring function by homology
• The fact that functionally important aspects of
sequences are conserved across evolutionary time
allows us to find, by homology searching, the
equivalent genes in one species to those known to be
important in other model species.
• Logic: if the linear alignment of a pair of sequences is
similar, then we can infer that the 3-dimensional
structure is similar; if the 3-D structure is similar then
there is a good chance that the function is similar.
Basic Local Alignment Search Tools
(BLAST)
• BLAST programs (there are several) compare a query sequence
to all the sequences in a database in a pairwise manner.
• Breaks: query and database sequences into fragments known
as "words", and seeks matches between them.
• Attempts to align query words of length "W" to words in the
database such that the alignment scores at least a threshold
value, "T". known as High-Scoring Segment Pairs (HSPs)
• HSPs are then extended in either direction in an attempt to
generate an alignment with a score exceeding another
threshold, "S", known as a Maximal-Scoring Segment Pair
(MSP)
2 sequence alignment
To align GARFIELDTHECAT with
GARFIELDTHERAT is easy
GARFIELDTHECAT
||||||||||| ||
GARFIELDTHERAT
Gaps
Sometimes, you can get a better overall
alignment if you insert gaps
GARFIELDTHECAT
||||||||
|||
GARFIELDA--CAT
is better (scores higher) than
GARFIELDTHECAT
||||||||
GARFIELDACAT
No gap penalty
But there has to be some sort of a gappenalty otherwise you can align ANY two
sequences:
G-R--E------AT
| | |
||
GARFIELDTHECAT
Affine gap penalty
• Could set a score for each indel
• Usually use affine gap penalty
– (open + extend) * gap length
• Open –10, extend -0.05
2+ similar sequences
• When doing a similarity search against a database
you are trying to decide which of many sequences is the
CLOSEST match to your search sequence.
• Which of the following alignment pairs is better?:
Scoring Alignments
GARFIELDTHECAT
||||
|||||||
GARFRIEDTHECAT
GARFIELDTHECAT
||| ||| |||||
GARWIELESHECAT
GARFIELDTHECAT
|| ||||||| ||
GAVGIELDTHEMAT
?
Low Complexity Masking
• Some sequences are similar even if they have no
recent
common ancestor.
• Huntington's disease is caused by poly CAG tracks in
the DNA which results in polyGlutamine (Gln, Q)
tracks in the protein.
• If you do a homology search with QQQQQQQQQQ
you get hits to other proteins that have a lot of
glutamines but have totally different function.
2 sequence alignment
Huntingtin:
MATLEKLMKA FESLKSFQQQ QQQQQQQQQQ
QQQQQQQQQQ PPPPPPPPPP PQLPQPPPQA
hits
>MM16_MOUSE MATRIX METALLOPROTEINASE-16 Score = 34.4 bits (78),
Expect = 0.18 Identities = 21/65 (32%), Positives = 25/65 (38%), Gaps = 2/65
(3%):
FQQQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQ--AQPLLPQPQPPPPPP
F Q +
+
Q Q+
PP
PPP
LP PP
P
P+
P PP
FYQYMETDNFKLPNDDLQGIQKIYGPPDKIPPPTRPLPTVPPHRSVPPADPRRHDRPKPP
But not because it is involved in microtubule mediated
transport!
E - values
• An E-value is a measure of the probability of any
given hit occurring by chance.
• Dependent on the size of the query sequence and
the database.
• The lower the E-value the more confidence you
can have that a hit is a true homologue (sequence
related by common descent).
Protein sequence analysis
Protein Sequence Analysis
1. Physico-chemical properties.
2. Cellular localization.
3. Signal peptides.
4. Transmembrane domains.
5. Post-translational modifications.
6. Motifs & domains.
7. Secondary structure.
8. Other resources.
ExPASy (Expert Protein Analysis System)
• Swiss Institute of Bioinformatics (SIB).
• Dedicated to the analysis of protein
sequences and structures.
• Many of the programs for protein sequence
analysis can be accessed via ExPASy.
1) Physico-chemical properties:
• ProtParam tool
o
o
o
o
o
o
o
o
o
molecular weight
theoretical pI (pH no net electrical charge)
amino acid composition
atomic composition
extinction coefficient
estimated half-life
instability index
aliphatic index
grand average of hydropathicity (GRAVY)
2) Cellular localization:
• Proteins destined for particular subcellular
localizations have distinct amino acid properties
particularly in their N-terminal regions.
• Used to predict whether a protein is localized in the
cytoplasm, nucleus, mitochondria, or is retained in
the ER, or destined for lysosome (vacuolar) or the
peroxisome.
• PSORT
• End of the output the percentage likelihood of the
subcellular localization.
3) Signal peptides:
• Proteins destined for secretion, operation with the
endoplasmic reticulum, lysosomes and many
transmembrane proteins are synthesized with leading
(N-terminal) 13 – 36 residue signal peptides.
• SignalP WWW server can be used to predict the
presence and location of signal peptide cleavage
sites in your proteins.
• Useful to know whether your protein has a signal
peptide as it indicates that it may be secreted from
the cell.
• Proteins in their active form will have their signal
peptides removed.
4) Transmembrane domains:
• TMpred program makes a prediction of membranespanning regions and their orientation.
• Algorithm is based on the statistical analysis of
TMbase, a database of naturally occurring
transmembrane proteins.
• Presence of transmembrane domains is an indication
that the protein is located on the cell surface.
5) Post-translational modifications:
• After translation has occurred proteins may undergo a number
of posttranslational modifications.
• Can include the cleavage of the pro- region to release the active
protein, the removal of the signal peptide and numerous
covalent modifications such as, acetylations, glycosylations,
hydroxylations, methylations and phosphorylations.
• Posttranslational modifications may alter the molecular weight of
your protein and thus its position on a gel.
• Many programs available for predicting the presence of
posttranslational modifications, we will take a look at one for the
prediction of type O-glycosylation sites in mammalian proteins.
• These programs work by looking for consensus sites and just
because a site is found does not mean that a modification
definitely occurs.
6) Motifs and Domains:
• Motifs and domains give you information on the
function of your protein.
• Search the protein against one of the motif or profile
databases.
• ProfileScan, which allows you to search both the
Prosite and Pfam databases simultaneously
7) Secondary Structure Prediction:
• WHY:
– If protein structure, even secondary structure, can be
accurately predicted from the now abundantly available gene
and protein sequences, such sequences become immensely
more valuable for the understanding of drug design, the
genetic basis of disease, the role of protein structure in its
enzymatic, structural, and signal transduction functions, and
basic physiology from molecular to cellular, to fully systemic
levels.
• JPRED - works by combining a number of modern,
high quality prediction methods to form a consensus.
Secondary Structure Prediction
• Essentially protein secondary structure
consists of 3 major conformations;
 a Helix.
 b pleated sheet.
 coil conformation.
Accessing Completed Genomes
Accessing Completed Genomes
1.
2.
3.
4.
5.
6.
GeneDB
TigrDB
TIGR Gene Indices
Ensembl
NCBI Genomic Biology
Accessing other genomes
GeneDB
http://www.genedb.org
• Multi-organism BLAST
• Datasets from Fungi, Bacteria, Protozoa, Parasite
Vectors
• Curated Genome Database for three major
organisms:
– Schizosaccharomyces pombe,
– Leishmania major and
– Trypanosoma brucei
TigrDB
http://www.tigr.org/tdb/parasites/
• Access to parasites sequenced at TIGR
• Genome Annotation Database
• Relevant to WHO/TDR Pathogens:
– Trypanosomes, Schistosoma mansoni, Brugia
malayi
TIGR Gene Indices
• http://www.tigr.org/tdb/tgi/
• Organism specific databases providing EST
and gene sequence transcripts.
• Gene indexes available for:
–
–
–
–
Animals
Plants
Protists and
Fungi.
Ensembl
• Ensembl is a joint
project between EMBL EBI and the Sanger
Institute to develop a
software system which
produces and maintains
automatic annotation on
eukaryotic genomes.
NCBI Genomic Biology
http://www.ncbi.nlm.nih.gov
•
•
•
•
•
•
Literature
Nucleotide Sequence
Protein Sequence
Complete Genomes
Genome Maps
… etc
Homology Searching
What is BLAST?
BLAST® (Basic Local Alignment Search Tool) is a set of
similarity search programs designed to explore all of the
available sequence databases regardless of whether the
query is protein or DNA.
“local” means it searches and aligns sequence segments,
rather than align the entire sequence. It’s able to detect
relationships among sequences which share only isolated
regions of similarity.
Currently, it is the most popular and most accepted
sequence analysis tool.
Why BLAST?
• Identify unknown sequences - The best way to identify an
unknown sequence is to see if that sequence already exists
in a public database. If the database sequence is a wellcharacterized sequence, then you may have access to a
wealth of biological information.
• Help gene/protein function and structure prediction – genes
with similar sequences tend to share similar functions or
structure.
• Identify protein family – group related (paralog or ortholog)
genes and their proteins into a family.
•Prepare sequences for multiple alignments
• And more …
Blast 1
Blast 2
Low Complexity masking
>GDB1_WHEAT
MKTFLVFALIAVVATSAIAQMETSCISGLERPWQQQPLPPQQSFSQQPPFSQQQQQPLPQ
QPSFSQQQPPFSQQQPILSQQPPFSQQQQPVLPQQSPFSQQQQLVLPPQQQQQQLVQQQI
PIVQPSVLQQLNPCKVFLQQQCSPVAMPQRLARSQMWQQSSCHVMQQQCCQQLQQIPEQS
RYEAIRAIIYSIILQEQQQGFVQPQQQQPQQSGQGVSQSQQQSQQQLGQCSFQQPQQQLG
QQPQQQQQQQVLQGTFLQPHQIAHLEAVTSIALRTLPTMCSVNVPLYSATTSVPFGVGTG
VGAY
>GDB1_WHEAT
SEG filtered
MKTFLVFALIAVVATSAIAQMETSCISGLERPWXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXLNPCKVFLQQQCSPVAMPQRLARSQMWXXXXXXXXXXXXXXXXXXXXXXX
RYEAIRAIIYSIIXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXHQIAHLEAVTSIALRTLPTMCSVNVPLYSATTSVPFGVGTG
VGAY
Blast limit by taxon
Blast results
Interpret BLAST results - Distribution
Query sequence
BLAST hits.
Click to access
the pairwise
alignment.
This image shows the distribution of BLAST hits on the query
sequence. Each line represents a hit. The span of a line
represents the region where similarity is detected. Different
colors represent different ranges of scores.
Interpret BLAST results - Description
The description (also called definition) lines are listed below under the heading
"Sequences producing significant alignments". The term "significant" simply refers
to all those hits whose E value was less than the threshold. It does not imply
biological significance.
ID (GI #, refseq #, DB-specific Gene/sequence Bit score – higher, better.
ID #) Click to access the
Definition
Click to access the
record in GenBank
pairwise alignment
Links
Expect value – lower, better. It tells the
possibility that this is a random hit
Interpret BLAST results –
pairwise alingments
Query line: the segment from query sequence.
Subj line: the segment from hit (subject) sequence.
Middle line: the consensus bases
Summary - If your sequence is NUCLEOTIDE
Length
DB
Purpose
Program
20 bp
or longer
Nucl
Identify the query sequence
MegaBlast
blastn
Find sequences similar to query
sequence
blastn
Find similar proteins to translated query
in a translated database
tblastx
Prot
Find similar proteins to translated query
in a protein database
blastx
Nucl
Find primer binding sites or map short
contiguous motifs
Search for short,
nearly exact
matches
7-20 bp
Summary - If your sequence is PROTEIN
Length
DB
Purpose
Program
15
residue
or longer
Prot
Identify the query sequence or find protein
sequences similar to query
blastp
Find members of a protein family or build
a custom position-specific score matrix
PSI-blast
Find proteins similar to the query around a PHI-blast
given pattern
5-15
residue
Nucl
Find similar proteins in a translated
nucleotide database
tblastn
Prot
Search for peptide motifs
Search for short,
nearly exact
matches
Multiple Sequence Alignments
Why Do MSAs?
• Although BLAST may give you good E-value – MSA more
convincing that protein is related and can be aligned over entire
length.
• Identification of conserved regions or domains in proteins.
– Regions that are evolutionary conserved are likely to be important
for structure/function.
– Mutations in these areas more likely to affect function.
• Identification of conserved residues in proteins.
• Prerequisite for doing phylogenetic trees.
Identification of conserved domains
How MSAs are computed
T-Coffee Vs Clustal
• ClustalW is standard program for MSAs.
• However, new program T-Cofffee often does
a better job particularly with more distantly
related proteins.
Comparative Genomics
The Big Picture: Why compare?
• Conservation over long evolutionary distances suggests
functional constraints;
– useful for discovery of genes and other functional elements
– lack of conservation over short distances may be indicative of
adaptive evolution
• Characterizing the differences between organisms reveals
insights into the mechanisms of change.
• Leveraging knowledge between species, e.g. from wellcharacterized model systems to species of strategic or
economic interest.
• Correlating intraspecies genotypic and phenotypic variation.
Matching Apples and Oranges:
Similarity/Homology
• Regardless of what the unit is, all comparisons require some
objective metric for defining how to match; e.g. in silico
hybridization protocol, substitution matrix, similarity scores
• According to the selected definition, similarity is an
observed/computed fact
• Homology is an inference about common ancestry usually
based on similarity and some underlying model of evolution
• Convergent evolution can result in similarity without homology
• Mutational saturation obscures homology over time, especially
in “neutral” areas
Matching Lemons and Limes:
Shades of homology
• Orthology: denotes descent from common precursor via
“speciation” event; the basic “copy” operation with divergence
• Paralogy: denotes descent from common precursor via
intraspecies duplication event; single element, segmental, whole
genome
• Horizontal transfer: denotes descent from common precursor via
interspecies transfer
• Gene fusion, Gene loss, Exon skipping/shuffling…
Artemis
•
•
•
•
Artemis is a DNA viewer program.
View EMBL and GenBank style files.
View Prokaryotic and Eukaryotic annotations.
Display genome features on a six-frame
translation.
• Is the main annotation tool used for analysis
of microbial genomes at the Sanger Institute.
ARTEMIS - example
ACT (Artemis Comparison Tool)
•
•
•
•
A DNA sequence comparison viewer.
Based on Artemis.
Visualise multiple genome comparisons.
ACT is usually the result of running a blastn
or tblastx search.
• Retains all functionality of Artemis
The ACT Display
genome1
Zoom scroll bar
Filter scroll
bar
genome2
Genome2
Blast HSPs
genome3
ACT
• Designed for looking at complete bacterial
genomes.
ACT - example
• Trypanosoma brucei chromosome 1 versus Trypanosoma
cruzi chromosome 3
Running ACT
Sequence 1
Sequence 2
BLASTn
tBLASTx
MSPcrunch
Reformat
MSPcrunch output
62
140
73
92
59
56
165
79
95
135
87
56
89
52
51
54
90
231
93
49.00
64.00
62.00
58.00
58.00
57.00
62.00
79.00
67.00
55.00
65.00
46.00
51.00
67.00
73.00
53.00
72.00
72.00
67.00
22
232
793
3498
3724
4333
4825
5239
5103
7486
8014
8698
8812
11117
12611
12622
14131
14374
14803
168
495
936
3752
3873
4458
5199
5367
5354
7770
8175
8835
9012
11215
12709
12750
14304
14829
14994
py.synt.contigs.00000001
py.synt.contigs.00000001
py.synt.contigs.00000001
py.synt.contigs.00000001
py.synt.contigs.00000001
py.synt.contigs.00000001
py.synt.contigs.00000001
py.synt.contigs.00000001
py.synt.contigs.00000001
py.synt.contigs.00000001
py.synt.contigs.00000001
py.synt.contigs.00000001
py.synt.contigs.00000001
py.synt.contigs.00000001
py.synt.contigs.00000001
py.synt.contigs.00000001
py.synt.contigs.00000001
py.synt.contigs.00000001
py.synt.contigs.00000001
3928 4074 chr_cm none
4141 4404 chr_cm none
4642 4785 chr_cm none
7271 7525 chr_cm none
7497 7646 chr_cm none
8074 8199 chr_cm none
8728 9102 chr_cm none
9142 9270 chr_cm none
9006 9257 chr_cm none
12766 13050 chr_cm none
13345 13506 chr_cm none
14149 14286 chr_cm none
14266 14466 chr_cm none
16541 16639 chr_cm none
18019 18117 chr_cm none
18030 18158 chr_cm none
19094 19267 chr_cm none
19337 19792 chr_cm none
19769 19960 chr_cm none