* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download MATLAB Bioinformatics Tools
Multilocus sequence typing wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Molecular ecology wikipedia , lookup
Community fingerprinting wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Biochemistry wikipedia , lookup
Biosynthesis wikipedia , lookup
Point mutation wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
MATLAB Bioinformatics Tools
Rob Henson
The MathWorks, Inc.
Who Am I?
• Development manager for Bioinformatics
group at The MathWorks
– Natick, MA
• Software developer
• Background in algorithm design and
software engineering
What do I do?
• Write software for bioinformatics
– Sequence analysis
– Microarray data analysis
• Some consulting
– Bioinformatics algorithm design
– Machine learning tools
• E.g. Neural networks, HMMs etc.
My solution to dotplot
>> map = eye(128);
>> spy(map(seq1,seq2))
Why does this work?
How could we make this better?
Enhancements to dotplot
• Does map need to be 128?
– What is the right value?
• Can we use less memory?
• How do we deal with bad inputs?
• Can we extend this to look for longer
patterns?
Some useful tools
•
•
•
•
edit
dbstop
profiler
Getting help
– Documentation
– Technical Support Knowledge Base
– Newsgroup
A full implementation of dotplot
function matches = dotplot(seq1,seq2,window,stringency)
% DOTPLOT Visualize sequence matches.
%
DOTPLOT(S,T) plots the sequence matches of sequences S and T.
%
%
DOTPLOT(S,T,WINDOW,NUM) plots sequence matches when there
%
are at least NUM matches in a window of size WINDOW. For nucleotide
%
sequences a WINDOW of 11 and NUM of 7 is recommended in the
%
literature.
%
%
MATCHES = DOTPLOT(...) returns the number of dots in the dotplot
%
matrix.
%
%
Example:
%
moufflon = getgenbank('AB060288','sequence',true);
%
takin = getgenbank('AB060290','sequence',true);
%
dotplot(moufflon,takin,11,7)
%
%
This shows the similarities between prion protein (PrP) nucleotide
%
sequences of two ruminants, the moufflon and the golden takin.
%
%
See also NWALIGN, SWALIGN.
Sequence properties
• Amino acid composition
– histc function
• Molecular weight
– Indexing and sum function
• Hydrophobicity
Molecular weights
A: 89.000
R: 174.000
N: 132.000
D: 133.000
D: 121.000
Q: 146.000
E: 147.000
G: 75.000
H: 155.000
I: 131.000
L: 131.000
K: 146.000
M: 149.000
F: 165.000
P: 115.000
S: 105.000
T: 119.000
W: 204.000
Y: 181.000
V: 117.000
http://cn.expasy.org/tools/pscale/Molecularweight.html
mw = [89.0900
0
121.1500
133.1000
147.1300
165.1900
75.0700
155.1600
131.1700
0
146.1900
131.1700
149.2100
132.1200
0
115.1300
146.1500
174.2000
105.0900
119.1200
0
117.1500
204.2300
0
181.1900];
seq = ‘MATLAPEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSP’;
seqmw = mw(seq-’A’+1);
plot(seqmw)
proteinplot
Assignments
1. Create a hydrophobicity plot
You can get the amino acid values from
http://cn.expasy.org/cgi-bin/protscale.pl
Use Kyte & Doolittle’s values.
Create a function that has two inputs, the
sequence and the window size. The function
will create a hydrophobicity plot. The help
for the function is on the next slide…
function hydrophobic(sequence, window_length)
% HYDROPHOBIC plots the hydrophobicity of an amino acid sequence
%
HYDROPHOBIC(SEQUENCE,WINDOW_LENGTH) creates a hydrophobicity plot of
%
SEQUENCE using a smoothing window of length, WINDOW_LENGTH.
%
%
SEQUENCE must be a valid amino acid sequence. If SEQUENCE contain any
%
symbols other than the standard 20 amino acid letters, the function
%
will give an error message. SEQUENCE can be either upper or lower case.
%
%
%
WINDOW_LENGTH must be an odd positive integer.
Assignments
2. Modify the function to return the maximum
and minimum hydrophobicity values in the
plot.
Make appropriate changes to the help for
the function.
Advanced example
• Alignment significance
– Alignment algorithms such as Smith-Waterman
and Needleman-Wunsch always find some
alignment. How do we know if what they find
is significant or simply random?
					 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            