Download Bioinformatics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Bioinformatics
Richard Tseng and Ishawar Hosamani
Outline
• Homology modeling (Ishwar)
• Structural analysis
– Structure prediction
– Structure comparisons
• Cluster analysis
– Partitioning method
– Density-based method
• Phylogenetic analaysis
Structural Analysis
• Overview
– Structure prediction
– Structural alignment
– Similarity
• Tools for protein structure prediction
– Protein
• Secondary structure prediction: SSEA
http://protein.cribi.unipd.it/ssea/
• Tertiary structure prediction:
– Wurst: http://www.zbh.uni-hamburg.de/wurst/
– LOOPP: http://cbsuapps.tc.cornell.edu/loopp.aspx
• WURST( Torda et al. (2004) Wurst: A protein threading server with a
structural scoring function, sequence profiles and optimized substitution
matrices Nucleic Acids Res., 32, W532-W535)
• Rationale
– Alignment: Sequence to structure alignments are
done with a Smith-Waterman style alignment and
the Gotoh algorithm
– Score function: fragment-based sequence to
structure compatibility score and a pure
sequence-sequence component substitution score
– Library: Dali PDB90 (24599 srtuctures)
• Tools for structure comparison
– Pair structures comparison:
• TopMatch
• Matras: (http://biunit.naist.jp/matras/)
– Multiple structures comparison:
• 3D-surfer
• Matras: (http://biunit.naist.jp/matras/)
• TopMatch (Sippl & Wiederstein (2008) A note on difficult structure
alignment problems. Bioinformatics 24, 426-427)
– Rationale:
• Structure alignment:
http://www.cgl.ucsf.edu/home/meng/grpmt/structalign.html
• Similarity measurement
S a ,b  La  Lb  Da ,b  2
– Input format
• PDB, SCOP and CATH code
• PDB structure directly
– Exercise: http://topmatch.services.came.sbg.ac.at/
• 3D-surfer (David La et al. 3D-SURFER: software for high throughput
protein surface comparison and analysis. Bioinformatics , in press. (2009))
– Rationale
1. Define a surface function
2. Transform the surface function into a 3D Zernike
description function
– Input format
•
•
Z nlm r ,  ,    Rnl r Yl m  ,  
PDB and CATH code
PDB structure directly
– Exercise: http://dragon.bio.purdue.edu/3d-surfer/
Cluster analysis
• Goal:
– Grouping the data into classes or clusters, so that
objects within a cluster have high similarity in
comparison to one another but are very dissimilar
to objects in other clusters.
• Methods
– Partitioning method: k-means
– Density-based method: Ordering Points to
Identify the Clustering Structure (OPTICS)
• k-means
– Rationale: Partition n observations into k clusters
in which each observation belongs to the cluster
with the nearest mean
k
E    p  mi
– Exercise
2
i 1 pCi
http://cgm.cs.ntust.edu.tw/etrex/kMeansClustering/kMeansClustering2
.html
• OPTICS
– Rationle: Partition
observations based
on the density of
similar objects
– Exercise
http://www.dbs.informatik.unimuenchen.de/Forschung/KDD/Clustering/OPTICS/Demo/
• Example: Folding of Trp-cage peptide
Phylogenetic analysis
• Overviews
– Comparisons of more than two sequences
– Analysis of gene families, including functional
predictions
– Estimation of evolutionary relationships among
organisms
• Theoretical tree
– Parsimony method
– Distance matrix method
– Maximum likelihood and Bayesian method
– Invariants method
• Software
– Collections of tools
http://evolution.genetics.washington.edu/phylip/software.html
– A web server version for tree construction and display
• PHYLIP, http://bioweb2.pasteur.fr/phylogeny/intro-en.html
• Interactive tree of life, http://itol.embl.de/
– Mostly common used stand alone software
• PHYLIP, tool for evaluating similarity of nucleotide and amino
acid sequences.
http://evolution.gs.washington.edu/phylip.html
• TreeView, tool for visualization and manipulation of family
tree.
http://taxonomy.zoology.gla.ac.uk/rod/treeview.html
• Matlab - bioinformatics tool box
• Example: Alignment phylogenetic tree of
Tubulin family
– Searching homologous sequences of Tubulin (PDB
code: 1JFF) from RCSB protein databank
• Blast for pair sequence alignment
• Clustalw for comparative sequence alignment
– Evaluating protein distance matrix
• using “Protdist” of PHYILIP (Particularly, Point Accepted
Mutation (PAM) matrix is used)
– Clustering proteins using “Neighbor” of PHYILIP
(Neightboring-Joint method is considered)
• Example: n-distance phylogenetic tree
– Evaluating n-distance matrix
• n-distance method
– Clustering proteins using “Neighbor” of PHYILIP
(Neightboring-Joint method is considered)
• 16S and 18S Ribosomal RNA sequenecs of 35 organisms
Summary
• Homology modeling
• Tools for structure prediction and comparisons
• Tools for phylogenetic tree construction
Thanks for your attention!!
•Protein distance matrix
1Z5V_A
3CB2_A
1JFF_B
1FFX_B
1TUB_B
1Z2B_B
1Z5V_A
0
0.000010
1.349411
1.349411
1.303115
1.345634
3CB2_A
0.000010
0
1.350506
1.350506
1.303115
1.346730
1JFF_B
1.349411
1.350506
0
0.000010
0.000010
0.010729
1FFX_B
1.349411
1.350506
0.000010
0
0.000010
0.010729
1TUB_B
1.303115
1.303115
0.000010
0.000010
0
0.006725
1Z2B_B
1.345634
1.346730
0.010729
0.010729
0.006725
0
•Tubulin family tree
• n-distance method
– Frequency count of “n-letter words” p  f / N
MREIVHIQAGQCGNQIGAKFWEVISDEHGIDPTGSYHGDSDLQLERINVYYNE
– n-dsiatnce matrix
D n , '   p  p '

– Advantage:
1. Identify fully conservative words located at nearly the
same sites
2. Effecient