Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bioinformatics Richard Tseng and Ishawar Hosamani Outline • Homology modeling (Ishwar) • Structural analysis – Structure prediction – Structure comparisons • Cluster analysis – Partitioning method – Density-based method • Phylogenetic analaysis Structural Analysis • Overview – Structure prediction – Structural alignment – Similarity • Tools for protein structure prediction – Protein • Secondary structure prediction: SSEA http://protein.cribi.unipd.it/ssea/ • Tertiary structure prediction: – Wurst: http://www.zbh.uni-hamburg.de/wurst/ – LOOPP: http://cbsuapps.tc.cornell.edu/loopp.aspx • WURST( Torda et al. (2004) Wurst: A protein threading server with a structural scoring function, sequence profiles and optimized substitution matrices Nucleic Acids Res., 32, W532-W535) • Rationale – Alignment: Sequence to structure alignments are done with a Smith-Waterman style alignment and the Gotoh algorithm – Score function: fragment-based sequence to structure compatibility score and a pure sequence-sequence component substitution score – Library: Dali PDB90 (24599 srtuctures) • Tools for structure comparison – Pair structures comparison: • TopMatch • Matras: (http://biunit.naist.jp/matras/) – Multiple structures comparison: • 3D-surfer • Matras: (http://biunit.naist.jp/matras/) • TopMatch (Sippl & Wiederstein (2008) A note on difficult structure alignment problems. Bioinformatics 24, 426-427) – Rationale: • Structure alignment: http://www.cgl.ucsf.edu/home/meng/grpmt/structalign.html • Similarity measurement S a ,b La Lb Da ,b 2 – Input format • PDB, SCOP and CATH code • PDB structure directly – Exercise: http://topmatch.services.came.sbg.ac.at/ • 3D-surfer (David La et al. 3D-SURFER: software for high throughput protein surface comparison and analysis. Bioinformatics , in press. (2009)) – Rationale 1. Define a surface function 2. Transform the surface function into a 3D Zernike description function – Input format • • Z nlm r , , Rnl r Yl m , PDB and CATH code PDB structure directly – Exercise: http://dragon.bio.purdue.edu/3d-surfer/ Cluster analysis • Goal: – Grouping the data into classes or clusters, so that objects within a cluster have high similarity in comparison to one another but are very dissimilar to objects in other clusters. • Methods – Partitioning method: k-means – Density-based method: Ordering Points to Identify the Clustering Structure (OPTICS) • k-means – Rationale: Partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean k E p mi – Exercise 2 i 1 pCi http://cgm.cs.ntust.edu.tw/etrex/kMeansClustering/kMeansClustering2 .html • OPTICS – Rationle: Partition observations based on the density of similar objects – Exercise http://www.dbs.informatik.unimuenchen.de/Forschung/KDD/Clustering/OPTICS/Demo/ • Example: Folding of Trp-cage peptide Phylogenetic analysis • Overviews – Comparisons of more than two sequences – Analysis of gene families, including functional predictions – Estimation of evolutionary relationships among organisms • Theoretical tree – Parsimony method – Distance matrix method – Maximum likelihood and Bayesian method – Invariants method • Software – Collections of tools http://evolution.genetics.washington.edu/phylip/software.html – A web server version for tree construction and display • PHYLIP, http://bioweb2.pasteur.fr/phylogeny/intro-en.html • Interactive tree of life, http://itol.embl.de/ – Mostly common used stand alone software • PHYLIP, tool for evaluating similarity of nucleotide and amino acid sequences. http://evolution.gs.washington.edu/phylip.html • TreeView, tool for visualization and manipulation of family tree. http://taxonomy.zoology.gla.ac.uk/rod/treeview.html • Matlab - bioinformatics tool box • Example: Alignment phylogenetic tree of Tubulin family – Searching homologous sequences of Tubulin (PDB code: 1JFF) from RCSB protein databank • Blast for pair sequence alignment • Clustalw for comparative sequence alignment – Evaluating protein distance matrix • using “Protdist” of PHYILIP (Particularly, Point Accepted Mutation (PAM) matrix is used) – Clustering proteins using “Neighbor” of PHYILIP (Neightboring-Joint method is considered) • Example: n-distance phylogenetic tree – Evaluating n-distance matrix • n-distance method – Clustering proteins using “Neighbor” of PHYILIP (Neightboring-Joint method is considered) • 16S and 18S Ribosomal RNA sequenecs of 35 organisms Summary • Homology modeling • Tools for structure prediction and comparisons • Tools for phylogenetic tree construction Thanks for your attention!! •Protein distance matrix 1Z5V_A 3CB2_A 1JFF_B 1FFX_B 1TUB_B 1Z2B_B 1Z5V_A 0 0.000010 1.349411 1.349411 1.303115 1.345634 3CB2_A 0.000010 0 1.350506 1.350506 1.303115 1.346730 1JFF_B 1.349411 1.350506 0 0.000010 0.000010 0.010729 1FFX_B 1.349411 1.350506 0.000010 0 0.000010 0.010729 1TUB_B 1.303115 1.303115 0.000010 0.000010 0 0.006725 1Z2B_B 1.345634 1.346730 0.010729 0.010729 0.006725 0 •Tubulin family tree • n-distance method – Frequency count of “n-letter words” p f / N MREIVHIQAGQCGNQIGAKFWEVISDEHGIDPTGSYHGDSDLQLERINVYYNE – n-dsiatnce matrix D n , ' p p ' – Advantage: 1. Identify fully conservative words located at nearly the same sites 2. Effecient