* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lecture 13_summary
Rosetta@home wikipedia , lookup
Circular dichroism wikipedia , lookup
Protein design wikipedia , lookup
List of types of proteins wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Structural alignment wikipedia , lookup
Protein folding wikipedia , lookup
RNA-binding protein wikipedia , lookup
Protein domain wikipedia , lookup
Alpha helix wikipedia , lookup
Protein purification wikipedia , lookup
Protein moonlighting wikipedia , lookup
Western blot wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Intro to Bioinformatics Summary What did we learn • Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq , for global best use MSA tools such as Clustal X, Muscle What did we learn • Multiple alignments (MSA) When? How ? MSA are needed as an input for many different purposes: searching motifs, phylogenetic analysis, protein and RNA structure predictions, conservation of specific nts/residues Tools : Clustal X (for DNA and RNA), MUSCLE (for proteins) Tools for phylogenetic trees: PHYLIP … What did we learn • Search a sequence against a database When? How ? - BLAST :Remember different option for BLAST!!! (blastP blastN…. ), make sure to search the right database!!! DO NOT FORGET –You can change the scoring matrices, gap penalty etc - PSIBLAST Searching for remote homologies - PHIBLAST Searching for a short pattern within a protein What did we learn • Motif search When? How ? - Searching for known motifs in a given promoter (JASPAR) -Searching for overabundance of unknown regulatory motifs in a set of sequences ; e.g promoters of genes which have similar expression pattern (MEME) Tools : MEME, logo, Databases of motifs : JASPAR (Transcription Factors binding sites) PRATT in PROSITE (searching for motifs in protein sequences) What did we learn • Protein Function Prediction When? How ? - Pfam (database to search for protein motifs/domain (PfamA/PfamB) - PROSITE - Protein annotations in UNIPROT (SwissProt/ Tremble) What did we learn • Protein Secondary Structure PredictionWhen? How ? – Helix/Beta/Coil(PHDsec,PSIPRED). – Predicts transmembrane helices (PHDhtm,TMHMM). – Solvent accessibility: important for the prediction of ligand binding sites (PHDacc). What did we learn • Protein Tertiary Structure PredictionWhen? How ? – First we must look at sequence identity to a sequence with a known structure!! – Homology modeling/Threading – MODEBase- database of models Remember : Low quality models can be miss leading !! Tools : SWISS-MODEL ,genTHREADER, MODEBase What did we learn • RNA Structure and Function PredictionWhen? How ? – RNAfold – good for local interactions, several predictions of low energy structures – Alifold – adding information from MSA – RFAM – Specific database and search tools: tRNA, microRNA ….. What did we learn • Gene expression When? How ? – Many database of gene expression GEO … – Clustering analysis EPClust (different clustering methods K-means, Hierarchical Clustering, trasformations row/columns/both…) – GO annotation (analysis of gene clusters..) So How do we start … • Given a hypothetical sequence predict it function…. What should we do??? Example • Amyloids are proteins which tend to aggregate in solution. Abnormal accumulation of amyloid in organs is assumed to play a role in various neurodegenerative diseases. Question : can we predict whether a protein X is an amyolid ? Research Plan 1 Building a Database -Search the protein database (swiss prot) for proteins annotated as amyloids -Select a set of 10-30 proteins which are amyloids related to human diseases 2. Analyzing the unique properties of the family -Use tools learned in class to calculate the protein properties (per each protein) which can be related to amyloidosis (5-10 different sequence/predicted structural features) For example Fact : Amyloids tend to aggregare via beta sheet – Calculate: the percent of secondary structure (H,E, C) Fact : Amyloids tend to aggregate via aromatic residues Calculate : the percent of different amino acids in the proteins …. 3. Summerize the results a and use basic statistical tools to evaluate if there are features Which are characteristic of this group, suggest a model to predict a new protein related to this group of proteins.