* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download lecture08_12
Transcriptional regulation wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Biochemistry wikipedia , lookup
Paracrine signalling wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Signal transduction wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Point mutation wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Expression vector wikipedia , lookup
Magnesium transporter wikipedia , lookup
Gene expression wikipedia , lookup
Metalloprotein wikipedia , lookup
Homology modeling wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Protein structure prediction wikipedia , lookup
Interactome wikipedia , lookup
Protein purification wikipedia , lookup
Western blot wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Predicting Protein Function DNA RNA protein Biochemical function (molecular function) What does it do? Kinase??? Ligase??? Page 245 Function based on ligand binding specificity What (who) does it bind ?? Page 245 Function based on biological process What is it good for ?? Amino acid metabolism? Page 245 Function based on cellular location DNA RNA Where is it active?? Nucleolus ?? Cytoplasm?? Page 245 Function based on cellular location DNA RNA Where is the Protein Expressed ?? Brain? Testis? Where it is under expressed?? Page 245 GO (gene ontology) http://www.geneontology.org/ • The GO project is aimed to develop three structured, controlled vocabularies (ontologies) that describe gene products in terms of their associated • molecular functions (F) • biological processes (P) • cellular components (C) Ontology is a description of the concepts and relationships that can exist for an agent or a community of agents Inferring protein function Bioinformatics approach • Based on homology • Based on functional characteristics “protein signature” Homologous proteins Rule of thumb: Proteins are homologous if 25% identical (length >100) Homologous proteins Proteins with a common evolutionary origin Orthologs - Proteins from different species that evolved by speciation. Hemoglobin human vs Hemoglobin mouse Paralogs - Proteins encoded within a given species that arose from one or more gene duplication events. Hemoglobin human vs Myoglobin human COGs Clusters of Orthologous Groups of proteins > Each COG consists of individual orthologous proteins or orthologous sets of paralogs. > Orthologs typically have the same function, allowing transfer of functional information from one member to an entire COG. Refence: Classification of conserved genes according to their homologous relationships. (Koonin et al., NAR) DATABASE Inferring protein function based on the protein signature The Protein Signature Expression Pattern Where it is expressed ? Motif (or fingerprint): • a short, conserved region of a protein • typically 10 to 20 contiguous amino acid residues Domain: • A region of a protein that can adopt a 3 dimensional structure Protein Motifs Protein motifs can be represented as a consensus or a profile ecblc vc hsrbp 1 50 MRLLPLVAAA TAAFLVVACS SPTPPRGVTV VNNFDAKRYL GTWYEIARFD MRAIFLILCS V...LLNGCL G..MPESVKP VSDFELNNYL GKWYEVARLD ~~~MKWVWAL LLLAAWAAAE RDCRVSSFRV KENFDKARFS GTWYAMAKKD GXW[YF][EA][IVLM] GTWYEI K AV M Searching for Protein Motifs - ProSite a database of protein patterns that can be searched by either regular expression patterns or sequence profiles. - PHI BLAST Searching a specific protein sequence pattern with local alignments surrounding the match. -MEME searching for a common motifs in unaligned sequences Protein Domains • Domains can be considered as building blocks of proteins. • Some domains can be found in many proteins with different functions, while others are only found in proteins with a certain function. DNA Binding domain Zinc-Finger Varieties of protein domains Extending along the length of a protein Occupying a subset of a protein sequence Occurring one or more times Page 228 Example of a protein with 2 domains: Methyl CpG binding protein 2 (MeCP2) MBD TRD The protein includes a Methylated DNA Binding Domain (MBD) and a Transcriptional Repression Domain (TRD). MeCP2 is a transcriptional repressor. Result of an MeCP2 blastp search: A methyl-binding domain shared by several proteins Are proteins that share only a domain homologous? Pfam > Database that contains a large collection of multiple sequence alignments of protein domains Based on Profile hidden Markov Models (HMMs). Profile HMM (Hidden Markov Model) HMM is a probabilistic model of the MSA consisting of a number of interconnected states D16 D17 delete 100% 50% M16 Match insert D 0.8 S 0.2 I16 X M17 50% D19 100% D18 P 0.4 R 0.6 M18 100% T 1.0 16 17 18 19 M19 100% R 0.4 S 0.6 I17 I18 I19 X X X DRTR DRTS S - - S SP TR DR TR DP TS D - - S D - - S D - - S D - - R Pfam > Database that contains a large collection of multiple sequence alignments of protein domains Based on Profile Hidden Markov Models (HMMs). > The Pfam database is based on two distinct classes of alignments –Seed alignments which are deemed to be accurate and used to produce Pfam A -Alignments derived by automatic clustering of SwissProt, which are less reliable and give rise to Pfam B Physical properties of proteins DNA binding domains have relatively high frequency of basic (positive) amino acids GCN4 M K D P A A L K R A R N T E A A R R S S R A R K L Q R M zif268 M E R P Y A C P V E S C D R R F S R S D E L T R H I R I H T myoD S K V N E A F E T L K R C T S S N P N Q R L P K V E I L R N A I R Transmembrane proteins have a unique hydrophobicity pattern Knowledge Based Approach • IDEA Find the common properties of a protein family (or any group of proteins of interest) which are unique to the group and different from all the other proteins. Generate a model for the group and predict new members of the family which have similar properties. Knowledge Based Approach Basic Steps 1. Building a Model • Generate a dataset of proteins with a common function (DNA binding protein) • Generate a control dataset • Calculate the different properties which are characteristic of the protein family you are interested for all the proteins in the data (DNA binding proteins and the non-DNA binding proteins • Represent each protein in a set by a vector of calculated features and build a statistical model to split the groups Basic Steps 2. Predicting the function of a new protein • Calculate the properties for a new protein And represent them in a vector • Predict whether the tested protein belongs to the family TEST CASE Y14 – A protein sequence translated from an ORF (Open Reading Frame) Obtained from the Drosophila complete Genome >Y14 PQRSVGWILFVTSIHEEAQEDEIQEKFCDYGEIKNIHL NLDRRTGFSKGYALVEYETHKQALAAKEALNGAEIM GQTIQVDWCFVKG G >Y14 PQRSVGWILFVTSIHEEAQEDEIQEKFCDYGEIKNI HLNLDRRTGFSKGYALVEYETHKQALAAKEALN GAEIMGQTIQVDWCFVKG G Y14 DOES NOT BIND RNA Projects 2011-12 Instructions for the final project Introduction to Bioinformatics 2011-12 Key dates 19.12 lists of suggested projects published * *You are highly encouraged to choose a project yourself or find a relevant project which can help in your research 29.1 Submission project overview (power point presentation Max 5 slides) -Title -Main question -Major Tools you are planning to use to answer the questions 30.1/31.1 Presentation of project overview 7.3 Poster submission 14.3 Poster presentation 2. Planning your research After you have described the main question or questions of your project, you should carefully plan your next steps A. Make sure you understand the problem and read the necessary background to proceed B. formulate your working plan, step by step C. After you have a plan, start from extracting the necessary data and decide on the relevant tools to use at the first step. When running a tool make sure to summarize the results and extract the relevant information you need to answer your question, it is recommended to save the raw data for your records , don't present raw data in your final written project. Your initial results should guide you towards your next steps. D. When you feel you explored all tools you can apply to answer your question you should summarize and get to conclusions. Remember NO is also an answer as long as you are sure it is NO. Also remember this is a course project not only a HW exercise. . 3. Summarizing final project in a poster (in pairs) Prepare in PPT poster size 90-120 cm Title of the project Names and affiliation of the students presenting The poster should include 5 sections : Background should include description of your question (can add figure) Goal and Research Plan: Describe the main objective and the research plan Results (main section) : Present your results in 3-4 figures, describe each figure (figure legends) and give a title to each result Conclusions : summarized in points the conclusions of your project References : List the references of paper/databases/tools used for your project Examples of posters will be presented in class