* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download introduction
Survey
Document related concepts
Community fingerprinting wikipedia , lookup
Non-coding DNA wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Magnesium transporter wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Metalloprotein wikipedia , lookup
Expression vector wikipedia , lookup
Gene expression wikipedia , lookup
Western blot wikipedia , lookup
Interactome wikipedia , lookup
Protein purification wikipedia , lookup
Point mutation wikipedia , lookup
Proteolysis wikipedia , lookup
Structural alignment wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Transcript
(52925 Proteiinianalyysin työt) Bioinformatiikan syventävä harjoitustyökurssi HOW – hands-on workshop on protein analysis Liisa Holm Instructors • Patrik Koskinen • Petri Törönen Course web page http://ekhidna.biocenter.helsinki.fi/how – Schedule – Talks – Exercises – Course assignments – Instructions for computer use Mode of work • Course assignments – Two sequences assigned to each team • Sessions (12-16) – Demonstrations (~ 1 hour) – Practical exercises • Structured questions • You should first try yourself, then ask team mate, then ask instructor • Discuss results with team mate – Try out tools on your assigned sequences during the course • Second-last session reserved solely for working on course assignments • Presentations – Course grade based on presentation (2 March) • Two sequence assignments per team Objectives • Infer function and/or structure starting from the amino acid sequence of a query protein – Identify related sequences, place in family – Identify conserved positions in sequence and structure • Learn to use representative web-based tools • No programming, no Unix/Linux Introduction • Most cellular functions are performed or facilitated by proteins. – Primary biocatalyst – Cofactor transport/storage – Mechanical motion/support – Immune protection – Control of growth/differentiation Linear DNA Watson & Crick (1953) 3D structure Myoglobin Kendrew & Perutz (1957) 1mbn Function = S interactions Evolution Sequence – Structure - Function DNA sequence Protein sequence Natural selection Protein function Protein structure What can sequence analysis do? • Homology – Inference of inherited complex features: what is conserved is important – Most powerful approach – Good tertiary structure prediction • Diagnostic patterns – E.g. subcellular localization signals • Physical preferences – Good secondary structure prediction – Prediction of transmembrane segments – Poor ab initio tertiary structure prediction Application: Finding Homologs Application: Finding Homologues • Find Similar Ones in Different Organisms • Human vs. Mouse vs. Yeast – Easier to do Expts. on latter! (Section from NCBI Disease Genes Database Reproduced Below.) Best Sequence Similarity Matches to Date Between Positionally Cloned Human Genes and S. cerevisiae Proteins Human Disease MIM # Human Gene GenBank BLASTX Acc# for P-value Human cDNA Yeast Gene GenBank Yeast Gene Acc# for Description Yeast cDNA Hereditary Non-polyposis Colon Cancer Hereditary Non-polyposis Colon Cancer Cystic Fibrosis Wilson Disease Glycerol Kinase Deficiency Bloom Syndrome Adrenoleukodystrophy, X-linked Ataxia Telangiectasia Amyotrophic Lateral Sclerosis Myotonic Dystrophy Lowe Syndrome Neurofibromatosis, Type 1 120436 120436 219700 277900 307030 210900 300100 208900 105400 160900 309000 162200 MSH2 MLH1 CFTR WND GK BLM ALD ATM SOD1 DM OCRL NF1 U03911 U07418 M28668 U11700 L13943 U39817 Z21876 U26455 K00065 L19268 M88162 M89914 9.2e-261 6.3e-196 1.3e-167 5.9e-161 1.8e-129 2.6e-119 3.4e-107 2.8e-90 2.0e-58 5.4e-53 1.2e-47 2.0e-46 MSH2 MLH1 YCF1 CCC2 GUT1 SGS1 PXA1 TEL1 SOD1 YPK1 YIL002C IRA2 M84170 U07187 L35237 L36317 X69049 U22341 U17065 U31331 J03279 M21307 Z47047 M33779 DNA repair protein DNA repair protein Metal resistance protein Probable copper transporter Glycerol kinase Helicase Peroxisomal ABC transporter PI3 kinase Superoxide dismutase Serine/threonine protein kinase Putative IPP-5-phosphatase Inhibitory regulator protein Choroideremia Diastrophic Dysplasia Lissencephaly Thomsen Disease Wilms Tumor Achondroplasia Menkes Syndrome 303100 222600 247200 160800 194070 100800 309400 CHM DTD LIS1 CLC1 WT1 FGFR3 MNK X78121 U14528 L13385 Z25884 X51630 M58051 X69208 2.1e-42 7.2e-38 1.7e-34 7.9e-31 1.1e-20 2.0e-18 2.1e-17 GDI1 SUL1 MET30 GEF1 FZF1 IPL1 CCC2 S69371 X82013 L26505 Z23117 X67787 U07163 L36317 GDP dissociation inhibitor Sulfate permease Methionine metabolism Voltage-gated chloride channel Sulphite resistance protein Serine/threoinine protein kinase Probable copper transporter What you will learn • Multiple alignment – Used as input to many prediction tools – Improves sequence-structure alignment – Identify functional sites • Protein structure – Visualisation – Comparative modelling • Using phylogeny in function assignment – Family classifications Query = Protein sequence Sequence similarity to other proteins? Yes: does similarity imply homology? Yes: place query in family tree Known function(s) in family? Yes Transfer function Verify conservation of functional motifs No Motif search Use other data Known structure in family? Yes Comparative modelling Validate motifs against 3D model No Secondary structure prediction No: use single sequence methods No: single sequence methods Motif search Secondary structure prediction Use other data Flowchart Course assignments • Goal: using the flowchart, what can you say, with what confidence, about the structure and function of the protein? • Max length of presentation is 25 minutes. No need to dwell on negative results. • More detailed guidelines given in Session 7. Teams Team n, n=1,…,12, works on both sequence_nA and sequence_nB • A and B sequences have been selected to present different challenges. The team members should work together on both sequences, discussing the findings between them and making notes for the final report (presentation). Sequences are here: http://ekhidna.biocenter.helsinki.fi/how/proteinlist .fasta