* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Proteiinien merkitys - Helsingin yliopisto
Endogenous retrovirus wikipedia , lookup
Community fingerprinting wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Magnesium transporter wikipedia , lookup
Metalloprotein wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Expression vector wikipedia , lookup
Western blot wikipedia , lookup
Gene expression wikipedia , lookup
Protein purification wikipedia , lookup
Point mutation wikipedia , lookup
Interactome wikipedia , lookup
Proteolysis wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Protein–protein interaction wikipedia , lookup
52925 Proteiinianalyysin työt HOW – hands-on workshop on protein analysis Liisa Holm Instructors • • • • • • Patrik Koskinen Samuli Eldfors Xuan Hung Ta Martin Heger Petri Törönen Jussi Nokso-Koivisto Course web page http://ekhidna.biocenter.helsinki.fi/how – Schedule – Talks – Exercises – Course assignments – Instructions for computer use Topics Week I • Monday – Introduction – Pairwise alignments • Tuesday – Manual editing of sequence alignment • Wednesday – Secondary structure prediction • Thursday – Structure visualisation • Friday – Comparative modelling Week II • Monday – Phylogenomics • Tuesday – Sequence classifications • Wednesday – Protein-protein interactions • Thursday – Structure classifications • Friday – Review day Week III – Work on course assignments Course organization • 1st and 2nd week (Structured) – Demonstrations (12 -…) – Practical exercises (…-17) • 3rd week (Self-organized) – Course assignment • Instructor available two hours daily – Discussion on Tuesday 13-15 • Written report due 8th December Mode of work • Demonstrations • Practical exercises – Structured questions – You should first try yourself, then ask team mate, then ask instructor – Discuss results with team mate • Course assignments – Written reports, due 8 December – Two sequence assignments per team – Course grade based on report Objectives • Infer function and/or structure starting from the amino acid sequence of a query protein – Identify related sequences, place in family – Identify conserved positions in sequence and structure • Learn to use representative web-based tools • No programming, no Unix/Linux Introduction • Most cellular functions are performed or facilitated by proteins. – Primary biocatalyst – Cofactor transport/storage – Mechanical motion/support – Immune protection – Control of growth/differentiation Linear DNA Watson & Crick (1953) 3D structure Myoglobin Kendrew & Perutz (1957) 1mbn Function = S interactions Evolution Sequence – Structure - Function DNA sequence Protein sequence Natural selection Protein function Protein structure What can sequence analysis do? • Homology – Inference of inherited complex features: what is conserved is important – Most powerful approach – Good tertiary structure prediction • Diagnostic patterns – E.g. subcellular localization signals • Physical preferences – Good secondary structure prediction – Prediction of transmembrane segments – Poor ab initio tertiary structure prediction Application: Finding Homologs Application: Finding Homologues • Find Similar Ones in Different Organisms • Human vs. Mouse vs. Yeast – Easier to do Expts. on latter! (Section from NCBI Disease Genes Database Reproduced Below.) Best Sequence Similarity Matches to Date Between Positionally Cloned Human Genes and S. cerevisiae Proteins Human Disease MIM # Human Gene GenBank BLASTX Acc# for P-value Human cDNA Yeast Gene GenBank Yeast Gene Acc# for Description Yeast cDNA Hereditary Non-polyposis Colon Cancer Hereditary Non-polyposis Colon Cancer Cystic Fibrosis Wilson Disease Glycerol Kinase Deficiency Bloom Syndrome Adrenoleukodystrophy, X-linked Ataxia Telangiectasia Amyotrophic Lateral Sclerosis Myotonic Dystrophy Lowe Syndrome Neurofibromatosis, Type 1 120436 120436 219700 277900 307030 210900 300100 208900 105400 160900 309000 162200 MSH2 MLH1 CFTR WND GK BLM ALD ATM SOD1 DM OCRL NF1 U03911 U07418 M28668 U11700 L13943 U39817 Z21876 U26455 K00065 L19268 M88162 M89914 9.2e-261 6.3e-196 1.3e-167 5.9e-161 1.8e-129 2.6e-119 3.4e-107 2.8e-90 2.0e-58 5.4e-53 1.2e-47 2.0e-46 MSH2 MLH1 YCF1 CCC2 GUT1 SGS1 PXA1 TEL1 SOD1 YPK1 YIL002C IRA2 M84170 U07187 L35237 L36317 X69049 U22341 U17065 U31331 J03279 M21307 Z47047 M33779 DNA repair protein DNA repair protein Metal resistance protein Probable copper transporter Glycerol kinase Helicase Peroxisomal ABC transporter PI3 kinase Superoxide dismutase Serine/threonine protein kinase Putative IPP-5-phosphatase Inhibitory regulator protein Choroideremia Diastrophic Dysplasia Lissencephaly Thomsen Disease Wilms Tumor Achondroplasia Menkes Syndrome 303100 222600 247200 160800 194070 100800 309400 CHM DTD LIS1 CLC1 WT1 FGFR3 MNK X78121 U14528 L13385 Z25884 X51630 M58051 X69208 2.1e-42 7.2e-38 1.7e-34 7.9e-31 1.1e-20 2.0e-18 2.1e-17 GDI1 SUL1 MET30 GEF1 FZF1 IPL1 CCC2 S69371 X82013 L26505 Z23117 X67787 U07163 L36317 GDP dissociation inhibitor Sulfate permease Methionine metabolism Voltage-gated chloride channel Sulphite resistance protein Serine/threoinine protein kinase Probable copper transporter What you will learn • Multiple alignment – Used as input to many prediction tools – Improves sequence-structure alignment – Identify functional sites • Protein structure – Visualisation – Comparative modelling • Using phylogeny in function assignment – Family classifications Query = Protein sequence Sequence similarity to other proteins? Yes: does similarity imply homology? Yes: place query in family tree Known function(s) in family? Yes Transfer function Verify conservation of functional motifs No Motif search Use other data Known structure in family? Yes Comparative modelling Validate motifs against 3D model No Secondary structure prediction No: use single sequence methods No: single sequence methods Motif search Secondary structure prediction Use other data Flowchart Course assignments • Goal: using the flowchart, what can you say, with what confidence, about the structure and function of the protein? • Max length of report ~10 pages. No need to show negative results. • More detailed guidelines given on Day 10. Teams Team n, n=1,…,12, works on both sequence_nA and sequence_nB • A and B sequences have been selected to present different challenges, therefore it is strongly recommended that the team members work together on both sequences Sequences are here: http://ekhidna.biocenter.helsinki.fi/how/proteinlist .fasta Preparing sequence reports • Week 3 is reserved for preparing the reports. – Experience has shown that students progress at different speeds. Fast students may try the tools out on their sequence assignments during weeks 1-2. – Checkpoint on Tuesday (Day 12) • It is expected that sequence database searches and some downstream analyses have been done by then • The purpose is to summarize progress and discuss strategies forward