Download introduction

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene wikipedia , lookup

Community fingerprinting wikipedia , lookup

Non-coding DNA wikipedia , lookup

Protein wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Magnesium transporter wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Metalloprotein wikipedia , lookup

Expression vector wikipedia , lookup

Gene expression wikipedia , lookup

Western blot wikipedia , lookup

Interactome wikipedia , lookup

Protein purification wikipedia , lookup

Point mutation wikipedia , lookup

Proteolysis wikipedia , lookup

Structural alignment wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Transcript
(52925 Proteiinianalyysin työt)
Bioinformatiikan syventävä
harjoitustyökurssi
HOW – hands-on workshop on
protein analysis
Liisa Holm
Instructors
• Patrik Koskinen
• Petri Törönen
Course web page
http://ekhidna.biocenter.helsinki.fi/how
– Schedule
– Talks
– Exercises
– Course assignments
– Instructions for computer use
Mode of work
• Course assignments
– Two sequences assigned to each team
• Sessions (12-16)
– Demonstrations (~ 1 hour)
– Practical exercises
• Structured questions
• You should first try yourself, then ask team mate, then ask instructor
• Discuss results with team mate
– Try out tools on your assigned sequences during the course
• Second-last session reserved solely for working on course
assignments
• Presentations
– Course grade based on presentation (2 March)
• Two sequence assignments per team
Objectives
• Infer function and/or structure starting from
the amino acid sequence of a query
protein
– Identify related sequences, place in family
– Identify conserved positions in sequence and
structure
• Learn to use representative web-based
tools
• No programming, no Unix/Linux
Introduction
• Most cellular functions are performed or
facilitated by proteins.
– Primary biocatalyst
– Cofactor transport/storage
– Mechanical motion/support
– Immune protection
– Control of growth/differentiation
Linear DNA
Watson & Crick (1953)
3D structure
Myoglobin
Kendrew & Perutz (1957)
1mbn
Function = S interactions
Evolution
Sequence – Structure - Function
DNA sequence
Protein sequence
Natural selection
Protein function
Protein structure
What can sequence analysis do?
• Homology
– Inference of inherited complex features: what is
conserved is important
– Most powerful approach
– Good tertiary structure prediction
• Diagnostic patterns
– E.g. subcellular localization signals
• Physical preferences
– Good secondary structure prediction
– Prediction of transmembrane segments
– Poor ab initio tertiary structure prediction
Application: Finding Homologs
Application:
Finding Homologues
• Find Similar Ones in Different Organisms
• Human vs. Mouse vs. Yeast
– Easier to do Expts. on latter!
(Section from NCBI Disease Genes Database Reproduced Below.)
Best Sequence Similarity Matches to Date Between Positionally Cloned
Human Genes and S. cerevisiae Proteins
Human Disease
MIM #
Human
Gene
GenBank
BLASTX
Acc# for
P-value
Human cDNA
Yeast
Gene
GenBank
Yeast Gene
Acc# for
Description
Yeast cDNA
Hereditary Non-polyposis Colon Cancer
Hereditary Non-polyposis Colon Cancer
Cystic Fibrosis
Wilson Disease
Glycerol Kinase Deficiency
Bloom Syndrome
Adrenoleukodystrophy, X-linked
Ataxia Telangiectasia
Amyotrophic Lateral Sclerosis
Myotonic Dystrophy
Lowe Syndrome
Neurofibromatosis, Type 1
120436
120436
219700
277900
307030
210900
300100
208900
105400
160900
309000
162200
MSH2
MLH1
CFTR
WND
GK
BLM
ALD
ATM
SOD1
DM
OCRL
NF1
U03911
U07418
M28668
U11700
L13943
U39817
Z21876
U26455
K00065
L19268
M88162
M89914
9.2e-261
6.3e-196
1.3e-167
5.9e-161
1.8e-129
2.6e-119
3.4e-107
2.8e-90
2.0e-58
5.4e-53
1.2e-47
2.0e-46
MSH2
MLH1
YCF1
CCC2
GUT1
SGS1
PXA1
TEL1
SOD1
YPK1
YIL002C
IRA2
M84170
U07187
L35237
L36317
X69049
U22341
U17065
U31331
J03279
M21307
Z47047
M33779
DNA repair protein
DNA repair protein
Metal resistance protein
Probable copper transporter
Glycerol kinase
Helicase
Peroxisomal ABC transporter
PI3 kinase
Superoxide dismutase
Serine/threonine protein kinase
Putative IPP-5-phosphatase
Inhibitory regulator protein
Choroideremia
Diastrophic Dysplasia
Lissencephaly
Thomsen Disease
Wilms Tumor
Achondroplasia
Menkes Syndrome
303100
222600
247200
160800
194070
100800
309400
CHM
DTD
LIS1
CLC1
WT1
FGFR3
MNK
X78121
U14528
L13385
Z25884
X51630
M58051
X69208
2.1e-42
7.2e-38
1.7e-34
7.9e-31
1.1e-20
2.0e-18
2.1e-17
GDI1
SUL1
MET30
GEF1
FZF1
IPL1
CCC2
S69371
X82013
L26505
Z23117
X67787
U07163
L36317
GDP dissociation inhibitor
Sulfate permease
Methionine metabolism
Voltage-gated chloride channel
Sulphite resistance protein
Serine/threoinine protein kinase
Probable copper transporter
What you will learn
• Multiple alignment
– Used as input to many prediction tools
– Improves sequence-structure alignment
– Identify functional sites
• Protein structure
– Visualisation
– Comparative modelling
• Using phylogeny in function assignment
– Family classifications
Query = Protein sequence
Sequence similarity to other proteins?
Yes: does similarity imply homology?
Yes: place query in family tree
Known function(s) in family?
Yes
Transfer function
Verify conservation of functional motifs
No
Motif search
Use other data
Known structure in family?
Yes
Comparative modelling
Validate motifs against 3D model
No
Secondary structure prediction
No: use single sequence methods
No: single sequence methods
Motif search
Secondary structure prediction
Use other data
Flowchart
Course assignments
• Goal: using the flowchart, what can you
say, with what confidence, about the
structure and function of the protein?
• Max length of presentation is 25 minutes.
No need to dwell on negative results.
• More detailed guidelines given in Session
7.
Teams
Team n, n=1,…,12, works on both sequence_nA
and sequence_nB
•
A and B sequences have been selected to
present different challenges. The team members
should work together on both sequences,
discussing the findings between them and
making notes for the final report (presentation).
Sequences are here:
http://ekhidna.biocenter.helsinki.fi/how/proteinlist
.fasta