Download MCSIS - Radboud Universiteit

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

RNA-Seq wikipedia , lookup

Genetic code wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Point mutation wikipedia , lookup

Genomics wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Multiple sequence alignment wikipedia , lookup

Metagenomics wikipedia , lookup

Sequence alignment wikipedia , lookup

Transcript
What can (many) sequences tell us?
Nuclear receptor function
Nuclear receptor family
NR2A2-HN4G
NR2B3-RRXG
NR2A5-HN4 d?
NR2B1-RRXA
NR2B2-RRXB
NR3C1-GCR
NR2A1-HNF4
NR3C4-ANDR
NR2C2-TR4
NR2C1-TR2-11
NR2E1-TLX
NR0B1-DAX1
NR0B2-SHP
NR2E3-PNR
NR3A1-ESTR
NR3C2-MCR
NR3A2-ERBT
NR3B1-ERR1
NR6A1-GCNF
NR2F6-EAR2
NR3B2-ERR2
NR5A1-SF1
NR5A2-FTF
NR2F2-ARP1
NR2F1-COTF
NR3C3-PRGR
NR4A1-NGFI
NR4A3-NOR1
NR1C1-PPAR
NR4A2-NOT
NR1C2-PPAS
NR1H4-FAR
NR1C3-PPAT
NR1H3-LXR
NR1D1-EAR1
NR1D2-BD73
NR1I1-VDR
NR1F3-RORG
NR1A2-THB1
NR1F1-ROR1
NR1I2-PXR
NR1A1-THA1
NR1F2-RORB
NR1B3-RRG1
NR1B2-RRB2 NR1B1-RRA1
NR1I4-CAR1-MOUSE-
NR1H2-NER
NR1I3-MB67
Nuclear receptor structure
A-B
AF-1
C
C D
DNA
E
LBD
DNA binding domain
– highly conserved
– > 90% similarity
E
Ligand binding domain
– conserved protein fold
– > 20% sequence similarity
F
The questions
How do ligands relate to activity?
What is the role of each amino acid in the NR LBD?
Which data handling / bioinformatics is needed to answer
these questions?
3D structure LBD
(hER)
Available NR data
56 structures in (PDB) (>200 now*)
>500 sequences (scattered) (>1500 now)
>1000 mutations (very scattered)
>10000 ligand-binding studies (secret)
Disease patterns, expression, >1000 SNPs, genetic
localization, etc., etc., etc.
This data must be integrated, sorted, combined,
validated, understood, and used to answer our questions.
Now was in 2007…
Step 1
The first important step is a common numbering
scheme because all structures have different
numbering schemes, and there are insertions and
deletions between species that are confusing any
numbering.
Whoever solves that problem once and for all should
get three Nobel prices.
Large data volumes
Large data volumes allow us to develop new data
analysis techniques.
Entropy-variability analysis is a novel technique to look
at very large multiple sequence alignments.
Entropy-variability analysis requires ‘better’ alignments
than routinely are obtained with ‘standard’ multiple
sequence alignment programs.
Part of the big alignment
We see correlations between columns and between ‘things’.
Vriend’s first rule of sequence analysis
If it is conserved,
it is important
Vriend’s second rule of sequence analysis
If it is very conserved,
it is very important
Consequence:
If something is conserved in each sub-family,
it is involved in a sub-family specific function.
What is CMA?
Functions never is just one residue
QWERTYASDFGRGH
QWERTYASDTHRPM
QWERTNMKDFGRKC
QWERTNMKDTHRVW
Red
= conserved
Green = variable
Blue = correlated
Example: (chymo)trypsin
Correlations
Residues can correlate with residues, and when
that happens we found a function, no matter the
conservation or variability.
Residues that have a function, correlate with that
function.
Correlations with wavelength
Residues can also correlate with something else.
Example: optimal wavelength for opsin excitation.
Wavelength
UV
Blue
Red/Green
Loop1
Gln
Asn
Leu
Loop2
His
Gln
Gln
Correlations
with drug binding
Wilma
(so no longer evolution-based…)
Wilma Kuipers Thesis
Correlation analysis
• Correlate sequences with ligand binding affinities
• Alignments showed 100% correlation of affinity for
pindolol and the absence/presence of Asn386
Receptor
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
...
Affinity
+
+
+
+
-
-
-
-
-
-
-
-
-
-
-
+
+
+
-
-
-
-
-
-
...
res. 386
N
N
N
N
T
T
T
T
A
A
A
V
V
L
L
N
N
N
Y
Y
Y
Y
T
T
...
1=
5HT-1a
2=
5HT-1b
3=
5HT-1d
....
....
• Obviously, Asn386 plays an important role in ligand binding
Wilma
Wilma Kuipers Thesis
Wilma
Summary correlation
If its conserved its important; if its important it remains conserved.
If residue positions show correlation with ‘something’ it is
involved in that ‘something’.
‘Something’ can be any of a very large number of functions.
Wilma Kuipers Thesis
Wilma
Example correlation: Which cysteines form a
pair in this protein family? Shown are aligned
peptides from five different bacteria.
ASDFGCHIKLMCNPQRSCTVW
YSDYGCNIKLFCQPQRSCT-ATDYPVQIKLMCNPQKSCSMW
YTDFGCHVKLLVQPNRSVTVW
-TDFGVHVKLMCNPQKSCSFW
Wilma Kuipers Thesis
ConservedWilma
or very conserved? Recalcitrant.
ASDFGCHIKLMCNPQRSCTVW
YSDYGCNIKLFCNPQRSCT-ATDLPVQIKLMANPQKSCSVW
LSDFGCHIKLMCNPQRSCTVW
YTDFGCHVKLLVQPNRSVAFW
-SDAGVHVKLMVQPNKSVSFYTDFGCHVKLLVQPNRSVVFW
-TDSGVHVKLMIQPNKSVSFW
Conclusion from recalcitrance
The more exceptions you find in other
(homologous) families,
the less important is the residue in your family.
Entropy and variability
So far we saw that conservation and correlation
can help us find functionally important residues.
Can variability patterns also tell us something?
Entropy
Sequence entropy Ei at position i is calculated
from the frequency pi of the twenty amino acid
types (p) at position i:
20
Ei =
S
i=1
pi ln(pi)
Variability
Sequence variability Vi is the number of
amino acid types observed at position i in
more than 0.5% of all sequences.
Intermezzo
It is a common concept in bioinformatics to
create an hypothesis.
But……, every hypothesis must be tested
against real data from real experiments.
Ras Entropy-Variability
11 Red
12 Orange
22 Yellow
23 Green
33 Blue
GPCR Entropy-Variability; signalling path
GPCR
11 G protein
12 Support
22 Signaling
23 Ligand in
33 Ligand out
NR LBD Entropy-Variability
11 main function
2.8
12 first shell around
main function
2.4
22 core residues
(signal transduction)
2.0
23 modulator
E
N 1.6
T
R
O
1.2
P
Y
33
23
33 mainly surface
0.8
22
12
0.4
11
0.0
0
2
4
6
8
10
VARIABILITY
12
14
16
18
Example: role of Asp 351
EV ánd correlation. But the correlation would
never have been found from sequence analyses.
agonist
antagonist
Summary variability analysis
Variability patterns hold information.
Entropy and Variability are two (of the) ways to measure
variability patterns.
Entropy and Variability patterns can say something
about the type of function, and thus add detail to
correlation studies.
Conclusions:
Data is difficult, but we need it (sic); life would be so
nice if we could do without it. PDB files are the worst.
Nomenclature is not homogeneous. Ontologies….
Much data has been carefully hidden in the literature,
where it can only be found back with great difficulty.
Residue numbering is difficult but very necessary.
Variability-entropy analysis is powerful, but requires
very 'good' alignments.
A short break for a word from our sponsors
Laerte
Oliveira
Wilma Kuipers
Weesp
Bob Bywater
Copenhagen
Nora vd Wenden
The Hague
Mike Singer
New Haven
Ad IJzerman
Leiden
Margot Beukers
Leiden
Fabien Campagne
New York
Øyvind Edvardsen
TromsØ
Simon Folkertsma
Frisia
Henk-Jan Joosten
Wageningen
Joost van Durma
Brussels
David Lutje Hulsik
Utrecht
Tim Hulsen
Goffert
Manu Bettler
Lyon
Adje
F
L
O
R
E
N
C
E
H
O
R
N
Margot
Our industrial
sponsor:
David
Elmar
Tim
Fabien
Manu
Simon Folkertsma
Krieger