Download PowerPoint Slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

X-inactivation wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Copy-number variation wikipedia , lookup

Gene therapy wikipedia , lookup

Essential gene wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Transposable element wikipedia , lookup

Human genome wikipedia , lookup

History of genetic engineering wikipedia , lookup

Public health genomics wikipedia , lookup

Point mutation wikipedia , lookup

Gene nomenclature wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genomics wikipedia , lookup

Genomic imprinting wikipedia , lookup

Minimal genome wikipedia , lookup

Ridge (biology) wikipedia , lookup

Gene desert wikipedia , lookup

Genome evolution wikipedia , lookup

Pathogenomics wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene expression programming wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Metagenomics wikipedia , lookup

Genome editing wikipedia , lookup

Genome (book) wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Gene wikipedia , lookup

Helitron (biology) wikipedia , lookup

Microevolution wikipedia , lookup

RNA-Seq wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression profiling wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
Creation of a functional B cell
receptor/Antibody
Germ line gene organization
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
© 2001 by Garland Publishing
BiC BioCentrum-DTU
Technical University of Denmark
Gene organization
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
© 2001 by Garland Publishing
BiC BioCentrum-DTU
Technical University of Denmark
The 12/23 rule of recombination
{
Only combined 12 RSS to 23 RSS
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
recombination signal sequence (RSS)
Mechanism of gene rearrangement
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
BiC BioCentrum-DTU
Technical University of Denmark
Addition of P and N nucleotides
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
BiC BioCentrum-DTU
Technical University of Denmark
Questions to be addressed
– Violation of 12/23 rule
• Can D genes be inserted backwards?
• Is there a D gene preference?
• Is there a reading frame preference for
D genes?
– If yes, is it part of the gene rearrangement?
• Who is doing the end trimming?
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
• Can multiple D genes be inserted?
P nucleotides
Distance from
heptamer to gene end
No. of
seq
Permutated sequences
No. with
P
Table 1. Manuscript 2.
% with P
No. of
seq
No. with
P
% with P
p-value
VH gene
1
1448
474
32.7
1635
103
6.3
<10-5
2
1027
48
4.7
1068
65
6.1
0.091
3
762
53
7.0
612
36
5.9
0.245
1
324
60
18.5
350
23
6.6
<10-5
2
184
2
1.0
209
3
1.4
0.560
3
219
8
3.7
250
14
5.6
0.220
1
519
128
24.7
619
54
8.7
<10-5
2
343
31
9.0
347
26
7.5
0.275
3
474
25
5.3
454
17
3.7
0.168
1
616
86
14.0
684
58
8.5
0.001
2
266
30
11.3
276
24
8.7
0.195
3
460
5
1.1
485
9
1.9
0.241
JH gene
5’ end of D gene
3’ end of D gene
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
Sequences
How many types of D genes?
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
• Conventional D genes
– Identified in 81% of sequences unmutated
sequences, 64% of mutated sequences
• D genes with irregular RSS (DIR)
• Chromosome 15 OR
• Two D genes
• Inverted D genes
– Long inverted D genes can not be excluded
Inverted (palindrom) D genes
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
BiC BioCentrum-DTU
Technical University of Denmark
D genes with irregular RSS (DIR)
• Contain a family 1 D gene
• Found in 1% of sequences, inverted in 1.2%
• Some explained as family 1 gene plus N additions
• Median length of remaining not different from
in permutated sequences
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
• Very long, >180 bp
Two D genes
• Frequency not different from permutated
sequences
• Some explained as one long D genes with deletion
• Some not possible due to D genes location
• Median lengths of longest gene resembles normal D
genes, shortest resembles permutated sequences
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
• 2 D genes found in 1% of sequences
Chromosome 15 OR
• High homology to conventional D genes
• Very few OR15 in un-mutated sequences
• Median length not different from hits in
permutated sequences
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
• 10 OR resembling D genes on chromosome 15
Number of Sequences (bars)
40
700
35
600
30
500
25
400
20
300
15
200
10
100
5
0
0
27 conventional D genes, 34 known alleles
D-Gene Usage and Lengths
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
D Gene
Average Length (triangles)
Germline Length (diamonds)
800
IGHD1-1
IGHD2-2
IGHD3-3
IGHD4-11/IGHD4-4
IGHD5-18/IGHD5-5
IGHD6-6
IGHD1-7
IGHD2-8
IGHD3-9
IGHD3-10
IGHD5-12
IGHD6-13
IGHD1-14
IGHD2-15
IGHD3-16
IGHD4-17
IGHD6-19
IGHD1-20
IGHD2-21
IGHD3-22
IGHD4-23
IGHD5-24
IGHD6-25
IGHD1-26
IGHD7-27
D gene usage
D-gene usage and JH gene
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
• JH proximal D genes more often recombined to JH4
than JH6 and JH distal D genes more often to JH6
BiC BioCentrum-DTU
Technical University of Denmark
D gene reading frames
Stop
Gene
Hydrophilic
P
NP
Hydrophobic
P
NP
P
NP
D2-2*01
RIL**YQLLC (1)
6.5
34.7
GYCSSTSCYA (2)
61.2
32.6
DIVVVPAAM (3)
32.2
32.6
D2-2*02
RIL**YQLLY (1)
11.3
46.7
GYCSSTSCYT (2)
55.0
20.0
DIVVVPAAI (3)
33.8
33.3
D2-2*03
WIL**YQLLC (1)
0.0
50.0
GYCSSTSCYA (2)
66.7
50.0
DIVVVPAAM (3)
33.3
0.0
D2-8*01
RILY*WCMLY (1)
2.4
42.9
GYCTNGVCYT (2)
68.3
28.6
DIVLMVYAI (3)
29.3
28.6
D2-8*02
RILYWWCMLY (1)
0.0
0.0
GYCTGGVCYT (2)
88.9
0.0
DIVLVVYAI (3)
11.1
100
D2-15*01
RIL*WW*LLL (1)
1.5
32.5
GYCSGGSCYS (2)
70.8
37.5
DIVVVVAAT (3)
27.7
30.0
D2-21*01
SILWW*LLF (1)
8.3
50.0
AYCGGDCYS (2)
58.3
25.0
HIVVVIAI (3)
33.3
25.0
D2-21*02
SILWW*LLF (1)
0.0
54.5
AYCGGDCYS (2)
78.0
18.2
HIVVVTAI (3)
22.0
27.3
Total
-
10.8
33.6
-
62.2
32.4
-
26.9
34.0
Tabel 4. Manuscript 2.
• The recombination mechanism utilises each D gene
reading frame at same frequency
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
Reading
Frame
N nucleotide dependence on
end nucleotide
T
0.146
0.290
0.172
0.204
0.201
G
0.292
0.207
0.453
0.231
0.292
C
0.271
0.243
0.172
0.430
0.298
P-value
0.04
0.016
0.0004
<0.0001
-
N addition is not random but dependent on end
nucleotide
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
Position X+1
Position X
A
A
0.292
T
0.260
G
0.204
C
0.136
Expected
0.210
Trimming of gene ends
Number of
Sequences
Avg. 3.8 bp
140
120
100
80
60
40
20
0
Observed
Predicted
15 14 13 12 11 10 9 8
7
6 5
4 3
2 1 P1 P2
End Position
 Trimming depends on the gene end and can not only be
described by a simple removal of one nucleotide at a time
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
Trimming of VH
Results regarding recombination and
diversity and open questions
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
• DIR, OR15, multiple D genes and VH replacements
are not used at a significant rate
• Inverted D genes are used rarely
• All D genes not used at same frequency
 What determines if a D genes is used?
• D gene usage somewhat dependent on JH gene
 Does multiple D-J recombination steps take place?
• All D gene reading frames used at equal rate at
the recombination step
 At what step in the development is the selection
for the hydrophilic reading frame
Results regarding recombination and diversity
and open questions (cont.)
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
• N addition not random but dependent on end nucleotide
 Does nucleotide availability or the specificity of TdT
determine the N addition?
• Trimming not random but dependent on gene and
sequence
 What enzyme(s) is responsible for the trimming?
The translated functional
rearrangement
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
BiC BioCentrum-DTU
Technical University of Denmark
Numbering Schemes
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
BiC BioCentrum-DTU
Technical University of Denmark
Numbering Schemes
http://www.bioinf.org.uk/abs/#kabatnum
BiC BioCentrum-DTU
Technical University of Denmark
http://imgt.cines.fr/
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
 The Kabat numbering scheme is a widely adopted standard for numbering
the residues in an antibody in a consistent manner. However the scheme has
problems!
 The Chothia numbering scheme is identical to the Kabat scheme, but
places the insertions in CDR-L1 and CDR-H1 at the structurally correct
positions. This means that topologically equivalent residues in these loops do
get the same label (unlike the Kabat scheme).
 The IMGT unique numbering for all IG and TR V-REGIONs of all species
relies on the high conservation of the structure of the variable region [1-6].
This numbering, set up after aligning more than 5 000 sequences, takes into
account and combines the definition of the framework (FR) and
complementarity determining regions (CDR) [8], structural data from X-ray
diffraction studies [9], and the characterization of the hypervariable loops [10].
Identification of CDR regions
CDR-L1
Start
Approx residue 24
Residue before is always C
Residue after
is always W. Typically WYQ, but also, WLQ, WFQ, WYL
Length
10 to 17 residues
CDR-L2
Start
always 16 residues after the end of CDR-L1
Residues before generally IY, but also, VY, IK, IF
Length
always 7 residues
CDR-L3
Start
always 33 residues after end of CDR-L2
Residue before is always C
Residues after
always FGXG
Length
7 to 11 residues
CDR-H1
Start
Approximately residue 31 (always 9 after a C) (Chothia/AbM defintion starts 5 residues earlier)
Residues before always CXXXXXXXX
Residues after
always W. Typically WV, but also WI, WA
Length
5 to 7 residues (Kabat definition); 7 to 9 residues (Chothia definition); 10 to 12 residues (AbM definition)
CDR-H2
Start
always 15 residues after the end of Kabat/AbM definition of CDR-H1
Residues before typically LEWIG, but a number of variations
Residues after
K[RL]IVFT[AT]SIA (where residues in square brackets are alternatives at that position)
Length
Kabat definition 16 to 19 residues (AbM definition and most recent Chothia definition ends 7 residues earlier; earlier Chothia definition starts
2 residues later and ends 9 earlier)
CDR-H3
Start
always 33 residues after end of CDR-H2 (always 3 after a C)
Residues before always CXX (typically CAR)
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
Indentifying the CDRs