* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download PowerPoint Slides
X-inactivation wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Copy-number variation wikipedia , lookup
Gene therapy wikipedia , lookup
Essential gene wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Transposable element wikipedia , lookup
Human genome wikipedia , lookup
History of genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
Point mutation wikipedia , lookup
Gene nomenclature wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genomic imprinting wikipedia , lookup
Minimal genome wikipedia , lookup
Ridge (biology) wikipedia , lookup
Gene desert wikipedia , lookup
Genome evolution wikipedia , lookup
Pathogenomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene expression programming wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Metagenomics wikipedia , lookup
Genome editing wikipedia , lookup
Genome (book) wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Helitron (biology) wikipedia , lookup
Microevolution wikipedia , lookup
Designer baby wikipedia , lookup
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
Creation of a functional B cell
receptor/Antibody
Germ line gene organization
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
© 2001 by Garland Publishing
BiC BioCentrum-DTU
Technical University of Denmark
Gene organization
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
© 2001 by Garland Publishing
BiC BioCentrum-DTU
Technical University of Denmark
The 12/23 rule of recombination
{
Only combined 12 RSS to 23 RSS
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
recombination signal sequence (RSS)
Mechanism of gene rearrangement
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
BiC BioCentrum-DTU
Technical University of Denmark
Addition of P and N nucleotides
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
BiC BioCentrum-DTU
Technical University of Denmark
Questions to be addressed
– Violation of 12/23 rule
• Can D genes be inserted backwards?
• Is there a D gene preference?
• Is there a reading frame preference for
D genes?
– If yes, is it part of the gene rearrangement?
• Who is doing the end trimming?
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
• Can multiple D genes be inserted?
P nucleotides
Distance from
heptamer to gene end
No. of
seq
Permutated sequences
No. with
P
Table 1. Manuscript 2.
% with P
No. of
seq
No. with
P
% with P
p-value
VH gene
1
1448
474
32.7
1635
103
6.3
<10-5
2
1027
48
4.7
1068
65
6.1
0.091
3
762
53
7.0
612
36
5.9
0.245
1
324
60
18.5
350
23
6.6
<10-5
2
184
2
1.0
209
3
1.4
0.560
3
219
8
3.7
250
14
5.6
0.220
1
519
128
24.7
619
54
8.7
<10-5
2
343
31
9.0
347
26
7.5
0.275
3
474
25
5.3
454
17
3.7
0.168
1
616
86
14.0
684
58
8.5
0.001
2
266
30
11.3
276
24
8.7
0.195
3
460
5
1.1
485
9
1.9
0.241
JH gene
5’ end of D gene
3’ end of D gene
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
Sequences
How many types of D genes?
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
• Conventional D genes
– Identified in 81% of sequences unmutated
sequences, 64% of mutated sequences
• D genes with irregular RSS (DIR)
• Chromosome 15 OR
• Two D genes
• Inverted D genes
– Long inverted D genes can not be excluded
Inverted (palindrom) D genes
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
BiC BioCentrum-DTU
Technical University of Denmark
D genes with irregular RSS (DIR)
• Contain a family 1 D gene
• Found in 1% of sequences, inverted in 1.2%
• Some explained as family 1 gene plus N additions
• Median length of remaining not different from
in permutated sequences
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
• Very long, >180 bp
Two D genes
• Frequency not different from permutated
sequences
• Some explained as one long D genes with deletion
• Some not possible due to D genes location
• Median lengths of longest gene resembles normal D
genes, shortest resembles permutated sequences
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
• 2 D genes found in 1% of sequences
Chromosome 15 OR
• High homology to conventional D genes
• Very few OR15 in un-mutated sequences
• Median length not different from hits in
permutated sequences
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
• 10 OR resembling D genes on chromosome 15
Number of Sequences (bars)
40
700
35
600
30
500
25
400
20
300
15
200
10
100
5
0
0
27 conventional D genes, 34 known alleles
D-Gene Usage and Lengths
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
D Gene
Average Length (triangles)
Germline Length (diamonds)
800
IGHD1-1
IGHD2-2
IGHD3-3
IGHD4-11/IGHD4-4
IGHD5-18/IGHD5-5
IGHD6-6
IGHD1-7
IGHD2-8
IGHD3-9
IGHD3-10
IGHD5-12
IGHD6-13
IGHD1-14
IGHD2-15
IGHD3-16
IGHD4-17
IGHD6-19
IGHD1-20
IGHD2-21
IGHD3-22
IGHD4-23
IGHD5-24
IGHD6-25
IGHD1-26
IGHD7-27
D gene usage
D-gene usage and JH gene
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
• JH proximal D genes more often recombined to JH4
than JH6 and JH distal D genes more often to JH6
BiC BioCentrum-DTU
Technical University of Denmark
D gene reading frames
Stop
Gene
Hydrophilic
P
NP
Hydrophobic
P
NP
P
NP
D2-2*01
RIL**YQLLC (1)
6.5
34.7
GYCSSTSCYA (2)
61.2
32.6
DIVVVPAAM (3)
32.2
32.6
D2-2*02
RIL**YQLLY (1)
11.3
46.7
GYCSSTSCYT (2)
55.0
20.0
DIVVVPAAI (3)
33.8
33.3
D2-2*03
WIL**YQLLC (1)
0.0
50.0
GYCSSTSCYA (2)
66.7
50.0
DIVVVPAAM (3)
33.3
0.0
D2-8*01
RILY*WCMLY (1)
2.4
42.9
GYCTNGVCYT (2)
68.3
28.6
DIVLMVYAI (3)
29.3
28.6
D2-8*02
RILYWWCMLY (1)
0.0
0.0
GYCTGGVCYT (2)
88.9
0.0
DIVLVVYAI (3)
11.1
100
D2-15*01
RIL*WW*LLL (1)
1.5
32.5
GYCSGGSCYS (2)
70.8
37.5
DIVVVVAAT (3)
27.7
30.0
D2-21*01
SILWW*LLF (1)
8.3
50.0
AYCGGDCYS (2)
58.3
25.0
HIVVVIAI (3)
33.3
25.0
D2-21*02
SILWW*LLF (1)
0.0
54.5
AYCGGDCYS (2)
78.0
18.2
HIVVVTAI (3)
22.0
27.3
Total
-
10.8
33.6
-
62.2
32.4
-
26.9
34.0
Tabel 4. Manuscript 2.
• The recombination mechanism utilises each D gene
reading frame at same frequency
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
Reading
Frame
N nucleotide dependence on
end nucleotide
T
0.146
0.290
0.172
0.204
0.201
G
0.292
0.207
0.453
0.231
0.292
C
0.271
0.243
0.172
0.430
0.298
P-value
0.04
0.016
0.0004
<0.0001
-
N addition is not random but dependent on end
nucleotide
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
Position X+1
Position X
A
A
0.292
T
0.260
G
0.204
C
0.136
Expected
0.210
Trimming of gene ends
Number of
Sequences
Avg. 3.8 bp
140
120
100
80
60
40
20
0
Observed
Predicted
15 14 13 12 11 10 9 8
7
6 5
4 3
2 1 P1 P2
End Position
Trimming depends on the gene end and can not only be
described by a simple removal of one nucleotide at a time
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
Trimming of VH
Results regarding recombination and
diversity and open questions
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
• DIR, OR15, multiple D genes and VH replacements
are not used at a significant rate
• Inverted D genes are used rarely
• All D genes not used at same frequency
What determines if a D genes is used?
• D gene usage somewhat dependent on JH gene
Does multiple D-J recombination steps take place?
• All D gene reading frames used at equal rate at
the recombination step
At what step in the development is the selection
for the hydrophilic reading frame
Results regarding recombination and diversity
and open questions (cont.)
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
• N addition not random but dependent on end nucleotide
Does nucleotide availability or the specificity of TdT
determine the N addition?
• Trimming not random but dependent on gene and
sequence
What enzyme(s) is responsible for the trimming?
The translated functional
rearrangement
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
BiC BioCentrum-DTU
Technical University of Denmark
Numbering Schemes
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
BiC BioCentrum-DTU
Technical University of Denmark
Numbering Schemes
http://www.bioinf.org.uk/abs/#kabatnum
BiC BioCentrum-DTU
Technical University of Denmark
http://imgt.cines.fr/
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
The Kabat numbering scheme is a widely adopted standard for numbering
the residues in an antibody in a consistent manner. However the scheme has
problems!
The Chothia numbering scheme is identical to the Kabat scheme, but
places the insertions in CDR-L1 and CDR-H1 at the structurally correct
positions. This means that topologically equivalent residues in these loops do
get the same label (unlike the Kabat scheme).
The IMGT unique numbering for all IG and TR V-REGIONs of all species
relies on the high conservation of the structure of the variable region [1-6].
This numbering, set up after aligning more than 5 000 sequences, takes into
account and combines the definition of the framework (FR) and
complementarity determining regions (CDR) [8], structural data from X-ray
diffraction studies [9], and the characterization of the hypervariable loops [10].
Identification of CDR regions
CDR-L1
Start
Approx residue 24
Residue before is always C
Residue after
is always W. Typically WYQ, but also, WLQ, WFQ, WYL
Length
10 to 17 residues
CDR-L2
Start
always 16 residues after the end of CDR-L1
Residues before generally IY, but also, VY, IK, IF
Length
always 7 residues
CDR-L3
Start
always 33 residues after end of CDR-L2
Residue before is always C
Residues after
always FGXG
Length
7 to 11 residues
CDR-H1
Start
Approximately residue 31 (always 9 after a C) (Chothia/AbM defintion starts 5 residues earlier)
Residues before always CXXXXXXXX
Residues after
always W. Typically WV, but also WI, WA
Length
5 to 7 residues (Kabat definition); 7 to 9 residues (Chothia definition); 10 to 12 residues (AbM definition)
CDR-H2
Start
always 15 residues after the end of Kabat/AbM definition of CDR-H1
Residues before typically LEWIG, but a number of variations
Residues after
K[RL]IVFT[AT]SIA (where residues in square brackets are alternatives at that position)
Length
Kabat definition 16 to 19 residues (AbM definition and most recent Chothia definition ends 7 residues earlier; earlier Chothia definition starts
2 residues later and ends 9 earlier)
CDR-H3
Start
always 33 residues after end of CDR-H2 (always 3 after a C)
Residues before always CXX (typically CAR)
BiC BioCentrum-DTU
Technical University of Denmark
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
Indentifying the CDRs