* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PowerPoint Slides
X-inactivation wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Copy-number variation wikipedia , lookup
Gene therapy wikipedia , lookup
Essential gene wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Transposable element wikipedia , lookup
Human genome wikipedia , lookup
History of genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
Point mutation wikipedia , lookup
Gene nomenclature wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genomic imprinting wikipedia , lookup
Minimal genome wikipedia , lookup
Ridge (biology) wikipedia , lookup
Gene desert wikipedia , lookup
Genome evolution wikipedia , lookup
Pathogenomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene expression programming wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Metagenomics wikipedia , lookup
Genome editing wikipedia , lookup
Genome (book) wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Helitron (biology) wikipedia , lookup
Microevolution wikipedia , lookup
Designer baby wikipedia , lookup
BiC BioCentrum-DTU Technical University of Denmark CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Creation of a functional B cell receptor/Antibody Germ line gene organization CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS © 2001 by Garland Publishing BiC BioCentrum-DTU Technical University of Denmark Gene organization CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS © 2001 by Garland Publishing BiC BioCentrum-DTU Technical University of Denmark The 12/23 rule of recombination { Only combined 12 RSS to 23 RSS BiC BioCentrum-DTU Technical University of Denmark CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS recombination signal sequence (RSS) Mechanism of gene rearrangement CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS BiC BioCentrum-DTU Technical University of Denmark Addition of P and N nucleotides CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS BiC BioCentrum-DTU Technical University of Denmark Questions to be addressed – Violation of 12/23 rule • Can D genes be inserted backwards? • Is there a D gene preference? • Is there a reading frame preference for D genes? – If yes, is it part of the gene rearrangement? • Who is doing the end trimming? BiC BioCentrum-DTU Technical University of Denmark CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS • Can multiple D genes be inserted? P nucleotides Distance from heptamer to gene end No. of seq Permutated sequences No. with P Table 1. Manuscript 2. % with P No. of seq No. with P % with P p-value VH gene 1 1448 474 32.7 1635 103 6.3 <10-5 2 1027 48 4.7 1068 65 6.1 0.091 3 762 53 7.0 612 36 5.9 0.245 1 324 60 18.5 350 23 6.6 <10-5 2 184 2 1.0 209 3 1.4 0.560 3 219 8 3.7 250 14 5.6 0.220 1 519 128 24.7 619 54 8.7 <10-5 2 343 31 9.0 347 26 7.5 0.275 3 474 25 5.3 454 17 3.7 0.168 1 616 86 14.0 684 58 8.5 0.001 2 266 30 11.3 276 24 8.7 0.195 3 460 5 1.1 485 9 1.9 0.241 JH gene 5’ end of D gene 3’ end of D gene BiC BioCentrum-DTU Technical University of Denmark CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Sequences How many types of D genes? BiC BioCentrum-DTU Technical University of Denmark CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS • Conventional D genes – Identified in 81% of sequences unmutated sequences, 64% of mutated sequences • D genes with irregular RSS (DIR) • Chromosome 15 OR • Two D genes • Inverted D genes – Long inverted D genes can not be excluded Inverted (palindrom) D genes CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS BiC BioCentrum-DTU Technical University of Denmark D genes with irregular RSS (DIR) • Contain a family 1 D gene • Found in 1% of sequences, inverted in 1.2% • Some explained as family 1 gene plus N additions • Median length of remaining not different from in permutated sequences BiC BioCentrum-DTU Technical University of Denmark CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS • Very long, >180 bp Two D genes • Frequency not different from permutated sequences • Some explained as one long D genes with deletion • Some not possible due to D genes location • Median lengths of longest gene resembles normal D genes, shortest resembles permutated sequences BiC BioCentrum-DTU Technical University of Denmark CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS • 2 D genes found in 1% of sequences Chromosome 15 OR • High homology to conventional D genes • Very few OR15 in un-mutated sequences • Median length not different from hits in permutated sequences BiC BioCentrum-DTU Technical University of Denmark CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS • 10 OR resembling D genes on chromosome 15 Number of Sequences (bars) 40 700 35 600 30 500 25 400 20 300 15 200 10 100 5 0 0 27 conventional D genes, 34 known alleles D-Gene Usage and Lengths BiC BioCentrum-DTU Technical University of Denmark CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS D Gene Average Length (triangles) Germline Length (diamonds) 800 IGHD1-1 IGHD2-2 IGHD3-3 IGHD4-11/IGHD4-4 IGHD5-18/IGHD5-5 IGHD6-6 IGHD1-7 IGHD2-8 IGHD3-9 IGHD3-10 IGHD5-12 IGHD6-13 IGHD1-14 IGHD2-15 IGHD3-16 IGHD4-17 IGHD6-19 IGHD1-20 IGHD2-21 IGHD3-22 IGHD4-23 IGHD5-24 IGHD6-25 IGHD1-26 IGHD7-27 D gene usage D-gene usage and JH gene CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS • JH proximal D genes more often recombined to JH4 than JH6 and JH distal D genes more often to JH6 BiC BioCentrum-DTU Technical University of Denmark D gene reading frames Stop Gene Hydrophilic P NP Hydrophobic P NP P NP D2-2*01 RIL**YQLLC (1) 6.5 34.7 GYCSSTSCYA (2) 61.2 32.6 DIVVVPAAM (3) 32.2 32.6 D2-2*02 RIL**YQLLY (1) 11.3 46.7 GYCSSTSCYT (2) 55.0 20.0 DIVVVPAAI (3) 33.8 33.3 D2-2*03 WIL**YQLLC (1) 0.0 50.0 GYCSSTSCYA (2) 66.7 50.0 DIVVVPAAM (3) 33.3 0.0 D2-8*01 RILY*WCMLY (1) 2.4 42.9 GYCTNGVCYT (2) 68.3 28.6 DIVLMVYAI (3) 29.3 28.6 D2-8*02 RILYWWCMLY (1) 0.0 0.0 GYCTGGVCYT (2) 88.9 0.0 DIVLVVYAI (3) 11.1 100 D2-15*01 RIL*WW*LLL (1) 1.5 32.5 GYCSGGSCYS (2) 70.8 37.5 DIVVVVAAT (3) 27.7 30.0 D2-21*01 SILWW*LLF (1) 8.3 50.0 AYCGGDCYS (2) 58.3 25.0 HIVVVIAI (3) 33.3 25.0 D2-21*02 SILWW*LLF (1) 0.0 54.5 AYCGGDCYS (2) 78.0 18.2 HIVVVTAI (3) 22.0 27.3 Total - 10.8 33.6 - 62.2 32.4 - 26.9 34.0 Tabel 4. Manuscript 2. • The recombination mechanism utilises each D gene reading frame at same frequency BiC BioCentrum-DTU Technical University of Denmark CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Reading Frame N nucleotide dependence on end nucleotide T 0.146 0.290 0.172 0.204 0.201 G 0.292 0.207 0.453 0.231 0.292 C 0.271 0.243 0.172 0.430 0.298 P-value 0.04 0.016 0.0004 <0.0001 - N addition is not random but dependent on end nucleotide BiC BioCentrum-DTU Technical University of Denmark CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Position X+1 Position X A A 0.292 T 0.260 G 0.204 C 0.136 Expected 0.210 Trimming of gene ends Number of Sequences Avg. 3.8 bp 140 120 100 80 60 40 20 0 Observed Predicted 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 P1 P2 End Position Trimming depends on the gene end and can not only be described by a simple removal of one nucleotide at a time BiC BioCentrum-DTU Technical University of Denmark CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Trimming of VH Results regarding recombination and diversity and open questions BiC BioCentrum-DTU Technical University of Denmark CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS • DIR, OR15, multiple D genes and VH replacements are not used at a significant rate • Inverted D genes are used rarely • All D genes not used at same frequency What determines if a D genes is used? • D gene usage somewhat dependent on JH gene Does multiple D-J recombination steps take place? • All D gene reading frames used at equal rate at the recombination step At what step in the development is the selection for the hydrophilic reading frame Results regarding recombination and diversity and open questions (cont.) BiC BioCentrum-DTU Technical University of Denmark CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS • N addition not random but dependent on end nucleotide Does nucleotide availability or the specificity of TdT determine the N addition? • Trimming not random but dependent on gene and sequence What enzyme(s) is responsible for the trimming? The translated functional rearrangement CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS BiC BioCentrum-DTU Technical University of Denmark Numbering Schemes CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS BiC BioCentrum-DTU Technical University of Denmark Numbering Schemes http://www.bioinf.org.uk/abs/#kabatnum BiC BioCentrum-DTU Technical University of Denmark http://imgt.cines.fr/ CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS The Kabat numbering scheme is a widely adopted standard for numbering the residues in an antibody in a consistent manner. However the scheme has problems! The Chothia numbering scheme is identical to the Kabat scheme, but places the insertions in CDR-L1 and CDR-H1 at the structurally correct positions. This means that topologically equivalent residues in these loops do get the same label (unlike the Kabat scheme). The IMGT unique numbering for all IG and TR V-REGIONs of all species relies on the high conservation of the structure of the variable region [1-6]. This numbering, set up after aligning more than 5 000 sequences, takes into account and combines the definition of the framework (FR) and complementarity determining regions (CDR) [8], structural data from X-ray diffraction studies [9], and the characterization of the hypervariable loops [10]. Identification of CDR regions CDR-L1 Start Approx residue 24 Residue before is always C Residue after is always W. Typically WYQ, but also, WLQ, WFQ, WYL Length 10 to 17 residues CDR-L2 Start always 16 residues after the end of CDR-L1 Residues before generally IY, but also, VY, IK, IF Length always 7 residues CDR-L3 Start always 33 residues after end of CDR-L2 Residue before is always C Residues after always FGXG Length 7 to 11 residues CDR-H1 Start Approximately residue 31 (always 9 after a C) (Chothia/AbM defintion starts 5 residues earlier) Residues before always CXXXXXXXX Residues after always W. Typically WV, but also WI, WA Length 5 to 7 residues (Kabat definition); 7 to 9 residues (Chothia definition); 10 to 12 residues (AbM definition) CDR-H2 Start always 15 residues after the end of Kabat/AbM definition of CDR-H1 Residues before typically LEWIG, but a number of variations Residues after K[RL]IVFT[AT]SIA (where residues in square brackets are alternatives at that position) Length Kabat definition 16 to 19 residues (AbM definition and most recent Chothia definition ends 7 residues earlier; earlier Chothia definition starts 2 residues later and ends 9 earlier) CDR-H3 Start always 33 residues after end of CDR-H2 (always 3 after a C) Residues before always CXX (typically CAR) BiC BioCentrum-DTU Technical University of Denmark CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Indentifying the CDRs