Download Supplementary Material (doc 28K)

Parameters used for pattern analysis The complete set of parameters of TEIRESIAS used in this analysis is as follows: amino acids in the pattern (-l), number of overlapping characters in the convolved pattern (-c), maximum length of an elementary pattern (-w), minimum number of appearances of the pattern (-k), maximum number of brackets (indicating equivalent amino acids) allowed in the pattern (-n), flag for the support k to be the minimum number of sequences in which a pattern should appear (compared to the minimum number of instances of the pattern) (-v), flag for the algorithm to use amino acid equivalences (-b<equivalences file>), and flag for the algorithm to consider only the uppercase characters during pattern discovery (-u). The parameters were set to [l=3] [c=2] [w=6] [k=2] [n=2] [-v] [-b<equivalences file>] [-u]. Clustering of non-CLL sequences based on HCDR3 patterns In the 5,344 non-CLL HCDR3 sequences from public-databases, TEIRESIAS discovered 1,106,692 patterns which were filtered down to 1,714, a reduction of 99.9%. This final set of patterns was smaller by 21.5% than the one in the CLL dataset although the number of sequences analyzed was almost twice as high (5,344 vs. 2,845). This was partly due to the fact that this set of sequences was a sum of several different entities. Furthermore, a significant number of groups of identical or nearidentical sequences were referenced in the same publication, i.e. probably clonally related and consequently described by very few patterns (see also Supplemental Figure 8). The non-CLL level 0 clusters were significantly smaller in size (most included 2-3 members) than the corresponding CLL clusters. Furthermore, in stark contrast to high-level clusters in CLL, high-level clusters in the non-CLL group were characterized by marked IGHV gene heterogeneity. For instance, non-CLL level 3 clusters had a minimum of 8 different genes, reaching up to 35 in a cluster of 152 members. Strikingly, three of the six most frequent genes in level 3 CLL clusters (IGHV1-69, IGHV3-21 and IGHV1-3) were under-represented in level 3 non-CLL clusters. Clustering of all sequences based on HCDR3 patterns Finally, analysing the total 8,189 HCDR3 sequences from patients with CLL and from the other entities, 2,033,781 patterns were discovered and subsequently filtered down to 4,955, a reduction of 99.7%. These patterns allowed us to put 2,493 sequences in clusters of different levels. Taking into account the origin of sequences in each cluster, the 1,364 level 0 clusters were subdivided in five categories on a CLL to non-CLL direction: CLL-unique, CLL-biased, “neutral” (i.e. equal number of CLL and non-CLL sequences), non-CLL-biased, and non-CLL-unique. Analysis of the sizes of clusters in these five different “specificity” categories showed a strong bias for CLL sequences to get together in significantly larger clusters compared to non-CLL sequences, which formed mainly two- or three-member-only level 0 clusters (Supplemental Figure 8).

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Supplementary Material (doc 28K)