Download Immune infiltrate estimation: review of methods for deriving cell type

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Molecular mimicry wikipedia , lookup

Innate immune system wikipedia , lookup

Adoptive cell transfer wikipedia , lookup

Cancer immunotherapy wikipedia , lookup

Psychoneuroimmunology wikipedia , lookup

Polyclonal B cell response wikipedia , lookup

Immunosuppressive drug wikipedia , lookup

Immunomics wikipedia , lookup

Transcript
Immune infiltrate estimation: review of methods for deriving
cell type profiles from purified cell sample data
Maxim Zaslavsky
June 2016
1
Introduction
Several groups have introduced methods for the deconvolution of bulk tumor gene expression data. Every
method includes a unique procedure for isolating a reference profile corresponding to each cell type using
sample gene expression data collected from enriched cell lines. For robust deconvolution, it is essential that
these reference profiles – taking the form of marker gene lists or characteristic gene expression vectors – are
determined carefully. Specifically, we are interested in the following properties of these methods:
•
•
•
•
Do these training methods extract biological intuition or noise?
How well do their selected genes or expression vectors differentiate between similar classes?
How well can these methods differentiate all classes, based on metrics like condition number?
Do their chosen genes overlap, and are the differences between their training expression vectors or
marker genes biologically significant or noise?
• How unique are genes to individual cell types? How many are shared between multiple types?
We introduce each approach, discuss its theoretical limitations, and examine its output to understand
whether there is motivation for new approaches.
2
2.1
Methods for selecting reference profiles
Marker gene methods
There are three notable methods that produce marker gene lists. First, [1] examined gene expression within
immune cell types and in other tissue to produce a list of genes that are specifically expressed in particular
immune cell types. To determine whether a certain gene g is exemplary of any cell type(s), the authors find the
array with the highest expression level of g and determine which enriched cell type it corresponds to. They
multiply this highest expression level by 0.1625 (chosen arbitrarily) and add the maximum expression level of
g seen in non-immune tissue samples. If this weighted sum is greater than the next highest expression level of
gene g across arrays from other immune cell types, then g is considered characteristic of the cell type in which
it was most highly expressed. Finally, if this gene has higher expression in another immune cell type than this
weighted sum, the gene is also considered to be characteristic of the other cell type.
However, fold change alone is a poor indicator of uniqueness; high expression should not be the only
indicator that a gene corresponds to a particular cell type! A more robust method would consider rare expression, even at low levels.
[2] pursues the same task with a similar method. To determine whether the expression of gene g is
1
characteristic of cell type t, the authors essentially compute the score:
∆g,t = min (ei (g)) −
ei ∈Xt
max (meanej ∈Xt′ (ej (g))),
t′ ∈T −{t}
where Xi is the set of all arrays of cell type i and e(g) is the expression of g in some array. Then, they keep
all g ’s with highest ∆g,t . The authors do not specify their filtering cutoff, unfortunately. The authors finally
add some cell type-specific genes for populations not sampled, again without much detail (the code is not
available).
This filtering mechanism ensures that selected genes are unique to their corresponding cell types, but
would fail if any two types are very similar. In this case, genes whose differential expression has biological
meaning but is small might not pass the filter, whereas genes with seemingly high differential expression – as
may be found in the noise from low sample sizes – may pass.
Finally, though [3] attempts to estimate tumor purity, which is the absolute fraction of stromal and
immune cells in a tumor sample, instead of the relative abundances of specific immune infiltrate cell types, the
method also relies on identifying immune signature genes from gene expression in enriched samples and thus
deserves investigation. The authors simply divided samples into extremely low and extremely high immune
cell infiltration groups (using leukocyte methylation signature scores that are given in many TCGA datasets),
removing any samples with medium immune cell infiltration. They computed Significance Analysis of Microarray (SAM) scores on the differential expression of genes between the high-and low-infiltration groups
[4]. They selected genes that were significantly differentially expressed to form a gene list.
SAM is a straightforward, well-known, and statistically sound method for finding genes that are differentially expressed between two classes. Moreover, the SAM technique can be applied to multi-class situations to
determine genes that are significantly differentially expressed in one combination of cell types versus another.
I believe SAM would form more robust gene lists in comparison to previous methods that are based solely on
fold change.
2.1.1
Analysis
Our first measure of whether these marker gene extraction methods are successful is whether known immune
pathways are enriched in the gene list. For example, are the genes that these methods believe to be associated
with T cells part of the T cell receptor signaling pathway, or are these methods pulling out noise?
The two marker gene lists, which we call IRIS [1] and Bindea [2], do not have much agreement on B
cells or T cells; on NK cells, there are no intersecting genes at all. We run gene ontology enrichment analysis
on the genes they have in common and on the genes unique to each list to see which T cell pathways are
found and where. The resulting significant (p < .001) GO terms are contained in the tables below. Though
the genes are different, they belong to the same pathways. The IRIS list contains much more noise than the
Bindea list. This suggests that the method of ??? is more effective at extracting the unique properties of each
immune cell subtype.
GO terms in intersection of T cell IRIS and Bindea marker gene lists:
1. T cell receptor signaling pathway
7. T cell aggregation
2. antigen receptor-mediated signaling pathway
8. lymphocyte aggregation
3. T cell costimulation
9. leukocyte aggregation
10. T cell selection
4. lymphocyte costimulation
11. leukocyte cell-cell adhesion
5. immune response-activating cell surface receptor signaling pathway
12. immune response-activating signal transduction
6. T cell activation
13. positive regulation of T cell activation
2
14. homotypic cell-cell adhesion
35. biological adhesion
15. positive regulation of homotypic cell-cell adhesion
36. positive regulation of cell adhesion
37. regulation of lymphocyte activation
16. positive regulation of leukocyte cell-cell adhesion
38. regulation of cell-cell adhesion
17. immune response-regulating cell surface receptor signaling pathway
39. cell-cell adhesion
18. activation of immune response
41. regulation of leukocyte activation
19. positive regulation of cell-cell adhesion
42. cell activation
20. lymphocyte activation
43. regulation of cell activation
21. positive regulation of lymphocyte activation
44. regulation of immune response
22. positive regulation of leukocyte activation
45. thymic T cell selection
23. immune response-regulating signaling pathway
46. positive T cell selection
24. positive regulation of immune response
25. T cell differentiation in thymus
47. positive regulation of calcium-mediated signaling
26. thymocyte aggregation
48. T cell differentiation
27. positive regulation of cell activation
49. regulation of cell adhesion
28. regulation of T cell activation
50. regulation of calcium-mediated signaling
29. regulation of leukocyte cell-cell adhesion
51. regulation of immune system process
30. leukocyte activation
52. immune system process
31. regulation of homotypic cell-cell adhesion
53. lymphocyte differentiation
32. single organismal cell-cell adhesion
54. immune response
33. single organism cell adhesion
55. olfactory bulb axon guidance
34. cell adhesion
56. positive regulation of response to stimulus
40. positive regulation of immune system process
GO terms of T cell genes in IRIS but not in Bindea:
1. cell division
12. cell cycle G2/M phase transition
2. nuclear division
3. organelle fission
13. anaphase-promoting complex-dependent proteasomal ubiquitin-dependent protein catabolic
process
4. cell cycle process
14. T cell activation
5. mitotic nuclear division
15. T cell aggregation
6. mitotic cell cycle process
16. lymphocyte aggregation
7. mitotic cell cycle
17. leukocyte aggregation
8. cell cycle
18. regulation of cell cycle
9. cell cycle checkpoint
19. spindle organization
10. mitotic cell cycle checkpoint
20. mitotic cell cycle phase transition
11. G2/M transition of mitotic cell cycle
21. leukocyte cell-cell adhesion
3
22. regulation of mitotic cell cycle
56. neuronal stem cell division
23. cell cycle phase transition
57. neuroblast division
24. negative regulation of mitotic cell cycle
58. single organism cell adhesion
25. regulation of spindle organization
59. microtubule cytoskeleton organization
26. homotypic cell-cell adhesion
60. V(D)J recombination
27. mitotic spindle organization
61. immune system development
28. somatic diversification of T cell receptor genes
62. meiotic nuclear division
29. somatic recombination of T cell receptor gene
segments
63. mitotic G2 DNA damage checkpoint
64. interleukin-5 production
30. T cell receptor V(D)J recombination
65. regulation of interleukin-5 production
31. spindle stabilization
66. meiotic cell cycle process
32. spindle assembly involved in meiosis
67. DNA integrity checkpoint
33. lymphocyte activation
68. negative regulation of mitotic cell cycle phase
transition
34. positive regulation of ubiquitin-protein transferase activity
69. nuclear envelope organization
35. regulation of ubiquitin homeostasis
70. positive regulation of ubiquitin-protein ligase activity involved in regulation of mitotic cell cycle
transition
36. free ubiquitin chain polymerization
37. positive regulation of ligase activity
71. positive regulation of proteolysis involved in cellular protein catabolic process
38. meiotic cell cycle
39. mitotic nuclear envelope disassembly
72. regeneration
40. membrane disassembly
73. oogenesis
41. nuclear envelope disassembly
74. spindle assembly
42. forebrain neuroblast division
75. organ regeneration
43. leukocyte activation
76. T cell costimulation
44. sister chromatid segregation
77. positive regulation of protein ubiquitination
45. response to insecticide
78. nuclear chromosome segregation
46. activation of anaphase-promoting complex activity
79. lymphocyte costimulation
80. cell-cell adhesion
47. single organismal cell-cell adhesion
81. centrosome localization
48. neural precursor cell proliferation
82. regulation of mitotic spindle organization
49. regulation of cell cycle process
83. negative regulation of cell cycle phase transition
50. cell proliferation
84. positive regulation of cellular protein catabolic
process
51. meiotic spindle organization
52. cell activation
85. negative regulation of cell cycle
53. regulation of ligase activity
86. positive regulation of protein modification by
small protein conjugation or removal
54. regulation of ubiquitin-protein transferase activity
87. negative regulation of cell division
55. histone-serine phosphorylation
88. single-organism organelle organization
4
GO terms of T cell genes in Bindea but not in IRIS:
1. T cell receptor signaling pathway
33. positive regulation of cell-cell adhesion
2. antigen receptor-mediated signaling pathway
34. immune response
3. positive regulation of immune system process
35. positive regulation of lymphocyte activation
4. regulation of immune system process
36. T cell costimulation
5. positive regulation of leukocyte activation
37. lymphocyte costimulation
6. regulation of immune response
38. immune system process
7. positive regulation of cell activation
39. lymphocyte activation
8. regulation of T cell activation
40. immune response-regulating signaling pathway
9. regulation of leukocyte cell-cell adhesion
10. regulation of homotypic cell-cell adhesion
41. positive regulation of interleukin-2 biosynthetic
process
11. regulation of cell adhesion
42. leukocyte activation
12. immune response-activating cell surface receptor signaling pathway
43. single organismal cell-cell adhesion
13. positive regulation of immune response
45. regulation of interleukin-2 biosynthetic process
14. positive regulation of cell adhesion
46. interleukin-2 biosynthetic process
15. regulation of lymphocyte activation
47. cell-cell adhesion
16. regulation of cell-cell adhesion
48. cell activation
17. regulation of leukocyte activation
49. regulation of defense response to virus by virus
18. T cell activation
50. positive regulation of interleukin-2 production
19. T cell aggregation
51. T cell differentiation
20. lymphocyte aggregation
52. positive regulation of response to stimulus
21. leukocyte aggregation
53. regulation of interleukin-2 production
22. cell adhesion
23. biological adhesion
54. positive regulation of alpha-beta T cell activation
24. regulation of cell activation
55. interleukin-2 production
25. positive regulation of T cell activation
56. positive regulation of cytokine biosynthetic process
44. single organism cell adhesion
26. positive regulation of homotypic cell-cell adhesion
57. positive regulation of myeloid dendritic cell activation
27. positive regulation of leukocyte cell-cell adhesion
58. lymphocyte differentiation
28. leukocyte cell-cell adhesion
59. positive regulation of adaptive immune response based on somatic recombination of immune receptors built from immunoglobulin superfamily domains
29. immune response-activating signal transduction
30. homotypic cell-cell adhesion
60. regulation of alpha-beta T cell activation
31. immune response-regulating cell surface receptor signaling pathway
61. positive regulation of lymphocyte mediated immunity
32. activation of immune response
5
62. Fc-epsilon receptor signaling pathway
64. positive regulation of adaptive immune response
63. positive regulation of signal transduction
2.2
Expression barcodes
Storing representative gene expression signatures, as opposed to just marker genes, is key to more robust
predictions of immune infiltrate cell type abundances. These distinctive transcriptional profiles are often called
unique expression “barcodes” (seemingly named for the heatmaps commonly used to visualize microarray
data). We now examine two methods that extract representative expression profiles.
[5] introduces the following procedure to select barcodes. For each expressed gene, the authors find
the two cell types with highest expression of this gene (perhaps in terms of mean expression across all samples
from each cell type, although the details are not given). If the gene is differentially expressed within a 95%
fold change confidence interval between those cell types, the gene is flagged as a potential marker for the cell
type with higher expression. This approach would clearly fail for very similar subtypes, and may only pull out
noise because of low sample sizes. So the authors also compare the cell types with highest and third-highest
expression of this gene in case it is hard to tell between the top two groups. They progressively refine their
basis matrix with an increasing number of top genes, and report that they minimize the condition number of
their matrix with an intermediate number of included genes (360 genes).
The authors note that their method produces a well-conditioned matrix. This is an important consideration because the condition number, defined as the ratio of the largest to smallest singular values in the singular
value decomposition of the basis matrix, estimates how imprecise solutions to linear systems with this matrix
are, and thus is a good proxy for the accuracy of deconvolution under the well-justified biological assumption
of linearity [6]. The smaller the condition number, the better conditioned the basis matrix is, meaning the cell
types are more distinct. However, more strict statistical testing with a controlled false discovery rate is desired.
[7] provides this desired statistical rigor. Like the previous method, this one also iteratively deletes
irrelevant genes. The authors find significantly differentially expressed genes between all populations using
two-sided unequal variance t-tests, with a (fairly loose) false discovery rate threshold of q < .3 and with log
fold change greater than 2.0. The number of selected genes per cell type is reduced from at most the first 150
towards 50 final selected genes in search of the best-conditioned matrix (minimum condition number).
Here is an example of the output of these methods. [5] provides raw samples from several populations:
T cells, two lines of B cells, and monocytes. Figure 1 and 2 are correlation matrices of the pure samples
and of processed basis matrices (via [7] codebase), respectively. Note the poor differentiation in the raw data
(especially note the scale), whereas differentiation is much easier in the processed matrix.
2.2.1
Analysis
We want to characterize how well expression barcode methods distinguish similar cell types. [5] does not
provide code to regenerate their full basis matrix from many samples. However, I was able to reproduce the
basis matrix from [7] using their tools and supplied input data, albeit with less filtering: the authors postprocessed their signature matrix to remove some junk genes using annotations from cancer cell lines. Though my
basis matrix thus included more genes, I obtained a very similar condition number to their matrix (which they
call LM22), and the genes in common all had almost exactly the same expressions throughout. This suggest
that the postprocessing that was poorly described and that I was unable to run did not significantly refine the
matrix.
I performed hierarchical clustering and computed pairwise Pearson correlations between cell typespecific profiles in the signature matrices from [5] and [7]. The pairwise Pearson correlation of the LM22
matrix [7] showed nice differentiation between cell types, and biologically-related cell types were highly correlated (Figure 3). In contrast, the pairwise Pearson correlations from the matrix in [5], hereafter called Abbas,
showed very poor differentiation among several B cell types (Figure 4). I also computed pairwise Pearson
6
Figure 1: Pairwise correlation in raw data from [5].
7
Figure 2: Pairwise correlation in basis matrix created from raw data of [5].
8
correlations from the combined matrices (Figure 5). Different methods with different datasets still produce
nice expected correlations, although are also several unexpected inter-matrix correlations.
Figure 3: Pairwise Pearson correlation in LM22 [7].
Hierarchical clustering of genes and cell types in LM22 generally recovers biological similarities between
cell types (Figure 6). There is one exception: gamma delta T cells. However, this cell type has been flagged as
problematic and may be ignored [8].
Since LM22 has nice differentiation between cell types, it is interesting to examine the most similar cell
types in this matrix. The Pearson correlations and the hierarchical clustering reveal that the following classes
in LM22 are most similar:
• B cells memory, naive
• CD4 T cells naive, memory resting
When one of each pair of similar cell types is removed, the condition number decreases from 11.38 to
9.30, meaning the resulting matrix is considerably better at deconvolving the more distinct set of cell types.
3
Future directions
In total, these papers have 390 microarrays samples. I downloaded and normalized all this array data. We
can construct a much richer set of expression profiles from this expanded dataset. In fact, the sample size
could potentially allow us to model variance and not just use mean expression profiles, which could be critical
for deconvolving the immune contexture of tumors, in which immune cells may have differing activations or
other properties depending on the state of the tumor.
9
Figure 4: Pairwise correlation in Abbas [5].
10
Figure 5: Pairwise pearson correlation in combination of LM22 and Abbas basis matrices, as well as with raw
data from [5].
11
Figure 6: LM22 hierarchical clustering
12
T.cells.CD4.memory.resting
T.cells.CD8
T.cells.CD4.naive
T.cells.regulatory..Tregs.
T.cells.follicular.helper
Macrophages.M2
T.cells.CD4.memory.activated
Dendritic.cells.resting
Macrophages.M0
Eosinophils
Mast.cells.activated
Mast.cells.resting
Neutrophils
Monocytes
T.cells.gamma.delta
NK.cells.resting
NK.cells.activated
Macrophages.M1
Dendritic.cells.activated
B.cells.naive
B.cells.memory
Plasma.cells
SEPT5
CEMP1
EFNA5
KIRREL
NPAS1
NTN3
SSX1
PCDHA5
ZNF442
CDH12
KRT18P50
IFNA10
MBL2
PTPRG
LHCGR
PKD2L2
DACH1
GIPR
SMPDL3B
MAGEA11
GYPE
BRSK2
HIC1
GPR1
CLCA3P
GALR1
PDE6C
TARDBPP1
BFSP1
MAP9
MAP4K2
CRYBB1
MAP3K13
KCNG2
ATXN8OS
GPR25
FRK
SLC12A1
UPK3A
WNT7A
TRAV12−2
CA8
TSHR
CDK6
SCN9A
PPFIBP1
CCDC102B
TEC
NOX3
IL5
LINC00597
IL17A
IL26
IL4
SKA1
CDC25A
ZBTB32
REN
MROH7
DENND5B
UGT2B17
PAX7
HIST1H2AE
ANGPT4
ZNF204P
FLJ13197
ZNF324
SERGEF
CXorf57
ZNF135
CDHR1
TEP1
KLRC4
NAALADL1
FAM124B
STXBP6
PAQR5
SEPT8
RRP9
ORC1
FZD3
SLC7A10
GGT5
TRPM4
KIAA0754
NME8
WNT5B
HTR2B
GRAP2
COLQ
ZBP1
KIR2DL1
S1PR5
TRAF4
MICAL3
IL7
NMBR
BEND5
LINC00921
ABCB4
FAM212B
STEAP4
C5AR2
CASP5
ABCB9
MAST1
TGM5
CCR10
HIST1H2BG
MANEA
RASA3
FBXL8
EPB41
RCAN3
CCR6
RPL10L
TRAV13−2
HMGB3P30
BARX2
LILRA4
CRISP3
GPR19
TRAV8−6
EPHA1
DSC1
GAL3ST4
VILL
BCL7A
MEP1A
UGT1A8
ZNF286A
TMEM156
FRMD8
RYR1
LOC126987
PLCH2
TNFRSF11A
SEC31B
EPN2
TRPM6
IL5RA
P2RY2
DEPDC5
ZNF222
RRP12
DAPK2
TLR7
ARRB1
LILRA3
ACHE
APOL6
TNIP3
SLCO5A1
PDCD1LG2
TMEM255A
DHRS11
FZD2
GSTT1
COL8A2
GPC4
CSF1
CR2
CD72
FCER2
NLRP3
ASGR1
PADI4
MEFV
VNN1
HPSE
FES
ASGR2
HNMT
FAM198B
CD33
LTC4S
IL1RL1
CMA1
CCL7
SLC12A8
CCL1
NTRK1
FAM174B
ADAMTS3
RAB27B
HOXA1
BPI
AZU1
GFI1
CSF2
IL3
CD40LG
TRAV13−1
DPP4
CXCR6
LAG3
PRR5L
KLRC3
PLEKHF1
CRTAM
ST6GALNAC4
C11orf80
PDK1
KCNA3
TNFRSF13B
GNG7
REPS2
TREML2
PGLYRP1
MAK
BTNL8
CREB5
CEACAM3
VNN3
CAMP
PLEKHG3
MXD1
P2RY10
SMPD3
OSM
ST3GAL6
ICA1
PDCD1
CXCR5
ST8SIA1
ZBTB10
IL21
FLT3LG
LEF1
UBASH3A
LY9
LIME1
ACAP1
ANKRD55
CD70
FOXP3
TYR
TRAV21
TRAV9−2
ETS1
CD28
CD8B
FASLG
PTGDR
KIR2DS4
TXK
IL12RB2
KIR3DL2
KIR2DL4
TNFSF14
LTA
MSC
DHX58
PTGIR
RASSF4
TLR8
NOD2
PLA1A
HESX1
SOCS1
CHST7
ARHGAP22
IL12B
ETV3
CCL14
HRH1
RENBP
FRMD4A
CD68
CD209
FLVCR2
SLAMF8
ALOX15
CCR5
HSPA6
HAL
FPR2
CDA
NFE2
LILRA2
BST1
ALOX5
CCR2
CD1D
CFP
DCSTAMP
MARCO
PPBP
BHLHE41
CXCL5
HK3
PSG2
NIPSNAP3B
ADAM28
RALGPS2
BRAF
NPIPB15
SP140
CD180
GPR18
BACH2
HHEX
STAP1
VPREB3
CD22
FCRL2
BLK
SPIB
FCGR2B
PNOC
AIM2
IL2RA
IFNG
IL9
SIT1
CD5
PBXIP1
CLEC2D
ZFP36L2
CD96
LAIR2
SKAP1
SPOCK2
DGKA
MAP4K1
SH2D1A
SIRPG
CD6
TRAT1
TNFRSF4
CTLA4
TBX21
GZMM
NCR3
DEFA4
TTC38
CD244
CD160
KLRG1
DUSP2
RPL3P7
CD3G
TCF7
ATHL1
BMP2K
P2RX1
MARCH3
RGS13
MS4A2
IL18R1
ATP8B4
CEACAM8
MS4A3
IL1A
MYB
FCER1A
CTSG
ELANE
ADRB2
CXCL3
CCL20
MAN1A1
AMPD1
SPAG4
IGHE
RASGRP3
EAF2
TNFRSF17
LOC100130100
CD7
PMCH
GPR171
PASK
ICOS
CHI3L2
TRIB2
CXCL13
CD8A
GPR97
EMR3
EMR2
CCR3
ZNF165
NR4A3
GPR183
RGS1
GPR65
P2RY14
RNASE2
LRMP
FOSB
CYP27B1
SLC2A6
SLAMF1
SIGLEC1
CCL8
IFI44L
CD80
CD40
CXCL11
APOL3
ADAMDEC1
KYNU
RSAD2
CD86
CLIC2
CCL17
BIRC3
TREM2
C1orf54
FPR3
EGR2
CLEC4A
CLEC10A
NPL
CCL23
HLA−DQA1
CCL13
SLC15A3
CHI3L1
CYP27A1
PLA2G7
IL18RAP
KLRD1
KLRF1
APOBEC3G
CTSW
PTGER2
CCND2
CD300A
IL1B
HDC
FFAR2
TNFRSF10C
MGAM
CXCR1
MMP25
CHST15
QPCT
DPEP2
P2RY13
TREM1
CSF3R
TLR2
LILRB2
APOBEC3A
CLEC7A
IGSF6
S100A12
HCK
LST1
C5AR1
AQP9
VNN2
CXCR2
CD1B
CD1A
MMP12
RNASE6
CD1C
CD1E
ACP5
AIF1
MS4A6A
CD4
CCL18
CD19
CD79B
KIAA0226L
BANK1
HLA−DOB
P2RX5
CD79A
SIK1
LY86
IL4R
TCL1A
IRF8
MS4A1
CD37
FAIM3
FAM65B
RASGRP2
CD69
FCN1
CCL22
SAMSN1
C3AR1
BCL2A1
EMR1
CLC
NCF2
MNDA
FPR1
FCGR3B
MMP9
TPSAB1
CPA3
HPGDS
PRG2
CD3E
PIK3IP1
BCL11B
LAT
PVRIG
ZAP70
PTPRCAP
CD27
ITK
LCK
CD2
SELL
IL7R
TRAC
TRBC1
CD3D
LTB
PRF1
GNLY
GZMA
IL2RB
CD247
GZMB
GZMH
KLRK1
KLRB1
GZMK
CST7
NKG7
TRDC
CCL5
IDO1
LAMP3
CCR7
TNFAIP6
EBI3
CD38
CXCL9
CCL19
CXCL10
CCL4
IGKC
IGHM
IGLL3P
GUSBP11
IGHD
MZB1
Since RNAseq is popular today for tumor sequencing, it is desirable to obtain enriched immune cell line
RNAseq data and produce a new basis matrix. However, online discussion suggests that RNAseq does not
support the independence assumptions in microarray analysis: https://www.biostars.org/p/160961/. This
context may require different reference profile expression methods.
References
1. Abbas A, Baldwin D, Ma Y, Ouyang W, Gurney A, Martin F, et al. Immune response in silico (iRIS):
Immune-specific genes identified from a compendium of microarray expression data. Genes and immunity.
Nature Publishing Group; 2005;6: 319–331.
2. Bindea G, Mlecnik B, Tosolini M, Kirilovsky A, Waldner M, Obenauf AC, et al. Spatiotemporal
dynamics of intratumoral immune cells reveal the immune landscape in human cancer. Immunity. Elsevier;
2013;39: 782–795.
3. Yoshihara K, Shahmoradgoli M, Martínez E, Vegesna R, Kim H, Torres-Garcia W, et al. Inferring
tumour purity and stromal and immune cell admixture from expression data. Nature communications. Nature
Publishing Group; 2013;4.
4. Storey JD, Xiao W, Leek JT, Tompkins RG, Davis RW. Significance analysis of time course microarray experiments. Proceedings of the National Academy of Sciences of the United States of America. National
Acad Sciences; 2005;102: 12837–12842.
5. Abbas AR, Wolslegel K, Seshasayee D, Modrusan Z, Clark HF. Deconvolution of blood microarray
data identifies cellular activation patterns in systemic lupus erythematosus. PloS one. Public Library of Science;
2009;4: e6098.
6. Shen-Orr SS, Tibshirani R, Khatri P, Bodian DL, Staedtler F, Perry NM, et al. Cell type–specific
gene expression differences in complex tissues. Nature methods. Nature Publishing Group; 2010;7: 287–289.
7. Newman AM, Liu CL, Green MR, Gentles AJ, Feng W, Xu Y, et al. Robust enumeration of cell
subsets from tissue expression profiles. Nature methods. Nature Publishing Group; 2015;12: 453–457.
8. Senbabaoglu Y, Winer AG, Gejman RS, Liu M, Luna A, Ostrovnaya I, et al. The landscape of t cell
infiltration in human cancer and its association with antigen presenting gene expression. bioRxiv. Cold Spring
Harbor Labs Journals; 2015; doi:10.1101/025908
13