Download Copy Number and Gene Expression Integration in Partek

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genomics wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Metagenomics wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Minimal genome wikipedia , lookup

Point mutation wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

History of genetic engineering wikipedia , lookup

NEDD9 wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Oncogenomics wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Gene therapy wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Pathogenomics wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Ridge (biology) wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Gene nomenclature wikipedia , lookup

Gene desert wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Public health genomics wikipedia , lookup

X-inactivation wikipedia , lookup

Genome evolution wikipedia , lookup

Gene wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Genome (book) wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genomic imprinting wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Microevolution wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Copy-number variation wikipedia , lookup

Designer baby wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
Copy Number and Gene Expression Integration in Partek®
Genomics Suite™
Intr oduction
Integration of different genomic level experimental data enables researchers to define
genomic regions of interest based on the intersection of two experimental approaches rather
than somewhat arbitrary decisions regarding false discovery rates. With the ability to
measure and visualize both copy number variant regions and differentially expressed genes,
Partek enables you to find variant genes potentially altered due to copy number aberration
and even verify this association by correlating copy number with gene expression data.
This tutorial will illustrate how to:
•
•
•
•
•
Open a previously created project in Partek containing multiple files
Find regions of significant copy number variation across multiple samples
Merge copy number and gene expression data in multiple ways to find association
between copy number aberration and differential expression
Visualize gene expression and copy number data simultaneously at the genome
level
Correlate and visualize gene expression and copy number at the gene level
Data were originally downloaded from ArrayExpress (E-TABM-282, E -TABM-283 and
E-TABM-284). The data is from a published study in Molecular Cancer (2008) by Cifola
et al. To obtain a copy of the data, which is already formatted from Partek, download the
data associated with this tutorial from within Partek at Help > On-line Tutorials. After
the file has been unzipped, simply open the Partek project by selecting File > Open
Project and browse to the cifola.ppj file within the unzipped data. A collection of several
files containing both gene expression and copy number data will open within Partek
(Figure 1).
Please note: The following tutorial was built using Partek® Genomics Suite™ v6.4. As
Partek is a rapidly evolving software application, future versions of Partek may be
different that what is displayed within this tutorial. To ensure that you are using the most
current version of Partek, please visit Help > Check for Updates from within Partek.
Copy Number and Gene Expression Integration in Partek Genomics Suite
1
Figure 1: Viewing the contents of the Partek project
Under standing the Contents of the Par tek Pr oject
Expr ession Data
There are 27 samples run on Affymetrix U133plus2. 16 renal cancer and 11 normal
samples with some samples paired tumor-normal. The first spreadsheet, Cifola_U133_2
contains the imported and summarized log2 transformed gene intensities. A two factor
ANOVA was run including both disease state and individual. Then a list of 710 genes that
were differentially expressed with respect to disease state was created. An FDR adjusted pvalue of 0.0001 was used to define the list.
Copy Number Data
54 samples were run on both Affymetrix 50K SNP arrays (100K total). These 54 samples
represent 27 pairs of tumor-normals from the same individuals. For all samples, both
arrays were simultaneously imported into Partek and copy number estimates were created
using the normal pair for each sample as the normal baseline. The resulting paired copy
number estimate file is the first file in the copy number section. The intensity copy number
sheet is not included in the data set. Then the segmentation algorithm was run using
default settings to collect individual markers that demonstrated aberrant levels of DNA
abundances relative to each normal pair. The resulting segmentation table lists one region
of aberrant DNA abundance per sample, per row.
Continuing the Copy Number Analysis – Histogr am of Segments
Start by viewing a histogram of segmented regions measured at the marker level.
•
•
From within the Copy Number workflow, select Plot Detected Regions
Select the segmentation spreadsheet from the drop down, and select the Plot
Histogr am radio button (Figure 2)
Copy Number and Gene Expression Integration in Partek Genomics Suite
2
Figure 2: Setting up a Histogram of Segments
The resulting histogram is displayed in Figure 3. Deletions are displayed to the left of the
center of the chromosome in blue and amplifications are displayed to the right in red. For
each genomic marker, the height of the red or blue bars displays the number of samples that
contain an amplification (red) or deletion (blue) within the experimental population. Note
that a new spreadsheet will be created, called Histogram, which is used to create the
graphic.
Figure 3: Viewing the Histogram of segmentation measured per marker
Inter pr eting the Histogr am
Examining the histogram shown in Figure 3, there are two major regions of segmentation
shared across multiple samples – a deletion (shown in blue) across the p arm of
chromosome 3 and an amplification (shown in red) across the q arm of chromosome 5. It
appears that both of these regions appear in greater than 8 samples based on the colorcoding of the histogram bars within these regions. To define these regions more precisely,
use the Detect Changes in Multiple Samples function.
Finding Regions in Multiple Samples or Cr eating the Sig-Regions Table
The segmentation table reports individual segments on a per sample basis. Two methods
exist within Partek to determine which segments are shared across multiple samples – Find
Regions in Multiple Samples and Detect Changes by Category. For this tutorial, use Find
Regions in Multiple Samples.
Copy Number and Gene Expression Integration in Partek Genomics Suite
3
•
•
•
Select Find Regions in Multiple Samples from the Copy Number workflow.
The setup dialog is shown in Figure 4
Specify the segmentation spreadsheet, and select the minimum number of samples
that a region should be present to be reported in the new table. Based on the
example histogram, use eight (8). Note that this value is somewhat arbitrarily
determined by the researcher.
Specify the output sheet name (or accept the default), and select OK. A new
spreadsheet will be created called Sig-Regions
The sig-regions table reports one shared segment per row with values such as length,
number of samples, and average copy number value, across the region.
Figure 4: Configuring the Find Regions in Multiple Samples dialog
Visualizing the Results of the Sig-Regions Table
•
•
To visualize the results of the sig-regions table, select Plot Detected Regions
from the Copy Number workflow
Select Sig-Regions as the spreadsheet to graph; the output is displayed in Figure 5
Copy Number and Gene Expression Integration in Partek Genomics Suite
4
Figure 5: Plot Detected Regions using Sig-Regions as the input
Inspecting the karyograph in Figure 5, the previous regions are easily identified, but a
deletion on chromosome 6 is also shown. Looking back at the histogram (Figure 3), the
aberrant region on chromosome 6 is present there as well.
Integrating Gene Expr ession (GX) with Copy Number (CN)
The Integrated Genomics workflows in Partek v6.4 focus on generating 3 data tables:
•
•
•
A gene-focused table merging GX & CN
A copy-number segment focused table merging CN & GX
A table correlating CN & GX
In the following sections, we will begin with selecting the appropriate table and
understanding the information in the table. Next, we will continue with visualizations,
including clustergrams, and correspondence analysis for gene-focused merging, as well as
using a scatter plot to display CN & GX correlation.
Mer ging GX and CN to Cr eate a Gene-Focused Data Table
•
•
From the Genomics Integration workflow of either the Gene Expression or Copy
Number workflows, select Mer ge Copy Number Regions with Gene
Expr ession, the dialog in Figure 6 will appear
For this tutorial, select One r esult per gene, labeled by copy number r egions
from the drop down list. This will build a gene-focused data table merging both
gene expression and copy number data with one gene per row
Copy Number and Gene Expression Integration in Partek Genomics Suite
5
Figure 6: Setting up the Genomics Integration Workflow and Merge Copy Number Regions
with Gene Expression dialog
•
Next, select the gene expression and copy number spreadsheets to merge as well
as the resulting file name (Figure 7)
Figure 7: Configuring the dialog for gene-based CN/GX merge table
Deter mining Which Tables to Select
Gene Expression
When building a gene-based table, the gene expression input should be a filtered set of
differentially expressed genes. Ideally, this list was produced from the Gene Expression
workflow. Alternatively, an ANOVA table with a filter applied can be used. If a contrast
was used in the ANOVA model, then the fold changes and direction call will be included in
the merged GX/CN table. For this example, select the premade list of 710 changed genes
between tumor and normal.
Copy Number
When building a gene-based table, select either a segmentation table or a shared segment
table, like chi-square or sig-regions. The decision on whether to use chi-square or sigregions is dependent on the level of phenotypic categories available within the table. In
this paired analysis (tumor-normal) example, additional information such as grade, drug
Copy Number and Gene Expression Integration in Partek Genomics Suite
6
response, or remission is not available; therefore, sig-regions is more appropriate.
However, chi-square is more useful if additional phenotypic information is available and
will be used as grouping variables.
For this example, use the Sig-Regions table, which includes segments that are observed in 8
or more samples. Using a shared segments table, rather than a straight segmentation table,
will produce more interesting results as differentially expressed genes will be merged with
aberrant regions shared across many samples. The resulting table will be named genes
merged with CN.txt, by default.
Under standing the Number of Rows in the “Genes Mer ged with CN” Table
The resulting table will have one row for each gene in the selected gene table, which
overlaps with a segment in the copy number table. Sometimes more than one region will be
presented for a single gene. This happens when a gene overlaps with more than one
segment in the copy number table. Most commonly, this is caused by the region being
deleted in some samples and amplified in others. It is also possible to have a gene name
represented twice due to “redundant” representation on the array. However, the probe set
IDs would be unique.
Selecting Suggested Visualizations
Chromosome View
•
From the View menu, select Chr omosome View with the “genes merged with
CN” spreadsheet selected. A histogram (Figure 8) will be displayed with two
tracks for the regions on chromosome 3. On the bottom, there is a histogram of
the number of samples showing amplification and deletion frequency in a region.
On the top, the fold change of overlapping genes is displayed according to the
merge table. By default, only one chromosome at a time will be shown. You can
switch to a different whole chromosome by right clicking on the chromosome
number in the upper left corner of the graph. Left clicking on a chromosome
number will add that chromosome to the view. Only chromosomes represented in
the merge table can be viewed. Note the region on the end of the q arm of
chromosome 3 has an area of decreased expression and copy number abundance.
Region View
•
<Right click> on a row header, and select Plot > Region View (not shown). This
will initiate the genomic browser wizard and display the selected tracks for the
given zoomed in region associated with the selected row header. To plot
expression data, select Add Spr eadsheet with Other Genomic Data selected in
the drop down. Then choose to plot the expression intensity spreadsheet in this
newly selected track.
Copy Number and Gene Expression Integration in Partek Genomics Suite
7
Figure 8: Viewing two chromosome views side by side. On the left is chromosome 3 and
the right is chromosome 5. Right clicking on the chromosome numbers near the top will
switch to the selected chromosome. Left clicking will add another chromosome into the
same view. On each plot, the first track shows gene expression fold changes for genes that
were in the selected drop down upon creation of the merge table. Red upward bars
represent increases and blue downward bars represent decreases. The bottom, second
track shows regions of copy number change. Amplifications appear pointing upward and
deletions pointing downward. The y axis on this track is the frequency of the aberration.
The bars are color coded based on number of samples as well.
Under standing the “Genes Mer ged with Copy Number ” Table
The next four figures display the output of the gene focused merge table. Brief
descriptions of the values are listed in the corresponding figure legends.
Figure 9: Viewing the annotation section. Expression probe set ID, cytoband location, the
Gene Title, and Gene Symbol
Copy Number and Gene Expression Integration in Partek Genomics Suite
8
Figure 10: Viewing the gene expression data. For the given gene, these are the p-values of
differentially expressed genes from the ANOVA model and fold change magnitude and
direction, if a contrast was used
Figure 11: Viewing the copy number data. The gene’s location in copy genomic
coordinates, the copy number direction, the # of samples with the amplification in this
region, and the average copy number magnitude across this region. Then the same
information is given for deletions. If the table was joined with a Chi-Square table, then the
average chi-square value is reported as well. The question mark in column 19 is there
because the row contains only amplifications and an average can’t be calculated
Figure 12: Viewing the Standardized Gene Expression. The rest of the data table has the
standardized gene expression values (mean = 0 and standard deviation = 1) of all genes in
the table across all of the samples. These data would be used to generate the optional
clustergram
Inter pr eting the Optional Hier ar chical Cluster gram
A checkbox to create an optional hierarchical clustergram from the gene expression
intensities is located in the Merge Genes with CN dialog (Figure 7). If this box is checked
during set up, then the resulting clustergram will be displayed (Figure 13).
Copy Number and Gene Expression Integration in Partek Genomics Suite
9
Figure 13: Viewing the clustered standardized gene expression. Data is clustered based on
expression pattern across samples. Each vertical column contains data from one gene
expression sample. Samples with similar expression patterns are grouped in the
dendrogram. The copy number abundance is displayed as a row label to the left with red
indicating amplification and blue showing deletions. Then standardize gene expression
intensities are displayed within the clustergram with grey indicating mean expression, blue
= below average expression, and red = above average expression.
Under standing Corr espondence Analysis on Gene-Focused “Genes Mer ged with CN”
Table
Correspondence analysis is a descriptive or exploratory technique that compares the
frequency of instances across rows and columns in a table. It will show how related fold
change corresponds with copy number variation. The Genes merged with CN table gives a
direction call for Copy Number (Amplification or Deletion) and a Fold Change direction
for Gene Expression (Up or Down for a given contrast). Having the GX direction calls in
the table is dependent on including a contrast in the ANOVA model. The concept of up
versus down in gene expression is limited to a two-group comparison.
•
To invoke correspondence analysis, select it from the Integrated Genomics
workflow. From the dialog, only a Genes merged with CN sheet can be selected.
By default, the copy number description and the gene expression direction will be
analyzed. The resulting table displays the co-occurrence of genes that are up and
down regulated with copy number that is amplified and/or deleted. For an easier
view, select the bar chart icon at the top of the correspondence interface (Figure
14). The resulting bar chart shows us that for genes whose expression goes up in
Copy Number and Gene Expression Integration in Partek Genomics Suite
10
cancer, they are equally distributed across regions that are amplified and deleted,
but for genes whose abundance goes down, there are many more deleted DNA
regions than amplified ones.
Figure 14: Viewing the correspondence analysis table and bar chart. This identifies the
global trend of DNA abundance corresponding to changes in RNA abundance
Mer ging GX and CN to Cr eate a Copy Number Focused Data Table
The next merged table is a copy number focused table.
•
•
To create the table, select Mer ge Copy Number Regions with Gene Expr ession
from the Genomic Integration workflow
In the first dialog, select One r esult per copy number r egion from the drop
down (Figure 15). This will create a copy-number focused data table displaying
one segment per row
Figure 15: Configuring the initial set up dialog to create the CN merged with genes table
•
A second dialog will appear. Here the specific gene expression and copy number
sheets to be merged are selected and the output file name is defined (Figure 16)
Copy Number and Gene Expression Integration in Partek Genomics Suite
11
Figure 16: Configuring the dialog to create CN merged with genes
Under standing Which Tables to Select
Gene Expression
Again, a table of differentially expressed genes such as a filtered ANOVA table or a gene
list is best. It is recommended to include a contrast in the ANOVA as the direction
descriptions can then be added into the merged table. If there is more than one contrast in
the ANOVA, then you can select the contrast to be merged into the table. If an unfiltered
ANOVA table is selected, then all genes within the selected copy number table will be
displayed regardless of the statistical significance. The count functions in columns 2 and 3
will add any gene with the appropriate direction regardless of the p-value, so it is desirable
to limit the analysis to only significant genes, a p-value filter should be applied to the
ANOVA table or a gene list should be created.
•
For this example, select the same list of 710 differentially expressed genes as the
GX input
Copy Number
The input here is limited to a shared segment table such as sig-regions or chi-square. A
raw segmentation table cannot be selected, as we will be monitoring trends with gene
expression across multiple samples.
•
Select the sig-regions table with shared segments in 8 or more samples
The chi-square table could be selected if one was generated, but there is no other decent
categorical variable in this data set to group the experiment (the grade category is
attractive, but the experiment is underpowered for this category).
Copy Number and Gene Expression Integration in Partek Genomics Suite
12
Sorting Options for the “CN Mer ged with Genes” Table
To find gene-rich copy number segments, sort descending on column 4 – % of genes higher
expressed in renal cell carcinoma.
•
•
To perform this sort, simply <right click> over column header 4 and select Sor t
Descending from the drop down.
It may also be helpful to use the interactive filter to limit the table to only
segments that contain 3 or more genes, this will enrich for gene-heavy segments.
To apply this limit, select the interactive filter icon
from the top and select
column 5 from the drop down. Next, pull the minimum slides to a value of above
2 but below 3. The resulting table will only display segments with 3 or more
genes.
Under standing the Infor mation in the Segment Focused “CN Mer ged with Genes” Table
Column 1 contains the genomic coordinates of the segments from the sig-regions or chisquare table (Figure 17).
Column 2 is the number of differentially expressed genes in one side of the contrast.
Column 3 is the number of differentially expressed genes in the other side of the contrast.
Column 4 is column 2 divided by column 5 – or the percentage of affected genes within the
given segment.
Column 5 is the number of genes based on expression probe sets in within the segment.
Column 6 is the number of samples, which had aberrant copy number abundance
associated with this segment.
The first row of Figure 17 shows that this segment has 7 genes and all of these genes have
higher expression in renal cancer and that 8 of the samples are positive for altered copy
number abundance for this region. This region is an excellent candidate for a phenotypic
“driver” since the change in chromosomal abundance leads to a change in the associated
RNA abundance for the affected genes.
Figure 17: Viewing the contents of the CN merged with genes table with a filter applied to
only show segments with 3 or more genes and then sorted descending on column 4 “ % of
genes higher in renal carcinoma” .
Copy Number and Gene Expression Integration in Partek Genomics Suite
13
Corr elating GX and CN
The last table available within Partek v6.4 to merge CN and GX is the correlation table.
•
To invoke the table, select Cor r elate Copy Number with Gene Expr ession from
either the CN or GX workflow under Genomic Integration. The setup dialog is
displayed in Figure 18.
The goal of this table is to identify genes where the level of mRNA expression correlates
with the abundance of the DNA template. That is, if there is an amplification of the DNA
for a given gene in some samples relative to others and in addition, there is a coordinate
increase in the mRNA of the same gene, then the gene will have a strong correlation value
across the experimental population. The inference from this is that the regulation of these
mRNAs is affected more by increases in template abundance than by transcriptional or
post-transcriptional regulation.
Figure 18: Configuring the Correlate Copy Number and Gene Expression Setup dialog
Deciding Which Tables to Select in the CN and GX Cor r elation Set-up Dialog
A specific table will need to be selected to correlate for both CN & GX. For the copy
number table, the input can be either a segmentation table or a shared segment table, like
chi-square or sig-regions. In this data set, additional phenotypic groups (that are well
powered) are not available, so use the sig-regions table with shared segments in 8 or more
samples.
For gene expression, a list of differentially expressed genes or an ANOVA table with a
filter applied is recommended. However, because of the low density of markers available
on the 100K SNP array, the shared segments in 8 or more samples overlaps very poorly
with the list of 710 genes. To increase the scope of the data, use all genes found
overlapping with the segments in the sig-regions table. To accomplish this, select the Gene
Expression Intensities table, labeled 1 (Cifola_U133_2), from the Gene Expression drop
down.
Note that under some conditions, Partek will generate an intermediate spreadsheet that has
been transformed to get the data into the correct table structure. For example, the sigCopy Number and Gene Expression Integration in Partek Genomics Suite
14
regions table may be transformed - (this process is identical to that used in the Tools >
Genomic Transform menu with Average from Original Data selected.)
Under standing the Infor mation in the Corr elation Table
The following two figures display the contents of the correlation spreadsheet and the figure
legends give brief descriptions of the values. Figure 19 shows the output when a GX
intensity sheet was selected. Figure 20 shows the output when either an ANOVA table or a
gene list was selected for the GX input.
Figure 19: Viewing of the format of the correlation table when GX intensity table is
selected. (The sig-regions table was the CN table selected.) Column 1 shows the
chromosome and genomic coordinates of the region. Column 2 shows the probe set ID
from the gene expression experiment. Column 3 shows the correlation value between the
gene expression and copy number values across all the samples in the input tables, and
column 4 shows the p-value of that correlation, which takes into account the sample size
(whereas the correlation value does not). Further, the p-value will rank both positive and
negative correlations
Figure 20: Viewing the format of the correlation table when ANOVA or gene list is
selected. (The sig-regions table was selected. If chi-square was selected as input, the chisquare statistics will also be represented in this table.) The values are similar to those
described in Figure 18, but the p-values from the ANOVA tables are displayed and if a
contrast was included in the ANOVA table, then the fold change direction is displayed
Tip! To add more genomic annotation information to the correlation table, simply select a
column header and <right click> to Insert Annotation.
Copy Number and Gene Expression Integration in Partek Genomics Suite
15
A more detailed correlation interface is available from Tools > Correlate Genomic Data.
The availability of this tool within the menu is context dependent. You need to have a
table of transposed genomic data selected or a table of gene expression intensities.
Corr elation of Gene Expr ession with SNP Mar ker s
This tutorial has described the correlation of copy number intensities with gene expression,
but it is also possible to correlate SNP markers with gene expression. However, you must
select the advanced interface to accomplish this.
•
Select Tools > Cor r elate Genomic Data, then select a table of SNP genotype
calls as one of the two spreadsheets to correlate (Figure 21)
Figure 21: Configuring the Advanced Genomic Correlation Setup dialog. This can be
called from the Tools Menu, then Correlate Genomic Data. This menu option is only
available when a spreadsheet of transposed genomic data is selected, e.g., transposed sigregions or expression intensities
Selecting Suggested Visualizations
Once you’ve created the correlation table, then you can visualize the raw intensities for
both gene expression and copy number across all of the samples in your table.
•
From the row header of interest, <right click> and select Scatter Plot (Or ig.
data) from the drop down (Figure 22). The resulting plot is displayed in Figure
23
Copy Number and Gene Expression Integration in Partek Genomics Suite
16
Figure 22: Invoking a scatter plot from the correlation table from a right-click on the row
header
Figure 23: Viewing the resulting correlation plot. Use the drop down at the top right of the
graph to color, size or shape the dots according at any variety of categorical variables. To
generate the regression line, hit the red ball and then the “ axes” , then “ set regression
line” and select “ regression line of y on x”
Copy Number and Gene Expression Integration in Partek Genomics Suite
17
End of Tutor ial
This is the end of the copy number and gene expression integration tutorial. If you need
additional assistance this data set, you can call our technical support staff at +1-314-8782329 or email @partek.com.
Refer ences
Cifola, I., Spinelli, R., Beltrame, L., Peano, C., Fasoli, E., Ferrero, S., Bosari, S., Signorini,
S., Rocco, F., Perego, R., Proserpio, V., Raimondo, F., Mocarelli, P., & Battaglia,
C. Genome-wide screening of copy number alterations and LOH events in renal
cell carcinomas and integration with gene expression profile. Molecular Cancer
2008, 7(1):6.
Copyright  2009 by Partek Incorporated. All Rights Reserved. Reproduction of this material without expressed written
consent from Partek Incorporated is strictly prohibited.
Copy Number and Gene Expression Integration in Partek Genomics Suite
18