Download ANNOTATATION OF GENE LISTS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pathogenomics wikipedia , lookup

Essential gene wikipedia , lookup

Behavioral epigenetics wikipedia , lookup

Metagenomics wikipedia , lookup

X-inactivation wikipedia , lookup

Gene wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Designer baby wikipedia , lookup

Genome evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Minimal genome wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genomic imprinting wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome (book) wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene expression programming wikipedia , lookup

RNA-Seq wikipedia , lookup

Microevolution wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
ANNOTATATION OF GENE LISTS
As we have seen from the SAM session, we can visualize the table containing the
significant probesets typing:
> [email protected]
The problem of this list is that it is not very informative on the biological point of
view. We sure would like to know more about each of the significant probesets.
In this session, we will try to build a HTML output carrying relevant biological
annotation for the significant probesets. We will also present the results ordered by
fold change.
Preliminary operations.
1. Open R-Gui;
2. Load the workspace containing the SAM analysis objects.
Obtaining and modifying SAM results.
We can create a data.frame where we can store the SAM results:
> samsig<[email protected]
let’s visualize the first 5 rows of this new data.frame:
>samsig[1:5,]
Row
d.value
206716_at
6242
0.01642527
209443_at
8936
0.06041301
204254_s_at 3781
0.17826224
206054_at
5580
0.03757413
220281_at
19645
0.03544170
stdev rawp q.value
-21.86692 0.2683784
R.fold
0
0
-20.88361 0.1948048
0
0
-20.61441 0.1205610
0
0
-19.61923 0.2375107
0
0
-19.42689 0.2467351
0
0
The columns containing the q-values and the fold changes are the fifth and the sixth.
So, we can create a new data.frame containing only the probeset IDs, the q-values
and the fold changes:
> res<-samsig[,5:6]
You can notice that there are fold changes < 1 (0.-something). These fold changes
indicate down-regulation of the expression in the RCC samples, compared to NORM
samples. The reason of this, is that in SAM the fold changes have been calculated
dividing the average expression of RCC by the average expression of NORM:
RCC / NORM.
It is more comfortable to visualize the down-regulated genes with negative fold
change. For instance, a gene having fold change 0.50 would be expressed by the fold
change -2. This is more readable.
Everything we need to do, is to transform the fold changes < 1 in -1/fold change:
> dim(res)
[1] 217
2
The data.frame “res” contains 217 rows and 2 columns.
>
+
+
+
for(i in 1:217) {
if(res[i,2]<1)
res[i,2] <- -1/res[i,2]
}
For each row of the data.frame “res” (217), we will recalculate the new fold change if
this is < 1.
In this very simple function, we use the variable i as a counter that will increase of
one unit at the end of each cycle, till the value 217.
Let’s visualize the first 5 rows of the new res object:
> res[1:5,]
206716_at
209443_at
204254_s_at
206054_at
220281_at
q.value
0
0
0
0
0
R.fold
-60.881782
-16.552726
-5.609713
-26.614057
-28.215350
We can notice that now the fold changes expressing down-regulation are negative
numbers, but the probesets are ordered by increasing q-value, while we want to order
the results by decreasing fold change.
We can sort the column containing fold changes and generate and index carrying the
information of the position of each row of the data.frame according to the new order.
To do this, we can use the function sort() specifying that we want the decreasing
order (dec=T) and that we want to generate the index (index.return=T).
> sorted<-sort(res[,2], dec=T, index.return=T)
The new object named “sorted” contains two slots: sorted$x contains the
ordered fold changes, and the slot sorted$ix contains the index.
Now we can re-order all the rows of “res” using sorted$ix:
> res<-res[sorted$ix,]
Biological annotation and output creation.
In order to build up the annotation for our list of genes and to create the output table
we need to use two libraries: hgu133a and annaffy.
> library(hgu133a)
>library(annaffy)
To see which columns of data can be included, we use the aaf.handler()
function:
> aaf.handler()
[1] "Probe"
"Description"
[5] "Chromosome"
"GenBank"
[9] "Cytoband"
"Gene Ontology"
[13] "Pathway"
"Symbol"
"Function"
"Chromosome Location"
"LocusLink"
"UniGene"
"PubMed"
In this example we will use all the annotation available, but many times we might
decide to use only few of them.
> anncols <- aaf.handler()[c(1:13)]
Once we have defined the annotation columns (anncols), we can build the
annotation table using the function aafTableAnn():
> anntable <- aafTableAnn(row.names(res), "hgu133a",
anncols)
Now we can create separate tables carrying the q-values and the fold changes
information:
> qvaltable <- aafTable("q-value"=res[,1])
> fctable <- aafTable("fold change"=res[,2], signed=T)
Using the function merge() it is possible to push together the qvaltable and the
fctable. In the same way, we can merge also the annotation table (anntable).
> tb<-merge(qvaltable, fctable)
> tb<-merge(anntable, tb)
Finally, we can create the output table in HTML format. We need to specify the name
of the file (“SAMout.html”).
> saveHTML(tb, "SAMout.html", title = "Significant Genes
with annotation")
Visualizing the genomic position of the genes.
We can produce a plot to graphically visualize the position of our significant genes on
the chromosomes.
First, we need to load the package geneplotter.
> library(geneplotter)
We can now build an object containing the location of all the genes represented on the
Affymetrix HG-U133A chipset using the function buildChromLocation().
>chrloc<-buildChromLocation(“hgu133a”)
Now we can plot chrloc:
> cPlot(chrloc)
The relative position of the genes on the chromosomes is visualized in white.
Now we want to highlight our 217 significant probesets.
The probesets are stored as row.names of the data.frame res.
> gn<-row.names(res)
Finally, we can color our genes in red, for instance:
> cColor(gn, "red", chrloc)
Thank you for your attention!
Dario
For contacts:
Dario Greco
Institute of Biotechnology - University of Helsinki
Building Cultivator II
P.O.Box 56
Viikinkaari 4
FIN-00014
Finland
Office: +358 9 191 58951
Fax: +358 9 191 58952
Mobile: +358 44 023 5780
Email: [email protected]