Download Exercises

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Transposable element wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Essential gene wikipedia , lookup

Epistasis wikipedia , lookup

Oncogenomics wikipedia , lookup

Point mutation wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

X-inactivation wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Pathogenomics wikipedia , lookup

Public health genomics wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Genetic engineering wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Copy-number variation wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Minimal genome wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Ridge (biology) wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Genome evolution wikipedia , lookup

NEDD9 wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Helitron (biology) wikipedia , lookup

Gene therapy wikipedia , lookup

Genomic imprinting wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene wikipedia , lookup

The Selfish Gene wikipedia , lookup

Gene desert wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene nomenclature wikipedia , lookup

Gene expression programming wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome (book) wikipedia , lookup

RNA-Seq wikipedia , lookup

Microevolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression profiling wikipedia , lookup

Designer baby wikipedia , lookup

Transcript
Exercises
For
Introduction to the Analysis
Of Microarray Data
Martina Bremer
Edward Himelblau
Andreas Madlung
The Data Set
The dataset Microarray.xls contains the results of a microarray dye-swap experiment. In
this experiment, six genes were spotted onto two arrays with “swapped” dye colors. Each
array consists of two blocks, each with three rows and five columns. The genes are
spotted with differing amounts of replication.
It is the goal of this experiment to decide which of the six genes are differentially
expressed among the treatment and control group. The treatment group is labeled with
red dye (635) on array one and with green dye (532) on array two. The control group is
labeled with the opposite dye.
Exercise 1: Open the dataset in Excel. Go to “File, Save As…” and save this file under a
new name. Look at the column headers. The columns ``Block'', ``Column'', and ``Row''
describe the position of a spot on the array. The column ``Name'' contains the gene name,
and the remaining two columns for each array contain the background corrected median
intensities for the red and green channel, respectively. There are two arrays. Since the
arrays are manufactured identically, the position of the genes is the same on both arrays.
Exercise 2: Check, whether any of the background corrected intensities are smaller than
0, and replace those values with 1, if necessary.
Check: You should have made two replacements.
Exercise 3: Compute the M-values for each gene on both arrays. For each array, insert a
new column into the spreadsheet and label it “M”. The M-value is the log 2 value of the
ration of red to green intensity. It can be computed in Excel by clicking into the empty
cell in the “M” column in the first row. Write “=LOG(E3/F3,2)”. Here E3 is the cell with
the red intensity and F3 is the cell with the green intensity for this spot. Then drag down
the column to fill in all the other M-values for array one. Repeat this procedure for array
two. You can round all your M-values to three digits, by adjusting the format of the
entries.
Check: For the gene At3g03050 on Array 1/Block1/Column 5/Row 2 you should have
obtained an M-value of –0.436 (rounded to three digits).
Question (Exercise 4): This value is negative. What does that mean biologically for this
spot of gene At3g03050?
Answer:
Exercise 5: We will conduct t-tests for the genes individually. To do this, we need to first
sort the dataset by ``Name'' to group the gene replications together.
Highlight the data (not including the first row). Next, click Data, click Sort, click Sort by
``(1)Name'' and make sure that ``My list has header Row'' is marked.
Insert lines between genes to visually separate them better. We have six genes
represented by 2-8 spots, each. Now your data file should look something like this:
Data Analysis
Exercise 6: Conduct the dye-swap normalization. For each spot, we need to compute the
average M-value for both arrays. To do this, include another column and label it
“corrected M”. Click into the first cell and enter “=AVERAGE(G3,K3)”. Now do the
same for all by filling down the column.
The data file is getting quite big at this point. We want to decide for each gene, whether it
is differentially expressed in the treatment and control group. All the information we need
to do this are the gene name and all the corrected M-values for this gene.
Exercise 7: Using Excel compute the average, Standard Deviation, test-statistic and pvalue for the first gene AT1g00100. The Excel commands needed are AVERAGE, and
STDEV (“=STDEV(L3,L4)”. For square root you can use SQRT. Do these calculations
in your Excel sheet in the columns next to “Corrected M”. First compute the average,
then the SD, then type in a formula for t as you find it below. (Ask for help if needed).
The (rounded) value of the test statistic then becomes:
t
x
s2
n
 8.684
df = n-1 = 2-1 = 1. Click on any empty cell in Excel and type “=TDIST(8.684,1,2)” to
obtain the p-value (rounded to three digits). Now calculate the other p values.
Show the p-values to your instructor before proceeding.
Exercise 8: Let’s clean up some. Open the empty spreadsheet named “Analysis.xls”. Go to
“File, Save As…” and save the file under a new name. This file contains a macro that we
will use to compute the p-value for each gene.
Copy the column of gene names and paste it into the first column of the Analysis file.
Highlight the column of corrected “M-values” in the Microarray spreadsheet, click Copy,
click into the first cell in the second column of the Analysis spreadsheet and click “Paste
Special”. Make sure to select “Values”. Save the new Analysis file that contains the gene
names and corrected M-values.
Check: Your new file should look like the one on the right. It is not necessary, to repeat the
gene name 2-8 times, you just need it once.
Exercise 9: To conduct the t-test for each gene, label the column next to the one with
corrected M-values “p-values”.
For each gene, highlight all M-values for a given gene that we have (between 2 and 6
values) and click “Ctrl t”. The value that appears is the p-value for this gene.
Enter the p-values that you obtained into this table:
Gene Name
p-value
At1g00100
At2g01250
At3g01020
At3g03050
At4g00235
At5g00405
Exercise 10: For which of the six genes can you say that they are differentially expressed
between the treatment and control group (use a significance level of 5%)? What does that
mean biologically for these genes?
Answer:
Note: Our very little array contains only six genes. That are not so many as to make a
multiple comparison procedure absolutely necessary. But to illustrate the process that is
used in “real” microarrays, which have several thousand genes spotted on them, we will
use both the Bonferroni method and the Linear Step Up Procedure to adjust for multiple
comparisons.
Exercise 11: Sort the genes by p-value (smallest first) and write the gene names as well
as the corresponding p-values into the table below. Use the Bonferroni method to decide
for each gene, whether it is differentially expressed at significance level 5%. To compute
the Bonferroni correction divide the level wanted (5%, 0.05) by the number of genes on
the array (6).
Gene Name
p-value
Bonferroni
correction
(0.05/6 = 0.00833)
0.00833
0.00833
0.00833
0.00833
0.00833
0.00833
Differentially
expressed
at level 5%
(Bonferroni)
YES or NO
Exercise 12: Use the linear step up procedure to decide for each gene whether it is
differentially expressed while controlling the False Discovery Rate at 5%.
To do that, first rank the p values in ascending order with the smallest one first. Then,
using a calculator, compute 0.05  6i
(0.05 equal 5% for FDR wanted; i = the rank of the gene (ie. the one with the lowest pvalue first i=1, then the next lowest i=2) etc. for each gene and compare the p-value to
this number. If the p-value is lower than the computed value it is differentially expressed
at level 5% FDR.
Gene Name
p-value
p-value smaller
than …?
0.008
Differentially
expressed
at level 5% (FDR)