* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Exercises
Transposable element wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Essential gene wikipedia , lookup
Oncogenomics wikipedia , lookup
Point mutation wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
X-inactivation wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Pathogenomics wikipedia , lookup
Public health genomics wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Genetic engineering wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Copy-number variation wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Minimal genome wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Ridge (biology) wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Genome evolution wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Helitron (biology) wikipedia , lookup
Gene therapy wikipedia , lookup
Genomic imprinting wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Epigenetics of human development wikipedia , lookup
The Selfish Gene wikipedia , lookup
Gene desert wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene nomenclature wikipedia , lookup
Gene expression programming wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome (book) wikipedia , lookup
Microevolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Exercises For Introduction to the Analysis Of Microarray Data Martina Bremer Edward Himelblau Andreas Madlung The Data Set The dataset Microarray.xls contains the results of a microarray dye-swap experiment. In this experiment, six genes were spotted onto two arrays with “swapped” dye colors. Each array consists of two blocks, each with three rows and five columns. The genes are spotted with differing amounts of replication. It is the goal of this experiment to decide which of the six genes are differentially expressed among the treatment and control group. The treatment group is labeled with red dye (635) on array one and with green dye (532) on array two. The control group is labeled with the opposite dye. Exercise 1: Open the dataset in Excel. Go to “File, Save As…” and save this file under a new name. Look at the column headers. The columns ``Block'', ``Column'', and ``Row'' describe the position of a spot on the array. The column ``Name'' contains the gene name, and the remaining two columns for each array contain the background corrected median intensities for the red and green channel, respectively. There are two arrays. Since the arrays are manufactured identically, the position of the genes is the same on both arrays. Exercise 2: Check, whether any of the background corrected intensities are smaller than 0, and replace those values with 1, if necessary. Check: You should have made two replacements. Exercise 3: Compute the M-values for each gene on both arrays. For each array, insert a new column into the spreadsheet and label it “M”. The M-value is the log 2 value of the ration of red to green intensity. It can be computed in Excel by clicking into the empty cell in the “M” column in the first row. Write “=LOG(E3/F3,2)”. Here E3 is the cell with the red intensity and F3 is the cell with the green intensity for this spot. Then drag down the column to fill in all the other M-values for array one. Repeat this procedure for array two. You can round all your M-values to three digits, by adjusting the format of the entries. Check: For the gene At3g03050 on Array 1/Block1/Column 5/Row 2 you should have obtained an M-value of –0.436 (rounded to three digits). Question (Exercise 4): This value is negative. What does that mean biologically for this spot of gene At3g03050? Answer: Exercise 5: We will conduct t-tests for the genes individually. To do this, we need to first sort the dataset by ``Name'' to group the gene replications together. Highlight the data (not including the first row). Next, click Data, click Sort, click Sort by ``(1)Name'' and make sure that ``My list has header Row'' is marked. Insert lines between genes to visually separate them better. We have six genes represented by 2-8 spots, each. Now your data file should look something like this: Data Analysis Exercise 6: Conduct the dye-swap normalization. For each spot, we need to compute the average M-value for both arrays. To do this, include another column and label it “corrected M”. Click into the first cell and enter “=AVERAGE(G3,K3)”. Now do the same for all by filling down the column. The data file is getting quite big at this point. We want to decide for each gene, whether it is differentially expressed in the treatment and control group. All the information we need to do this are the gene name and all the corrected M-values for this gene. Exercise 7: Using Excel compute the average, Standard Deviation, test-statistic and pvalue for the first gene AT1g00100. The Excel commands needed are AVERAGE, and STDEV (“=STDEV(L3,L4)”. For square root you can use SQRT. Do these calculations in your Excel sheet in the columns next to “Corrected M”. First compute the average, then the SD, then type in a formula for t as you find it below. (Ask for help if needed). The (rounded) value of the test statistic then becomes: t x s2 n 8.684 df = n-1 = 2-1 = 1. Click on any empty cell in Excel and type “=TDIST(8.684,1,2)” to obtain the p-value (rounded to three digits). Now calculate the other p values. Show the p-values to your instructor before proceeding. Exercise 8: Let’s clean up some. Open the empty spreadsheet named “Analysis.xls”. Go to “File, Save As…” and save the file under a new name. This file contains a macro that we will use to compute the p-value for each gene. Copy the column of gene names and paste it into the first column of the Analysis file. Highlight the column of corrected “M-values” in the Microarray spreadsheet, click Copy, click into the first cell in the second column of the Analysis spreadsheet and click “Paste Special”. Make sure to select “Values”. Save the new Analysis file that contains the gene names and corrected M-values. Check: Your new file should look like the one on the right. It is not necessary, to repeat the gene name 2-8 times, you just need it once. Exercise 9: To conduct the t-test for each gene, label the column next to the one with corrected M-values “p-values”. For each gene, highlight all M-values for a given gene that we have (between 2 and 6 values) and click “Ctrl t”. The value that appears is the p-value for this gene. Enter the p-values that you obtained into this table: Gene Name p-value At1g00100 At2g01250 At3g01020 At3g03050 At4g00235 At5g00405 Exercise 10: For which of the six genes can you say that they are differentially expressed between the treatment and control group (use a significance level of 5%)? What does that mean biologically for these genes? Answer: Note: Our very little array contains only six genes. That are not so many as to make a multiple comparison procedure absolutely necessary. But to illustrate the process that is used in “real” microarrays, which have several thousand genes spotted on them, we will use both the Bonferroni method and the Linear Step Up Procedure to adjust for multiple comparisons. Exercise 11: Sort the genes by p-value (smallest first) and write the gene names as well as the corresponding p-values into the table below. Use the Bonferroni method to decide for each gene, whether it is differentially expressed at significance level 5%. To compute the Bonferroni correction divide the level wanted (5%, 0.05) by the number of genes on the array (6). Gene Name p-value Bonferroni correction (0.05/6 = 0.00833) 0.00833 0.00833 0.00833 0.00833 0.00833 0.00833 Differentially expressed at level 5% (Bonferroni) YES or NO Exercise 12: Use the linear step up procedure to decide for each gene whether it is differentially expressed while controlling the False Discovery Rate at 5%. To do that, first rank the p values in ascending order with the smallest one first. Then, using a calculator, compute 0.05 6i (0.05 equal 5% for FDR wanted; i = the rank of the gene (ie. the one with the lowest pvalue first i=1, then the next lowest i=2) etc. for each gene and compare the p-value to this number. If the p-value is lower than the computed value it is differentially expressed at level 5% FDR. Gene Name p-value p-value smaller than …? 0.008 Differentially expressed at level 5% (FDR)