* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Significance analysis of microarrays (SAM)
Quantitative trait locus wikipedia , lookup
Essential gene wikipedia , lookup
Transposable element wikipedia , lookup
Oncogenomics wikipedia , lookup
X-inactivation wikipedia , lookup
Pathogenomics wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Copy-number variation wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Genetic engineering wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Public health genomics wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Minimal genome wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Ridge (biology) wikipedia , lookup
Gene therapy wikipedia , lookup
Genomic imprinting wikipedia , lookup
Helitron (biology) wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Genome evolution wikipedia , lookup
The Selfish Gene wikipedia , lookup
Gene desert wikipedia , lookup
Gene nomenclature wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Genome (book) wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Microevolution wikipedia , lookup
Gene expression programming wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Significance analysis of microarrays (SAM) • SAM can be used to pick out significant genes based on differential expression between sets of samples. Currently implemented for the following designs: - two-class unpaired two-class paired multi-class censored survival one-class SAM • SAM gives estimates of the False Discovery Rate (FDR), which is the proportion of genes likely to have been wrongly identified by chance as being significant. • It is a very interactive algorithm – allows users to dynamically change thresholds for significance (through the tuning parameter delta) after looking at the distribution of the test statistic. SAM designs Two-class unpaired: to pick out genes whose mean expression level is significantly different between two groups of samples (analogous to between subjects t-test). Two-class paired: samples are split into two groups, and there is a 1-to-1 correspondence between an sample in group A and one in group B (analogous to paired t-test). SAM designs Multi-class: picks up genes whose mean expression is different across > 2 groups of samples (analogous to one-way ANOVA) Censored survival: picks up genes whose expression levels are correlated with duration of survival. One-class: picks up genes whose mean expression across experiments is different from a user-specified mean. SAM Two-Class Unpaired 1. Assign experiments to two groups, e.g., in the expression matrix below, assign Experiments 1, 2 and 5 to group A, and experiments 3, 4 and 6 to group B. Group A Exp 1 Exp 2 Exp 3 Exp 4 Exp 5 Exp 6 Exp 1 Exp 2 Exp 5 Gene 1 Gene 1 Gene 2 Gene 2 Gene 3 Gene 3 Gene 4 Gene 4 Gene 5 Gene 5 Gene 6 Gene 6 2. Question: Is mean expression level of a gene in group A significantly different from mean expression level in group B? Group B Exp 3 Exp 4 Exp 6 SAM Two-Class Unpaired Permutation tests i) For each gene, compute d-value (analogous to t-statistic). This is the observed d-value for that gene. ii) Rank the genes in ascending order of their d-values. iii) Randomly shuffle the values of the genes between groups A and B, such that the reshuffled groups A and B respectively have the same number of elements as the original groups A and B. Compute the d-value for each randomized gene Group A Group B Exp 1 Exp 2 Exp 5 Exp 3 Exp 4 Exp 6 Original grouping Gene 1 Group A Exp 3 Exp 2 Gene 1 Group B Exp 6 Exp 4 Exp 5 Exp 1 Randomized grouping SAM Two-Class Unpaired iv) Rank the permuted d-values of the genes in ascending order v) Repeat steps iii) and iv) many times, so that each gene has many randomized d-values corresponding to its rank from the observed (unpermuted) d-value. Take the average of the randomized d-values for each gene. This is the expected d-value of that gene. vi) Plot the observed d-values vs. the expected d-values “Observed d = expected d” line SAM Two-Class Unpaired Significant negative genes (i.e., mean expression of group A > mean expression of group B) Significant positive genes (i.e., mean expression of group B > mean expression of group A) The more a gene deviates from the “observed = expected” line, the more likely it is to be significant. Any gene beyond the first gene in the +ve or –ve direction on the x-axis (including the first gene), whose observed exceeds the expected by at least delta, is considered significant. SAM Two-Class Unpaired For each permutation of the data, compute the number of positive and negative significant genes for a given delta as explained in the previous slide. The median number of significant genes from these permutations is the median False Discovery Rate. The rationale behind this is, any genes designated as significant from the randomized data are being picked up purely by chance (i.e., “falsely” discovered). Therefore, the median number picked up over many randomizations is a good estimate of false discovery rate.