* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Microarray_module_lecture_(both_courses)
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Copy-number variation wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Essential gene wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Primary transcript wikipedia , lookup
Transposable element wikipedia , lookup
Oncogenomics wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Pathogenomics wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Point mutation wikipedia , lookup
RNA interference wikipedia , lookup
X-inactivation wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Non-coding RNA wikipedia , lookup
RNA silencing wikipedia , lookup
Gene therapy wikipedia , lookup
Gene nomenclature wikipedia , lookup
Ridge (biology) wikipedia , lookup
Minimal genome wikipedia , lookup
Gene desert wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genomic imprinting wikipedia , lookup
The Selfish Gene wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genome evolution wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genome (book) wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Helitron (biology) wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene expression programming wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Microevolution wikipedia , lookup
Designer baby wikipedia , lookup
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible RNA: AUGCAUGCUGCUAGCUACGUAUGCAUGCUGCUAGCUACGU cDNA: TACGTACGACGATCGATGCATACGTACGACGATCGATGCA Probe: GCTACGTATGCAT Mix probe with cDNA: probe will find complementary DNA sequence and bind to it. TACGTACGACGATCGATGCATACGTACGACGATCGATGCA GCTACGTATGCAT Expression microarray: Statistical analysis of Microarrays: An Introduction Why do replication of arrays? control treatment Biological Replication Technical Replication RNA RNA mixed probe pool Dye Swap Design What type of replication ? Background subtraction Transformation using logarithmic values Assume red and green signal are the same: log2 (1/1) => 0 (by definition) Assume red signal is twice of green signal: log2 (2/1) => log2 (2) =1 (b/c 21 =2) Assume red signal is half of green signal: log2 (1/2) => log2 (1) - log2(2) =-1 (= 0-1 => -1) Using logarithmic values 2 log2 (2) =1 1 1 0.5 unequal arrow distances equal arrow distances, same absolute values for the same-fold up or down regulation -1 log2 (0.5) =-1 Normal scale Logarithmic scale Graphing all array values: the MA plot M: the greater distance from 0= the greater the R/G ratio A: the greater the distance from 0 the darker the spot on the microarray (redder or greener). Using logarithmic values Two values used in Microarray analysis: M= ratio of red value/green value A= overall spot intensity The Dye-swap Why? To account for dye bias (Cy5, the red dye fluoresces brighter than Cy3, the green dye. This is unfortunate but impossible to change due to differences in chemical structures of the two dyes). Normalizing Why? A mathematical way to account for the systematic error due to dye intensity differences. Example: Gene X is 2-fold up-regulated by drought stress R/G :2.0 for gene X (drought/normal) G/R :should be 2.0 as well after swapping the dyes and RNA samples, but let’s say it is 1.9 for gene X (drought/normal). Normalizing, cont’d Bottom line: Mi is the average of 2 dye-swap array slides for each spot Remember: How do you analyze replicated results? Mean (average) Median (value in middle) Stand Dev (spread around average) X= each data point, x (bar) = average, I= # of data points Is a gene differentially expressed? In other words: Is the R/G ratio = 0 or not? The test statistic _ x = average of n samples s = SD Example: Null hypothesis: treatment and control show equal gene expression (M=0) (see next slide, too) Six observations of the same gene: average = -1.15 SD= 1.28 N=6 Look up p-value for the calculated t-statistic. Here: 9.21% are in the red shaded area. p= 0.09 Accept null hypothesis: Treatment and control are NOT different, M = 0 The null hypothesis Bonferroni Correction Assume you do a stats test for more than one gene: Each time you accept = 0.05 (5%) uncertainty. That means you accept false positives 5% of the time for each gene. If you accept the same error for two genes it is 1 - (1- 0.05)2 = 0.1 (10% uncertainty). You accept that out of the 2 genes in 10% of cases one is a false positive.. For an array with n= 1000 genes, this means: 1 - (1- 0.05)1000 = 0.999 This means in 99.99% you WILL make an error in at least one gene. Assume 1000 genes and desired Bonferroni correction of 10%: Use only those genes with a p value = 0.10/1000 = 0.0001 False Discovery Rate (FDR) Correction Why use FDR? Can use instead of Bonferroni. How? Sort all p-values low to high. Decide on your desired FDR rate (e.g 5%) Rank the genes (here: 1-6) Calculate 0.05 * (i/N) i= rank (here 1-6) N= total number of genes (here 6) If the p-value is < than 0.05*(i/N) then it is a significant gene. Here: 1. 2. 0.05 * (1/6)=0.008 --> under 0.05? YES, significant 0.05* (2/6)=0.016 --> under 0.05? NO, not significant