Download Microarray_module_lecture_(both_courses)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Copy-number variation wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Essential gene wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Primary transcript wikipedia , lookup

Transposable element wikipedia , lookup

Oncogenomics wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Pathogenomics wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Genetic engineering wikipedia , lookup

Public health genomics wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Point mutation wikipedia , lookup

RNA interference wikipedia , lookup

X-inactivation wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Non-coding RNA wikipedia , lookup

RNA silencing wikipedia , lookup

Gene therapy wikipedia , lookup

Gene nomenclature wikipedia , lookup

Ridge (biology) wikipedia , lookup

Minimal genome wikipedia , lookup

Gene desert wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genomic imprinting wikipedia , lookup

The Selfish Gene wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Genome evolution wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genome (book) wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Helitron (biology) wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Microevolution wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Distinguishing active from non active genes:
Main principle: DNA hybridization
-DNA hybridizes due to base pairing using H-bonds
-A/T and C/G and A/U possible
RNA: AUGCAUGCUGCUAGCUACGUAUGCAUGCUGCUAGCUACGU
cDNA: TACGTACGACGATCGATGCATACGTACGACGATCGATGCA
Probe: GCTACGTATGCAT
Mix probe with cDNA: probe will find complementary DNA sequence
and bind to it.
TACGTACGACGATCGATGCATACGTACGACGATCGATGCA
GCTACGTATGCAT
Expression microarray:
Statistical analysis of Microarrays: An Introduction
Why do replication of arrays?
control
treatment
Biological Replication
Technical Replication
RNA
RNA
mixed probe pool
Dye Swap Design
What type of replication ?
Background subtraction
Transformation using logarithmic values
Assume red and green signal are the same:
log2 (1/1) => 0 (by definition)
Assume red signal is twice of green signal:
log2 (2/1) => log2 (2) =1
(b/c 21 =2)
Assume red signal is half of green signal:
log2 (1/2) => log2 (1) - log2(2) =-1 (= 0-1 => -1)
Using logarithmic values
2
log2 (2) =1
1
1
0.5
unequal arrow distances
equal arrow distances,
same absolute values for
the same-fold up or down
regulation
-1
log2 (0.5) =-1
Normal scale
Logarithmic scale
Graphing all array values: the MA plot
M: the greater distance from 0= the greater the R/G ratio
A: the greater the distance from 0 the darker the spot on the
microarray (redder or greener).
Using logarithmic values
Two values used in Microarray analysis:
M= ratio of red value/green value
A= overall spot intensity
The Dye-swap
Why?
To account for dye bias (Cy5, the red dye fluoresces brighter than
Cy3, the green dye. This is unfortunate but impossible to change
due to differences in chemical structures of the two dyes).
Normalizing
Why?
A mathematical way to account for the systematic error due to
dye intensity differences.
Example: Gene X is 2-fold up-regulated by drought stress
R/G :2.0 for gene X (drought/normal)
G/R :should be 2.0 as well after swapping the dyes and RNA
samples, but let’s say it is 1.9 for gene X (drought/normal).
Normalizing, cont’d
Bottom line: Mi is the average of 2 dye-swap array slides for each spot
Remember:
How do you analyze replicated results?
Mean
(average)
Median
(value in middle)
Stand Dev
(spread around average)
X= each data point, x (bar) = average, I= # of data points
Is a gene differentially expressed?
In other words: Is the R/G ratio = 0 or not?
The test statistic
_
x = average of n samples
s = SD
Example:
Null hypothesis: treatment and control
show equal gene expression (M=0)
(see next slide, too)
Six observations of the same gene:
average = -1.15
SD= 1.28
N=6
Look up p-value for the calculated t-statistic.
Here: 9.21% are in the red shaded area.
p= 0.09
Accept null hypothesis: Treatment and control
are NOT different, M = 0
The null hypothesis
Bonferroni Correction
Assume you do a stats test for more than one gene:
Each time you accept = 0.05 (5%) uncertainty.
That means you accept false positives 5% of the time for each gene.
If you accept the same error for two genes it is
1 - (1- 0.05)2 = 0.1 (10% uncertainty).
You accept that out of the 2 genes in 10% of cases one is a false positive..
For an array with n= 1000 genes, this means:
1 - (1- 0.05)1000 = 0.999
This means in 99.99% you WILL make an error in at least one gene.
Assume 1000 genes and desired Bonferroni correction of 10%:
Use only those genes with a p value = 0.10/1000 = 0.0001
False Discovery Rate (FDR) Correction
Why use FDR? Can use instead of Bonferroni.
How?
Sort all p-values low to high.
Decide on your desired FDR rate (e.g 5%)
Rank the genes (here: 1-6)
Calculate 0.05 * (i/N)
i= rank (here 1-6)
N= total number of genes (here 6)
If the p-value is < than 0.05*(i/N) then
it is a significant gene.
Here:
1.
2.
0.05 * (1/6)=0.008 --> under 0.05? YES, significant
0.05* (2/6)=0.016 --> under 0.05? NO, not significant