Download Microarrays - Computational Bioscience Program

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Fetal origins hypothesis wikipedia , lookup

RNA interference wikipedia , lookup

Metagenomics wikipedia , lookup

Epistasis wikipedia , lookup

Oncogenomics wikipedia , lookup

RNA silencing wikipedia , lookup

Ridge (biology) wikipedia , lookup

Transposable element wikipedia , lookup

X-inactivation wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Pathogenomics wikipedia , lookup

Point mutation wikipedia , lookup

NEDD9 wikipedia , lookup

Genomic imprinting wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Copy-number variation wikipedia , lookup

Genetic engineering wikipedia , lookup

History of genetic engineering wikipedia , lookup

Public health genomics wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Genome evolution wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Gene wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genome (book) wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene therapy wikipedia , lookup

The Selfish Gene wikipedia , lookup

Gene desert wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Helitron (biology) wikipedia , lookup

Gene nomenclature wikipedia , lookup

Gene expression programming wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Microevolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Designer baby wikipedia , lookup

Transcript
Microarrays
Tzu Lip Phang, Ph.D.
Lawrence Hunter, Ph.D.
Associate Professor of Bioinformatics
Director, Computational Bioscience Program
Division of Pulmonary Sciences and Critical Care Medicine
University of Colorado School of Medicine
University of Colorado School of Medicine
[email protected]
[email protected]
http://compbio.uchsc.edu/Hunter
The Central Dogma
Genome
Transcriptome
Microarrys in the Literature
7000
Number of papers
6000
5000
4000
3000
2000
1000
0
Year
Barrett, T., Wilhite, S. E., Ledoux, P., Evangelista, C., Kim,
I. F., Tomashevsky, M., et al. (2012). NCBI GEO: archive
for functional genomics data sets--update. Nucleic acids
research, 41(D1)
Public Data Usages
• Preliminary Data/Results, hypothesis
•
•
•
generation
Test Algorithm
Power Analysis (sample size calculation)
Enhance sample size
Array technology
•
•
Basic idea: Genomic material DNA/RNA
hybridizes best to exactly complementary
sequences.
Method:
– Probes are attached to a substrate in a known location
– DNA/RNA in one or more samples are fluorescently
labelled
– samples are hybridized to probe array, excess is
washed off, and fluorescence reading are taken for
each position
Microarray: Primer
Array synthesis
• Photolithography for oligonucleotides
• Cost proportional to length of oligo, not
•
number of features (genes) per chip!
Many layers compared to computer chips.
Affymetrix Probe Sets
http://intermedin.stanford-edu/Arrays.ppt
AAAA.
.
25mer
25mer
25mer
(11 to 16)
25mer
PM
MM
Gene Expression
• Still most common use for microarrays
• Aim to determine differential expression
•
between groups of samples e.g. disease
and control
Generate hypotheses about the
mechanisms underlying the disease of
interest
Basic Statistical Analysis
Experimental Design
•
•
•
Biological replication is essential
– Technical replication not essential except
for quality control studies
Pooling biological samples to reduce array
variability
– Increase sample size without running
more chips
– BUT, if individual variation is important,
pooling wash out the effect
Power Analysis is essential
Power Analysis
• How many biological replication?
• My experience; at least 3, preferably 5, even 7
• Bioconductor: SSPA
Preprocessing
• Including image analysis, normalization,
•
and data transformation
Data normalization:
– Remove systematic errors introduced in
labeling, hybridization and scanning
procedures
– Correct these errors while preserve
biological variability / information
Why normalization?
Technical replicate difference
A different look …
Average Intensity Values
To normalize or not to …
AffyComp
Rafael Irizarry, Dept BioStat
John Hopkins University
Statistical Testing
• Hypothesis Testing: Is the
means of two groups different
from each other
– Fold Change
– Student-T Test
Microarray Scatter Plot
Student-T Test
What is Multiple Comparison
Testing??!
Genes
Gene 1
Gene 2
Gene 3
Gene 4
Gene 5
Gene 6
Gene 7
Gene 8
Gene 9
Gene 10
Alpha level = 0.05
P-values
0.0001
0.0002
0.008
0.009
0.005
0.09
0.05
0.09
0.2
0.3
<=
<=
<=
<=
<=
<=
<=
<=
<=
<=
Critical level
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
Ho
1
1
1
1
1
0
0
0
0
0
When large number of tests
…
Genes
Gene 1
Gene 2
Gene 3
Gene 4
Gene 5
Gene 6
…
…
Gene 999
Gene 1000
P-values
0.0001
0.0002
0.008
0.009
0.005
0.09
…
…
0.2
0.3
Alpha level = 0.05
50 wrong genes …
<=
<=
<=
<=
<=
<=
…
…
<=
<=
Critical level
0.05
0.05
0.05
0.05
0.05
0.05
…
…
0.05
0.05
Ho
1
1
1
1
1
0
…
…
0
0
Correction … Bonferroni
Genes
Gene 1
Gene 2
Gene 3
Gene 4
Gene 5
Gene 6
…
…
Gene 999
Gene 1000
P-values
0.0001
0.0002
0.008
0.009
0.005
0.09
…
…
0.2
0.3
<=
<=
<=
<=
<=
<=
…
…
<=
<=
Alpha level = 0.05 / 1000 = 0.00005
Critical level
0.00005
0.00005
0.00005
0.00005
0.00005
0.00005
0.00005
0.00005
0.00005
0.00005
Ho
0
0
0
0
0
0
…
…
0
0
Strike the balance …
Most Conservative
Bonferroni
Most Lenient
False Discovery Rate
No correction
The False Discovery Rate (FDR) of a set of predictions is the expected percent
of false predictions in the set of predictions.
Example:
If the algorithm returns 100 genes with false discovery rate of 0.3, then we
should expect 70 of them to be correct
Put them together
Result Validation
•
•
•
RT-PCR: most common method
Gene levels at the borderline of differential
expression
– Their measurability reduce by random error
For highly differentially expressed genes, having
sufficient replicates would serve as validation.
Biological Interpretation