Download Analysis of Microarray Data Using R

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene desert wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Molecular evolution wikipedia , lookup

Secreted frizzled-related protein 1 wikipedia , lookup

Gene expression wikipedia , lookup

X-inactivation wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Gene wikipedia , lookup

Genome evolution wikipedia , lookup

Community fingerprinting wikipedia , lookup

Gene regulatory network wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genomic imprinting wikipedia , lookup

Ridge (biology) wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
Microarray Data
Analysis Using R
Studies in Tissue Databases
Mark Reimers, NCI
Outline






The GNF tissue database
Exploratory analysis - clustering
Positional co-regulation
Insight via co-regulation
Apoptotic configuration of tissues
Probe level analysis
The GNF Expression Atlas




Su et al ( PNAS 2004) hybridized 150 samples
from 61 tissues to Affymetrix U133A and
custom arrays
Variation in gene expression (as proportion of
transcriptome)
95% show at least one 2-fold change among 61
tissues
37% show more than 2-fold differences between
lowest 10% and highest 10%
Clustering samples


All biological
replicates are
nearest neighbors
Dendrogram
reflects
discrepancy
between healthy
and cancerous
Co-regulation of Nearby Genes

Some groups of genes next to one another on
chromosome show high correlation across tissues
Significance of Co-regulation



How often would such correlations happen ‘by chance’
- eg. by selecting genes at random?
Three random measures would have correlation greater
than 0.6 with p < 10-20!
However 3 genes selected at random from atlas have
probability ~ 10-3 of having all corrs > 0.6


156 regions of high correlation determined


In 30,000 positions, we should see 30
Many are paralogs
Perhaps 50% false discovery rate among the rest
Prediction of Function




Zhang, et al (J. Biol, 2004, 3:21) hybridized 55
mouse tissues to spotted oligo arrays
Hypothesis: genes with similar tissue expression
patterns share similar function
Able to recover prediction of GO biological
process for known genes with better than 50%
accuracy for many categories
Extended prediction to 1,092 uncharacterized
transcripts
Investigation of Poorly
Characterized Gene - Top1MT


10-fold variation in expression (odd for a
‘housekeeping gene’)
>50 genes with expression highly correlated (
.75) with Top1MT across tissue database
Large proportion are splicing factors
 Top1MT has an odd splice junction in intron 1, and
may depend critically on abundant splicing factors

Apoptosis Patterns


Majority of
epithelial tissues
show common
pattern
(indisposed to
apoptosis)
Blood cells show
variety of
patterns
Exploration of Probe Sets



Examine correlation of
probe sets across 150
samples
All but one probe
verified to match latest
Unigene build for gene
Probes organized by
position in 3’ end
Red: 1; White: < 0
Quality of Arrays

Regional bias images