Download Microarray Database - Asia University, Taiwan

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

X-inactivation wikipedia , lookup

Epigenetics in stem-cell differentiation wikipedia , lookup

Microevolution wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Genome evolution wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Metagenomics wikipedia , lookup

Genomic imprinting wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene wikipedia , lookup

Oncogenomics wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Ridge (biology) wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome (book) wikipedia , lookup

Minimal genome wikipedia , lookup

Designer baby wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene expression profiling wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

RNA-Seq wikipedia , lookup

NEDD9 wikipedia , lookup

Transcript
Microarray Databases
Ka-Lok Ng
Asia University
Microarray Databases
• Time series data – yeast cell cycle
• Stanford Genomic Resources - Stanford
Microarray Database (SMD)
• yeast cell cycle analysis project
• Yeast database - SGD
• Human Cell Cycle and Their Expression in Tumors
Analysis of cell-cycle regulation
• Yeast cells stopped at
different stages of cell
cycle
– G1, S, G2, and M
• RNA extracted from each
stage
• Control RNA from
unsynchronized culture
Results of cell-cycle analysis
• 800 genes identified whose expression
changes during cell cycle
• Grouped by peak expression
• M/G1, G1, S, G2, and M
• Four different treatments used to synchronize
cells
– All gave similar results
• Results from Spellman et al., 1998; Cho et al.,
1998
Cell-cycle regulated genes
• Each gene is a line on Alpha
the longitudinal axis
• Treatments in different
panels
• Cell-cycle stages are
color coded at top
• Vertical axis groups
genes by stage in which
expression peaks
cdc15
cdc28
Elu
M/G1
G1
S
G2
M
Brown and Botstein, 1999
Time-course microarray data
• Why study time-course microarray data ?
• Application of time-course microarray data is to
group together genes with similar expression
patterns.
• Genes clustered in the same group are
considered to have the potential of conducting
similar functions.
• Model system – yeast cell cycle
Cell cycle
Checkpoints in Cell-Cycle
Checkpoint controls function to ensure that
chromosomes are intact and that critical stages of
the cell cycle are completed before the following
stage is initiated.
Checkpoint that function in response to DNA
damage prevent entry into S to M until the damage
is repaired. Moderate DNA damage activates
p53﹐a TF that stimulates expression of p21CIP. This
cyclin-kinase inhibitor then binds to and inhibits all
Cdk-cyclin complexes, causing arrest in G1 and G2.
One checkpoint operates during S and G2 to
prevent the activation of MPF (Mitosis-promoting
factor) before DNA synthesis is complete.
Another checkpoint operates during early mitosis to
prevent the activation of APC (Anaphase-promoting
complex) and the initiated of anaphase until the
mitotic spindle apparatus is completely assembled
and all chromosome kinetochores are properly
attached to spindle fibers.
Microarray Databases - SMD
•
•
http://genome-www.stanford.edu/
SMD : Advanced Search  Select arrays by by Experimenter, Category, Subcategory and Organism
•
http://genome-www5.stanford.edu/cgi-bin/search/QuerySetup.pl.
Microarray Databases - SMD
• Spellman, yeast cell cycle time series data
–
–
–
–
Alpha factor block data – every 7 min., 18 time points
Cdc15 data – 24 time points
Cdc28 data – every 10 min., 17 time points
Elu data – every 30 min.,14 time points
Microarray Databases - SMD
Microarray Databases - yeast cell cycle analysis project
• yeast cell cycle analysis project http://genome-www.stanford.edu/cellcycle/ or
http://www.cs.sunysb.edu/~skiena/gene/jizu/
• The web site provides the alpha factor block, cdc15, cdc28 and elu data set
• Keyword searching using "regul* on YPD database as of Feb. 2000 yielded 1007 genes.
• By reviewing the published literature on these 1007 genes, 888 transcriptional regulations were
collected, of which 647 were activators and 241 were inhibitions (647+241=888).
• 469/155 known transcription activations/inhibitions in cdc28 data set
• 343/96 known transcription activations/inhibitions in alpha data set
• Download the Tab delimited data
• Use the Tab delimited option in Excel to store the data
Microarray Databases - yeast cell cycle analysis project
• the Pearson correlation coefficient of the gene YLR256W and YPL028W in the alpha factor block
data set is -0.5094  anti-correlate
• Spearman correlation coefficient is -0.5698  anti-correlate
Microarray Databases - yeast cell cycle analysis project
• time series plot
Finding Similar Time Series
•
•
•
Time series are an important class of complex data objects
arise in financial and scientific applications examples include stock price indices, the
volume of product sales, telecommunications data, 1D medical signals audio data
and environmental measurement sequences
Two time series seqs. are similar if they exhibit similar behavior for a large subset of
their length. The sequences to be compared can have
–
–
•
Outliers, ie values that are measurement errors and should be omitted when comparing the
sequence against others
different scaling factors and baselines the sequences can be due to measurements done
using different devices and the scaling and baseline values can be different
That is if we have a sequence X and modify it to sequence X’ by introducing outliers
by scaling and translation and by adding or removing some observations the
sequences X and X’ should still be considered reasonably similar
References
Gautam Das, Dimitrios Gunopulos, Heikki Mannila
(1996), Finding Similar Time Series. Principles of Data
Mining and Knowledge Discovery.
陳延洛,"基因表現時間序列的叢集分析方法與系統實作",
國立成功大學資訊工程研究所碩士論文,2003
Finding Similar Time Series
http://www.cs.sunysb.edu/~skiena/gene/jizu/
Now HAP1 is 137
Finding Similar Time Series
PCC rank 331
among 343 genes
Finding Similar Time Series
•
•
•
Remove the last point in YLR256W (GO: HAP1) and the first point in YPL028W (GO:
ERG10)
pcc changes from -0.5094 to 0.6233  correlate
src changes from to -0.5698 0.6169  correlate
Yeast database - SGD
SGD - http://www.yeastgenome.org/
Cancer Gene Expression Database
•
•
•
•
CGED (Cancer Gene Expression Database) is a database of gene
expression profile and accompanying clinical information.
http://cged.genes.nig.ac.jp/
CGED includes data on breast, colorectal, hepatocellular, esophageal,
thyroid, and gastric cancers.
Follow “Information” to get the flat file data
Cancer Gene Expression Database
Liver cancer data
Human Cell Cycle Genes Expression Data
• Genes Periodically Expressed in the Human
Cell Cycle and Their Expression in Tumors
http://genome-www.stanford.edu/HumanCellCycle/Hela/
• Over 850 genes showed periodic variation
during the cell cycle
• Hierarchical clustering of the gene expression
patterns revealed co-expressed groups of
genes involved in cell cycle processes such
as DNA replication, chromosome segregation,
and cell adhesion
• Most of the genes whose expression had
previously been reported to correlate with the
proliferative state of tumors were found herein
also to be periodically expressed during the
HeLa cell cycle.
Human fibroblasts Genes Expression Data
•
•
•
•
http://www-sequence.stanford.edu:16080/human_cell_cycle/
Fibroblasts (纖維母細胞) prepared from human skin
synchronized in late G1 using a double thymidine-block protocol
Cultures were then released from arrest, and cells were collected
every 2 hours for 24 hours, covering nearly 2 complete cell cycles.
• identified approximately 700 genes that display transcriptional
fluctuation with a periodicity consistent with that of the cell cycle.
Reference
• Cho et al. (2001) Nature Genetics 27, 48 - 54
Human fibroblasts Genes Expression Data
Human fibroblasts Genes Expression Data
• functional analysis results from the human cell
cycle experiments
• How to study the biological functions ?
• Fibroblasts cell expose to the following changes
• E7 – transfection with the E7 oncoprotein
• MMS – methyl methane sulfonate
• UV – ultraviolet light
Human fibroblasts Genes Expression Data
G2 phase
Phase
G1
S
G2
M
Number of
genes
53
107
108
119