Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al. 2005 Challenges in microarray data analysis • Integration of multiple microarray data sets. – Different platforms, e.g. cDNA arrays, Affymetrix arrays – Alternative experimental parameters • Identification of functionally related genes which do not have similar expression patterns. • Reconstruction of transcriptional regulatory networks. – It is difficult to elucidate the cooperativity between TFS because the changes in their expression are often subtle and their activities are often controlled at levels other than expression. Data pre-processing • Classify the 618 expression profiles into 39 data sets. A data set contains a set of expression profiles measured under relevant conditions. – 19 cDNA data sets from SMD – 4 Affymetrix data sets from GEO – 16 data sets from Rosetta 19 SMD data sets • • • • • • • • • • • • • • • • • • • Alpha factor release cdc15 block release DTT Exposure Elutriation Forkhead regulation Gamma radiation Menadione exposure DNA damage (MMS) response Nitrogen depletion Nutrition limitation Osmotic shock SIR proteins (Chromatin Silencing) Sorbitol effects H2O2 response Heat shock Heat steady CellCycle Factor YPD Stationary phase Zinc homoeostasis Corresponding to 19 SMD subcategories 4 GEO data sets • Aging • Chitin synthesis • Fermentation time course • Ume6 regulon 16 Rosetta data sets • • • • • • • • • • • • • • • • Cell cycle control Cell wall organization Chromatin assembly Ion homeostasis Nucleotide metabolism Organelle biogenesis Perception of external stimulus Protein biosynthesis Protein degradation Classification is based on the Protein metabolism Protein phosphorylation GeneOntology (GO) biological Protein transport process categories of the Pseudohyphal growth deleted genes. Steroid metabolism Amino Acid Starvation MAPK pathway The idea: 2nd-order expression correlation • 1st-order expression correlation – Correlation of expression patterns from one data set – For each pair of genes, a vector of length n is obtained. n is the number of data sets. • 2nd-order expression correlation – Correlation of the 1st-order expression correlation An example The overall expression similarity between the two gene pairs is not significantly high. However, their 1st-order expression correlation profiles exhibit high correlation, that is, the four genes have high 2nd-order expression correlation. Clustering functionally related genes • Procedure – Identification of doublets • A doublet is a pair of genes that is tightly co-expressed in multiple data sets. – Clustering of doublets based on their 1st- order expression correlation profiles • Results – 72 of the top 100 tightest clusters are functionally homogeneous. Gene function prediction • A prediction of function is made for a doublet only if it is in a tight cluster that includes at least three doublets and in which all remaining doublets share the same function. • 79 functions are assigned to 67 unknown genes. Some have been verified by experimental studies. Reconstruction of regulatory networks • For each transcription module, a 1st-order average expression correlation profile (a vector with the same length as the number of data sets) is calculated. The profile of a module can be interpreted as the activity profile of the transcription factor(s) that regulate the module. – A transcription module is defined to be a set of genes that are regulated by the same transcription factor(s) based on genome-wide location data, and are coexpressed in multiple data sets. – 60 TM are identified. • A 2nd-order expression correlation is calculated for two activity profiles of transcription factors, to measure the cooperativity between the two transcription factors. – 34 pairs show high 2nd-order correlation. Clustering of modules Annotation of TFs • The function of a TF is predicted based on two evidences: – The functions of known genes in its target module – The functions of known genes in other modules in the same module cluster • TF GAT3 is predicted to play a role in mitotic and meiotic cell cycles.