* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Sai_Presentation
List of types of proteins wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Secreted frizzled-related protein 1 wikipedia , lookup
Gene desert wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genomic imprinting wikipedia , lookup
Genome evolution wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Community fingerprinting wikipedia , lookup
Gene expression wikipedia , lookup
Metabolic network modelling wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Expression vector wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Sai Moturu Introduction • Current approaches to microarray data analysis – Analysis of experimental data followed by a posterior process where biological information is incorporated to make inferences • Integrative analysis technique in this paper – Integrate gene annotation with expression data to discover intrinsic associations among both data sources based on co-occurrence patterns Methods and Data – Association Rules Discovery – Gene expression data – Gene annotation: Gene ontology categories, metabolic pathways and transcriptional regulators – Applied to two previously studied experiments Association Rules Discovery – Antecedent -> Consequent X -> Y – Measures of Quality • Support: P(XυY) • Confidence: P(Y|X) = P(XυY)/P(Y) • Improvement: Confidence/Consequent = P(XυY)/(P(X)*P(Y)) Association Rules Discovery – Itemsets • Genes and the set of experiments in which gene is over or underexpressed • Gene characteristics – Constraint • Antecedent needs to be gene annotation – Expression Thresholds • Genes with log expression values >1 are overexpressed and <-1 are underexpressed (two fold) Mining Association Rules – The association rules that we are interested in have low support values and high confidence values – A variant of the apriori algorithm is used that has helped previously with mining low support-high confidence biologically significant patterns Filtering – Major drawback with association rules is the number of rules generated is huge – Also there is redundancy – This is taken care of with two filters • Redundant filter • Single antecedent filter Diauxic shift dataset – Gene expression accompanying the metabolic shift from fermentation to respiration that occurs when fermenting yeast cells – Expression levels recorded at 7 time points – External information • Metabolic pathways • Transcriptional regulators Results – Association rules among metabolic pathways and expression patterns • 1126 out of over 6000 genes were annotated with at least one pathway • Association rules with minimum support of 5, minimum confidence of 40% and minimum improvement of 1 • Redundant and single antecedent filters applied • 21 association rules Results – Association rules among transcriptional regulators and expression patterns • 3490 genes were annotated with at least one regulator • Association rules with minimum support of 5, minimum confidence of 80% and minimum improvement of 1 • Redundant filter applied • 28 association rules Results – Association rules among transcriptional regulators, metabolic pathways and expression patterns • 3882 genes • Association rules with minimum support of 5, minimum confidence of 80% and minimum improvement of 1 • Redundant filter applied • 37 association rules Results Results Results Serum stimulation dataset – Gene expression program of human fibroblast after serum exposure – External information • Gene ontology terms Results – Association rules among biological process annotation and expression patterns • 4092 genes of over 8000 • Support of 4, min confidence of 10% and min improvement of 1 • Single antecedent and redundant filters applied • 12 associations Results – Association rules among terms from all GO categories • 4630 genes of over 8000 • Support of 4, min confidence of 10% and min improvement of 1 • Redundant filter applied • 31 associations Results Results Results Conclusions – Some of the biological implications matched the ones found experimentally – The others could be explored further – Integrative data analysis is very useful for meaningful discoveries using gene expression data