Download Sai_Presentation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

List of types of proteins wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Secreted frizzled-related protein 1 wikipedia , lookup

Gene desert wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genomic imprinting wikipedia , lookup

Genome evolution wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Community fingerprinting wikipedia , lookup

Gene expression wikipedia , lookup

Metabolic network modelling wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Expression vector wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene regulatory network wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
Sai Moturu
Introduction
• Current approaches to microarray data analysis
– Analysis of experimental data followed by a
posterior process where biological information is
incorporated to make inferences
• Integrative analysis technique in this paper
– Integrate gene annotation with expression data to
discover intrinsic associations among both data
sources based on co-occurrence patterns
Methods and Data
– Association Rules Discovery
– Gene expression data
– Gene annotation: Gene ontology categories,
metabolic pathways and transcriptional regulators
– Applied to two previously studied experiments
Association Rules Discovery
– Antecedent -> Consequent
X -> Y
– Measures of Quality
• Support: P(XυY)
• Confidence: P(Y|X) = P(XυY)/P(Y)
• Improvement: Confidence/Consequent = P(XυY)/(P(X)*P(Y))
Association Rules Discovery
– Itemsets
• Genes and the set of experiments in which gene is over or
underexpressed
• Gene characteristics
– Constraint
• Antecedent needs to be gene annotation
– Expression Thresholds
• Genes with log expression values >1 are overexpressed and
<-1 are underexpressed (two fold)
Mining Association Rules
– The association rules that we are interested in have
low support values and high confidence values
– A variant of the apriori algorithm is used that has
helped previously with mining low support-high
confidence biologically significant patterns
Filtering
– Major drawback with association rules is the number
of rules generated is huge
– Also there is redundancy
– This is taken care of with two filters
• Redundant filter
• Single antecedent filter
Diauxic shift dataset
– Gene expression accompanying the metabolic shift
from fermentation to respiration that occurs when
fermenting yeast cells
– Expression levels recorded at 7 time points
– External information
• Metabolic pathways
• Transcriptional regulators
Results
– Association rules among metabolic pathways and
expression patterns
• 1126 out of over 6000 genes were annotated with at least
one pathway
• Association rules with minimum support of 5, minimum
confidence of 40% and minimum improvement of 1
• Redundant and single antecedent filters applied
• 21 association rules
Results
– Association rules among transcriptional regulators
and expression patterns
• 3490 genes were annotated with at least one regulator
• Association rules with minimum support of 5, minimum
confidence of 80% and minimum improvement of 1
• Redundant filter applied
• 28 association rules
Results
– Association rules among transcriptional regulators,
metabolic pathways and expression patterns
• 3882 genes
• Association rules with minimum support of 5, minimum
confidence of 80% and minimum improvement of 1
• Redundant filter applied
• 37 association rules
Results
Results
Results
Serum stimulation dataset
– Gene expression program of human fibroblast after
serum exposure
– External information
• Gene ontology terms
Results
– Association rules among biological process
annotation and expression patterns
• 4092 genes of over 8000
• Support of 4, min confidence of 10% and min improvement
of 1
• Single antecedent and redundant filters applied
• 12 associations
Results
– Association rules among terms from all GO
categories
• 4630 genes of over 8000
• Support of 4, min confidence of 10% and min improvement
of 1
• Redundant filter applied
• 31 associations
Results
Results
Results
Conclusions
– Some of the biological implications matched the
ones found experimentally
– The others could be explored further
– Integrative data analysis is very useful for
meaningful discoveries using gene expression data