Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Biochemistry wikipedia , lookup
Genomic imprinting wikipedia , lookup
Metabolomics wikipedia , lookup
Pharmacometabolomics wikipedia , lookup
Gene regulatory network wikipedia , lookup
Metabolic network modelling wikipedia , lookup
Natural product wikipedia , lookup
Biosynthesis wikipedia , lookup
Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B. Kell, P. Reiser and R.D. King Presenter: Mark H. Rich 2/7/2003 University of Wisconsin - Madison CS 838 Learning and Modeling Biological Networks Discovering Gene Function Yeast (S. cerevisiae) has 6,000 proteinencoding genes Only 60% can be assigned function with confidence The cell is a bio-chemical machine Logic can help us discover these metabolic functions and networks Robot Scientist Background Knowledge Learning Engine Analysis New Knowledge Experiment Selection Results ASE-Progol Outline Introduction Abduction and Active Learning Functional Genomics Metabolism in Logic Experiments Results Logic in AI Deduction Given facts with sound and complete proof theory, show that other facts can be proven Induction Given positive and negative examples of facts and background knowledge, find hypothesis that explains difference between positives and negatives Abduction and TCIE Given a theory and partial facts, discover what facts are missing to form one consistent hypothesis Lateral Thinking Puzzles Presented with a confusing situation There is an Oracle that knows what happened You can only ask yes or no questions The Mysterious Package One day a man received a parcel in the post. Carefully packed was a human arm. He examined it, repacked it and then sent it on to another man. The second man also carefully examined the arm before taking it to the woods and burying it. Why did they do this? The Mysterious Package Was the arm cut off intentionally? Is the arm’s person still alive? Is he a doctor? Did the three men know each other? Are the other men also missing an arm? Were they ever stuck on a desert island with no food, make a pact to each cut off an arm to eat and survive, but were rescued before the doctor could cut off his own arm, and the doctor later fulfilled his commitment? YES! Lateral Thinking Lessons Certain questions are valuable and lead to large leaps of information . . . How do we form hypotheses? How can we pick good questions? probability that question leads to consistent hypotheses cost of asking question We want to find quickest cheapest path to consistent hypotheses Hypothesis Generation Use contra-positives for inverse entailment Background Knowledge hasbeak(X) :- bird(X). bird(X) :- vulture(X). Example hasbeak(tweety). Hypotheses bird(tweety). bird(X). vulture(tweety). vulture(X). Trial Selection Theory e1 e2 e3 e4 One possible trial path e1 H1 0 1 1 1 H2 1 1 0 1 e2 t H3 1 0 1 1 f t H2 f H3 H1 Hypothesis Probability Each trial partitions H into {H[t],H[t’]} Assuming optimal encoding scheme… Prior probability of each hypothesis 2Compression( hi |E ) p(hi | E) 2Compression( h|E ) hH Compression is rounded f measure E f E (l1 l2 n) p Experiment Cost Ct is the cost of a trial t p(t) p(h) h H t Ct p(t)(meant'(T t )Ct' )J H t EC(H,T) min t T (1 p(t))(mean C )J t' (T t ) t' H t J H p(h)log 2 ( p(h)) hH Functional Genomics Want to learn gene-enzyme mapping Genes encode for Enzymes that catalyze reactions between Metabolites to eventually create Amino Acid Products Perform auxotrophic growth experiments to determine phenotype Functional Genomics: Simple gene1 X A gene2 Y B gene3 Z C Trp A, B and C are Enzymes X is ubiquitous metabolite, Y and Z optional If we knock out gene2, we need to add nutrient Z to produce Trp want to learn codes(gene2, B, [Y], [Z]) but only ask: pheno_effect(gene2,[Y]) is false pheno_effect(gene2,[Z]) is true pheno_effect(gene2,[Y,Z]) is true Aromatic amino acid pathway aromatic amino acids enzymes metabolites Metabolism in Logic Hypotheses: codes(‘YDR254W’, ‘4.2.1.11’, [‘C00631’],[‘C00074’]). codes(‘YDR254W’, ‘5.3.1.24’, [‘C04302’],[‘C01302’]). etc ... Background Knowledge: enzyme(‘4.2.1.11’,[‘C00631’],[‘C00074’]). enzyme(‘5.3.1.24’,[‘C04302’],[‘C01302’]). etc ... generated_by_other_pathways([‘C00002’, ‘C00005’, ‘C00006’, ... , ‘C03356’]). ends([‘C00078’, ‘C00079’, ‘C00082’]). Metabolism in Logic What the Oracle answers: phenotypic_effect(ORF, Growth_medium):generated_by_other_pathways(Ubiquitous_metabolites), union(Ubiquitous_metabolites, Growth_medium, Starts), connected(Starts, Wild_products), ends(Ends), subset(Wild_products, Ends), enz(Enzyme, Reactants, Products), encodes(ORF, Enzyme, Reactants, Products), connected_without_this_step(Starts, Mutant_products, Enzyme, Reactants, Products), not(subset(Mutant_products, Ends)). Experiments Learn function of 17 genes by removing ORF Growth Media 13 optional nutrients, at most 3 at a time 378 possible experiments for each ORF Cost of Optional Nutrients Determined from www.sigmaaldrich.com catalog Strategies for Comparison Random Naïve Cheapest ASE-Progol Experiments Remove all codes(…) facts Loop Generate random sample of trials Generate hypotheses using Theory Completion by Inverse Entailment Find minimum EC(H,T) trial and perform Add results to known examples until hypotheses consistent with trials Results: Cost Results: Time Conclusions and Future Work ASE-Progol finds hypotheses inexpensively and quickly 5 of 17 genes had only negative examples… why? Look into inhibitors and nonmonotonic logics. Limited answers to yes/no. Probabilities? Can this be applied to gene regulatory networks, using microarray technology? What other networks have similar frameworks?