Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Knowledge Space Map for Organic Reactions Knowledge Space Theory Existing Rule Set Basis for Chemistry Knowledge Space Model Data Model Proposal Constructing and Learning the Map Knowledge Space Map Isolate atomic knowledge units / nodes / elements Determine dependency graph of knowledge units (defines a learning order by topological sort) Enables targeted and purposeful lesson plans based on the “fringes” of student’s current knowledge state Addition Multiplication Exponents Spelling Subtraction Division Fractions Logarithms Vocabulary Grammar Chemistry Knowledge Space? Current system has user driven selection of which chapter(s) to work on, then system randomly generates problem Idealized approach: Assess student’s current knowledge state and auto-generate next problem to target next most useful subject Existing tutorial based on predictive power of 80+ reagents, which are based on 1500+ elemental rules. These could be interpreted as 1500+ knowledge units Rule Clustering Many rules are just variants of the same concept / knowledge unit Alkene, Alkene, Alkene, Alkene, Alkene, Alkene, … Protic Protic Protic Protic Protic Protic Acid Acid Acid Acid Acid Acid Addition, Alkoxy Addition, Benzyl Addition, Allyl Addition, Tertiary Addition, Secondary Addition, Generic Some rules will always be used in conjunction with another (like “qu”) Not really a learning dependency order between these rules then, you essentially know one of the rules IFF (if and only if) you know the others Data Model Proposal Want general framework for representing relationships Each reaction rule represents an elementary knowledge unit node Weighted, directed edge between each node represents learning dependency relationship A B (90%) Given that a student “knows” rule B, there is a 90% probability that they “know” rule A Conversely, if do NOT know rule A, 90% probability that do NOT know rule B. Define “know”: Student should consistently answer correct any problem that is based only on rules that they “know” Define rule similarity measure as average of reciprocal dependency relationships Major Relationship Cases Strong learning dependency Strong similarity / mutual dependency A B (99%) A B (50%) A B (99%) A B (99%) No relation (random correlation) A B (50%) A B (50%) Additional Enhancements Add baseline probability of “knowing” each node, instead of assuming uniform 50% Analogous to using background weights for amino acid distribution in protein sequence Add a confidence number for each of these probability weights to reflect how trustworthy our prior data is Analogous (maybe equal) to n, the number of data points that were used to arrive at the current estimate Learning Relationship Map Give students assessment exams based on the rule sets with criteria to distinguish problems that students get “right” vs. “wrong” Defines sets of rules R: All rules used in problems students got right W: All rules used in problems students got wrong (that are not in R) Adjust rule relation values Decrease Ri Wj relations Increase Ri Rk relations Scale adjustment based on confidence in prior Learning Propagation Each assessment exam may only cover a handful of specific rules in R and W When updating relation for rule R1 R2, look for all rules similar to R1 and all similar to R2 Assume respective updates for all relations between similar rule pairs, scaled by the magnitude of similarity to R1 and R2 Technically, all rules are similar to all others by some degree, but don’t want to update 15002 relations every time. Set similarity threshold, which effectively defines clusters around rules. Constructing Relationship Map Initial pass should be able to automatically find a lot of “similarity” relationships just based on existing structured data Rule names Combined usage in test examples Included in common reagents, chapters, etc. Use book chapters order as initial guess for dependency orders Similarity analysis could reduce 1500+ rules to ~100? rule “clusters” which is more tractable to manually assign major dependencies not automatically addressed by book chapter order Open Questions Student knowledge evolves over time, maybe even with one exam. How to hit “moving target” of their current knowledge state? Baseline probabilities of knowing a rule. Random sample of all students? Will differ greatly based on population sample chosen. SMILES Extensions Atom Mapping Necessary to map reactant to product atoms Proper transform requires balanced stoichiometry Hydrogens generally must be explicitly specified O1 8 + H 2 9 R1 3 OH 4 5 O 10 NH-R2 7 Carboxylic acid + Primary amine Amide + Water 1 7,8 3 + H2O 2 9 R1 4 5 10 NH-R2 [O:1]=[C:2]([*:9])[O:3][H:7]. [H:8][N:4]([*:10])[H:5]>> [O:1]=[C:2]([*:9])[N:4]([*:10])[H:5]. [H:7][O:3][H:8] Transformation Rules H3C Chemical state machine modeling at mechanistic level of detail State information: Molecular structure State transition: Transformation rules H Br CH2 H3C p-bond protic acid addition - Br H3C + CH3 carbocation halide addition H3C H3C Br CH3 H3C SMIRKS Description [C:1]=[C:2].[H:3][Cl,Br,I:4]>>[+0:3][C:1][C+:2].[Cl,Br,I;-:4] Alkene, Protic Acid Addition [C+:1].[Cl,Br,I;-:2]>>[C+0:1][+0:2] Carbocation, Halide Addition