Download Computational Discovery of Communicable Knowledge

The Computational Discovery of Communicable Knowledge Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, CA 94304 http://hypatia.stanford.edu/cll/ [email protected] Also affiliated with the DaimlerChrysler Research & Technology Center and the Institute for the Study of Learning and Expertise. The Problem and the Potential Our society is collecting increasing amounts of data in commercial and scientific domains. These include complex spatial/temporal data sets like:  traces of traffic behavior from GPS and cell phones  prices of stocks and currencies from exchanges  measurements of climate and ecosystem variables Computational techniques should let us find relations in these data that are useful for business and society. Drawbacks of Current Approaches The fields of machine learning and data mining have developed methods to find regularities in data. Despite many successful applications, most techniques:  assume attribute-value representations that cannot handle time or space  cannot tell interesting discoveries from mundane ones  state the discovered knowledge in some opaque form This indicates the need for alternative methods that can address these issues. Paradigms for Machine Learning decision-tree induction induction of logical rules case-based learning neural networks probabilistic induction Paradigms for Scientific Discovery taxonomy formation qualitative law discovery equation discovery structural model construction process model formation Discovering Numeric Laws Statement of the task: • Given: Quantitative measurements about objects or events in the world. • Find: Numeric relations that hold among variables that describe these items and that predict future behavior. Historical examples: • Kepler’s three laws of planetary motion • Archimedes’ principle of displacement in water • Black’s law relating specific heat, mass, and temperature • Proust’s and Gay-Lussac’s laws of definite proportions BACON on Kepler’s Third Law BACON carries out heuristic search through a space of numeric terms, looking for constant values and linear relations. moon d p d/p d2/p d3/p2 A B C D 5.67 8.67 14.00 24.67 1.77 3.57 7.16 16.69 3.20 2.43 1.96 1.48 18.15 21.04 27.40 36.46 58.15 51.06 53.61 53.89 This example shows the system’s progression from primitive variables (distance and period of Jupiter’s moons) to a complex term that has a nearly constant value. Some Laws Discovered by BACON Basic numeric relations: • Ideal gas law PV = aNT + bN • Kepler’s third law D3 = [(A - k) / t]2 = j • Coulomb’s law FD2 / Q1Q2 = c • Ohm’s law TD2 / (LI - rI) = r Relations with intrinsic properties: • Snell’s law of refraction sin I / sin R = n1 / n2 • Archimedes’ law C =V + i • Momentum conservation m1V1 = m2V2 • Black’s specific heat law c1m1T1 + c2m2T2 = (c1m1+ c2m2 ) Tf Temporal Laws of Ecological Behavior (Todorovski & Dzeroski, 1997) Input: time phyt zoo phosp temp time 1 time 2 phyt 1 phyt 2 zoo 1 zoo 2 phosp 1 phosp 2 temp 1 temp 2 .. .. .. time m phyt m zoo m .. phosp m .. temp m Input: a context-free grammar of domain constraints Output: phosp phyt = c1 • phyt • – c3 • phyt c2 + phosp • Formulating Structural Models Statement of the task: • Given: Qualitative or numeric empirical laws that describe observed phenomena. • Find: Explanatory models of these phenomena in terms of component objects and their relations. Historical examples: • Dalton’s and Avogadro’s molecular models of chemicals • Mendel’s genetic model of inherited traits • Quark models of elementary particles • Structural models of planets, comets, and stars DALTON on Chemical Reactions Initial state: (reacts in {hydrogen oxygen} out {water}) (reacts in {hydrogen nitrogen} out {ammonia}) (reacts in {oxygen nitrogen} out {nitrous oxide}) . . . Final state: 2 hydrogen + 1 oxygen  2 water 3 hydrogen + 1 nitrogen  2 ammonia 2 oxygen + 1 nitrogen  2 nitrous oxide hydrogen  {h h} water  {h h o} oxygen  {h h} ammonia  {h h h n} nitrogen  {h h} nitrous oxide  {n o o} . . . DALTON finds these structural models through a depth-first search process constrained by conservation assumptions. Constructing Process Models Statement of the task: • Given: Qualitative or numeric empirical laws that describe temporal phenomena. • Find: Explanatory models of these phenomena in terms of processes among component objects. Historical examples: • Caloric and kinetic theories of heat phenomena • Reaction pathways in chemistry and nucleosynthesis • Models of continental drift and plate tectonics • Process models of stellar evolution and destruction ASTRA on Nucleosynthesis Inputs: - quantum properties for elements and isotopes - conservation relations among these properties - an element to be explained (e.g., O or C) - elements to be assumed (e.g., H or He) Outputs: - elementary reactions that obey conservation laws - reaction pathways that explain the element’s evolution ASTRA uses depth-first search to find reaction pathways for: - proton and neutron captures - neutron and deuteron production - generation of helium (He) from hydrogen (H) - generation of carbon (C) and oxygen (O) Three Pathways for Carbon Synthesis Standard pathway: 4He + 4He  8Be 4He + 8Be  12C Alternative pathways: 4He + D  6Li 3He + 6Li  9Be 4He + 9Be  12C + n + D  6Li 4He + 6Li  10Be 4He + 10Be  12C + D 4He ASTRA generates many pathways novel to astrophysics, some of which have viable reaction rates. Proposed Research We plan to develop and evaluate discovery methods that:  are designed to process temporal and structured data  use techniques from computational scientific discovery  describe new knowledge in a communicable form Likely notations for the discovered knowledge include:  structural models of relations among entities  process models of change over time  sets of simultaneous differential equations We will apply our methods to domains that benefit from such communicable representations. Benefits of the Approach Unlike most previous work on data mining and knowledge discovery, our methods will:  support discoveries in domains that involve complex spatial, temporal, or relational data  use domain knowledge to filter only discoveries that are interesting and novel to the domain user  present the new knowledge in some understandable notation that can be communicated among humans Such techniques will improve the way we manipulate and understand complex data.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Computational Discovery of Communicable Knowledge