* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Knowledge Representation (and some more Machine Learning)
Survey
Document related concepts
Embodied cognitive science wikipedia , lookup
Existential risk from artificial general intelligence wikipedia , lookup
Neural modeling fields wikipedia , lookup
Knowledge representation and reasoning wikipedia , lookup
Ethics of artificial intelligence wikipedia , lookup
Convolutional neural network wikipedia , lookup
Pattern recognition wikipedia , lookup
Concept learning wikipedia , lookup
History of artificial intelligence wikipedia , lookup
Reinforcement learning wikipedia , lookup
Machine learning wikipedia , lookup
Transcript
Knowledge Representation and Machine Learning Stephen J. Guy Overview Recap some Knowledge Rep. Machine Learning History First order logic ANN Bayesian Networks Reinforcement Learning Summary Knowledge Representation? Ambiguous term “The study of how to put knowledge into a form that a computer can reason with” (Russell and Norvig) Originally couple w/ linguistics Lead to philosophical analysis of language Knowledge Representation? Cool Robots Futuristic Robots Early Work SAINT (1963) Closed form Calculus Problems STUDENT (1967) “If the number of customers Tom gets is twice the square of 20% of the number of advertisements he runs, and the number of advertisements he runs is 45, what is the number of customers Tom gets? Blockworlds (1972) 2 x x SHRDLU “Find a block which is taller than the one you are holding and put it in the box” Early Work - Theme Limit domain “Microworlds” Allows precise rules Generality Problem Size 1) Making rules are hard 2) State space is unbounded Generality First-order Logic Is able to capture simple Boolean relations and facts x y Brother(x,y) Sibling(x,y) x y Loves(x,y) Can capture lots of commonsense knowledge Not a cure-all First order Logic - Problems Faithful captures fact, objects and relations Problems Does not capture temporal relations Does not handle probabilistic facts Does not handle facts w/ degrees of truth Has been extended to: Temporal logic Probability theory Fuzzy logic First order Logic - Bigger Problem Still lots of human effort “Knowledge Engineering” Time consuming Difficult to debug Size still a problem Automated acquisition of knowledge is important Machine Learning Sidesteps all of the previous problems Represent Knowledge in a way that is immediately useful for decision making 3 specific examples Artificial Neural Networks (ANN) Bayesian Networks Reinforcement Learning Artificial Neural Networks (ANN) 1st work in AI (McCulloch & Pitts, 1943) Attempt to mimic brain neurons Several binary inputs, One binary output Inputs: I1, I2, … Responses: R1, R2, … Output: O N O Rn I n threshold n 0 Artificial Neural Networks (ANN) Inputs: I1, I2, … Responses: R1, R2, … Output: O N O Rn I n threshold n 0 Can be chained together to Represent logical connectives (and, or, not) Compute any computable functions Hebb (1949) introduced simple rule to modify connection strength (Hebbian Learning) Single Layer feed-forward ANNs (Perceptrons) Input Layer Output Unit N O Rn I n threshold n 0 Can easily represent otherwise complex (linearly separable) functions And, Or Majority Function Can Learn based on gradient descent Cannot tell if 2 inputs are different!! (Minskey, 1969) Learning in Perceptrons Replace Threshold function w/ Sigmod g(x) Define Error Metric (Sum Sqr Diff) Calculate Gradient wrt Weight Err * g’(in) * Xj Wj = Wj + * Err * g’(in) * Xj Multi Layer feed-forward ANNs Input Layer Hidden Layer Output Unit Breaks free of problems of perceptions Simple gradient decent no longer works for learning Learning in Multilayer ANNs (1/2) Backpropagation Treat top level just like single-layer ANN Diffuse error down network based on input strength from each hidden node Learning in Multilayer ANNs (2/2) i = Erri* g’(ini) Wj,i = Wj,i + * aj * i Wk,j = Wk,j + * ak * j ANN - Summery Single Layer ANNs (Proceptrons) can capture linearly separable functions Multi-layer ANNs can caputer much more complex functions and can be effectively trained using backpropagation Not a silver bullet How to avoid over-fitting? What shape should the network be? Network values are meaningless to humans ANN – In Robots (Simple) Can be easily set up and robot Brian Input = Sensors Output = Motor Control Simple Robot learns to avoid bumps ANN – In Robots (Complex) Autonomous Land Vehicle In a Neural Network (ALVINN) CMU project learned to drive from humans 32x30 “retina” 5 hidden layers 30 output nodes Capable of driving itself after 2-3 minutes of training Bayesian Networks Combines advantages of basic logic and ANNs Allows for “effucient represenation of, and rigorous reasoning with, unceartain knwoledge” (R&N) Allows for learning from experience Bayes’ Rule P(b|a) = P(a|b)*P(b)/P(a) = nrm(<P(a|b)*P(b), P(a|~b)*P(~b)>) Meningitis Example (From R&N) s=stiff neck, m = has meningitis P(s|m) = 0.5 P(m) = 1/50000 P(s) = 1/20 P(m|s) = P(s|m)P(m)/P(s) = .5*(1/5000)/(1/2) = .0002 Diagnostic knowledge more fragile than causal knowledge Bayesian Networks Meningitis Stiff Neck P(M) = 1/50000 M P(S) T .5 F 1/20 Allows us to chain together more complex relations Creating network is not necessarily easy Create a fully connected network Cluster groups w/ high correlation together Find probabilities using rejection sampling Bayesian Networks (Temporal Models) Raint-1 Raint Raint+1 Umbrellat-1 Umbrellat Umbrellat+1 More complex Bayesian networks are possible Time can be taken into account Imagine predicting if it will rain tomorrow, based only on if your co-worker brings in an umbrella Bayesian Networks (Temporal Models) Raint-1 Raint Raint+1 Umbrellat-1 Umbrellat Umbrellat+1 4 Possible Inference tasks based on this knowledge Filtering – Computing belief as to current state Prediction – Computing belief of future state Smoothing – Improving knowledge of pasts states using hindsight (Forward-backward Algorithm) Most likely explanation – Finding the single most likely explanation for a set of observations (Viterbi) Bayesian Networks (Temporal Models) Raint-1 Raint Raint+1 Umbrellat-1 Umbrellat Umbrellat+1 Assume you see umbrella 2 days in a row (U1= 1, U2 = 1) P(R0) = <0.5,0.5> (<.5 R0 = T, .5 R0 = F>) P(R1) = P(R1|R0)*P(R0)+P(R1|~R0)*P(~R0) = 0.7*0.5 + 0.3*0.5 = <0.5,0.5> P(R1|U1) =nrm(P(U1|R1)*P(R1)) =nrm<.9*.5,.3*.5> =nrm<.45,.1> = <.818,.182> Bayesian Networks (Temporal Models) Raint-1 Raint Raint+1 Umbrellat-1 Umbrellat Umbrellat+1 Assume you see umbrella 2 days in a row (U1= 1, U2 = 1) P(R2|U1) = P(R2|R1)P(R1|U1)+ P(R2|~R1)P(~R1|U1) =.7*.818 + 0.3*0.182 = .627 = <.627,.373> P(R2|U2,U1) =nrm(P(U2|R2)*P(R2|U1)) =nrm<.9*.627,.2*.373> =nrm<.565,.075> = <.883,.117> On the 2nd day of seeing the umbrella we were more confident that it was raining Bayesian Networks - Summary Bayesian Networks are able to capture some important aspects of human Knowledge Representation and use Uncertainty Adaptation Still difficulties in network design Overall a powerful tool Meaningful values in network Probabilistic logical reasoning Bayesian Networks in Robotics Speech Recognition Inference Sensors Computer Vision SLAM Estimating Human Poses Robot going through doorway using Bayesian networks (Univ. of Basque) Reinforcement Learning How much can we take the human out of loop? How do humans/animals do it? Genes Pain Pleasure Simply define rewards/punishments let agent figure out all the rest Reinforcement Learning - Example 1 .8 -1 .1 .1 start R(s) = Reward of state s R(Goal) = 1 R(pitfall) = -1 R(anything else) = ? Attempts to move forward may move left or right Many (~262,000) possible policies Different policies are optimal depending on the value of R(anything else) Reinforcement Learning - Policy 1 -1 start Above is Optimal policy for R(s) = -.04 Given a policy how can an agent evaluate U(s), the utility of a state? (Passive Reinforcement Learning) Adaptive Dynamic Programming (ADP) Temporal Difference Learning (TD) With only an environment how can an agent develop a policy? (Active Reinforcement Learning) Q-learning Reinforcement Learning - Utility 1 1 1 .812 2 -1 2 .762 3 1 3 start 2 3 4 start 1 .868 .918 .660 .655 2 .661 3 1 -1 .338 4 U(s) = R(s) + U(s’)P(s’) ADP: Updating all U(s) based on each new observation TD: Update U(s) only for last state change S’ Ideally: U(s) = R(s) + U(s’), but s’ is probabilistic U(s) = U(s) + (R(s)+U(s’)-U(s)) decays from 1 to 0 as a function of # times state is visited U(s) is guaranteed converge to correct value Reinforcement Learning – Policy Ideally Agents can create their own policies Exploration: Agents must be rewarded for exploring as well as taking best known path Adaptive Dynamic Programming (ADP) Temporal Difference Learning (TD) Can be achieved by changing U(s) to U’(s) U’(s) = n< N ? Max_Reward : U(s) Agent must also update transition model No changes to utility calculation! Can explore based on balancing utility and novelty (like ADP) Can chose random directions with a decreasing rate over time Both converge on optimal value Reinforcement Learning in Robotics Robot Control Discretize workspace Policy Search Pegasus System (Ng, Stanford) Learned how to control robots Better than human pilots w/ Remote Control Summary 3 different general learning approaches Artificial Neural Networks Bayesian Networks Good for learning correlation between inputs and outputs Little human work Good for handling uncertainty and noise Human work optional Reinforcement Learning Good for evaluating and generating policies/behaviors Can handle complex tasks Little human work References 1. Russell S, Norvig P (1995) Artificial Intelligence: A Modern Approach, Prentice Hall Series in Artificial Intelligence. Englewood Cliffs, New Jersey (http://aima.cs.berkeley.edu/) 2. Mitchell, Thomas. Machine Learning. McGraw Hill, 1997. (http://www.cs.cmu.edu/~tom/mlbook.html) 3. Sutton, Richard S., and Andrew G. Barto. Reinforcement Learning. Cambridge, MA: MIT Press, 1998.(http://www.cs.ualberta.ca/~sutton/book/the-book.html ) 4. Hecht-Nielsen, R. "Theory of the backpropagation neural network." Neural Networks 1 (1989): 593-605. (http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=3401&arnumber=11 8638) 5. P. Batavia, D. Pomerleau, and C. Thorpe, Tech. report CMU-RI-TR-96-31, Robotics Institute, Carnegie Mellon University, October, 1996 (http://www.ri.cmu.edu/projects/project_160.html) 6. Bayesian Network based Human Pose Estimation D.J. Jung, K.S. Kwon, and H.J. Kim (Korea) (http://www.actapress.com/PaperInfo.aspx?PaperID=23199) 7. Frank L. Lewis, "Neural Network Control of Robot Manipulators," IEEE Expert: Intelligent Systems and Their Applications ,vol. 11, no. 3, pp. 6475, June, 1996. (http://doi.ieeecomputersociety.org/10.1109/64.506755)