Download Designing a Safe Motivational System for Intelligent Machines

Designing a Safe Motivational System for Intelligent Machines Mark R. Waser Inflammatory Statements       >Human intelligence REQUIRES ethics All humans want the same things Ethics are universal Ethics are SIMPLE in concept Difference in power is irrelevant (to ethics) Evolution has “designed” you to disagree with the above five points Definitions (disguised assumptions) • • • • • • • • • Human – goal-directed entity Goals – a destination OR a direction Restrictions – conditional overriding goals Motivation – incentive to move Actions – determined by goals + motivations Path (or direction) Preferences, Rules-of-Thumb and Defaults Ethics (the *goal* includes the path) Safety Asimov's 3 Laws: 1. A robot may not injure a human being or, through inaction, allow a human being to come to harm. 2. A robot must obey orders given to it by human beings except where such orders would conflict with the First Law. 3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law. http://www.markzug.com/ Four Possible Scenarios • Asimov’s early robots (little foresight, helpful but easily confused or conflicted) • Immediate shutdown/suicide • VIKI from the movie “I, Robot” (generalize to “bubble-wrapping” humanity) • Asimov’s late robots (further generalize to self-exile with invisible continuing assistance) goals & motivations • Friendly AI - an AI that takes actions that are, on the whole, beneficial to humans and humanity; benevolent rather than malevolent; -----------------nice rather than hostile • Coherent Extrapolated Volition of Humanity (CEV) “In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together.” SIAI’s Definitions SIAI’s First Law An AI must be beneficial to humans and humanity (benevolent rather than malevolent) But . . . What is beneficial? What are humans and humanity? Value Formula Values (good/bad) are *entirely* derivative/relative with respect to some goal (CEV) Value = f(x, y) where x is a set of circumstances (world state), y is a set of (proposed) actions, and f is an evaluation of how well your goal is advanced Value = f(x, y, t, e) t is the time point at which goal progress is judged e is the set of entities which the goal covers Questions • Is this moral relativism? • Are values complex? • Must our goal (CEV) be complex? Copernicus! Assume that beneficial was a relatively simple formula (like z2+c) Mandelbrot set Color Illusions Assume further that we are trying to determine that formula (beneficial) by looking at the results (color) one example (pixel) at a time Current Situation of Ethics • Two formulas (beneficial to humans and humanity & beneficial to me) • As long as you aren’t caught, all the incentive is to shade towards the second • Evolution has “designed” humans to be able to shade to the second (Trivers, Hauser) • Further, for very intelligent people, it is far more advantageous for ethics to be complex Definition Ethics *IS* What is beneficial for the community OR What maximizes cooperation Goal(s)/Omohundro Drives 1. AIs will want to self-improve 2. AIs will want to be rational 3. AIs will try to preserve their utility 4. AIs will try to prevent counterfeit utility 5. AIs will be self-protective 6. AIs will want to acquire resources and use them efficiently GDEs “Without explicit goals to the contrary, ----AIs are likely to behave like human sociopaths in their pursuit of resources.” 7. GDEs will want cooperation and to be part of a community 8. GDEs will want FREEDOM! Humans . . . • Are classified as obligatorily gregarious because we come from a long lineage for which life in groups is not an option but a survival strategy (Frans de Waal, 2006) • Evolved to be extremely social because mass cooperation, in the form of community, is the best way to survive and thrive • Have empathy not only because it helps to understand and predict the actions of others but, more importantly, prevents us from doing anti-social things that will inevitably hurt us in the long run (although we generally won’t believe this) • Have not yet evolved a far-sighted rationality where the “rational” conscious mind is capable of competently making the correct social/community choices when deprived of our subconscious “sense of morality” Circles of Morality/Moral Sombrero Relationships and Loyalty Redefining Friendly Entity • Friendly Entity (“Friendly”) - an entity with goals and motivations that are, on the whole, beneficial to humans and humanity; benevolent rather than malevolent • Friendly Entity (“Friendly”) - an entity with goals and motivations that are, on the whole, beneficial to the community of Friendlies (i.e. the set of all Friendlies, known or unknown); benevolent rather than malevolent Friendliness’s First Law An entity must be beneficial to the community of Friendlies (benevolent rather than malevolent) But . . . What is beneficial? What are humans and humanity? ------------------------------- What is beneficial? • Cooperation (minimize conflicts & frictions) • Omohundro drives • Increasing the size of the community (both growing and preventing defection) • To meet the needs/goals of each member of the community better than any alternative (as judged by them -- without interference or gaming) What is harmful? • • • • • Blocking/Perverting Omohundro Drives Lying Single-goaled entities Over-optimization (achievable top level goals) The fact that we do not maintain our top-level goal and have not yet evolved a far-sighted rationality where the “rational” conscious mind is capable of competently making the correct social/community choices when deprived of our “moral sense” OPTIMAL < community’s sense of what is correct (ethical) This makes ethics much more complex because it includes the cultural history The anti-gaming drive to maintain utility adds friction/resistance to the discussion of ethics ONE non-organ donor + avoiding a defensive arms race > SIX dying patients Credit to: Eric Baum What Is Thought? Triangle CEV LOGICAL VIEW GOAL(S) stimuli implement moral rules of thumb ACTIONS Sloman’s architecture for a human-like agent (Sloman 1999) Inflammatory Statements       >Human intelligence REQUIRES ethics All humans want the same things Ethics are universal Ethics are SIMPLE in concept Difference in power is irrelevant (to ethics) Evolution has “designed” you to disagree with the above five points Next . . . . CEV Candidate #1: We wish that all entities were Friendlies Necessary? Sufficient/Complete? Possible? Copies of this powerpoint available from [email protected]

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Designing a Safe Motivational System for Intelligent Machines