Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Experimental Studies of Integrated Cognitive Systems Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California Elena Messina Intelligent Systems Division National Institute of Standards and Technology Gaithersburg, Maryland Thanks to David Aha, Michael Genesereth, and Barney Pell. This work was funded in part by DARPA IPTO, which is not responsible for the points made herein. Experimentation in Artificial Intelligence Controlled experiments are the primary evaluation tool in modern AI, including the subfields of: supervised learning and reinforcement learning; generative planning and scheduling; computational linguistics and text processing; but not for work on integrated cognitive systems. Extending experimental methods to the latter is crucial, since it deals with the ultimate goals of artificial intelligence. Challenges for Experimentation The reasons that experiments with integrated cognitive systems have lagged behind are clear from the phrase itself: systems are harder to evaluate than component algorithms; cognitive methods involve complex, multi-step reasoning; integrated software relies on interactions among components. Together, these factors have slowed the development and wide acceptance of an experimental framework. In this talk, we propose the key elements of an experimental method for the study of integrated cognitive systems. Dependent Variables: Basic Measures Dependent variables in an experiment measure system behavior. Some basic measures of integrated cognitive systems include: success or failure on a given problem; speed or efficiency of the system’s response; desirability or quality of the system’s response. Such metrics provide the building blocks for more sophisticated and informative measures of behavior. Dependent Variables: Combined Measures Statistics tells us we should not draw conclusions from one case. Collecting multiple samples supports combined measures like: average behavior of the system; cumulative behavior of the system; variance of the system’s behavior. Combined measures also partly cancel variation due to unknown or uncontrolled factors. However, this requires some population from which samples are drawn, which one should always specify clearly. Dependent Variables: Higher-Order Metrics Combined measures present only a small window on behavior. However, one can also derive higher-order measures such as: the slope and intercept with respect to a control system; the intercept, rate, and asymptote of a learning curve. Such metrics let one summarize behavior even when variation across samples is not systematic. Conclusions about higher-order measures are more important than ones about basic or combined variables. Independent Variables: Task Characteristics Independent variables in an experiment reflect factors thought to influence system behavior. An important class of factors are domain or task features like: the complexity of the environment; the difficulty of achieving a given task; the resources available for pursuing the task. Experiments that vary these factors reveal how the intelligent system’s behavior depends on them. Synthetic domains let one alter such variables systematically, but it is crucial that they be similar to natural domains. Independent Variables: System Characteristics Another important class of variables involves system features. Varying these factors leads to different types of experiments: parametric studies (altering system parameters); lesion studies (removing a system component); replacement studies (replacing one module with another). Such experiments suggest ways that the intelligent system’s behavior depends on its parameters and components. Studies that vary two or more factors can reveal interactions among them. Independent Variables: System Knowledge A third class of factors concerns the knowledge and experience of the intelligent system. One can adapt lesion and replacement studies to examine: the presence or absence of types of knowledge; the amount of knowledge about a given subject; the amount of experience with a class of tasks. Such experiments let one plot behavioral measures as a function of knowledge and experience (learning curves). They also let one compute higher-order measures such as rate of improvement and asymptotic performance. Repositories for Cognitive Systems Public repositories are now common among the AI subfields, and they offer clear advantages for research by: providing fast and cheap materials for experiments; supporting replication and standards for comparison. However, they can also produce undesirable side effects by: focusing attention on a narrow class of problems; encouraging a ‘bake-off ’ mentality among researchers. To support research on cognitive systems, we need testbeds and environments designed to evaluate general intelligence. Desirable Characteristics of Testbeds Testbeds that are designed to support research on integrated cognitive systems should: include a variety of domains to ensure generality; be well documented and simple for researchers to use; have standard formats to ease interface with systems. However, these features are already present in many existing repositories, and more work is necessary. Desirable Characteristics of Testbeds In addition, testbeds for integrated cognitive systems should: contain not data sets but task environments which support agents that exist over time at least some of which involve physical domains provide an infrastructure to ease experimentation with external databases (e.g., geographic information systems) controlled capture, replay, and restart of scenarios methods for recording performance measures Also, environments should have little or no dependence on sensory processing. Physical vs. Simulated Environments For domains that involve external settings, one can either a physical or a simulated environment for evaluation. Simulated environments have many advantages, including: ability to vary domain parameters and physical layout; ease of recording traces of behavior and cognitive state. One can make simulated environments more realistic by: using simulators that support kinematics and dynamics; including data from real sensors in analogous locations. This approach combines the relevance of physical testbeds with the affordability of synthetic ones. Some Promising Domains A number of domains hold promise for the experimental study of integrated cognitive systems: urban search and rescue (Balakirsky & Messina, 2002); flying aircraft on military missions (Jones et al., 1999); driving a vehicle in a city (Choi et al., 2004); playing strategy games (Aha & Molineaux, 2004); general game playing (Genesereth, 2004). Each requires the integration of cognition, perception, and action in a complex, dynamical setting. Goals of Scientific Experimentation Science aims not to show that one method is better than another, but to understand the reasons for complex behavior. This goal can best be achieved through experimental studies that: ask clear questions or test specific hypotheses examine relations between behavior and independent factors move beyond descriptions to explanations of phenomena Good experiments provide insight into the reasons that underlie system behavior. Also, whether or not they support an hypothesis, they do not end the story, but rather suggest ideas for further studies. Concluding Remarks In this talk, we considered the experimental study of integrated cognitive systems, including: challenges posed by their distinctive characteristics; dependent measures that describe their behavior; independent variables that influence this behavior; the need for environments and testbeds that: exercise the full capabilities of integrated agents; evaluate their behavior at the system level; support studies of interactions among components. Taking these into account will transform the study of integrated cognitive systems into a well-balanced experimental science. End of Presentation