Download Normative schemas

Normative schemas. Cognitive theory of norms. Lukasz Lazarz, M.A. (Faculty of Law and Administration at Jagiellonian Univeristy) Keywords: Cognitive science, Artificial Intelligence, cognitive architectures, Morality, Ethics, Philosophy of law. Abstract: This article joins outcomes of modern cognitive sciences, especially cognitive psychology with most crucial problems of philosophy of law. I try to fit concept of norm to cognitive frames. I strongly believe that it results in most useful theory of norms. It is because such theory operates on frames developed by cognitive science, the science concerning human mind to a largest extent among other disciplines. At the beginning I briefly call the relatively short history of modern cognitive science. I try to draw two main dimensions of any cognitive architecture. The declarative one concerning with problems of information (mental representation, schemas, concepts) and the procedural one concerning behavioral problems (decision making, problem solving). I distinguish specific kind of schemas among other types of schemas and I show how they work from procedural perspective. In the end I draw conclusions directly on philosophy of law ground. I also try to compare the proposed theory with other theories of norms and show that the one is better than any other. 1. Development of cognitive perspective Human mind has always been one of the most crucial object of interest of sciences. For philosophy, psychology or anthropology human mind was the most important object from obvious reasons. All of them relate to human mind as the central object of their interest. However slowly it has been realized that determined assumptions on functioning of human mind have a strong impact on many crucial issues among other sciences; sociology, linguistic, logic, literary studies. At the turn of century development of computer sciences resulted in a new idea on the grounds on sciences focusing on human mind; the idea that human mind may be treated as a data processing unit. Claude Shannon in his graduate work dated of 1937 and classic article A Mathematical Theory of Communication (Shannon, Weaver 1948) developed foundations of theory of binary information. In 1948 international symposium took place in Hixon, which became one of the significant events in the history of science. Mathematicians, computer scientists (among others John von Neumann, Norbert Wiener and Warren McCulloch) discussed about “ core behavioral mechanisms”. In 1956 Noam Chomsky and George Miller participated in the workshops on information which took place in Massachusetts Institute of Technology. Chomsky discussed his project of generative grammar. Miller presented his researches on short term memory. He suggested that human short-term memory has a forward memory span of approximately seven items plus or minus two. The meeting opened possibilities for joining depictions of experimental psychology, theoretical linguistics and computer simulations of cognitive processes. In the same year Shannon, Marvin Minsky and John McCarthy organized workshops in Dartmouth. They had a ambitious aim to develop formalized forms of thinking. It has been quickly noticed, that this new perspective of researches widely opened maiden territories of knowledge about human mind. Optimism that accompanied origin of cognitive sciences caused strong belief that all cognitive process of human being would be algorithmized soon. Later on optimism had weakened, many technical problems have arisen in simulating particular cognitive abilities. Also many serious philosophical objections have been rised regarding possibility of overall algorithmization of cognitive processes (Penrose 2001, Searle 1980). All the above problems have not changed the fact, that science on human mind was not to be the same after 1956. New branch of science has came into the being, the branch called Artificial Intelligence. Artificial Intelligence have algorithmized successfully simple cognitive processes, what has been used in engineering for many years. Term of Artificial Intelligence has been coined by McCarthy during the above mentioned workshops in Dartmouth. Representatives of more radical branch called Artificial Genral Intelligence according to their opinion build cognitive architectures, which even today model all core cognitive processes (Franklin 1995). After this cognitive revolution psychology turned away behavioral paradigm, which treated human mind as a black box. More than 75% American psychologists has identified with a cognitive approach already in 80ties of XX century, what shows how important approach it became within psychology. Development of computer sciences created new technical possibilities in scope of neurobiological cognition of brain structure. Dynamic development of knowledge on human mind had a significant impact on all disciplines focusing more or less on problems of human mind. This remarks regards inter alia mathematic, logic, statistic, computer sciences, Artificial Intelligence, psychology, neuropsychology, neurobiology or biology. The last bastion relatively not seriously affected by cognitive revolution are normative disciplines. Normative disciplines often focus on most complex cognitive processes. This complexity of such cognitive processes makes for many any approach aiming at their algoritmization impossible. However recently more and more often cognitive sciences cross this line and focus on problems that have been associated with ethics and philosophy till today. Moreover I strongly believe there are some more limitations that have strong impact on slow development of cognitive sciences within normative territory. As it has been said despite belief that truth liberates there is some kind of psychological barrier which protects us against too deep cognition of our minds (Duch 1998). The reason of the above is quite simple, if we are close to our dream of recognition of functioning of human mind in clear, algorithmized way it will be dangerous for our all normative system that is based on absolutely different assumptions. 2. Two dimensions of the cognitive human architectures Summing up the results of development of cognitive sciences it may be recognized that a new approach to a human mind consists in application of a computer metaphor or at least observation that fundamental role of creation of minds phenomena is played by data transmission. Such approach forces two kinds of problems, declarative problem (data representation) and procedural problems (how to act with data). So, when we attempt to build a cognitive architecture that simulates cognitive processes two dimensions of such architecture shall be distinguished: the declarative and procedural one. Such distinction between declarative and procedural aspect of a human cognitive processes or cognitive architecture in general (hereinafter referred to as cognitive processor) can be observed in typical distinction between declarative and procedural memory (Ryle 1949). Declarative memory concerns facts encoded in permanent memory. Facts can concern general or episodic knowledge. Declarative memory is assigned with "knowledge that". Procedural memory concerns procedures of execution of mental and motor activities also encoded in permanent memory. Procedures concern executive or cognitive skills. Procedural knowledge is assigned with "knowledge how". Distinction between declarative and procedural memory is also a foundation for some cognitive architectures, such as ACT (Anderson 1995). At meta level, analogic distinction can be observed in distinction between algorithm and functional architecture (Pylyshyn 1980) or between algorithm and its implementation (Anderson 1987). In both cases algorithm concerns data (knowledge that), both declarative and procedural one, and functional architecture or algorithm's implementation concern action (knowledge how). Declarative aspect of human cognitive processes or cognitive architecture in general concerns problem of knowledge and its components; schemas and mental representations. Procedural aspect concerns problem of choice and its subject; actions (motor activities), attension and problem solving (mental activities). Taking the above into consideration I would like to describe below the problem of normativity in frames of such two dimensional cognitive framework. 3. Declarative dimension of the cognitive processor – knowledge representation 3.1. Declarative dimension in general - declarative knowledge 3.1.1. Chunks as indivisionable information unit At the beginning I would like to propose a definition of “chunk”, fundamental concept for declarative problem of cognitive processor and next considerations. Chunk shall mean the single unit of information derived by a cognitive processor being a basis for any of its choice and action in procedural dimension. First time a concept of chunk was proposed by Miller, who presented the idea that shortterm memory could only hold 5-9 chunks of information (seven plus or minus two) where a chunk is any meaningful unit (Miller 1956). A chunk could refer to digits, words, chess positions, or people's faces. The concept of chunking and the limited capacity of short term memory became a basic element of all subsequent theories of memory. Chunks are indivisionable. They may be complex but the structural data are not available for the cognitive processor till it is not abstracted from the given chunk by way of chunking. Chunking is process of derivation information from the cognitive processor sources: senses, memory, emotional and information on own actions. Senses and memory are obvious sources of cognitive processor. It is easy also to imagine possibility to derive also information on own actions. Emotionality as a source of chunking is a little more complicated issue, however it is only a technical and configuration problem. Implementation of given emotionality understood as motivational module will have a crucial impact on next activity of the cognitive processor, in particular from efficiency perspective, but it will not change the fact of identity of the mechanism. Anyway I will also comment this issue wider in this paper later on. Chunks are informational units within cognitive architectures, such as ACT-R (Anderson 1995). Declarative knowledge is represented in terms of chunks that are schema like structures, consisting in isa pointer specifying their category and some number of additional pointers encoding their contents. The problem "how" information is derived into a single informational unit is a procedural issue that will be discussed later on in this paper. For sake of simplicity, we shall assume for the time being that any decisions of actions are undertaken on basis of calculation maximum utility of the given action. We can call this assumption as Utility Rational Procedural Assumption (URPA). For sake of concepts unity, I want to underline that chunks can be understood as indivisionale information units as presented above (i.e. Kowalski is eating a cake - whole data representing Kowalski is eating a cake, or eating - whole data representing fact of eating, or a cake - whole data representing a cake), but also as schemas and relations as well. 3.1.2. Chunks as schema structures The idea of a schema has a number of antecedents. However the origins may be found in Kant's idea of concepts the foundations for modern understanding of schema has been built by Sir Frederick Bartlett at Cambridge University. Bartlett (1932) was interested in how expectations play an critical role for people to remember and understand events in daily life. In the sixties, Piaget (1967) used schemata to understand changes in children’s cognition. The modern understanding of schema concept has been built in seventies. Schank’s (1972) conceptual dependency theory, uses a form of schemata to represent relational concepts; Schank and Abelson (1977) proposed a form of schemata called scripts, which contain organized sequences of stereotypical actions; Bower et al. (1979) also experimented with a similar conception of scripts, and discussed the segmentation of them into low-level action sequences called scenes. In artificial intelligence, Minsky (1975) proposed another schematalike structures, called frames. Frames are intended mainly for the representation of concepts, by grouping together sets of attributes, and then regrouping sets of frames in arbitrarily complex forms. According to Rumelhart (1980) schemas are well integrated parts of semantic network. When the cognitive processor focuses on the given object (Kowalski is eating a cake) a following single information units are being chunked: "Kowalski is eating a cake", "Kowalski is eating", "eating a cake", "Kowalski", "eating", "a cake". Single chunks are connected with associative connections: "Kowalski" with "eating a cake", "Kowalski eating" with "a cake", "Kowalski" with "eating", "eating" with "a cake". Such a set of chunks that can be a single chunks as well (Kowalski is eating a cake) is often referred to a schema structure. Figure As mentioned above schemas are well integrated parts of semantic network representing cognitive processor knowledge. The idea of network representing cognitive processor knowledge underlines the idea of single information units (chunks) and assotiative connections between them (Collins, Loftus 1975). Both of them have strong neurobiological inspirations referring to neural networks. When we call "Kowalski" concept represented by a single chunk we also call its associative features represented by another chunks like "eating" or "eating" and "a cake" with a given strentgh. This strenght of associations refers to probility of existence of assigned features of the given concept (Kowalski" is "eating" with a given probability, "Kowalski" is "eating" "a cake" with a given probability). Probabilistic character of connections between chunks is underlined by representatives of probabilistic view (Smith, Shoben, Rips 1973, 1974). Such set of chunks can be characterized in vertical and horizontal dimension of categories (Rosch 1978). Vertical dimension of category concerns generality of the concept (connection between a concepts "Man" and "Kowalski"). More general level of the description less data is needed (all sensual data needed for description of "Man" concept are included in the more detailed set of data in case of "Kowalski" concept). Horizontal dimension of categories concerns specimens of the concept existing at the higher level of generality. Most typical specimen of the given category (concept), more probability that calling (chunking) of this concept brings calling (chunking) of a chunk representing this specimen. Such set of chunks can be also analized as consisting in a permanent and variable part. Schema is like a play or a game (Rumelhart 1980). The screenplay represents permanent part of a schema (core), actors its variable part (tracks). Similarly Minsky distinguish two levels of his frame (schemata), the lower part (slots) consist in variable data. 3.1.3. Chunks as relations Earlier works of Collins and Quillian (1969) assumed that chunks are nodes of the semantic network connected with relations also representing semantic information (node of "Kowalski" and relation of "eating") . Difficulties in distinction differences between chunks and relations caused modifications of the theory of semantic networks. In result of the above Collins and Loftus relations have been defined as assotiative connections having given strenght. Previous relations as "eating" has been properly classified as the nodes (chunks). The above shows another understanding of chunks; as relations between other chunks being connected with each other via those given chunks. For example: chunk of "Kowalski" is in a chunk relation of "eating" with a chunk of "a cake". If we see "Kowalski is eating" once for ten times when we see "Kowalski" , probability of "Kowalski" is "eating" amounts to 10%. If we see "Kowalski is eating a cake" once for ten times we see "Kowalski is eating", probability of "Kowalski" is "eating" relation with "a cake" amounts to 1%. This functional relation role of chunks is very important in learning processes. Synthetic reasoning can be understood from a symbolic view as a process of reapeting schemas. This leads to increase of probability of given relation (represented as connecting chunk or connecting set of chunks) between two other chunks (induction). From connectionist view synthetic reasoning can be understood as increase of weights between chunks representing connection between given two chunks. Analitic reasoning can be understood from a symbolic view as deduction of one chunk in relation of given chunk(s) with another chunks with given probability. For example, when we see "Kowalski" but we do not see "Kowalski is eating a cake" and "Kowalski in not eating a cake" (we do not have knowledge on fact of eating a cake by Kowalski) higher probability of such a relation previously inducted higher probability that such a chunk in a given relation is chunked again. From connectionist view analitic reasoning can be understood as spread of activation. For example: when we see "Kowalski", "Kowalski" chunk is being activated; bigger weights with further chunks of "is eating" and "a cake", more intensive spread of activation from "Kowalski" chunk and more probability of next chunks to be chunked again. 3.2. Conclusions The concepts of chunks and its understanding as a schema structure and functional role as relation too is crucial for knowledge representation available for further data processing. The above presented cognitive view is a modern approach to first from two fundamental dimension of cognitive processor; the declarative one. 4. Declarative procedural knowledge 4.1. Declarative procedural schemas (Normative schemas) As previously said chunking is process of derivation information from the cognitive processor sources: senses, memory, emotional and information on own actions. In this section I focus on schemas where crucial role is played by chunks representing information on own actions and information on emotions (following own actions). Such chunks shall be referred to as declarative procedural schemas (or normative schemas). Figure The model declarative procedural schemas (normative schemas) consist in conditional chunk(s), chunk(s) representing knowledge on own actions and chunk(s) representing information on emotions (following own actions). As conditional chunk plays a desriptive part, so there are no other crucial remarks that shall be raised than indicated in section above. The problem starts with chunks representing knowledge on own actions and information on emotions. 4.2. Emotional (utility) chunks Emotions are one of cognitive processor sources of chunking process. In general, emotions shall be understood as cognitive processor information on its internal state which is information on realization of the given goals (assumed for the given cognitive processor). For example there is no doubt that in case of human beings, as well as other biological living beings, the basic goal is to survive. Emotional chunks play a role of utility function for other chunks according to their polarity and motivational characteristic. This no doubtful polarity and motivational characteristic is described with different terms: emotions or values (philosophy), emotions, needs or urges (psychology), utility, payoffs or goals (computer scientists). Two things about emotional chunks have to be stressed next. First, problem of appropriate configuration of values of emotional (utility) chunks is crucial for human level Artificial General Intelligence projects. If we aim to build such a human level cognitive architecture (cognitive processor) we shall have complete knowledge how utility chunks are set up in case of human beings what is equal to have complete knowledge about emotional consequenses of any possible action. Such knowledge is unavailable for the time being. Moreover such knowledge will not be available in forseenable future. Unavailibility of such knowledge and fact, that any differences in motivational configurations (configuration of values of emotional chunks) may have a strong impact on procedural activity of the cognitive architecture (cognitive processor), also from effectivity point of view, are in my opinion one of the biggest obstacles on a way to human level AGI. Obviously, generally human beings follow “good emotions”. But “bad emotions” are also needed for our long term emotional optimum. The very old proverb express it very well “without the bitter, the sweet ain't as sweet”. So in case of human beings we have difficulties in recognition of many basic goals, however many of them are clearly specified. The difficulties are even more, because this relevantly flexible set of most basic needs (goals) is associated with much more flexible set of subgoals. For example: one of basic need of affiliation may be connected with different subgoals (material status, competence, etc) dependently on time and particular environment. However difficulties in proper specification (configuration) of human goals (values of emotional chunks) as a cognitive architecture are not discussed here, there are many relevant papers on this subject. Moreover the above mentioned configuration difficulties are not important for considerations in this article. This is the second thing to be underlined as regards emotional chunks. So the second thing that has to be noticed is fact that knowledge how to set up a values of emotional (utility) chunks or to specify cognitive processor needs or to configure its motives is irrelevant for the essence of mechanism of the cognitive processor. In other words those configurational and technical issues have no impact on cognition per se. Its mechanism is the only important basis for normative considerations carried out herein. 4.3. Declarative procedural knowledge versus procedural knowledge The second issue important for proper description of declarative procedural knowledge or declarative procedural schemas (normative schemas) is to distinguish them from procedural knowledge. Distinction between declarative knowledge and procedural knowledge descends from Gilbert Ryle (1949). As previously said such a distinction is also an expression of declarative and procedural dimension of cognitive processors. Declarative memory concerns facts encoded in permanent memory. Facts can concern general or episodic knowledge. Declarative memory is assigned with "knowledge that". This kind of knowledge has a strong semantic component. Procedural memory concerns procedures of execution of mental and motor activities also encoded in permanent memory. Procedures concern executive or cognitive skills. Procedural knowledge is assigned with "knowledge how". Distinction between declarative and procedural memory is also a foundation for some cognitive architectures, such as ACT (Anderson 1995). The idea being a foundation for such a distinction is obvious. We can have all necessary information what to do, but it still not enough to do it. First we have to acquire a skill, to train a performance of the method, even if we have complete declarative knowledge about it. There is no doubt that acquisition of declarative knowledge and procedural knowledge are different, but there is no reasons to profess that they are storaged in the different way, they have different internal structures or their extraction methods are different (Nęcka 2008). One form of knowledge is inadaquate without the other, and thus choosing one form of knowledge over the other for any reason would lead to philosophical falsehoods adn worst, or a seriours neglect. There has been such a neglect of procedural knowledge over the course of history rooted in our human predilection for language on the one hand and pictures on the other. Add to this that we are still puzzled over the nature of time. It has only recently and reluctantly been incorporated into our understanding of the phisical word as an equal to space. Processes are hardly to preceive, hard to understand, and, in the end, get the short and of the theoretical stick. We are much happier with observable measurement - the visual realm again - and static relationships between relatively fixed quantities (Hartley ?). To keep unity of both kinds of knowledge and recognizing their obvious difference that can be expressed with another old proverb "knowing the path is not walking the path" I could propose a cognitive processor that is not able to abstract complex chunks (schemas) on information on own act (actions) into simpler ones. If we assume that such simple schemas are only available for execution for the cognitive processor, any declarative knowledge how to do the certain thing can not be extracted directed from this declarative knowledge below some level, and the cognitive processor has to learn how "translate" simple available chunks for complex once. This mechanism seems to frame the question of difference between the declarative and procedural knowledge. Still, storage and internal structures remain in both cases the same. Declarative procedural (normative) schemas (as declarative procedural knowledge) are part of declarative knowledge as they concern more generally described knowledge that is not acquired by way of training as skill acquisition but in a way of learning. The figure ... shows visually relationships between the above described kind of knowledge: declarative, declarative procedural (normative) and procedural. Fig 5. Procedural dimension of the cognitive processor 5.1. Procedural dimension in general The question arises: what is the value of such knowledge representation considerations, in particular as regards normative schemas, if there is not certain that it may be helpful in its fruitful usage in frames of acting cognitive architecture (processor)? This question becomes a gate to the second dimension of cognitive framework - the procedural one. As previously said the procedural problem is also expressed in distinction of the above described kinds of knowledge: declarative and procedural one. However the procedural problem is not an issue here, I want to show briefly declarative dimension of cognitive processor in action, what requires making some general procedural remarks. The cognitive processor in action uses its devices such as senses (data input device), memory (data storage device) and motor abilities (output device). It controls them by choices which (as we assumed) are generally taken on a basis of Utility Rational Procedural Assumption (URPA) which will be commented below. Subject of choices are procedures of execution of mental and motor activities (productions). Productions are a part the above described procedural knowledge which is assigned with "knowledge how". They are the above described low-level schemas consisting in model conditional, information on own actions and emotional (utility) chunks. Such cooperation of choosing mechanism and devices are carried out in frames of motor and mental (low order cognition: attension, high order cognition: thinking, reasoning, problem solving) activities. 5.2. Utility Rational Procedural Assumption (URPA) According to the rational analysis of choice, cognitive processors make choices among option in decision making and problem solving situations in order to maximize their expected utilities: each option has an expected probability (P) of achieving the goal and expected cost (C) of achieving a goal. If the value of the goal is G the expected gain associated with that option is PG. Subtracting the cost from this gives us the expected ability PG-C. The rational analyses claimed subjects choose the option with the highest PG – C. This rational analyses might seem to fly in the face of conventional wisdom that people do not behave to maximize their expected utilities (Dawes, 1988). For instance, in a simple probability matching situation one might expect subjects to always guess the more probable outcome which would maximize number of correct guesses. In contrast, subjects will often choose the more probable alternative with a probability that approximates the empirical probability (Kintsch, 1970). Lovett (1988) accounted for such deviations by assuming there was some randomness in the estimations of these expected values. Productions are fired (own actions described by a procedural schema) when conditional chunks matches the currently chunked data (chunk). In case of typical situations, that there are many productions which condition chunk matches the currently chunked data (chunk) the conflict rule is needed. The conflict resolution process select among various production, which control cognitive processor behavior. As we assumed generally the choice is made among competing productions according to Utility Rational Procedural Assumption (URPA) – their expected utility calculated as U – PG-C. In ACT-R theory each production rule is chosen according to the probability that reflects its expected gain E(i) relative to the competitors expected gains E(j). ACT-R chooses the production with the highest expected gain, but because of the noise in evaluation the production with a highest expected gain is chosen only a certain proportion of time. The presented below Conflict Resolution Equatation describes the probability that the a production with its expected gain E(i) will have the highest noise added expected gain where t controls the noise of the evaluation. There evaluations of expected gain are computed as the quantity E=PG – C, where P Is estimated probability of achieving the productions goal, G is the value of the goal and C is the cost to be expected in reaching the goal. P is the estimated probability of eventual successes in attaining the goal, it is decomposed into two parts: P=qr, where q is the probability that the production under consideration will achieve intended next state, and t is the probability of achieving the production’s goal given arrival at the intended next state. For practical reasons we can takes q a 1, leaving r as the main quantity to estimate. Under this constraint the r parameter is important for determining the choice among competitive productions. When a production’s parameter r Is low it implies that the production tends not to lead to the goal even when it leads to its intended next state, this r low value will be represented in a low P value, which will lead the production to have a low expected gain. In contrast a production with a high likehood of leading to its goal will have a higher estimated probability of achieving the goal and hence a higher expected gain evaluation. In ACT-R the value of the production’s r parameter is estimated as: r = successes/successes + failures The very important improvement made within ACT-R theory of choice is also implementation of decay of successes and failures experiences used in computing expected gain. In other words: more time elapsed from last use of a production rule a r parameter shall be lower. r(t) = successes (t) /successes (t) + failures (t) where where t(j) is defined as how long ago past success of failure was. For example in SOAR-RL, in each cycle, uses Boltzmann equation to select an action from set of possible actions: n P(a)= e Qt(s,a)xB / ∏ e Q t(s,b)xB B=1 P(a) is the probability of action being chosen. Qt(s,a) is the estimated value for performing action a in state s at time t. This equation returns the probability of action a being chosen out from the set of possible actions, based on the agent’s Q values of those actions and upon the variable B, called inverse temperature. The B parameter controls the greediness of the Boltzmann probability distribution. If B -> ∞ each action has the same probability (1/n) of being selected. In other words, a low B corresponds to random behavior and a high B to greedy behavior (Hogewoning). One thing has to be stressed strongly here, independently how big resistance it meets, that there is no cognitive evidence that there are any other factors needed to simulate any cognitive processor behavior, including human beings behavior, that utility calculation and some randomness. Independently how hard is to define a human level like utility aspect (as discussed above) and independently what is a randomness from philosophical point of view (and fact we can obtain only quasi random values) these two concepts are much more specified that any other metaphysical concepts (like a free will). They are also sufficient to describe any reasons of any psychological behaviour (behavior which is subject to psychological researches). 6. Theory of norms 6.1. Definition of norms There are many possible definitions of norms, dependently what their aspect we would like to stress. If we say that legal norms are sentences written in codes we stress their material aspect. In fact we loose all their aspects too. If we build theory of norms we shall answer the question what is most important function that such a theory shall comply with. The one results from the above example. Good theory of norms shall effectively characterize all possible aspect of norms, in the given example legal norms. Saying effectively I understand two things: the theory shall stress distinguished aspects of norm phenomenon proportionally (more important/frequently aspects are stressed stronger) and shall use minimum terms to describe norm phenomenon reasonably widely. 6.2. Cognitive theory of norms (CTN) The norms are declarative procedural knowledge (normative knowledge). To show how CTN perform functions from 6.1. 6.3. Comparsion with other theories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Normative schemas