Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Self-organization in Normal and Abnormal Cognitive Development Denis Mareschal Centre for Brain and Cognitive Development, Birkbeck College, University of London, Malet Street, London [email protected] Michael S. C. Thomas Neurocognitive Development Unit, Institute of Child Health, 30, Guilford St., London [email protected] To appear in: The Handbook of Brain and Behaviour in Human Development. 2 Running head: Self-organization in cognitive development Abstract This chapter discusses self-organization as a motor for cognitive development. Selforganization occurs in systems with many degrees of freedom and is ubiquitous in the brain. The principal means of investigating the role of self-organization in cognitive development is through connectionist computational modeling. Connectionist models are computer models loosely based on neural information processing. We survey a range of models of cognitive development in infants and children and identify the constraints on self-organization that lead to the emergence of target behaviors. A survey of connectionist models of abnormal cognitive development illustrates how deviations in these constraints can lead to the development of abnormal behaviors. Special attention is paid to models of development in autistic children. 3 We have come a long way in understanding the processes that underlie brain development since the days of Piaget’s attempts to relate cognitive development to an unfolding biological substrate (e.g., Piaget, 1971; 1980). Developmental Cognitive Neuroscience is a new field of research that addresses that very issue. The aim of this field is to bridge the gap between children’s cognitive development (as assessed by behavioral studies) and the underlying development of the brain (Johnson, 1997). Although the age-old debate concerning the relative importance of nature and nurture in determining development rages on, it has recently taken a new twist. Few people now claim that innate knowledge is hard-wired in a priori neural connections (representational innateness). Rather, it is generally accepted that both nature and nurture play a role in children’s cognitive development. The pivotal question that remains is the extent to which plasticity dominates development and the extent to which structural constraints are genetically determined such that experience only plays a limited role in fine-tuning these structures (Elman, Bates, Johnson, Karmiloff-Smith, Parisi, and Plunkett, 1996). This chapter will explore how the concept of self-organization can provide an account of behavioral development in infants and children. Along the way it will explore how constraints (or boundary conditions) guide self-organization. An important tool for exploring self-organization in cognitive development is connectionist computational modeling. Connectionist network models (or artificial neural network models) are computer models loosely based on neural information processing. These models allow us to explore how different system constraints interact with an environment to give rise to 4 observed system behaviors. They also provide a means of exploring how deviations in self-organization (due to a shift in boundary conditions) can result in the emergence of abnormal behaviors. In the rest of this chapter we begin by discussing self-organization in the brain. We then turn to discussing self-organization in cognitive development. Following this, connectionist modeling is introduced as a means of investigating self-organization in development. The two subsequent sections review connectionist models of normal cognitive development and abnormal cognitive development. Self-organization in the brain Many chapters in this handbook provide examples of self-organization. Selforganization occurs when structure emerges in response to a system’s dynamic interactions with an environment. Self-organization is a fundamental characteristic of the brain (Willshaw and von der Malsburg, 1976; Grossberg, 1982; Changeux, Heidman and Patte, 1984; Edelman and Finkel, 1984; von der Malsburg, 1995; Leslo, 1995). It can occur at several time scales: a learning time scale of hours and days, a developmental time scale of months and years, but also on the functional time scale of seconds and minutes. All stages of brain organization involve an element of self-organization (Leslo, 1995; Johnson, 1997). It is unlikely that the genes can (in any direct way) encode the full information necessary to describe the brain (Elman et al. 1996). Given that the cerebral cortex alone contain some 1014 synapses, and given the variability of vertebrate brain structure, it is difficult to see how individual wiring diagrams could be encoded within the limited coding space of the genome (von der Malsburg, 1995). 5 There are many well-studied examples of self-organization in the physical and biological sciences (Prigogine and Stengers, 1986). Self-organization occurs in systems with a large number of degrees of freedom (e.g., synapses in the brain). Initially the system is undifferentiated (randomly organized) but, as a result of small adaptive changes, an order begins to emerge among the elements of the system. These changes can self-amplify, resulting in a form of positive feedback. If there is a limitation in resources, this limitation can lead to competition and selection among the changes. Finally, changes can co-operate, enhancing the “fitness” of some changes over the others in spite of competition. With respect to self-organization in the brain, synaptic adjustment rules (such as the Hebbian learning rule; Hebb, 1949) can lead to ordered connection patterns that in turn lead to structured behaviors. A fundamental characteristic of self-organizing systems is that global order can arise from local interactions. This is extremely important in the brain where local interactions between cellular neighbors create states of global order and ultimately generate coherent behavior. However, we should be careful in what we understand by local here, because nerve cells are connected by long axons. Local neural interactions are not necessarily topologically arranged (von der Malsburg, 1995). Connected cells can be neighbors although they are physically located at other ends of the brain. One implication of this is that some ordered structures in the brain may not initially “look” ordered to our eyes because we rely heavily on spatial contiguity to perceive patterns. There are two relevant parameters in network self-organization. The first is the information or activation transmitted through the network (cf. action potentials). The second is the connection strength between successive units (cf. synaptic strength). 6 Connections control neural interactions and are characterized by a continuous weight variable. These connections reflect the size of the effect exerted on one unit by another unit. Organization in the brain can therefore take place at two levels: activity and connectivity. Changes in activation levels reflect self-organization at the instantaneous, functional level whereas changes in connectivity correspond to self-organization on a learning or developmental time scale. Self-organization in cognitive development The fundamental question of cognitive development is where new behaviors come from. Traditionally, developmentalists have looked for the source of these behaviors either in the organism or in the environment. Perhaps new structures arise as a result of instructions stored beforehand in some code (e.g., the genes). Or, perhaps new behaviors are acquired by absorbing the structures and patterns of the environment directly. Oyama (1985) has suggested that both of these accounts are fundamentally preformationist in that it is either the genes or the environment that determines the nature of the structures that are developed. She argues that attributing the origin of structure to the genes or the environment simply pushes back to another level questions concerning the causal origin of these structures. Therefore, it fails to answer the fundamental question of cognitive development. Oyama further suggests that it is the concept of self-organization that rescues developmentalists from this logical hole of infinite regress. In biological systems, pattern and order can emerge from the process of interaction without the need for explicit instructions. 7 There is now ample evidence of self-organization occurring during development in both linguistic and cognitive domains. Specific examples include children’s understanding of balancing relations, (Karmiloff-Smith and Inhelder, 1971), children’s drawing abilities (Karmiloff-Smith, 1990), and language acquisition (Karmiloff-Smith, 1985). Karmiloff-Smith (1992) suggests that what drives development is the endogenous principle of “representational re-description”. Even when performance seems to be adequate, there are pressures arising from within the cognitive system to re-describe existing knowledge in more abstract and accessible forms. These pressures arise from the need to make information in one functional module accessible to another functional module. There have been several attempts to explain the apparent stage-like growth of competence in children in terms of self-organizing principles. Van Geert (1991) outlined a framework for discussing language and cognitive development as growth under limited resources. He formulated a dynamic systems model of development in terms of logistic growth equations. This model describes development as a result of supportive and competitive interactions between “cognitive growers”. Similarly, Van der Maas and Molenaar (1992) presented an account of stage transitions on conservation tasks in terms of catastrophe theory. According to this account, discrete and qualitative shifts in behavior arise as a result in continuous changes in the underlying parameters of a system. Although these models provide a good account of how new behaviors can emerge through the continuous adjustment of abstract parameters, and as a result of endogenous pressures, they rarely relate those parameters to any measurable cognitive quantities. Nor do they relate them to some underlying neurological substrate. Explicit accounts of self- 8 organization in behavior have been limited to describing how actions are elicited in infancy, or to how separate motor systems become coupled to induce higher levels of motor action (e.g., Goldfield, 1995). A more effective means of exploring selforganization in cognitive development (and to relate development to neural information processing) is to construct neurally based computer simulations of cognitive development. Connectionist computational modeling Connectionist models are computer models loosely based on the principles of neural information processing (Rumelhart and McClelland, 1986; Elman, Bates, Johnson, Karmiloff-Smith, Parisi, and Plunkett, 1996; McLeod, Plunkett, and Rolls, 1998). They are information processing models and are not intended to be neural models. They embody general principles such as inhibition and excitation within a distributed, parallel processing system. They attempt to strike the balance between importing some of the key ideas from the neurosciences while maintaining sufficiently discrete and definable components to allow questions about behavior to be formulated in terms of a high-level cognitive computational framework. From a developmental perspective, connectionist networks are ideal for modeling because they develop their own internal representations as a result of interacting with an environment (Plunkett and Sinha, 1992). However, these networks are not simply tabula rasa empirical learning machines. The representations they develop can be strongly determined by initial constraints (or boundary conditions). These constraints can take the form of different associative learning mechanisms attuned to specific information in the 9 environment (e.g., temporal correlation or spatial correlation), or they can take the form of architectural constraints that guide the flow of information in the system. Although connectionist modeling has its roots in associationist learning paradigms, it has inherited the Hebbian rather than the Hullian tradition. That is, what goes on inside the box (inside the network) is as important in determining the overall behavior of the networks as is the correlation between the inputs (stimuli) and the outputs (responses). Connectionist networks are made up of simple processing units (idealized neurons) interconnected via weighted communication lines. Units are often represented as circles and the weighted communication lines (the idealized synapses) as lines between these circles. Activation flows from unit to unit via these connection weights. Figure 1a shows a generic connectionist network in which activation can flow in any direction. However most applications of connectionist networks impose constraints on the way activation can flow. These constraints are embodied by the pattern of connections between units. ============== Insert Figure 1 about here =========== Figure 1b shows a typical feed-forward network. Activation (information) is constrained to move in one direction only. Some units (those units through which information enters the network) are called input units. Other units (those units through which information leaves the network) are called output units. All other units are called hidden units. In a feed-forward network, information is first encoded as a pattern of activation across the bank of input units. That activation then filters up through a first 10 layer of weights until it produces a pattern of activation across the band of hidden units. The pattern of activation produced across the hidden units constitutes an internal rerepresentation of the information originally presented to the network. The activation at the hidden units continues to flow through the network until it reaches the output unit. The pattern of activation produced at the output units is taken as the network’s response to the initial input. Each unit in the network is a very simple processor that mimics the functioning of an idealized neuron. The unit sums the weighted activation arriving into it. It then sets it own level of activation according to some non-linear function of that weighted sum. The non-linearity allows the unit to respond differentially to different ranges of input values. The key idea of connectionist modeling is that of collective computation. That is, although the behavior of the individual components in the network is simple, the behavior of the network as a whole can be very complex. It is the behavior of the network as a whole that is taken to model different aspect of infant development. Given the units’ response functions, the network’s behavior is determined by the connection weights. As activation flows through the network, it is transformed by the set of connection weights between successive layers in network. Thus, learning (i.e., adapting one’s behavior) is accomplished by tuning the connection weights until some stable state of behavior is obtained. Supervised networks adjust their weights until the output response (for a given input) matches a target response. The target can be obtained from an explicit teacher, or it can arise from the environment, but it must come from outside the system. Unsupervised networks adjust their weights until some internal constraint is satisfied (e.g., maximally different inputs must have maximally different 11 internal representations). Backpropagation (Rumelhart, Hinton, and William, 1986) is a popular training algorithm for supervised connectionist networks that incrementally updates the network weights so as to minimize the difference between the network's output and some desired target output. These networks self-organize in such a way as to internalize structures in the environment. Through adaptation, the connection weights come to encode regularities about the network’s environment that are relevant to a task the network must solve. Networks are very sensitive to the distribution statistics of relevant features in their environment. A feedforward network with a single layer of hidden units can approximate arbitrarily well any finite output response function, given enough hidden units (Cybenko, 1989). Further details of the similarities between connectionist network learning and statistical learning procedures can be found elsewhere (e.g., Hertz, Krogh, and Palmer, 1991). There are two levels of knowledge in these networks. The connection weights encode generalities about the problem that have been accumulated over repeated encounters with the environment. One can think of this as a form of long-term memory or category specific knowledge as opposed to knowledge about a particular task or object. In contrast, the pattern of activation that arises in response to inputs, encodes information about the current state of the world. Internal representations are determined by an interaction between the current input (activation across the input units) and previous experience as encoded in the connection weights. Many connectionist networks are very simple. They may contain some 100 units or so. This is not to suggest that the part of the brain solving the corresponding task only has 100 neurons. It is important to understand that most connectionist models are not 12 intended as neural models, but rather as information processing models of behavior. The models constitute examples of how systems with similar computational properties to the brain can give rise to a set of observed behaviors. Sometimes, individual units are taken to represent pools of neurons or cell assemblies. According to this interpretation, the activation level of the units corresponds to the proportion of neurons firing in the pool (e.g., Changeux and Dehaene, 1989). Connectionist models of cognitive development Infancy provides an excellent opportunity to model self-organizing processes because behavior is closely tied to perceptual-motor skills. We begin this section by describing two models of infant cognitive development and then turn to describing selforganizing models of children’s cognitive development. Many infant categorization tasks rely on preferential looking or habituation techniques, based on the finding that infants direct more attention to unfamiliar or unexpected stimuli (Reznick and Fagan, 1984). Connectionist autoencoder networks have been used to model the relation between sustained attention and representation construction (Mareschal & French, 1997; Mareschal & French, 2000; Mareschal, French, & Quinn, submitted; Schafer & Mareschal, in press). An autoencoder is a feedforward connectionist network with a single layer of hidden units (Figure 1b). It is called an autoencoder because it associates an input with itself. The network learns to reproduce on the output units the pattern of activation across the input units. The successive cycles of training in the autoencoder are an iterative process by which a reliable internal representation of the input is developed. 13 This approach to modeling novelty preference assumes that infant looking times are positively correlated with the network error. The greater the error, the longer the looking time, because it takes more training cycles to reduce the error. The degree to which error (looking time) increases on presentation of a novel object depends on the similarity between the novel object and the familiar object. Presenting a series of similar objects leads to a progressive error drop on future similar objects. An unusual asymmetry has been observed in natural category formation in infants (Quinn, Eimas, & Rosenkrantz, 1993; Quinn & Eimas, 1996). When 3- to 4-month-olds are initially exposed to a series of pictures of cats, they will form a category of cat that excludes dogs. We can tell this because, after habituating to a series of pictures of cats, they nevertheless show a novelty response to a picture of a dog. However, when they are exposed to a series of pictures of dogs, they will form a category of dog that does include cats. Thus, there would be no novelty response to a picture of a cat after a series of dogs. We used the autoencoder network above to explain this behavior. The original cat and dog pictures were measured along 10 dimensions and presented to the networks for categorization. The same presentation procedure was used as with the infants. These networks developed CAT and DOG categories with the same exclusivity asymmetry as the 3- to 4-month-olds. Moreover, the model predicted that learning DOG after CAT would disrupt the prior learning of CAT whereas as learning CAT after DOG would not disrupt the prior learning of CATS. A subsequent study with 3- to 4-month-olds found this to be true of infants as well (Mareschal, French, and Quinn, submitted). The asymmetry was explained in terms of the distribution of cat and dog feature values in the stimuli presented to the infants. Most cat values fell within the range of dog 14 values but the converse was not true. Thus for a system that processes the statistical distribution of features of a stimulus, the cats would appear as a subset of the dog category. Further analyses revealed that these networks could parse the world into distinct categories according to the correlation of feature values in the same way as 10-montholds have been shown to do (Younger, 1985; Mareschal & French, 2000). In short, this model demonstrates that categorical representations can self-organize in a neural system as a result of exposure to familiarization exemplars. We now turn to discussing object-directed behaviors in infancy. Newborns possess sophisticated object-oriented perceptual skills (Slater, 1995) but the age at which infants are able to reason about hidden objects remains unclear. Using manual search to test infants’ understanding of hidden objects, Piaget concluded it was not until 7.5 to 9 months that infants can understand that an object continues to exist beyond direct perception. He concluded this because infants younger than this age fail to reach successfully for an object hidden behind an occluding screen (Piaget, 1952, 1954). More recent studies using a violation of expectancy paradigm have suggested that infants as young as 3.5 months do have some understanding of hidden objects. These studies rely on non-search indices such as surprise instead of manual retrieval to assess infant knowledge (e.g., Baillargeon, Spelke, & Wasserman, 1985; Baillargeon, 1993). Infants watch an event in which some physical property of a hidden object is violated (e.g., solidity). Surprise at this violation (as measured by increased visual inspection of the event) is interpreted as showing that the infants know: (a) that the hidden object still exists, and (b) that the hidden object maintains the physical property that was violated (Baillargeon, 1993). The nature and origins of this developmental lag between 15 understanding the continued existence of a hidden object and searching for it remains a central question of infant cognitive development. An initial attempt to account for this behavior in terms of the principles of selforganization was put forward by Munakata, McClelland, Johnson, and Siegler (1997). These authors reported on a connectionist model that learned to keep track of the potential reappearance of a hidden object. The model experienced a number of events in which a screen would move past a stationary object, thereby hiding the object. With experience, the model learned to predict when the object would reappear. Knowledge about the hidden object was gradually encoded in the network connection weights in response to experience with the environment. Similar accounts of infant performance on delayed response tasks that involved reaching for a hidden object, accounts also based on the interaction and competition between neural systems, have been proposed (Deheane and Changeux, 1989; Munakata, 1988). Mareschal, Plunkett, and Harris (1999) describe a model also designed to address this question (Figure 2). This model embodies more neurophyisiological constraints than the previous models. Anatomical, neurophysiological, and psychophysical evidence points to the existence of two processing routes for visual object information in the cortex (Ungerleider & Mishkin,1982; Van Essen, Anderson, & Felleman, 1992; Goodale, 1993; Milner & Goodale, 1995). Although the exact functionality of the two routes remains a hotly debated question, it is generally accepted that they contain radically different kinds of representations. The dorsal (or parietal) route processes spatial-temporal object information, whereas the ventral (or temporal) route processes object feature information. 16 The Mareschal et al. model is more complex than the simple autoencoder networks described above. Rather than drawing individual units (as in Figure 1), each box represents a layer of units and each arrow represents a full set of connections between successive layers. The dotted lines delimit separate modules. Information enters the networks via a simplified retina. The Object Recognition Network develops a spatially invariant feature-based representation of the object (cf. the functions of the ventral cortical route) whereas the Trajectory Prediction Network develops a spatial temporal representation of the object (cf. the functions of the dorsal cortical route). The Response Integration Network recruits and co-ordinates these representations as and when required by an active, voluntary response. ============= Insert Figure 2 about here ======== Like infants, the model showed a developmental lag between expectancy and retrieval. Active tasks such as retrieval of a desired hidden object required the integration of information across the multiple object representations, whereas surprise or dishabituation tasks may only require access to one of the representations separately. A developmental lag appeared between retrieval and surprise-based tasks because of the added cognitive demands of accessing two object representations simultaneously in an active response task. The model predicted that dishabituation tasks requiring infants to access cortically separable representations would also show a development lag, as compared to tasks that only required access to one cortical representation. 17 We now turn to considering models of cognitive development in children rather than infants. The recent revival in connectionist modeling of cognitive development was triggered by the need to provide a mechanistic account of Piagetian cognitive development. Piaget described the motors of development in terms of assimilation (that new information is changed to match existing knowledge better), accommodation (that existing knowledge is changed to match new information better), and equilibration (a combination of assimilation and accommodation, Piaget, 1977). However, it was never entirely clear what these terms referred to (Boden, 1980). The first connectionist models of children’s cognitive development attempted to couch connectionist modeling in terms of assimilation and accommodations (Plunkett and Sinha, 1992). So, for example, the pattern of activation produced by the presentation of an input to a network was described as assimilation, while weight adjustment in response to an error signal was seen as accommodation (McClelland, 1989). In an alternative account, weight adjustments were seen as assimilative learning and the addition of new units (the growth of the networks) was seen as accommodation (Mareschal, 1991; Shultz, Schmidt, Buckingham and Mareschal, 1995). As a result of the focus on Piagetian development, many of the early models tried to capture the trademark of Piagetian theory: stage-like development. Perhaps the first connectionist model of cognitive development was Papert’s (1963) Genetron model. Papert used a series of simple neural networks (Perceptrons) to model how children learn to assess which of two sticks is the longest. He found that, like the children, perceptrons would focus on one end of the stick before considering both ends in conjunction. In his model, development was viewed as the hierarchical integration of successive perceptron modules. 18 A classic post-piagetian problem-solving task on which children show stage-like development is Siegler’s (1976) balance-scale task. Siegler examined children’s developing abilities to predict which side of a balance scale would tip when a set of weights was positioned at varying distances on either side of the fulcrum. Detailed analyses of behaviors across of a battery of balance-scale problems revealed that children progressed through 4 stages in the development of balance scale understanding. In stage 1 children always chose the side with the most weights as the one that would go down. In stage 2 children would attend predominantly to the weight dimension, but if there were equal weights on both sides, they would chose the side with the greatest distance as the one to go down. In stage 3, children would succeed on weight problems, but would guess in conflict problems where there were more weights on one side but greater distance on the other side. Finally, in stage 4 children would solve all problems by implicitly computing the torque on each side of the fulcrum. Performance on this task has been modeled by presenting feedforward connectionist networks with a series of balance scale problems (McClelland and Jenkins, 1991; Shultz, Mareschal and Schmidt, 1995). Networks were presented with weight and distance information for the left and right sides of the fulcrum and had to learn to predict which way the balance scale would tip. The networks developed through a series of stages analogous to the children. Development was accounted for in terms of the selforganizing properties of weight adjustment and the possible addition of new units to the network. The macroscopic behavior of stage development emerged as a result of finegrained, local adjustments to the weights. An important assumption of these models was that the children’s learning environment was biased towards learning from problems 19 where weight was the important dimension. That is, the dimension of weight was made more important in the training set. Thus, self-organization in these networks reflected the distribution of problems that children learn from. Mareschal and Shultz (2000) describe a connectionist model of the development of children’s Seriation (or sorting) abilities. Seriation is a task originally developed by Piaget (1965) to probe children’s developing transitive reasoning skills. Piaget identified 4 stages of development. In stage 1, young children made no effort to sort a series of sticks when asked to do so. In stage 2, they would group sticks according to local size comparisons, but were unable to extend this order to the entire array. Their sorted series consisted of small groups of consistently sorted sticks but would not extend over the entire set of sticks. In stage 3, children successfully sorted the entire array of sticks, but used a slow and fastidious empirical trial and error strategy. Finally, in stage 4, children quickly and efficiently sorted the entire series by systematically choosing the smallest unordered stick at each move. Mareschal and Shultz modeled this task by decomposing the Seriation task into the two subtasks. These were: (1) identifying where a stick should be moved to and (2) identifying which stick need to be selected. Each task was learnt by a distinct connectionist module. There was no initial architectural difference between modules that biased a network to learn one task better than the other task. However, one module received feedback about the correct stick to move and the other module received feedback about the correct location that a stick needed to be moved to. In response to this feedback, the two modules self-organized differently and in such a way as to encode the order relationship that ultimately enabled them to accomplish their individual subtasks. 20 During this process the whole network progressed through the same behavioral stages (described above) as do the children. Again, the macroscopic stage-wise developmental profiles could be attributed to fine-grained changes in the connection weights between units in the sub-networks. Also of interest was the fact that the observed level of competence of a given network resulted from the interacting competence of the two modules developing at their own distinct rates. Observed sorting errors could be explained in terms of an interaction between errors on selecting a stick and errors in placing a stick in a partially ordered series. In continuing with the general trend to model Piagetian tasks, Shultz (1998) described a connectionist model of conservation behaviors in children. Conservation is the understanding that some quantities are preserved during some perceptual transformations (e.g., the total amount of clay is preserved when a lump of clay is stretched out). The model shows a shift from attending to perceptual information to transformation information as a result of self-organization in the network. Tasks on which children’s development appears not to show stages have also been modeled. Sirois and Shultz (1999) described a model of discrimination shift learning in children. Discrimination shift tasks represent a basic form of concept learning in which the subjects must shift the criteria for categorization during the task. Manipulations of the amount of experience the networks had with the task simulated the developmental phenomena. This suggests that human developmental differences in shift learning arise from spontaneous over-adaptation in older participants. Connectionist models have also been used to investigate processes of early language development (see Plunkett, 1997, for a review). These include models of early 21 phonological development in learning to categorize speech sounds (Nakisa and Plunkett, 1998, Schafer and Mareschal, in press), learning to segment the speech stream into discrete words (Christiansen, Allen, and Seidenberg, 1998), vocabulary development (Gasser and Smith, 1998; Plunkett, Sinha, Muller, and Strandsby, 1992), the acquisition of inflectional morphology, for instance in forming the past tense of verbs (Daugherty & Seidenberg, 1992; Juola & Plunkett, 1999; MacWhinney and Leinbach, 1991; Plunkett and Marchman, 1991, 1993, 1996; Rumelhart and McClelland, 1986) and in forming the plural of nouns (Forrester and Plunkett, 1994; Juola & Plunkett, 1999; Plunkett and Nakisa, 1997), and models of the acquisition of syntax (Elman, 1990, 1992, 1993). In all of these models, an initial network structure is postulated by the modeler, but the connection weights are randomized, so that the model has no initial content. The model is then exposed to language input of some sort, and required to adjust its weights, either to form concise representations with an unsupervised algorithm (e.g. Nakisa and Plunkett’s model of early phonological development) or to learn a mapping between two domains using a supervised algorithm (e.g., in Christiansen et al.’s (1998) model, the network had to learn a mapping between the current phoneme and the next phoneme in the speech stream; in Rumelhart and McClelland’s (1986) model, the network had to learn a mapping between the stem and the past tense form of verbs). During training, the network weights self-organize to reflect the structure of the language task. Thus in the lexical segmentation model, Christiansen et al. showed how the model could learn to combine probabilistic cues about phonemes, relative lexical stress, and boundaries between utterances, to predict word boundaries. Each of these cues on its own is an 22 unreliable indicator, yet after training on a large corpus of child-directed speech, the model was able to use these cues to reliably identify word boundaries. These models have been successful in showing how distributional information in the child’s language input can be very useful in helping the child to learn language. It is the latent structure in the input that allows these self-organizing models to form structured representations appropriate to the linguistic task (Redington and Chater, 1998). The fact that such information can be extracted from the linguistic environment has led to a reconsideration of the degree to which aspects of language must be innately coded into the developing language system (Elman et al., 1996). However, the difficulty of extracting information which is not readily apparent in the language input, such as underlying grammatical structure, continues to drive claims that significant aspects of our knowledge of the structure of language are innately specified (see e.g. Chomsky, 1986, 1988, 1995; Pinker, 1994, 1999). This remains a very active area of research (see e.g. Henderson and Lane, 1998, for recent work on the acquisition of grammatical structures in connectionist networks). In summary connectionist models provide concrete examples of how the processes of self-organization can account for many aspects of behavioral change during cognitive development. Moreover, connectionist models are loosely based on the principles of neural information processing and thereby provide a means of linking Behavioral development with neural or brain development. Connectionist models of developmental disorders The ability of self-organizing connectionist models to capture changes during development also presents us with the opportunity to investigate disorders of 23 development. Connectionist models have been used to explore a range of developmental disorders, including autism (Cohen, 1994, Gustafsson, 1997), developmental dyslexia (Harm and Seidenberg, 1999; Manis, Seidenberg, Doi, McBride-Chang, and Peterson, 1996; Plaut, McClelland, Seidenberg, and Patterson, 1996; Seidenberg and McClelland, 1989; Zorzi, Butterworth, and Houghton, 1998), Specific Language Impairment (Hoeffner and McClelland, 1993), the development of morphology in a damaged language system (Marchman, 1993), and language processing in Williams syndrome (Thomas & Karmiloff-Smith, 1999). Connectionist models of normal development carry with them a number of assumptions about constraints (boundary conditions). These include (1) the initial state of the network, in terms of the number of units, layers, connections, and the pattern of connectivity, (collectively known as the network architecture); (2) the way a particular cognitive problem is presented to the network, in terms of the input and output representations; (3) the dynamics of activation changes in the network; (4) the learning algorithm that the network will use to change its connection weights (and potentially, its architecture); and (5) the regime of training which the network will undergo. Models of developmental disorders have assumed that one or more of these elements is atypical in the disordered system. In particular, the following initial changes have been proposed to simulate each disorder: ‘too few’ hidden units (developmental dyslexia, autism), ‘too many’ hidden units (autism, Williams syndrome), ‘too much’ lateral inhibition in the output layers of self-organizing feature maps (autism), alterations in input and output representations (developmental dyslexia, Williams syndrome, Specific Language Impairment), and elimination of intermediate layers of units 24 (developmental dyslexia, Williams syndrome). We will return to consider what constitutes ‘too few’, ‘too many’, or ‘too much’ later. For the time being, the key aspect to note with these models is that, with the exception of Marchman (1993), who explored the ability of a network to recover from damage at different points in development, these models all postulate differences in the system prior to training. That is, the system undergoes a process of development under a set of constraints that is different right from the start. As a consequence, the developmental profile that we find in these systems may be qualitatively different from the normal case, rather than simply being a delayed version, or a version of development which is normal but terminates at an earlier stage. As we shall see, connectionist models provide us with a new type of vocabulary to consider the nature of this disordered development. The application of connectionist modeling methods to developmental disorders is still very new. However, enough work has been done to outline the potential contribution of this approach, as well as its possible weaknesses. In the following section, we will consider one example in more detail, that of autism. Autism is a developmental disorder characterized by a central triad of deficits. These deficits are in social interaction, communication, and imagination (Wing and Gould, 1979). In addition, other features have been associated with the syndrome. These include a restricted repertoire of interests, an obsessive desire for sameness, savant abilities, excellent rote memory, a preoccupation with parts of objects, improved perceptual discrimination, and an impaired ability to form abstractions or generalize knowledge to new situations (see Happé, 1994, for a review). Autism has multiple biological causes (Gillberg and Coleman, 1992), and a disposition to the disorder is 25 probably inherited (Simonoff, Bolton, and Rutter, 1998). Autism appears to have multiple Behavioral subgroups, so that it is perhaps best characterized as a spectrum of disorders rather than a single disorder (Wing, 1988). Cohen (1994) focused on the pattern of improved perceptual discrimination and impaired generalization found in the syndrome. He concluded that evidence from neuropathological investigations of the brains of affected individuals was suggestive of abnormal wiring patterns in various brain regions, perhaps caused by deficits in neuronal migration during fetal development (Piven et al., 1990), by curtailment of normal neuronal growth (Bauman, 1991), and/or by aberrant development (Courchesne et al., 1993). In comparison with the normal brain, in Cohen’s view, the structural deficits were consistent with too few neurons in some areas, such as the cerebellum, and too many neurons in other areas, such as the amygdala and hippocampus (see also Bauman, 1999). Cohen took a simple three-layer feedforward network and trained it on a classification task using Backpropagation. The network had two output units for the binary classification task, and 11 inputs units on which to present exemplars of the two classes. Cohen explored the effect of varying the number of units in the hidden layer between 1 and 13, firstly on the ability of the network to learn the classification in the training set, and secondly on its ability to generalize this classification to a new test set. Cohen found that with one hidden unit, performance on both the training set (examples previously presented to the network) and generalization set (novel examples) was poor. With 13 hidden units, the network quickly and accurately learnt the classification for the training set. Early on in training, generalization was also good but with further training, generalization then declined, as the network was subject to over- 26 learning. In over-learning, the network’s representations become increasingly focused on details of the training set. Cohen suggested that his networks captured different characteristics found in autistic subjects. In some cases, autistic children have trouble acquiring simple discriminations and attend to a restricted range of stimuli (corresponding to too few neurons). In others, autistic children have good discrimination and indeed very good memory, but this relies on representing too many unique details of stimuli, details that subsequently interfere with generalization (corresponding to too many neurons). This study illustrates some of the advantages and disadvantages of using connectionist networks to model disordered development. Let us begin with the disadvantages. Using a modeler’s rule of thumb, Cohen decides that a network with three hidden units will be the ‘normal’ case (p.12). Less than this and learning suffers, greater than this and generalization suffers. But where does the rule of thumb come from? Well, connectionist modelers have traditionally attempted to set the number of hidden units precisely to achieve optimal learning on the training set and optimal generalization. Cohen can provide us with no independent justification for the ‘normal’ set of parameters in the network. And this holds true for all of the models mentioned above. Thus far, we simply don’t have solid grounds for saying why some set-up of architecture, representations, and training regime should represent the ‘normal’ case for development – other than it seems to capture the normal pattern of development. On the other hand, it is worthwhile to show that a single model can account for normal development using one set of parameters, and atypical development using a second set. This suggests we may be on the right track. Nevertheless, work remains to be done to ground these parameter 27 settings. In Cohen’s case, this might be done by pursuing the hypotheses that certain patterns of neural connectivity are a sufficient condition for autism, and that variation in this connectivity can account for the various sub-groups of autism. On the plus side, Cohen’s model does generate some interesting new hypotheses. One of these is that the apparent regression in functioning that some autistic children show between 18 and 30 months of age could be explained by over-learning in systems which have too many neurons. The second is that Behavioral therapy may be effective because its training regimes weaken previously established, idiosyncratic connections in the networks which have too many neurons, and strengthen connections for knowledge that forms a better basis for generalization. These are useful unanticipated hypotheses that have arisen from the modeling process. Gustafsson (1997) followed a similar line of thought in proposing that the capacity of autistic individuals to form representations of sensory experience may be impaired. Gustafsson suggested that their cognitive impairment arose in the development of ‘cortical feature maps’. These maps are thought to develop in systems using unsupervised learning, where the aim is for a set of processing units to self-organize such that they form a concise representation of the information they receive. This concise representation may then be used to drive higher-order processes. In Gustafsson’s view, the formation of such feature maps is damaged in autism due to too much lateral inhibition in this layer of units. The idea can be explained as follows. A simple, unsupervised network has two connected layers of units, the input units and the output units. On a given learning trial, the output units compete with each other to become the one that will represent the pattern of activation arriving from the input units. Through 28 exposure to many different patterns, areas of neighboring output units come to specialize in representing certain patterns. Lateral inhibition is the process that mediates the competition between the output units. Now, if there is ‘too much’ competition between these units, the output units may come to represent very precise details of particular input patterns. For example, instead of a unit coming to represent a book, it might represent the book on a shelf at a certain angle with another book next to it. While such fine-grained information would be useful in classification tasks (such as spotting the book on the shelf), this context-bound information is a poor basis for generalization (e.g. recognizing the book over a range of situations). In relation to autism, fine-grained information would explain good perceptual discrimination abilities. The inability to form abstractions of sensory data might explain impairments to higher order processes, since such processes would have to rely on “raw data” or rote memory (Hermelin and O’Connor, 1970). Although Gustafsson did not run any computer simulations, Oliver, Johnson, Karmiloff-Smith, and Pennington (2000) have examined the ways in which just such a process of feature map formation can be disrupted by changes in the initial properties of a self-organizing connectionist network. Moreover these authors did so in an attempt to develop a more explicit conceptual framework in which to consider developmental disorders. Oliver et al. employed a model based on that proposed by Miller, Keller, and Stryker (1989) to explain the emergence of ocular dominance columns in the primary visual cortex of mammals. Ocular dominance columns are sets of neurons tuned to respond to information coming from only one eye. Initially, neurons in this area of visual cortex are not specialized, responding to both eyes. However, through a process of self- 29 organization, the sheet of neurons segregates itself into strips that respond only to one eye or the other. These strips of neurons supply information that can be used in subsequent computations involved in stereovision. Miller et al.’s neural network model had an output layer which received input from two retinas. In the output layer, output neurons competed with each other to respond to input patterns. Miller et al. showed how this network, constructed using biologically plausible principles, could self-organize its output layer to segregate into areas responding only to input from one or other of the retinas. Similarly, Oliver et al. employed a two-layer network, whereby an output layer of 30x30 units received information from a single input retina of 30x30 units. The network was shown a set of 4 stimuli, in the form of bars lying across the input retina. The bars had a certain similarity structure, whereby A was similar to B, B was similar to C, and C was similar to D. Oliver et al. showed that, using their initial parameter set, the output layer formed a topographic map of the possible inputs. That is, certain areas of the output layer specialized in responding to input A, other areas specialized in responding to input B, other areas to C, and so on. Moreover, the similarity structure of the input was reflected in the topographic map: the area responding to A was next to that responding to B, the area for B was next to that for C, and so on. Oliver et al. then went on to disrupt the network in various different ways. In each case they performed the manipulation prior to exposing the network to the training stimuli. They varied the threshold of the output units, disrupted the connectivity between the input and output layers, disrupted the connectivity responsible for lateral inhibition in output layer, and changed the similarity of the input stimuli to each other. These manipulations demonstrated that small initial differences in the constraints under which 30 the model developed could have significant effects on the outcome of development. The resulting topographic map suffered a range of disruptions, including output units failing to specialize at all or simply turning off, specialization emerging but not in organized areas, and organized areas emerging but without adjacent areas representing similar looking bars. Three points are of particular note here. Firstly, loss of function in the final performance of the network was not due to removing specific parts of the network responsible for that function. Final performance was the outcome of a developmental process. Secondly, changes in the initial network conditions altered the trajectory of development that the network followed. Disordered development was qualitatively different. Oliver et al.’s model demonstrates that Gustafsson’s conjecture is a plausible one. Changes in factors such as the initial levels of lateral inhibition in the output layer may well disrupt the featural representations subsequently formed by the network, and indirectly, any processes that use these representations. This leads us to the third point. The outcome of this differential development may not simply be a generalized impairment, but a pattern of strengths and weaknesses. The representations may be changed in ways that are good for some tasks but bad for others. Thus, context-bound features will support superior discrimination, but will lead to inferior generalization. Even abilities that seem unimpaired (sometimes referred to as ‘intact’ or ‘spared’ abilities) may, as a consequence of disordered development, be achieved by qualitatively different cognitive processes (Karmiloff-Smith, 1998). 31 This explanation of autism is exciting in that it attempts to build bridges between levels of description, from its basis in biological principles through computational modeling to Behavioral outcome. However, the explanation is somewhat limited. Both Cohen and Gustafsson’s models tackle associated perceptual impairments of the disorder. But neither addresses the central triad of features that is generally taken to characterize the syndrome – deficits in socialization, communication, and imagination. It is far from clear that their biologically constrained network models can be straightforwardly extended to account for deficits in, say, play behavior (Thomas, 2000). Currently, when we extend connectionist principles to investigating high level behavior of this sort, these principles serve less as biologically motivated constraints and more as an extended metaphor, a set of conceptual tools to formulate new ideas that themselves must be empirically tested (Thomas and Stone, 1998). Perhaps it is the case that context-bound sensory feature maps impair imagination, socialization, and so on. If so, we must derive testable hypotheses and collect empirical support for this proposal. For example, we might identify a particular social task, hypothesize how such a task might be achieved using ‘social’ features, and then predict how context-bound social features would lead to certain kinds of deficits in this task. Support for a hypothesis of this type would strengthen Cohen and Gustafsson’s respective claims that something akin to learning with too many neurons or learning with unusual levels of lateral inhibition could account for the range of deficits associated with autism. It is likely that the application of connectionist modeling to developmental disorders will have at least two implications for this field. The first will be to favor dynamic rather than static accounts of selective deficits. The second will be to force 32 clarification of the relationship of individual differences within normal development to atypical development. With regard to the first point, it is important to note that in recent times, it has been common to characterize developmental disorders as if they were equivalent to cases of adult brain damage – as if they represented a normal cognitive system largely ‘intact’ but for the loss of some specialized functional module. This is nicely illustrated by one of the leading explanations of the triad of deficits in autism, the “theory of mind” hypothesis (Baron-Cohen, Leslie, and Frith, 1985). Under this hypothesis, autistic people lack the specific ability to understand and reason about mental states. For instance, they would not be able to understand other people’s behaviors in terms of their (potentially false) beliefs and desires. This inability to ‘think about thoughts’ leads to specific impairments in social skills, communication, and imagination. Leslie (1987) has suggested that a particular, discrete functional component of the cognitive system might be responsible for this ability (the ‘Theory of Mind Module). Moreover, Baron-Cohen suggests that this module is innately built into the human mind (e.g. Baron-Cohen, 1998). Autism, in this view, is the consequence of highly selective damage to one component of an otherwise normally developing cognitive system. Similar explanations of behavioral deficits in developmental disorders can also be found for Specific Language Impairment and for Williams syndrome (Clahsen and Almazan, 1998; Pinker, 1991, 1994, 1999). As discussion of Oliver et al.’s (2000) model demonstrates, connectionist accounts of developmental disorders place much greater emphasis on the process of development itself. That is, the selective behavioral deficits found in the end state of an atypically developing system are the outcome of a developmental process which, due to 33 differing initial constraints, has followed an alternative trajectory. The system is qualitatively different, not a normal system with some elements missing. This view is consistent with recent criticisms of the static adult brain damage account of developmental disorders (Bishop, 1997a; Karmiloff-Smith, 1998). The adult brain damage account is only feasible if one of two rather unlikely assumptions turn out to be true. These are either that the computational modules comprising the cognitive system develop independently, or that the content of these modules is static and innately specified (as implied by Baron-Cohen, 1998). However, Bishop (1997b) has argued that interactivity rather than independence is the hallmark of early development; and a number of researchers have argued that functional modules are the outcome of a developmental process, rather than being a precursor to it (Elman et al., 1996; KarmiloffSmith, 1992). Indeed connectionist modeling thus far tends to lend support to the idea that damage to a system prior to development has a quite different effect to damage of the endstate system (Elman et al. 1996; Thomas, 2000). The same kind of pre-training network manipulations that were used to model developmental disorders have also been used after network training is complete, but now to capture a range of acquired disorders, including Alzheimer’s disease, Parkinson’s disease, epilepsy, phantom limbs, stroke, frontal lobe damage, prosopagnosia, semantic memory deficits, acquired dyslexia, alexia, schizophrenia, and unipolar depression (see Reggia, Ruppin, and Berndt, 1996, for a selection of such models. See Marchman, 1993, for an investigation of the effect of lesioning a network while development is still under way). 34 The second implication of connectionist models for the study of developmental disorders is with regard to the relation of atypical development to individual differences. McLeod, Plunkett and Rolls (1998) have suggested that connectionist models may be able to capture individual differences between people’s cognitive processes by varying network parameters such as learning rate, number of hidden units, and initial weight configurations. Given that the same types of manipulations have been used to capture disordered development, this suggests that (albeit implicitly) connectionist theorists support the view that developmental disorders and individual differences lie on a continuum. However, this view has been strongly contested. Developmental disorders are frequently thought to involve organic damage, and there is evidence that atypical development is qualitatively different from the development found at the lower end of the normal distribution of typical development (Bennett-Gates and Zigler, 1998). Work remains to be done to explore the extent to which initial differences in constraints overlap for typical and atypical groups in development and the extent to which these constraints are different. Discussion and Future Directions This chapter has illustrated how models can be used to investigate selforganization in cognitive development. When artificial neural network models are used, developmental phenomena can be related to processes of self-organization in the brain. This offers one way of addressing the central question of Developmental Cognitive Neuroscience (Johnson, 1997); namely, the relation between brain development and cognitive development. The models also provide a means of exploring the cause of 35 abnormal development in terms of divergent self-organization. A number of outstanding questions still need to be addressed. The biological plausibility of connectionist models has been questioned (O’Reilly, 1999). Although these models are functionally equivalent to some of the features of neural information processing, many other processes exist in the brain. These other processes (e.g., diffusion of neural modulators) may radically effect the self-organizing properties of these systems. Such processes must be incorporated in any future modeling work. One example is synaptogenesis, the creation of new synapses. Until recently, synaptogenesis was not thought to play an important role in later brain development. However, Quartz and Sejnowski (1995) have argued that synaptogenesis and arborization (the development of new dendritic trees on which new synapses may be formed) play an important part in increasing the computational power of natural neural networks. Although these ideas have been incorporated in some models of cognitive development (see Mareschal and Shultz, 1996, for a review) they still remain relatively rare. The realization that synaptogenesis may play an important role in some aspects of cognitive development begs the question of which processes are best accounted for in terms of synaptogenesis and which are best accounted for in terms of self-organization in systems with fixed numbers of synapses. Shultz and Mareschal (1997) provide one tentative answer by suggesting that abilities that emerge early in development and that are culturally invariant should be modeled with static network models, whereas later developing and culturally specific abilities should be modeled with a synaptic growth process. 36 Many of the models described above speak of cognitive modules. These are components dedicated to processing certain types of information. While there is no denying that modularization exists both in the brain and in cognition (Karmiloff-Smith, 1992), the origin of these modules is still unresolved. Again, the modules may be prewired, or they may emerge though self-organizing processes such as competition (e.g., Jacobs, 1999; Jacobs, Jordan, and Barto, 1991). Throughout this chapter we have restricted our review to a single class of connectionist models (see Grossberg, 1982) for an example of an other class of models). The appropriateness of this class of model for capturing self-organization in cognitive development has been brought into question. Raijmaker, van Koten, and Molenaar (1996) have suggested that existing connectionist models do not capture dynamic aspects of selforganization sufficiently well. They suggested that temporal oscillations in fully connected recurrent networks are the appropriate markers of self-organization in neural systems (see also, Molenaar, 1986). Finally, the models described in this chapter make no attempt to account for explicit or metacognitive knowledge. That is, knowledge that can be verbalized and that can be subjected to syntactic logical operations. Because of this shortcoming, it has been suggested that connectionist networks cannot reveal anything of interest about cognitive development (Marcus, 1998). What this issue really raises is the degree to which explicit and metacognitive knowledge is important in the observed development of behavior. It also suggests that modeling work needs to be done in order to understand how the brain might instantiate these abilities and how they might emerge as self-organizing processes. 37 Further Reading Elman, J. L., Bates, E. A., Johnson, M. H., Karmiloff-Smith, A., Parisi, D., & Plunkett, K. (1996). Rethinking innateness: A connectionist perspective on development. MIT Press, Cambridge (Mass). Johnson, M. H. (1997). Developmental cognitive neuroscience. Blackwells, Oxford, UK. Karmiloff-Smith, A. (1998). Development itself is the key to understanding developmental disorders. Trends in Cognitive Science, 2, 389-398. Keslo, S. (1995). Dynamic Pattern: The Self-organization of Brain and Behavior. MIT Press, Cambridge (Mass). Siegler, R. S. (1997). Children’s Thinking (3rd Ed). Prentice Hall, London, UK. Thelen, E, and Smith, L. (1994). A dynamic systems approach to the development of cognition and action. MIT press, Cambridge (Mass). Keywords: Self-organization, connectionism, cognitive development, developmental disorders, children, infants 38 References Baillargeon, R., Spelke, E. S., & Wasserman, S. (1985). Object permanence in 5-monthold infants. Cognition, 20, 191-208. Baillargeon, R. (1993). The object concept revisited: New directions in the investigation of infants' physical knowledge. In Visual perception and cognition in infancy (C. E. Granrud, ed.). LEA, London, UK, pp. 265-315. Baron-Cohen, S. (1998). Modularity in developmental cognitive neuropsychology: Evidence from autism and Gilles de la Tourette syndrome. In Handbook of Mental Retardation and Development (J. A. Burack, R. M. Hodapp, & E. Zigler, eds.). Cambridge Un. Press, Cambridge, UK, pp. 334-348. Baron-Cohen, S. (in press). Does the study of autism justify minimalist innate modularity? Learning and Individual Differences. Baron-Cohen, S., Leslie, A. M., and Frith, U. (1985). Does the autistic child have a "theory of mind"? Cognition, 21, 37-46. Bauman, M. (1991). Microscopic neuroanatomic abnormalities in autism. Pediatrics, 87, 791-796. Bauman, M. (1999). Autism: Clinical features and neurobiological observations. In Neurodevelopmental disorders (H. Tager-Flusberg, ed.). MIT Press, Cambridge (Mass), pp.383-399. Bennett-Gates, H. and Zigler, E. (1998). Resolving the development-difference debate: An evaluation of the triarchic and systems theory models. In Handbook of Mental 39 Retardation and Development (J. A. Burack, R. M. Hodapp, & E. Zigler, eds.). Cambridge Un. Press, Cambridge, UK, pp. 115-131. Bishop, D. V. M. (1997a). Cognitive neuropsychology and developmental disorders: Uncomfortable bedfellows. Quarterly Journal of Experimental Psychology, 50A (4), 899-923. Bishop, D. V. M. (1997b). Uncommon understanding: Development and disorders of language comprehension in children. Psychology Press, Hove, UK. Boden, M. A. (1982). Is equilibration important? A view from artificial intelligence. British Journal of Psychology, 73, 65-173. Chomsky, N. (1986). Knowledge of Language: Its Nature, Origin and Use. Praegar, New York. Chomsky, N. (1988). Language and Problems of Knowledge: The Managua Lectures. MIT Press, Cambridge (Mass). Chomsky, N. (1995). The Minimalist Program. MIT Press, Cambridge (Mass). Changeux, J. P., Heidman, T., Patte, P. (1984). Learning by selection. In The biology of learning (P. Marler & H. S. Terrace, eds.). Springer-Verlag, Berlin, pp. 115-133. Changeux, J. P, & Dehaene, S. (1989). Neuronal models of cognitive function. Cognition, 33, 63-109. Christiansen, M. H., Allen, J. and Seidenberg, M. S. (1998). Learning to segment speech using multiple cues: a connectionist model. Language and Cognitive Processes, 13, 221-268. Clahsen, H. and Almazan, M. (1998). Syntax and morphology in Williams Syndrome. Cognition, 68, 167-198. 40 Cohen, I. L. (1994). An artificial neural network analogue of learning in autism. Biological Psychiatry, 36(1) 5-20 Courchesne, E., Press, G. A., Yeung-Courchesne, R. (1993). Parietal lobe abnormalities detected with MR in patients with infantile autism. American Journal of Roentgenol, 160, 387-393. Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems, 2, 303-314. Daugherty, K. & Seidenberg, M. S. (1992). Rules or connections? The past tense revisited. In Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society (pp. 259-264). Hillsdale, NJ: Lawrence Erlbaum Associates Inc. Edelman, G. M. & Finkel, L. H. (1985). Neuronal group selection in the cerebral cortex. In Dynamic aspects of neocortical function (G. M. Edelman, W. E. Gall, & W. M. Cowan, eds.). Wiley-Interscience, New York, pp.635-695. Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 179-211. Elman, J. L. (1992). Grammatical structure and distributed representations. In Connectionism theory and practice (S. Davis, ed.). Oxford Un. Press, Oxford, UK. Elman, J. L. (1993). Learning and development in neural networks: The importance of starting small. Cognition, 48, 71-99. Elman, J. L., Bates, E. A., Johnson, M. H., Karmiloff-Smith, A., Parisi, D., & Plunkett, K. (1996). Rethinking innateness: A connectionist perspective on development. MIT Press, Cambridge (Mass). Dehaene, S. & Changeux, J. P. (1989). A simple model of prefrontal cortex function in delayed-response tasks. Journal of Cognitive Neuroscience, 1, 244-261. 41 Forrester, N. and Plunkett, K. (1994). Learning the Arabic plural: the case for minority default mappings in connectionist networks. In Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society (A. Ramand &, K. Eiselt, eds.). Erlbaum, Mahwah, NJ, Pp. 319-323. Gasser, M. and Smith, L. B. (1998). Learning nouns and adjectives: A connectionist account. Language and Cognitive Processes, 13, 269-306. Gillberg, C. and Coleman, M. (1992). The biology of the autistic syndromes (2nd edition). MacKeith, London, UK. Goldfield, E. C. (1995). Emergent forms. Origins and Early Development of Human Action and perception. Oxford Un. Press, Oxford, UK. Goodale, M. A. (1993). Visual pathways supporting perception and action in the primate cerebral cortex. Current Opinion in Neurobiology, 3, 578-585. Grossberg, S. (1982). How does the brain build a cognitive code? Psychological Review, 87, 1-51. Gustafsson, L. (1997). Inadequate cortical feature maps: A neural circuit theory of autism. Biological-Psychiatry, 42, 1138-1147. Happé, F. G. E. (1994). Autism: an introduction to psychological theory. UCL Press. Harm, M. W. and Seidenberg, M. S. (1999). Phonology, reading acquisition, and dyslexia: Insights from connectionist models. Psychological Review, 106, 491-528. Henderson, J. and Lane, P. (1998). A connectionist architecture for learning to parse. In Proceedings of COLING-ACL’98, pp. 531-537. Hebb. D. (1949) . The organization of behavior. Wiley, New York. 42 Hertz, J., Krogh, A. & Palmer, R. G. (1991). Introduction to the theory of neural computation. Addison-Wesley, Redwood city (CA). Hermelin, B. and O’Connor, N. (1970). Psychological experiments with autistic children. Oxford Pergamon Press. Hoeffner, J. H. and McClelland, J. L. (1993). Can a perceptual processing deficit explain the impairment of inflectional morphology in developmental dysphasia? A computational investigation. In E.V. Clark (Ed.), Proceedings of the 25th Child language research forum. Stanford University Press. Jacobs, R. A. (1999). Computational studies of the development of functionally specialized neural modules. Trends in Cognitive Science, 3, 31-38. Jacobs, R. A., Jordan, M. I., & Barto, A. G. (1991). Task decomposition through competition in a modular connectionist architecture: The what and where vision tasks. Cognitive Science, 15, 219-250. Johnson, M. H. (1997). Developmental Cognitive Neuroscience. Blackwells, Oxford, UK. Karmiloff-Smith, A. & Inhelder, B. (1971). If you want to get ahead, get a theory. Cognition, 3, 195-212. Karmiloff-Smith, A. (1985). Language and cognitive processes from a developmental perspective. Language and Cognitive Processes, 1, 60-85. Karmiloff-Smith, A. (1990). Constraints on representational change: evidence from children’s drawing. Cognition, 34, 57-83. Karmiloff-Smith, A. (1992). Beyond Modularity: A Developmental Perspective on Cognitive Science. MIT Press, Cambridge (Mass). 43 Karmiloff-Smith, A. (1998). Development itself is the key to understanding developmental disorders. Trends in Cognitive Sciences, Vol. 2, No. 10, 389-398. Karmiloff-Smith, A. & Thomas, M. S. C. (in press). Developmental disorders. To appear in M. A. Arbib and P. H. Arbib (Eds.), Handbook of Brain Theory and Neural Networks 2nd Edition. MIT Press. Keslo, S. (1995). Dynamic Pattern: The Self-organization of Brain and Behavior. MIT Press, Cambridge (Mass). Leslie, A. M. (1987). Pretence and representation: The origins of "theory of mind". Psychological Review, 94, 412-26. MacWhinney, B. and Leinbach, J. (1991). Implementations are not conceptualizations: revising the verb learning model. Cognition, 40, 121-157. Manis, F. R., Seidenberg, M. S., Doi, L. M., McBride-Chang, C., and Peterson, A. (1996). On the bases of two subtypes of developmental dyslexia. Cognition, 58, 157195. Marchman, V. (1993). Constraints on plasticity in a connectionist model of the English past tense. Journal of Cognitive Neuroscience, 5, 2, 215-234. Marcus, G. (1998). Rethinking eliminative connectionism. Cognitive Psychology, 37, 243-282. Mareschal, D., Plunkett, K., & Harris, P. (1999). A computational and neuropsychological account of object-oriented behaviours in infancy. Developmental Science, 2, 306-317. Mareschal, D., & French, R. M. (1997). A connectionist account of interference effects in early infant memory and categorization. In Proceedings of the 19th annual conference 44 of the Cognitive Science Society (M. G. Shafto & P. Langley, eds.). Erlbaum, Mahwah, NJ, pp. 484-489. Mareschal, D. & French, R. M. (2000). Mechanisms of Categorization in Infancy. Infancy, 1. 111-122. Mareschal, D. & Shultz, T. R. (1999). Development of children's seriation: A connectionist approach. Connection Science. 11, 153-188 Mareschal, D. French, R. M., & Quinn, P. C. (submitted). A Connectionist Account of Asymmetric Category Learning in Early Infancy. McClelland, J. L. (1989). Parallel distributed processing: implications for cognition and development. In Parallel distributed processing: Implications for psychology and neurobiology (R. G. M. Morris, ed.). Oxford Un Press, Oxford, UK, pp. 9-45. McClelland, J. L. & Jenkins, E. (1991). Nature, nurture, and connections: Implications of connectionist models for cognitive development. In Architectures for intelligence (K. van Lehn, ed.) LEA, Hillsdale, NJ, pp. 41-73. McLeod, P., Plunkett, K. and Rolls, E.T. (1998) Introduction to connectionist modeling of cognitive processes. Oxford Un. Press, Oxford, UK. Miller, K. D., Keller, J. B., and Stryker, M. P. (1989). Ocular dominance column development: analysis and simulation. Science, 245, 605-615. Milner, A. D. & Goodale, M. A. (1995). The Visual Brain in Action. Oxford Un. Press, Oxford, UK. Molenaar, P. C. M. (1986). On the impossibility of acquiring more powerful structures: A neglected alternative. Human Development, 29, 245-251. 45 Munakata, Y., McClelland, J. L., Johnson, M. N., & Siegler, R. S. (1997). Rethinking infant knowledge: Towards an adaptive process account of successes and failures in object permanence tasks. Psychological Review, 104, 686-713. Munakata, Y. (1998). Infants perseveration and implication for object permanence theories: A PDP model of the AB task. Developmental Science, 1, 161-211. Nakisa, R. C. and Plunkett, K. (1998). Innately guided learning by a neural network: the case of featural representation of speech language. Language and Cognitive Processes, 13, 105-128. Oliver, A., Johnson, M. H., Karmiloff-Smith, A., and Pennington, B. (2000). Deviations in the emergence of representations: A neuroconstructivist framework for analysing developmental disorders. Developmental Science, 3, 1-23. O’Reilly, R. C. (1998). Six principles for biologically based computational models of cortical cognition. Trends in Cognitive Sciences, 2, 455-462. Oyama, S (1985). The ontogeny of information. Cambridge Un. Press, Cambridge, UK. Papert, S. (1963). Intelligence chez l'enfant et chez le robot. Etudes D'Epistemologie Génétiques, 15, 131-194. Piaget, J. (1952). The Origins of Intelligence in the Child. International Universities Press, New York. Piaget, J. (1954). The Construction of Reality in the Child. Basic Books, New York. Piaget, J. (1965). The child's concept of number. Norton Library, New York. Piaget, J. (1971). Biology and knowledge. Edinburgh Un, Press, Edinburgh, UK. Piaget, J. (1977). The development of thought: Equilibration of cognitive structures. Blackwells, Oxford, UK. 46 Piaget, J. (1980). Adaptation and intelligence. Organic selection and phenocopy. Chicago Un. Press, Chicago. Pinker, S. (1991). Rules of language. Science, 253, 530-535. Pinker, S. (1994). The Language Instinct: The New Science of Language and Mind. Penguin, New York. Pinker, S. (1999). Words and rules. London: Weidenfeld & Nicolson Piven, J., Berthier, M. L., Starkstein, S. E., Nehme, E., Pearlson, G., and Folstein, S. (1990). Magnetic resonance imaging evidence for a defect of cerebral cortical development in autism. American Journal of Psychiatry, 147, 734-739. Plaut, D. C., McClelland, J. L., Seidenberg, M. S., and Patterson, K. E. (1996). Understanding normal and impaired word reading: Computational principles in quasiregular domains. Psychological Review, 103, 56-115. Plunkett, K. (1997). Theories of early language acquisition. Trends in Cognitive Sciences, Vol.1, No.4, 146-153. Plunkett, K. & Juola, P. (1999). A connectionist model of English past tense and plural morphology. Cognitive Science, 23, 463-490. Plunkett, K. and Marchman, V. (1991). U-shaped learning and frequency effects in a multi-layered perceptron: Implications for child language acquisition. Cognition, 38, 1-60. Plunkett, K. and Marchman, V. (1993). From rote learning to system building: acquiring verb morphology in children and connectionist nets. Cognition, 48, 21-69. Plunkett, K. and Marchman, V. (1996). Learning from a connectionist model of the acquisition of the English past tense. Cognition, 61, 299-308. 47 Plunkett, K. and Nakisa, R. C. (1997). A connectionist model of the Arabic plural system. Language and Cognitive Processes, 12, 807-836. Plunkett, K. & Sinha, C. (1992). Connectionism and developmental theory. British Journal of Developmental Psychology, 10, 209-254. Plunkett, K., Sinha, C., Moeller, M. F., and Strandsby, O. (1992). Vocabulary growth in children and a connectionist net. Connection Science, 4(3-4), 293-312. Prigogine, I. & Stengers, I. (1986). Order out of chaos. Flamingo Press, London, UK. Quartz, S. R, & Sejnowski, T. J. (1997). The neural basis of cognitive development: A constructivist manifesto. Behavioural and Brain Sciences,20, 537-596. Quinn, P. C., Eimas, P. D., & Rosenkrantz, S. L. (1993). Evidence for representations of perceptually similar natural categories by 3-month-old and 4-month-old infants. Perception, 22, 463-475. Quinn, P. C., & Eimas, P. D. (1996). Perceptual organization and categorization in young infants. Advances in infancy research, 10, 1-36. Raijmaker, M. E., J., van Koten, S., & Molenaar, P. C. M. (1996). On the validity of simulating stagewise development by means of PDP networks: Application of catastrophe analysis and an experimental test of rule-like network performance. Cognitive Science, 20, 101-136. Redington, M. and Chater, N. (1998). Connectionist and statistical approaches to language acquisition: a distributional perspective. Language and Cognitive Processes, 13, 129-192. Reggia, J. A., Ruppin, E., and Berndt, R. S. (Eds.) (1996). Neural modeling of brain and cognitive disorders. World Scientific. 48 Reznick, J. S. & Kagan, J. (1983). Category detection in infancy. Advances in Infancy Research, 2, 79-111. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Nature, 323, 533-536. Rumelhart, D. E. & McClelland, J. L. (1986). Parallel Distributed Processing, Vol. 1. MIT Press, Cambridge (Mass). Rumelhart, D. E. and McClelland, J. L. (1986). On learning the past tense of English verbs. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 2: Psychological and Biological Models. (J. L. McClelland, D. E. Rumelhart, and the PDP Research Group), MIT Press, Cambridge (Mass), pp. 216271. Schafer, G. & Mareschal, D. (in press). Qualitative shifts in behavior without qualitative shifts in processing: the case of speech sound discrimination and word learning in infancy. Infancy.,1(3). Seidenberg, M. S. and McClelland, J. L. (1989). A distributed, developmental model of word recognition and naming. Psychological Review, 96, 452-477. Shultz, T. R. (1998). A computational analysis of conservation. Developmental Science, 1, 103-126. Shultz, T. R., Mareschal, D., & Schmidt, W. C. (1994). Modeling cognitive development on balance scale phenomena. Machine Learning, 16, 57-86. Shultz, T. R., Schmidt, W. C., Buckingham, D., & Mareschal, D. (1995). Modeling cognitive development with a generative connectionist algorithm. In Developing cognitive competence: New approaches to process modeling (T. J. Simon, & G. S. Halford, eds.), Erlbaum , Hillsdale, NJ, pp. 205-261. 49 Siegler, R. S. (1976). Three aspects of cognitive development. Cognitive Psychology, 8, 481-520. Shultz, T. R., & Mareschal, D. (1997). Rethinking innateness, learning, and constructivism. Cognitive Development, 12, 563-586. Simonoff, E., Bolton, P., and Rutter, M. (1998). Genetic perspectives on mental retardation. In Handbook of Mental Retardation and Development (J. A. Burack, R. M. Hodapp, & E. Zigler, eds.). Cambridge Un. Press, Cambridge, UK, pp. 41-79. Sirois, S. & Shultz, T. R. (1998). Neural network modeling of developmental effects in discrimination shifts. Journal of Experimental Child Psychology, 71, 235-274. Slater, A. (1995). Visual perception and memory at birth. Advances in Infancy Research, 9, 107-162. Thomas, M. S. C. (2000). Neuroconstructivism's promise. Commentary on Oliver et al. Developmental Science, 3, 35-37. Thomas, M. S. C. & Karmiloff-Smith, A. (1999). Connectionist modeling of past tense formation in Williams syndrome. Presentation to the Cognitive Psychology Section of the British Psychological Society, York, UK, September 1999. Thomas, M. S. C. and Stone, A. (1998). ‘Cognitive’ connectionist models are just models, and connectionism is a progressive research programme. Commentary on Green on Connectionist Explanation. Psycoloquy, 9(36) Ungerleider, L. G. & Mishkin, M. (1982). Two cortical visual systems. In Analysis of visual behavior (D. J. Ingle, M. A. Goodale, & R. J. W. Mansfield, eds.), MIT Press, Cambridge (Mass), pp. 549-586. 50 van der Maas, H. L. C. & Molenaar, P. C. M. (1992), Stagewise cognitive development: An application of catastrophe theory. Psychological Review, 99, 395-417. van Essen, D. C., Anderson, C. H., & Felleman, D. J. (1992). Information processing in the primate visual system: An integrated systems perspective. Science, 255, 419-423. van Geert, P. (1991). A dynamic system model of cognitive and language growth. Psychological Review, 98, 3-53. von der Malsburg, C. (1995). Self-organization in the brain. In The handbook of brain theory and neural networks (M. A. Arbib, ed), MIT Press, Cambridge (Mass), pp. Willshaw, D. J. & von der Malsburg, C. (1976). How patterned neural connections can be set up by self-organization. Proceedings of the Royal Society of London, B 194, 431445. Wing, L. (1988). The continuum of autistic characteristics. In Diagnosis and assessment in autism, (E. Schopler and G. B. Mesibov, eds.). Plenum Press, New York, pp. 91110. Wing, L. and Gould, J. (1979). Severe impairments of social interaction and associated abnormalities in children: epidemiology and classification. Journal of Autism and Developmental Disorders, 9, 11-29. Younger, B, A. (1985). The segregation of items into categories by ten-month-old infants, Child Development, 56, 1574-1583. Zorzi, M., Houghton, G., and Butterworth, B. (1998). Two routes or one in reading aloud? A connectionist dual-process model. Journal of Experimental Psychology: Human Perception and Performance, 24, 1131-1161. 51 Figure 1. Schema of (a) a generic and (b) a feedforward connectionist network. 52 Figure 2. Schema of Mareschal, Plunkett, & Harris, (1999) object processing model.