Active vision system for embodied intelligence based on retina sampling model and hierarchical representation Janusz A. Starzyk, Xinming Yu Ohio University, Athens, OH Building up memories from environment INTRODUCTION Fig. 1: Retina structure Retina structure is fundamental to human vision system, which is much more efficient than any of the current robotic vision systems. ● Photoreceptors (CONE, ROD) are concentrated around fovea, for the highest resolution on target ● Retina processes the sampled scene using ganglion cells, and sends activation through optical nerve and LGN to the primary visual cortex (V1) ● The neurons in V1 fire in groups responding to different visual features from the retina The retina sampling model uses prespecified sampling density. Fig. 2 shows the distribution density curves of the cones inside retina. ● Correlation based sparse connections are used to mimic the neuron connections in V1 ● Neurons which are locally correlated connect to the same group of neurons in the higher layer ● Winners and theirs neighbors fire and have weight adjusted together, for smooth processing and increased robustness. ● Using the retina sampling, the vision system receives more useful information. ● Table 1 shows comparison of retina sampling model (human vision) and uniform sampling (computer vision). ● The resolution (density of the sampling points) in the center part of the retina sampling is much higher than that of uniform sampling. Part inside blue Percentage of sampling points Retina Sampling Uniform Sampling (Human Vision) (Computer Vision) 31% 4% Part inside black 52% 14% Part inside red 63% 25% Part inside green 78% 50% Whole range 100% 100% Table 1: Comparison (Human V.S. Computer vision) Fig. 2: Cone densities When this artificial retina sampling is applied to a visual scene, the vision system will receive much more data from object in focus, and still have a peripheral vision in human retina  Fig. 5: An example of retina sampling Original resolution: 900x900, resolution after sampling 60x60 In active vision system, we apply a connection mechanism based on correlation between input neurons’ activation and the activation of local winners. ● Correlation between input neurons’ activation Perceive ◘ Use real image data instead of noise ◘ Use images organized in time sequences to obtain feedback connections for invariance building ◘ Process the input data from layer N-1 and calculate the correlations ◘ For each neuron, find out the best correlated set of neurons, and create connections to those neurons ● Local winners are used to adjust the connection weights ◘ the local winners are activated (e.g. the green one in layer N) ◘ The weights of connections to neighbors of local winner Fig. 8: The excitation of local winner are adjusted and its neighbors ◘ The local winners help the neighbors to fire together (horizontal red arrows are excitatory) ◘ All groups of winner sets in layer N (local winners and their neighbors) used to activate layer N+1 ◘ Use Oja’s learning rule  to adjust the weights of connections to the winners Retina sampling: Model of data Collection EI Architecture Pain or Goal Creation Competing goals Act Planning INPUT Hierarchical representations learning is based on external reinforcement for primitive goals and internal goal creation system for abstract goals and internal rewards OUTPUT Task Environment Fig. 11: The pathways through which Simulation or Real-World System the system is built up from interactions with the unspecified environment In learning, it is not easy to obtain examples of desired behavior that are both correct and representative of all the situations in which the agent has to act. Reinforcement learning (RL) is a good choice for learning in unspecified external environment. E S I A M R As shown in Fig. 12, the agent (A) receives data, which includes input (I) Fig. 12: The reinforcement learning model and reward (R) from the environment (E), and takes proper action (M) back to the environment. With the aid of the reward, the agent learns how to take correct action to have the maximum reward. Goal Creation system provides a mechanism that organizes learning of intentional representations and associations between sensory and motor pathways. When an agent realizes that a specific action resulted in a desirable effect related to the current goal, it stores a representation of the perceived object involved in such action and learns associations between the sensory and motor pathways. Correlation-based Connection The retina, unlike a camera, does not simply send a picture to the brain. The retina spatially encodes (compresses) the image to fit the limited capacity of the optic nerve. ● In primary visual cortex (V1), neurons are activated by the stimuli from similar groups of inputs. ● The connections built based on the correlation of the input reflect observed relations in the real world. Fig. 6 shows the correlations based on real images. ● The photoreceptors are not evenly distributed inside retina. Most of them are concentrated on or around fovea ●1D probability distribution curve is shown in Fig. 3. Fig. 3: PDF of the photoreceptors (cones and rods)  ● Cortex receives distorted images, which are sharper in the fovea area. ● Fovea is the reference point of gaze shifting, and focuses on the most interesting part of the scene. Fig. 4 shows the sampling points for the retina model, with higher density in the center than on the periphery Fig. 4: Sampling points for retina model Fig. 6: Correlation of the input data Fig.7: Correlation based connections with remote but correlated area ● Linsker obtained useful features in visual field with a fixed connectivity model and noise input for self-organizing training.  The disadvantage of his model is that the fixed connectivity model ◘ May not deliver connections to remote but correlated areas of the visual field. Fig. 7 shows the existence of the remote but correlated area ◘ May not result in useful features on higher levels ◘ Local connectivity region is set arbitrarily Procedure of the weight adjustment: Activate layer N-1 find the strongest winners in layer N excite the neighbors as co-winners adjust weights for all activated activate layer N+1 An active servo system shown in Fig. 9 is being built with real-time video input, to demonstrate the active vision system for embodied intelligence. Both the retina sampling model and the correlation based connections are used to work with the servo system. ◘ The webcam is used to capture the visual data, ◘ The raw data is uniformly distributed (320x240 pixels), it will be processed first by retina model, compressed to 40x30 with little data loss in the center. ◘ With the compressed data and the correlation based sparse connection, the active vision system processes the real-time input, finds Fig. 9: Servo system the interesting object and generates the object coordinates. ◘ The servo system receives the real-time coordinates and follows the object with laser pointer. Fig. 10: Servo system is working with active vision system to follow the object in view CONCLUSIONS An active vision system for embodied intelligence based on retina sampling model and hierarchical representation is developed. The retina sampling model mimics efficiency of human vision system. A hierarchical representation is built up with sparse connections, which are locally generated from the neurons’ activity correlation. Using the goal creation system learning scheme, the active vision system can learn complex knowledge. Goals evolve from the simple ones through interaction with environment. Such organization of the learning process is conductive to creation of a general intelligence, with self-organizing structure and dynamic goals. BIBLIOGRAPHY  Curcio, C.A., Sloan, K.R. Jr, Packer, O., Hendrickson, A.E. & Kalina, R.E. (1987). Distribution of cones in human and monkey retina: individual variability and radial asymmetry. Science 236, pp. 579-582.  Riedel G., Physiology of Human Cells, Available: http://www.aberdeen.ac.uk/sms/ugradteaching/course.php?ID=10  Linsker R., “From Basic Network Principles to Neural Architecture: Emergence of Spatial-Opponent Cells”, Proc. National Academy of Sciences, Vol. 83. pp. 7508-7512, 1986.  Oja E., “Simplified neuron model as a principal component analyzer”. Journal of Mathematical Biology 15 (3): pp. 267-273, 1982.