Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Integration of different features in guiding eye-movements Frank Schumann, Alper Acik, Selim Onat & Peter König Neurobiopsychology Department, Institute of Cognitive Science, Albrechtstrasse 28, 49076 Osnabrück Introduction Example Stimuli and Features In natural behaviour we actively attend to parts of a visual scene by moving our eyes. Models of such overt attention behaviour combine different local visual features in a bottom-up process to derive salient locations (e.g. Itti & Koch, 2001; Parkhurst & Niebur, 2002). Here, we developed a Bayesian, data-driven model of saliency and studied this feature integration process empirically. We investigated the interaction of luminance, luminance contrast, texture contrast, edges and colour contrast during free viewing of natural stimuli. We develop an eye-tracking data-driven Bayesian measure of saliency and study: (1) How does the saliency of a feature depend on the feature values? Natural Mean Luminance 1° ManMade Faces Luminance Contrast 1° Fractals Texture 5° Rainforest Barness 1° (2) How does the saliency of a feature vary in the context of a second feature? We model this feature interaction with additive and multiplicative integration processes. Methods Fixations on image 1 Control Fixations The upper row shows example stimuli from all image categories. The middle row shows example feature maps for the first natural image. We analyzed feature integration within the intensity channel, and in the color dataset also across intensity and colour channel. The bottom row shows fixations on the first natural image and a spatial control fixation distribution. We analyzed eye-tracking data from free viewing baseline conditions of two studies. A grayscale data set was taken from Acik et al. (submitted), and comprises 64 images in 4 categories (Natural images, manmade scences and objects, close up faces, and fractals). Natural images were taken from the “Zurich Natural Image Database” (http://www.klab.caltech.edu/~wet/ ZurichNatDB.tar.gz). A colour set contained free-viewing data of 96 colour-calibrated images from the Kibale rainforest in Uganda (courtesy of Prof. Tom Troscianko, University of Bristol). We generated feature maps for mean luminance (L), luminance contrast (LC), luminance contrast on lowpass filtered images (LCLP), texture contrast (TC), edges (Bar), red-green contrast (RG) and yellow-blue contrast (YB) at spatial scales of 1, 2 and 5 degrees according to the respective measures used in Einhäuser & König (2003). Colour contrasts follow the definition of LC, applied to the respective channel in the DKL colour space (Darrington, 1984). (2) Model: (1) Eye Tracking Data: Individual salience P(Fix|LC, TC) Measured Integration + Additive Integration P(Fix|LC, TC) LC 1° (1) Measuring Feature Interaction: a Bayesian Model of Salience TC 5° TC 5° LC 1° LC 1° X Salience º FixationStatistics * Constant ImageStatistics LC 1° TC 5° Joint Feature Statistics in Images TC 5° LC 1° LC 1° Results Linearity of individual feature salience R 2 1A) We measured how the salience of a feature depends on the value of the feature. For some features, salience rises linearly with the value of the feature, while other features contribute non-linearly to saliency, depending on the image category. Only luminance contrast seems to contribute linearly to saliency in all categories except in natural images. 2) Individual features contribute independently to salience. Integration of features can be explained by additive integration. However, multiplicative models had a similar fit to the data. Mutual information between the salience of two features We modelled the obtained empirical feature interaction data with additive and multiplicative interaction processes. We first derived salience for two individual features as the respective marginal distributions of the measured joint salience. Then, we again combined the two individual saliencies additively and multiplicatively. For additive interaction, the two individual salience distributions served as regressor variables. For multiplicative interaction, an interaction term was constructed as the outer product matrix of individual saliencies. We used multiple regression to compare the global fit of additive and multiplicative reconstructions of the empirical data with the coefficient of determination R2. We used a Bayesian measure of salience to study the interaction of two features in eye-tracking data during free viewing of different image categories. We model the integration of two features in saliency with additive and multiplicative integration processes. We find that (1) features contribute to salience in linear and nonlinear ways. And (2) in general knowing the value of one feature does not give much information about the salience of another feature. That is, different features contribute independently to a joint saliency map. Additive integration of individual saliencies gives a good description of the data for most analysed feature combinations within and across feature channels. However, multiplicative interaction explained the data equally good in most cases. In summary, we argue that selection of salient regions can be explained by independent processing of individual features that are additively combined. Contact: The work reported here is based on the Master’s Thesis of Frank Schumann ([email protected]). Goodness of fit for additive models Acknowledgements: Hans-Peter Frey, Daniel Lang (University of Osnabrück) & Tom Troscianko (University of Bristol) for eyetracking-data of colour calibrated Uganda rainforest images; Lina Jansen and Stefan Scherbaum for help with statistical analysis and implementation. max MI = 4.3 bit! R 2 MI (bit) (2) Modelling Feature Interaction: Discussion 1) Some features contribute linearly and others non-linearly to a joint saliency map. 0 2A) We quantify the interaction of features in measured salience with the mutual information (MI) between the saliencies of the two features. We find that knowing the value of one feature gives little information (< 0.1 bit, from a maximal MI of 4.3. bit) about the saliency of second feature. Hence features contribute independently to saliency. Correcting for spatial viewing biases Subjects tend to look more at the centre of an image than at the periphery, independent of the images shown. Hence our measure would overestimate the importance of features at central locations, introducing a dependence of the model on eventual regularities in the spatial arrangements of the images. We estimate spatial viewing biases by accumulating the available fixation mass over all subjects and images in a spatial distribution of control fixation. To accommodate for such biases, we weighted features according to this spatial control distribution such that features of prominent central locations are considered less strongly in the fixation statistics, correcting for spatial biases. TC 5° We derive joint salience, the fixation probability of a patch given two image features, from eye-tracking data and test feature integration for additive and multiplicative interaction. A) We acquire the joint distribution of respective feature combinations in the images (image statistics) and at the subset of selected fixation points (gaze statistics). B) The empirical feature interaction in salience is calculated in a Bayesian manner as the relative difference between the likelihood of the two features at gaze and the respective prior probabilityof the two features in the images. This empirical saliency model describes active selection of image regions based on two features. We quantify interaction of features in saliency with mutual information. C) We use multiple regression to reconstruct the measured joint salience model with additive and multiplicative combination of the individual salience distributions for the two features. 0.1 P(Feat1, Feat2|Fixation) denotes the distribution of two features in the subset of the patches selected by subjects. P(Feat1, Feat2) denotes the prior probability of a feature combination in the images, the image statistics. It reflects the probability to encounter a given feature combination under random sampling of the images. The relation between the gaze statistics (what was selected) and the image statistics (what could have been selected) shows how subjects actively choose feature combinations from the images. It is an empirical model of saliency. P(Fix) is a constant factor depending on the number of images taken into the analysis. Comparing different features Different features have different units of measurements. To allow comparison of features we rank-ordered the bin edges of two single feature domains in percentiles of the occurrence of the respective feature in the images. Such histogram equalization for individual features is plausible in light of contrast gain control mechanisms (Ohzawa et al., 1982) and temporal whitening of natural visual input (Dan et al., 1996) in the early visual system. P(LC, TC) P(LC, TC|Fix) Joint Feature Statistics at Gaze Multiplic. Integration P(Fix|LC, TC) Bayesian Salience Model TC 5° Models of saliency aim to describe how subjects actively use local features of an image to direct visual attention and gaze. Here, we ask whether the salience of one feature varies in the context of another, i.e. if features interact. We measure salience of two features as the conditional probability to fixate an image patch given two feature values of the patch: 2B) We reconstruct the measured joint salience with additive and multiplicative interaction terms using multiple regression. Additive integration generally provides a good model of the measured feature interaction, across and within feature channels. However, multiplicative models had a similar fit than additive models for most feature combinations (not shown here). References: Dan Y, Atick JJ, Reid RC. (1996). Efficient coding of natural scenes in the lateral geniculate nucleus: experimental test of a computational theory. J Neurosci (15). Darrington, A. (1984). Chromatic mechanisms in lateral geniculate nucleus of macaque. Krauskopf, J and Lennie, P. J Physiol 357 Einhäuser & König (2003): Does luminance contrast contribute to a saliency map for overt visual attention?, European Journal of Neuroscience (17). Itti, L. & Koch, C. (2001). Computational modelling of visual attention. Nature Review Neuroscienc (2) Ohzawa I, Sclar G, Freeman RD. (1982). Contrast gain control in the cat visual cortex. Nature (15). Parkhust, et. al. (2002):Modelling the role of saliency in the allocation of overt visual attention, Vision Research (42).