* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download c05-I
Survey
Document related concepts
Transcript
Eye-Based Interaction in Graphical Systems: Theory & Practice Part I Introduction to the Human Visual System A: Visual Attention “When the things are apprehended by the senses, the number of them that can be attended to at once is small, `Pluribus intentus, minor est ad singula sensus' ” — William James • Latin translation: “Many filtered into few • for perception” Visual scene inspection is performed minutatim (piecemeal), not in toto A.1: Visual Attention—chronological review • Qualitative historical background: a dichotomous theory of attention—the “what” and “where” of (visual) attention • Von Helmholtz (ca. 1900): mainly concerned with eye movements to spatial locations, the “where”, I.e., attention as overt mechanism (eye movements) • James (ca. 1900): defined attention mainly in terms of the “what”, i.e., attention as a more internally covert mechanism A.1: Visual Attention—chronological review (cont’d) • Broadbent (ca. 1950): defined attention as “selective filter” from auditory experiments; generally agreeing with Von Helmholtz’s “where” • Deutsch and Deutsch (ca. 1960): rejected “selective filter” in favor of “importance weightings”; generally corresponding to James’ “what” • Treisman (ca. 1960): proposed unified theory of attention—attenuation filter (the “where”) followed by “dictionary units” (the “what”) A.1: Visual Attention—chronological review (cont’d) • Main debate at this point: is attention • • parallel (the “where”) or serial (the “what”) in nature? Gestalt view: recognition is a wholistic process (e.g., Kanizsa figure) Theories advanced through early recordings of eye movements A.1: Visual Attention—chronological review (cont’d) • Yarbus (ca. 1967): demonstrated sequential, but variable, viewing patterns over particular image regions (akin to the “what”) • Noton and Stark (ca. 1970): showed that subjects tend to fixate identifiable regions of interest, containing “informative details”; coined term “scanpath” describing eye movement patterns • Scanpaths helped cast doubt on the Gestalt hypothesis A.1: Visual Attention—chronological review (cont’d) Fig.2: Yarbus’ early scanpath recording: • trace 1: examine at will • trace 2: estimate wealth • trace 3: estimate ages • trace 4: guess previous activity • trace 5: remember clothing • trace 6: remember position • trace 7: time since last visit A.1: Visual Attention—chronological review (cont’d) • Posner (ca. 1980): proposed attentional “spotlight”, an overt mechanism independent from eye movements (akin to the “where”) • Treisman (ca. 1986): once again unified “what” and “where” dichotomy by proposing the Feature Integration Theory (FIT), describing attention as a “glue” which integrates features at particular locations to allow wholistic perception A.1: Visual Attention—chronological review (cont’d) • Summary: the “what” and “where” • dichotomy provides an intuitive sense of attentional, foveo-peripheral visual mechanism Caution: the “what/where” account is probably overly simplistic and is but one theory of visual attention B: Neurological Substrate of the Human Visual System (HVS) • Any theory of visual attention must • address the fundamental properties of early visual mechanisms Examination of the neurological substrate provides evidence of limited information capacity of the visual system—a physiological reason for an attentional mechanism B.1: The Eye Fig. 3: The eye—“the world’s worst camera” • suffers from numerous optical imperfections... • ...endowed with several compensatory mechanisms B.1: The Eye (cont’d) Fig. 4: Ocular optics B.1: The Eye (cont’d) • Imperfections: • spherical abberations • chromatic abberations • curvature of field • Compensations: • iris—acts as a stop • focal lens—sharp focus • curved retina—matches curvature of field B.2: The Retina • Retinal photoreceptors constitute first • • stage of visual perception Photoreceptors transducers converting light energy to electrical impulses (neural signals) Photoreceptors are functionally classified into two types: rods and cones B.2: The Retina—rods and cones • Rods: sensitive to dim and achromatic • • light (night vision) Cones: respond to brighter, chromatic light (day vision) Retinal construction: 120M rods, 7M cones arranged concentrically B.2: The Retina—cellular makeup • The retina is composed of 3 main layers of • • different cell types (a 3-layer “sandwich”) Surprising fact: the retina is “inverted”— photoreceptors are found in the bottom layer (furthest away from incoming light) Connection bundles between layers are called plexiform or synaptic layers B.2: The Retina—cellular makeup (cont’d) Fig.5: The retinocellular layers (w.r.t. incoming light): • ganglion layer • inner synaptic plexiform layer • inner nuclear layer • outer synaptic plexiform layer • outer layer B.2: The Retina—cellular makeup (cont’d) Fig.5 (cont’d): The neuron: • all retinal cells are types of neurons • certain neurons mimic a “digital gate”, firing when activation level exceeds a threshold • rods and cones are specific types of dendrites B.2: The Retina—retinogeniculate organization (from outside in, w.r.t. cortex) • Outer layer: rods and cones • Inner layer: horizontal cells, laterally • connected to photoreceptors Ganglion layer: ganglion cells, connected (indirectly) to horizontal cells, project via the myelinated pathways, to the Lateral Geniculate Nuclei (LGN) in the cortex B.2: The Retina—receptive fields • Receptive fields: collections of • • interconnected cells within the inner and ganglion layers Field organization determines impulse signature of cells, based on cell types Cells may depolarize due to light increments (+) or decrements (-) B.2: The Retina—receptive fields (cont’d) Fig.6: Receptive fields: • signal profile resembles a “Mexican hat” • receptive field sizes vary concentrically • color-opposing fields also exist B.3: Visual Pathways • Retinal ganglion cells project to the LGN along two major pathways, distinguished by morphological cell types: and cells • cells project to the magnocellular (M-) layers • cells project to the parvocellular (P-) layers • Ganglion cells are functionally classified by three types: X, Y, and W cells B.3: Visual Pathways—functional response of ganglion cells • X cells: sustained stimulus, location, and fine detail • nervate along both M- and P- projections • Y cells: transient stimulus, coarse features, and motion • nervate along only the M-projection • W cells: coarse features and motion • project to the Superior Colliculus (SC) B.3: Visual Pathways (cont’d) Fig.7: Optic tract and radiations (visual pathways): • The LGN is of particular clinical importance • M- and P-cellular projections are clearly visible under microscope • Axons from M- and P-layers of the LGN terminate in area V1 B.3: Visual Pathways (cont’d) Characteristics ganglion size transmission time receptive fields sensitivity to small objects sensitivity to change in light levels sensitivity to contrast sensitivity to motion color discrimination Magno large fast large poor large low high no Parvo small slow small good small high low yes Table.1: Functional characteristics of ganglionic projections B.4: The Occipital Cortex and Beyond Fig.8: The brain and visual pathways: • the cerebral cortex is composed of numerous regions classified by their function B.4: The Occipital Cortex and Beyond (cont’d) • M- and P- pathways terminate in distinct • • layers of cortical area V1 Cortical cells (unlike center-surround ganglion receptive fields) respond to orientation-specific stimulus Pathways emanating from V1 joining multiple cortical areas involved in vision are called streams B.4: The Occipital Cortex and Beyond—directional selectivity • Cortical Directional Selectivity (CDS) of • • cells in V1 contributes to motion perception and control of eye movements CDS cells establish a motion pathway from V1 projecting to areas V2 and MT (V5) In contrast, Retinal Directional Selectivity (RDS) may not contribute to motion perception, but is involved in eye movements B.4: The Occipital Cortex and Beyond—cortical cells • Two consequences of visual system’s motion-sensitive, single-cell organization: • due to motion sensitivity, eye movements are never perfectly still (instead tiny jitter is observed, termed microsaccade)—if eyes were stabilized, image would fade! • due to single-cell organization, representation of natural images is quite abstract: there is no “retinal buffer” B.4: The Occipital Cortex and Beyond—2 attentional streams • Dorsal stream: • V1, V2, MT (V5), MST, Posterior Parietal Cortex • sensorimotor (motion, location) processing • the attentional “where”? • Ventral (temporal) stream: • V1, V2, V4, Inferotemporal Cortex • cognitive processing • the attentional “what”? B.4: The Occipital Cortex and Beyond—3 attentional regions • Posterior Parietal Cortex (dorsal stream): • disengages attention • Superior Colliculus (midbrain): • relocates attention • Pulvinar (thalamus; colocated with LGN): • engages, or enhances, attention C: Visual Perception (with emphasis on foveoperipheral distinction) • Measurable performance parameters may • • often (but not always!) fall within ranges predicted by known limitations of the neurological substrate Example: visual acuity may be estimated by knowledge of density and distribution of the retinal photoreceptors In general, performance parameters are obtained empirically C.1: Spatial Vision • Main parameters sought: visual acuity, • contrast sensitivity Dimensions of retinal features are measured in terms of projected scene onto retina in units of degrees visual angle, S A 2arctan 2D where S is the object size and D is distance C.1: Spatial Vision—visual angle Fig.9: Visual angle C.1: Spatial Vision—common visual angles Object thumbnail sun or moon US quarter coin US quarter coin US quarter coin Distance arm’s length arm’s length 85 m 5 km Angle subtended 1.5-2 deg .5 deg 2 deg 1 min 1 sec Table 2: Common visual angles C.1: Spatial Vision—retinal regions • Visual field: 180° horiz. 130° vert. • Fovea Centralis (foveola): highest acuity • 1.3° visual angle; 25,000 cones • Fovea: high acuity (at 5°, acuity drops to 50%) • 5° visual angle; 100,000 cones • Macula: within “useful” acuity region (to about 30°) • 16.7° visual angle; 650,000 cones • Hardly any rods in the foveal region C.1: Spatial Vision—visual angle and receptor distribution Fig.10: Retinotopic receptor distribution C.1: Spatial Vision—visual acuity Fig.11: Visual acuity at eccentricities and light levels: • at photopic (day) light levels, acuity is fairly constant within central 2° • acuity drops of linearly to 5°; drops sharply (exp.) beyond • at scotopic (night) light levels, acuity is poor at all eccentricities C.1: Spatial Vision—measuring visual acuity • Acuity roughly corresponds to foveal • receptor distribution in the fovea, but not necessarily in the periphery Due to various contributing factors (synaptic organization and later-stage neural elements), effective relative visual acuity is generally measured by psychophysical experimentation C.2: Temporal Vision • Visual response to motion is characterized • • • by two distinct facts: persistence of vision (POV) and the phi phenomenon POV: essentially describes human temporal sampling rate Phi: describes threshold above which humans detect apparent movement Both facts exploited in media to elicit motion perception C.2: Temporal Vision—persistence of vision Fig.12: Critical Fusion Frequency: • stimulus flashing at about 50-60Hz appears steady • CFF explains why flicker is not seen when viewing sequence of still images • cinema: 24 fps 3 = 72Hz due to 3-bladed shutter • TV: 60 fields/sec, interlaced C.2: Temporal Vision—phi phenomenon • Phi phenomenon explains why motion is • • • perceived in cinema, TV, graphics Besides necessary flicker rate (60Hz), illusion of apparent, or stroboscopic, motion must be maintained Similar to old-fashioned neon signs with stationary bulbs Minimum rate: 16 frames per second C.2: Temporal Vision—peripheral motion perception • Motion perception is not homogeneous • across visual field Sensitivity to target motion decreases with retinal eccentricity for slow motion... • higher rate of target motion (e.g., spinning disk) is needed to match apparent velocity in fovea • …but, motion is more salient in periphery than in fovea (easier to detect moving targets than stationary ones) C.2: Temporal Vision—peripheral sensitivity to direction of motion Fig.13: Threshold isograms for peripheral rotary movement: • periphery is twice as sensitive to horizontalaxis movement as to vertical-axis movement • (numbers in diagram are rates of pointer movement in rev./min.) C.3: Color Vision—cone types • foveal color vision is • • Fig.14: Spectral sensitivity curves of cone photoreceptors facilitated by three types of cone photorecptors a good deal is known about foveal color vision, relatively little is known about peripheral color vision of the 7,000,000 cones, most are packed tightly into the central 30° foveal region C.3: Color Vision—peripheral color perception fields • blue and yellow fields are • • Fig.15: Visual fields for monocular color vision (right eye) larger than red and green fields most sensitive to blue, up to 83°; red up to 76°; green up to 74° chromatic fields do not have definite borders, sensitivity gradually and irregularly drops off over 15-30° range C.4: Implications for Design of Attentional Displays • Need to consider distinct characteristics of foveal and peripheral vision, in particular: • spatial resolution • temporal resolution • luminance / chrominance • Furthermore, gaze-contingent systems must match dynamics of human eye movement D: Taxonomy and Models of Eye Movements • Eye movements are mainly used to • reposition the fovea Five main classes of eye movements: • saccadic • vestibular • smooth pursuit • physiological nystagmus • vergence • (fixations) • Other types of movements are nonpositional (adaptation, accommodation) D.1: Extra-Ocular Muscles Fig.16: Extrinsic muscles of the eyes: • in general, eyes move within 6 degrees of freedom (6 muscles) D.1: Oculomotor Plant Fig.17: Oculomotor system: • eye movement signals emanate from three main distinct regions: • occipital cortex (areas 17, 18, 19, 22) • superior colliculus (SC) • semicircular canals (SCC) D.1: Oculomotor Plant (cont’d) • Two pertinent observations: 1 eye movement system is, to a large extent, a feedback circuit 2 controlling cortical regions can be functionally characterized as: • voluntary (occipital cortex—areas 17, 18, 19, 22) • involuntary (superior colliculus, SC) • reflexive (semicircular canals, SCC) D.2: Saccades • Rapid eye movements used to reposition • • • • fovea Voluntary and reflexive Range in duration from 10ms - 100ms Effectively blind during transition Deemed ballistic (pre-programmed) and stereotyped (reproducible) D.2: Saccades—modeling xt g 0 s t g 1 s t 1 g k 0 s k t k Fig.18: Linear moving average filter model: • st = input (pulse), xt = output (step), gk = filter coefficients • e.g., Haar filter {1,-1} D.3: Smooth Pursuits • Involved when visually tracking a moving • • target Depending on range of target motion, eyes are capable of matching target velocity Pursuit movements are an example of a control system with built-in negative feedback D.3: Smooth Pursuits—modeling h( st xt ) xt 1 Fig.19: Linear, time-invariant filter model: • st = target position, xt = (desired) eye position, h = filter • retinal receptors give additive velocity error D.4: Nystagmus • Conjugate eye movements characterized by • sawtooth-like time course pattern (pursuits interspersed with saccades) Two types (virtually indistinguishable): • Optokinetic: compensation for retinal movement of target • Vestibular: compensation for head movement • May be possible to model with combination of saccade/pursuit filters D.5: Fixations • Possibly the most important type of eye movement for attentional applications • 90% viewing time is devoted to fixations • duration: 150ms - 600ms • Not technically eye movements in their own right, rather characterized by miniature eye movements: • tremor, drift, microsaccades D.6: Eye Movement Analysis • Two significant observations: 1 only three types of eye movements are mainly needed to gain insight into overt localization of visual attention: • fixations • saccades • smooth pursuits (to a lesser extent) 2 all three signals may be approximated by linear, time-invariant (LTI) filter systems D.6: Eye Movement Analysis— assumptions • Important point: it is assumed observed eye movements disclose evidence of overt visual attention • it is possible to attend to objects covertly (without moving eyes) • Linearity: although practical, this assumption is an operational oversimplification of neuronal (non-linear) systems D.6: Eye Movement Analysis—goals • goal of analysis is to locate regions where signal average changes abruptly • fixation end, saccade start • saccade end, fixation start • two main approaches: • summation-based • differentiation-based • both approaches rely on empirical thresholds Fig.20: Hypothetical eye movement signal D.6: Eye Movement Analysis— denoising Fig.21: Signal denoising—reduce noise due to: • eye instability (jitter), or worse, blinks • removal possible based on device characteristics (e.g., blink = [0,0]) D.6: Eye Movement Analysis— summation based • Dwell-time fixation detection depends on: • identification of a stationary signal (fixation), and • size of time window specifying range of duration (and hence temporal threshold) • Example: position-variance method: • determine whether M of N points lie within a certain distance D of the mean () of the signal • values M, N, and D are determined empirically D.6: Eye Movement Analysis— differentiation based • Velocity-based saccade/fixation detection: • calculated velocity (over signal window) is compared to threshold • if velocity > threshold then saccade, else fixation • Example: velocity detection method: • use short Finite Impulse Response (FIR) filters to detect saccade (may be possible in real-time) • assuming symmetrical velocity profile, can extend to velocity-based prediction D.6: Eye Movement Analysis (cont’d) (a) position-variance (b) velocity-detection Fig.22: Saccade/fixation detection D.6: Eye Movement Analysis— example Fig.23: FIR filter velocity-detection method based on idealized saccade detection: • 4 conditions on measured acceleration: | I1 | A • acc. > thresh. A | I2 | B • acc. > thresh. B Sgn( I 2 ) Sgn( I 1 ) • sign change Tmin I 2 I 1 Tmax • duration thresh. • thresholds derived from empirical values D.6: Eye Movement Analysis— example (cont’d) • Amplitude thresholds A, B: derived from expected peak • saccade velocities: 600°/s Duration thresholds Tmin, Tmax: derived from expected saccade duration: 120ms - 300ms Fig.24: FIR filters for saccade detection