* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Gaze control for detail and overview in image exploration Sebastian Rauhala LIU-ITN-TEK-A--15/046--SE
Survey
Document related concepts
Transcript
LIU-ITN-TEK-A--15/046--SE Gaze control for detail and overview in image exploration Sebastian Rauhala 2015-06-16 Department of Science and Technology Linköping University SE- 6 0 1 7 4 No r r köping , Sw ed en Institutionen för teknik och naturvetenskap Linköpings universitet 6 0 1 7 4 No r r köping LIU-ITN-TEK-A--15/046--SE Gaze control for detail and overview in image exploration Examensarbete utfört i Medieteknik vid Tekniska högskolan vid Linköpings universitet Sebastian Rauhala Handledare Matthew Cooper Examinator Jimmy Johansson Norrköping 2015-06-16 Upphovsrätt Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under en längre tid från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/ Copyright The publishers will keep this document online on the Internet - or its possible replacement - for a considerable time from the date of publication barring exceptional circumstances. The online availability of the document implies a permanent permission for anyone to read, to download, to print out single copies for your own use and to use it unchanged for any non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its WWW home page: http://www.ep.liu.se/ © Sebastian Rauhala Avdelning, Institution Division, Department Datum Date Medie och informations teknik Department of Science and Technology SE-601 74 Norrköping Språk Language Rapporttyp Report category ISBN Svenska/Swedish Licentiatavhandling ISRN Engelska/English ⊠ Examensarbete ⊠ C-uppsats D-uppsats — LiU-ITN-TEK-A—15/046—SE Serietitel och serienummer Title of series, numbering Övrig rapport 2015-06-23 ISSN — URL för elektronisk version - Titel Title Ögonstyrning för detalj och översikt, i bild utforskning Författare Author Sebastian Rauhala Gaze control for detail and overview in image exploration Sammanfattning Abstract Eye tracking technology has made it possible to accurately and consistently track a users gaze position on a screen. The human eyes center of focus, where it can see the most detailed information, is quite small at a given moment. The peripheral vision of humans have a much lower level of details than the center of gaze. Knowing this, it is possible to display a view that increases the level of resolution at the position of the users gaze point on the screen, while the rest of the screen keeps a lower resolution. An implementation of such a system can generate a representation of data with both detail and overview. The results indicate that even with simple gaze data processing it is possible to use gaze control to help explore details of a high resolution image. Gaze data processing often involve a compromise between stability, responsiveness and latency. A low latency, highly responsive gaze data filter would increase the risk for lens oscillation, and demand a higher concentration level from the viewer then a slower filter would. Applying a gaze data filter that allowed for smooth and stable lens movement for small saccades and responsive movements for large saccades proved successfully. With the uses of gaze control the user might be able to use a gaze aware application more efficient since gaze precedes actions. Gaze control would also reduce the need for hand motions which could provide a improved work environment for people interacting with computer. Nyckelord Keywords Eye tracker, Gaze control, Gaze data processing, detail, overview, Media technology, Medieteknik, Tobii, lens, ögonstyrning, ögontracking Sammanfattning Ögontrackingsteknik har gjort det möjligt att noggrant och kontinuerligt följa en användares blickposition på en skärm. Det mänskliga ögats fokuseringspunkt, vart den har förmågan att uppfatta den mest detaljerade informationen, är relativt liten för varje givet ögonblick. Periferisynen hos människor har betydligt sämre förmåga att uppfatta detaljer än vad de centrala delarna besitter. Denna kunskap gör det möjligt att konstruera en scen på ett sådant sätt att upplösningen är högre för det område blicken är riktat på för nuläget, medans omkringliggande områden behåller sin ursprungliga upplösning. Ett system med en sådan implementering kan visa en representation i både detalj och överblick. Resultatet indikerar att även med enkel blickdatabehandling så är det möjligt att använda ögonstyrning för att utforska detaljer av högupplösta bilder. Blickdatabehandling involverar ofta en kompromiss mellan stabilitet, responsivitet och fördröjning. Ett blickdatafilter med hög responsivitet och låg fördröjning ökade risken för oscillerande linsrörelser, vilket krävde högre koncentration utav användarna än vad ett långsammare filter skulle ha gjort. Användandet av ett blickdatafilter som tillåter mjuka och stabila linsförflyttningar för små ögonrörelser och responsiva linsförflyttningar för stora ögonrörelser, visade sig vara framgångsrikt. Med användandet utav ögonstyrning kan en användare möjligtvis bli mer effektiv när han/hon använder ett blickmedvetet program eftersom blick ofta föregår handling. Ögonstyrning kan också minska behovet utav handrörelser vilket skulle kunna förbättra arbetsmiljön för personer som arbetar med datorer. i Abstract Eye tracking technology has made it possible to accurately and consistently track a users gaze position on a screen. The human eyes center of focus, where it can see the most detailed information, is quite small at a given moment. The peripheral vision of humans have a much lower level of details than the center of gaze. Knowing this, it is possible to display a view that increases the level of resolution at the position of the users gaze point on the screen, while the rest of the screen keeps a lower resolution. An implementation of such a system can generate a representation of data with both detail and overview. The results indicate that even with simple gaze data processing it is possible to use gaze control to help explore details of a high resolution image. Gaze data processing often involve a compromise between stability, responsiveness and latency. A low latency, highly responsive gaze data filter would increase the risk for lens oscillation, and demand a higher concentration level from the viewer then a slower filter would. Applying a gaze data filter that allowed for smooth and stable lens movement for small saccades and responsive movements for large saccades proved successfully. With the uses of gaze control the user might be able to use a gaze aware application more efficient since gaze precedes actions. Gaze control would also reduce the need for hand motions which could provide a improved work environment for people interacting with computer. iii Acknowledgments I would like to express my sincere thanks to my supervisor Matthew Cooper and my examiner Jimmy Johansson. They introduced me to an interesting field of information visualization and human computer interaction, and shared their good ideas and knowledge with me. Our discussions and their feedback have been of great help and I really appreciate it. I would also like tank my good friend and classmate Philip Zanderholm for all his help, encouragement and support during my years in Norrköping. Tanks Kahin Akram for your company and support during my thesis work. I take this opportunity to express gratitude to all of the Department faculty members for their help and support. I also thank my parents and siblings for the unceasing encouragement, support and attention. Norrköping, Juni 2015 Sebastian Rauhala v Contents List of Figures ix Notation xi 1 Introduction 1.1 The problem statement . . 1.2 Goals . . . . . . . . . . . . 1.3 Audience . . . . . . . . . . 1.4 Organization of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 3 3 3 2 Human vision 2.1 Autonomy of the human eye 2.2 Eye Moments . . . . . . . . . 2.2.1 Fixations . . . . . . . 2.2.2 Saccades . . . . . . . 2.2.3 Microsaccades . . . . 2.2.4 Gaze accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 6 7 7 8 8 9 3 Eye tracking techniques 3.1 Types of eye tracking systems . . . . . . . . . . . . . . 3.2 Calibration of remote video-based eye tracker systems 3.3 Accuracy and precision . . . . . . . . . . . . . . . . . . 3.4 Eye tracking as human computer interaction . . . . . . 3.5 Tobii TX300 eye tracker . . . . . . . . . . . . . . . . . . 3.5.1 Tobii Gaze SDK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 12 13 14 15 16 18 4 Gaze aware image viewing application, GazRaz 4.1 The idea . . . . . . . . . . . . . . . . . . . . 4.2 Hardware setup . . . . . . . . . . . . . . . . 4.3 Software setup . . . . . . . . . . . . . . . . . 4.4 Gaze data . . . . . . . . . . . . . . . . . . . . 4.5 Gaze data filtering . . . . . . . . . . . . . . . 4.5.1 Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 23 24 24 25 26 27 vii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contents viii 4.5.2 Static threshold, weighted mean filter . . . 4.5.3 Exponential threshold, weighted mean filter 4.6 Lens . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Double texture lens . . . . . . . . . . . . . . 4.6.2 Refraction lens . . . . . . . . . . . . . . . . . 4.6.3 Dynamic lens size . . . . . . . . . . . . . . . 4.6.4 Gaze point correction. . . . . . . . . . . . . 4.7 User interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 29 31 32 32 33 36 37 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 41 42 44 44 44 6 Conclusions and Future work 6.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 47 49 A GLIF, Additional information and Acknowledgements 53 Bibliography 55 5 Evaluation 5.1 Method . . . . . . . . . 5.1.1 Participants . . 5.1.2 GazRaz settings 5.1.3 TX300 settings . 5.2 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . List of Figures 2.1 A simple illustration of the scemantics of the human eye. Picture by Rhcastilhos. From wikipedia.org. . . . . . . . . . . . . . . . . . 6 3.1 A illustration of the difference of between accuracy and precision in eye tracker context. . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 TX300 eye tracker, viewed from the front. The purple circles under the screen is light from the IR lamps captured by the camera. . . . 17 3.3 Illustration of the front of the TX300 eye tracker. Notice that the camera represent the origin. . . . . . . . . . . . . . . . . . . . . . . 19 3.4 TX300 eye tracker view from the side. Notice that the eye tracker and the z axis is angled with respect to the head. Head portrait designed by Freepik.com. . . . . . . . . . . . . . . . . . . . . . . . . 19 3.5 Illustration of the tree coordinate systems used by Gaze SDK. . . . 20 3.6 Illustration of two eyes looking at a point at the active display. . . 21 4.1 Illustration of the a plane zoom lens for detail and overview. The blue area is hidden behind the lens and cannot be seen. View plane is the background image. . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2 Gaze points during smooth pursuit with static filter. Green points are the raw gaze points for the right eye and they fade towards yellow the older they get. Red points are the raw gaze points for the left eye and they fades towards purple with age. White line consist of the filtered gaze points. The red, green and blue circles represent the distance thresholds. The mean gaze point calculation was done with 14 samples i.e 7 gaze point pairs. The eyes move from the left side towards the right side in the image. Notice the distance between the left most raw gaze points and the left most filtered gaze points . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 ix x List of Figures 4.3 Gaze points during smooth pursuit with exponential filter. Green points are raw gaze points for the right and fade to yellow the older they get. Red points are raw gaze points for left eye and fade to purple with age. White line consist of the filtered gaze points. The red and cyan coloured circles represent the distance thresholds. The mean calculation was done with 14 samples i.e 7 gaze point pairs. The eyes move from the top left corner towards the bottom of the image. Notice the distance between the lowest raw gaze points and the lowest most filtered gaze points . . . . . . . . . . . . . . . . . . 4.4 Gaze points during a saccade with static filter (upper) exponential filter (lower). Green points are raw gaze points for the right and fade to yellow the older they get. Red points are raw gaze points for left eye and fade to purple with age. White points consist of the filtered gaze points. The mean calculation was done with 12 samples i.e 6 gaze point pairs. The eyes move from the left most circle to the right most circle. . . . . . . . . . . . . . . . . . . . . . 4.5 Plane lens (upper), with a zoom factor of 2.0. Fish-eye lens (lower). Both a placed above Europe on a high resolution worldmap image taken from shaderelief.com. . . . . . . . . . . . . . . . . . . . . . . 4.6 Refract lens over USA’s east coast. Texture image taken from glif.is, see appendix for additional information . . . . . . . . . . . . . . . 4.7 Plane lens over USA’s east coast. Texture image taken from glif.is, see appendix for additional information . . . . . . . . . . . . . . . 4.8 Triangle. a, b, c represent the side lengths. α, β, γ represent the angles. Image by David Weisman. Taken from wikipedia.org . . . 4.9 Illustration of gaze point correction for the plane lens. The right arrow shows where the lens would be moved to if there were no gaze point correction. The dotted line shows how the gaze point correction repositions the gaze point to target that area that the lens is showing at the moment. . . . . . . . . . . . . . . . . . . . . 4.10 The information window in the top left corner of the screen. . . . 4.11 The red text in the center of the lens is the direction the user is asked to move when he/she is too close to the track box border. The warning disappears as fast as the eye distance comes inside the allowed limit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 The image used during the introduction to GazRaz. Image resolution 16200x8100 pixels. Taken from shadedrelief.com. . . . . . . . 5.2 The image used during the think aloud evaluation. The image shows a world map centred over America. On the map there are coloured lines illustrating GLIF infrastructure of international research and education network bandwidth. The network cables have names, and some cities are marked with points and names. Image resolution 16384x8192 pixels. Taken from glif.is, see appendix for additional information . . . . . . . . . . . . . . . . . . . 30 31 34 35 35 36 37 40 40 43 43 Notation Abbreviations Abbreviation hci sdk api ip ir Meaning Human Computer Interaction Software Development Kit Application Programming Interface Internet Protocol Infra Red xi 1 Introduction With an increasingly data dense society growing up around us, computers of all sizes are becoming essential for information sharing. To perceive the fast and frequently varying information, we have a variety of computer screens and monitors to our help, see for example 1.1. Since a digital monitor is limited to a physical size and resolution, the user will have to interact with the device to make it present the information that is of interest at the moment. Human computer interaction (HCI) is often done with a mouse and keyboard or the increasingly popular use of touchscreens1 . Other forms of HCI could be by voice control, gesture sensors or eye tracking. Classical use of mouse and keyboard has proven itself reliable and useful. Eye tracking technology has made important leaps the last decade, making remote eye trackers increasingly powerful, accurate and unobtrusive (Tob [1]). This provides for new opportunities in HCI. The term eye tracker can be used in different ways. From simple camera applications that try to determine whether there are any eyes present in the scene. For example a mobile device that keeps its screen illuminated as long as the front mounted camera can detect any eyes. Other more advanced systems have the ability to not only find eyes in the scene but also determine an accurate gaze direction or gaze position of the viewer. In this thesis eye tracker is referring to a device that is able to identify the eyes of a user and calculate their gaze position. The progressive improvements of eye tracking technology makes it an interesting complement to the classical ways of human computer interaction. A gaze aware 1 Touchscreen refers to a screen of an electronic device that responds to, or is able to track a finger’s motion on the screen. 1 2 1 Introduction application might help the user to perceive information faster, than they would have if she used a mouse and keyboard. This thesis will further explore the use of eye tracking as a form of human computer interaction, and how the technique could be used in an image viewing application. Example 1.1: MRI scanners Medical MRI scanners will produce a large amount of high resolution images. To fully diagnose the patient an expert needs to do a thorough investigation of the produced images. To review the vast amount of information the medical expert uses digital monitors. The limits of the display make it necessary for the medical personnel to interact with the device, to be able to navigate and zoom into interesting parts of the data. 1.1 The problem statement Vast amounts of graphical data are produced by various machines and systems. Whether it is medical scanners, astronomical observatories or security systems, the graphical output will be presented to a human viewer. Even though digital monitors are evolving together with its graphical information, they will continue to be limited to a physical size and a maximum resolution. The human eye is evolved in such a way that we perceive the finest visual information in a small region, the Fovea. The fovea is the central 2 degrees of vision. Outside the fovea the visual accuracy becomes poorer, see Rayner [2]. As a result humans will be limited to studying a small part of a large digital monitor at any given moment. Modern systems and sensors can produce such high resolution imagery that it is necessary to be able to zoom and navigate through the image. With the help of modern eye tracking technology interaction with the information could be improved. A software application that knows where on the display the user is viewing could enhance the resolution at the given gaze position, thus giving the user the ability to process the image in detail without losing the overall perception. A gaze aware application could also be beneficial in efficiency without the need to drag a mouse pointer to the acquired position. Some people that spend large parts of their days interacting with a computer using common tools such as keyboards and mouse claim to experience pain in hands and forearms, see Lassen et al. [3]. Using the eyes to interact with computers could reduce the need for hand movement, which in turn should reduce the stress to which the hands and arms are exposed. The human eye constantly moves in its normal state (Martinez-Conde et al. [4]) so tracked or not they should not be strained when interacting with computers. That is if we not change the way we use our eye when we become aware that they are tracked. This thesis aims to explore how a gaze aware image viewing application, can be 1.2 Goals 3 developed to provide the user with the abilities of gaze control. 1.2 Goals This thesis was motivated by the following goals: • Research and conclude the advanced processes that are involved in human vision. What key components are involved in the human visual perception, and how could this knowledge be used to build a responsive and user friendly gaze aware application. • Research eye tracking technology. Review the different types of eye tracking technology. In which situations are they suitable for human computer interaction? What information can an eye tracker provide to its user and what are their limitations? • Dynamic resolution of a high quality image. A high resolution image is, in many cases, viewed through a digital monitor of lower resolution than the image original. To fully review the image the user must zoom into, and navigate around in the image. A gaze aware application could possibly increase the efficiency of this processes by dynamical zooming the image at the center of the gaze position, instead of relying on the need for hand gestures. • Non-intrusive gaze interaction. A gaze aware application should aim to help the user interact in such a way that it feels natural, easy to use and provide a boost in efficiency. 1.3 Audience This thesis will be addressed to people interested in the field of gaze tracking and the development of gaze aware applications. Previous knowledge of software engineering and calculus are advantageous for understanding this thesis. 1.4 Organization of the thesis Chapter 2 discusses the human vision and explains important parts of eye movements. Chapter 3 discusses eye tracking technology with focus on video-based eye tracking. It further describes the TX300 eye tracker system that was used during this thesis work, and the software development kit (SDK) that was used together with the system. Chapter 4 describes the GazRaz application and how it was designed. Chapter 5 discusses the evaluation of GazRaz and the result. Chapter 6 provides a discussion of the evaluation result and the design of GazRaz, together with a discussion of future work. 2 Human vision To accomplish a useful gaze aware application it is necessary to understand the components involved in human vision. Vision is arguably one of the humans most important senses. It has evolved to be an instrument that, with high precision, can describe the world around us. Beyond an input device, the eyes also serve for communicative purposes, see Majaranta and Bulling [5]. Because of the human eyes fovea construction, we are forced to direct our gaze at the object we want to see clearly. This behavior gives humans the ability to seamlessly communicate their direction of interest. The case explained by Bolt [6] in example 2.1, highlights the natural instinct in humans to use gaze as an extra dimension in communication. If a computer has the ability to understand when its being looked at it could also act thereafter. For example illuminate its screen or activate a listener for voice control. How a person use their gaze for communication can, however, vary depending on personality, cultural references or even emotional state, see Drewes [7]. The book “Neurology of eye movements” by R. John Leigh and David S. Lee gives a comprehensive review of the human visual system and how disorders can have an affect on it. The article “The impact of microsaccades on vision: towards a unified theory of saccadic function.” by Martinez-Conde et al. Provides a good description of microsaccades1 in the human vision. 1 Microsaccades is small unconscious eye movements, often defined to less than 1 degree (MartinezConde et al. [4]) 5 2 6 Human vision Example 2.1: Gaze in communication Bolt [6] made an example where he pronounced that humans use their gaze to express to whom they communicate. “Consider the case of asking the question "What is your favorite sport?" in the presence of several people, but looking at Mary, say, not at Frank, Judy, or Dave. Then you utter the selfsame words, but now looking at Dave. The question is a different question; the difference lies not in its verbal component, but in its intended addressee as given by eye.” 2.1 Autonomy of the human eye The eye can, from an engineering point of view, be described as a photo sensor with stabilizing muscles. Three pairs of stabilizing muscles are used to give the eye the ability to move horizontally, vertically and to rotate around the axis pointing out of the pupil. This provides the eyes 3 degrees of freedom and is sufficient to compensate for all movements of the head. To stabilize head moments the nerves controlling the eye muscles are closely connected with the equilibrium organ located in the ear (Drewes [7]). Figure 2.1: A simple illustration of the scemantics of the human eye. Picture by Rhcastilhos. From wikipedia.org. Figure 2.1 illustrates a schematic diagram of the human eye. Light enters the eye through the pupil. The area surrounding the pupil, the iris, is responsible for the light admission and is able to adapt to changing light conditions. The light sensitive area of the inside of the eye is called the retina. The retina stretches in an ellipsoidal manner with a horizontal extent of 180 degrees and 130 degrees vertically (Drewes [7]). It should also be noted that the fovea is not located directly in line with the pupil, 2.2 Eye Moments 7 but 4 to 8 degrees higher up (Drewes [7]). This means that the optical axis (papillary axis) does not fully correspond to the line of sight. For a clear vision of an object, its image needs to be held steady on the retina and kept close to the center of the fovea. Leigh and Zee [8] explain that, at a 2 degree distance from the fovea, the visual acuity has declined by about 50%. Further more Leigh and Zee [8] state that for best perception of an object it needs to be within 0.5 degrees of the center of the fovea. The visual field outside the fovea can be separated into parafoveal and peripheral. The fovea covers the central 2 degrees of vision, parafovea extends to 5 degree, see Rayner [2]. The remaining peripheral vision supplies cues about where the eye should look next and also provides information on moments or changes that occur in the scene (Majaranta and Bulling [5]). For example Hillstrom and Yantis [9] argue that “when motion segregates a perceptual element from a perceptual group, a new perceptual object is created, and this event captures attention” and “motion as such does not capture attention but that the appearance of a new perceptual object does.”. 2.2 Eye Moments The human eyes move and shift gaze direction frequently. Because of the foveal construction of the eyes, humans need to shift their gaze to the object they want to study. When there are no moving objects in the view and the head is in a stable position the eye changes gaze direction in abrupt movements. These jerky eye movements are known as saccades. Right after a saccade the eye needs a moment to fetch the visual information before another saccade can be achieved. This time for rest is called a fixation. When the eye is fixating on an object, but the body or head is moving, the eye muscles can counteract the motion to maintain a stable projection on the retina. Such action is known as smooth pursuit. Smooth pursuit also include when the eye strive to maintain a fixation on a moving or accelerating object. The eyes movements can be summarized to two main types: those that stabilize the gaze to preserve steady images on the retina, and those that shift gaze direction to explore new objects (Leigh and Zee [8]). The rest of the section will discuss fixations, saccades and gaze accuracy in more detail. 2.2.1 Fixations The eyes need a stable exposure on the retina for image recognition. For this purpose the eyes fixate and oppose drift and disturbance of the eye. Fixations can be characterized as pauses that last at least 100 ms, but typically in the interval of 200 to 600 ms (Majaranta and Bulling [5]). Depending on fixation identification technique, some authors present different results of fixation duration. Rayner [2] experimented on eye movement when reading and suggested fixation lengths of 2 8 Human vision 100 to 400 ms. The human eyes stabilizing muscles are good at counteracting head and body movements, but they are not stable enough to hold the eyes completely still. Even if we attempt to fixate our gaze to a steady point, small ocular motions will shift our eye position (Martinez-Conde et al. [4]). The gaze instability during fixation can be separated into three main components, high-frequency low-amplitude tremor, microsaccades and slow drifts. The frequency of the tremor can reach up to 150 Hz and its amplitude is less than 0.01 degree, which corresponds to less then the size of one photo receptor (Leigh and Zee [8]). 2.2.2 Saccades Saccades are the rapid motions the eye makes when shifting the gaze direction between fixation points. Saccades include a range of ocular behaviors, and can be activated both voluntarily and involuntarily (Leigh and Zee [8]). There are multiple roles for saccades in vision: they correct gaze errors, foveate targets and are used during visual search (Martinez-Conde et al. [4]). Saccades show a consistent relation between the speed, size and duration. The further saccade angle the longer duration will it need. Leigh and Zee [8] argue that even large saccades, with an amplitude up to 30 degrees, do not last much longer then 100 ms. Which is close to the response time of the visual system. According to Thorpe Simon and Catherine [10] the time from a visual stimulus to a reaction in the brain is roughly 150 ms. This implies that no visual feedback has time to reach the brain during the saccade. The brain must therefore calculate the predicted motions needed to translate the gaze onto a new target, before the saccade begins. If the eye fails to hit the desired target a corrective saccade is usually done with a latency of 100 to 130 ms (Leigh and Zee [8]). Leigh and Zee [8] further mentioned that saccades can do corrective motions before the initial saccade is complete. These corrective responses are believed to be trigged without any visual feedback. This is supported by experiments showing that corrective saccades can be conducted even in complete darkness (Leigh and Zee [8]). The duration of a saccade can be approximately described as linearly related to their amplitude of movement, if it is within the range of 1 to 50 degrees (Leigh and Zee [8]). There are however a couple of different mathematical models suggested by various authors, to describe the relation between saccade duration and saccade distance. Some also argue for and against the use of target size as a factor for saccade duration. For more insight and discussion about these models see Drewes [7] as a pointer to literature. 2.2.3 Microsaccades During an attempted fixation the eye still produces low amplitude saccades known as microsaccades. These saccades can occur in a frequency of 1-2 times per second during fixation. The purpose of the microsaccades are still debatable, but 2.2 Eye Moments 9 they are believed to share a common generator with larger saccades, see MartinezConde et al. [4]. The amplitude of a microsaccade has previously, before the 1990s, been defined to be around 12 arc min (0.2 degree). Later studies, however, show microsaccades frequently exceeding this limit. Instead a microsaccade magnitude distribution of around 1 degree is suggested, see Martinez-Conde et al. [4]. Martinez-Conde et al. [4] describe the distinction between saccades and microsaccades during free viewing or visual search as “saccades produced during active exploration may be considered ‘regular saccades’, and saccades produced in the fixation periods between exploratory saccades may be considered ‘microsaccades’” Even though the exact mechanism that triggers microsaccades is unknown (OteroMillan et al. [11]). Steinman et al. [12] showed that conscious attempts to fixate could decrease the rate of microsaccades. Steinman et al. [13] further confirmed the evidence for saccade suppression, and also stated that voluntary saccades can be the size of microsaccades. The average of these voluntary “microsaccades” where 5.6 arc min (0.093 degree) with a standard deviation of < 3.0 arc min (0.05 degree). This conclude that it is not sensible to treat all small saccades as unwanted gaze disturbances. Steinman et al. [13] reason that “The frequent occurrence of miniature saccades during maintained fixation may merely confirm something we already know. Human beings are uncertain and human beings are curious. After all, how can you be sure that you are really looking exactly at a target, and how can you resist the temptation to look for something more interesting nearby?” 2.2.4 Gaze accuracy The eyes high resolution vision (fovea region) covers 1 degree of the visual angle. The size of the area is commonly described to approximately be the same size as thumbnail when the arm is fully stretched. If a viewed object fits within this area, it is sufficient for the eye to have its projection somewhere on the fovea. The object does not necessarily have to be in the center (Drewes [7]). Drewes [7] also argues that there is no need for the eye to position the gaze more accurate. Since the target will be perceived clearly as long as it is within the fovea. Drewes [7] concludes that, due to ocular disturbance and the size of the fovea, the gaze accuracy is limited to around ±0.5 degrees. Which also means that eye tracker technology will, at its best, produce the same amount of accuracy as the eyes themselves. 3 Eye tracking techniques A lot of research has been done in the field of eye tracking. The earliest work dates back to the 18th century, (Drewes [7]). An essential part of the eye tracking research has been towards understanding human visual perception and the physical movement of the eyes. Eye tracking has a history through medical and psychological research as a important tool to study human visual behavior. Technology has since its early stage, developed to be increasingly accurate and non-intrusive, which has made it applicable in new fields such as accessibility to help the handicapped, or market advertisement testing. New affordable eye tracking systems will also provide the possibilities of gaze aware applications to a wider audience. Modern affordable eye tracking systems might provide the motivation to expand to new user domains, even becoming an established way of human computer interaction. Humans cognitive processes are reflected in our gaze behavior, which can provide hints of our thoughts and intentions, see Majaranta and Bulling [5]. Land and Furneaux [14] stated that humans look at things before acting on them. By introducing gaze data to a software application, it could adapt to the users interest or intention and increase the efficiency of the software. This chapter will provide a foundation to understand of eye tracking technology. Discuss how it could make or wreak human computer interaction. 11 12 3 Eye tracking techniques 3.1 Types of eye tracking systems There are various sets of techniques to track the motions of the eyes. The most direct method is to place sensors in direct contact with the eye. For example attaching small levers to the eyeball, as in the experiments of Steinman et al. [13]. This method is however not recommended because of the high risk of injuries (Drewes [7]). An improved way of applying sensors to the eye is through the use of contact lenses. The lens could hold a integrated coil, that allows measurement of the coil’s magnetic field, which in turn can be used to track eye movement. The wire needed to connect the lens to the rest of the gear could, however, be annoying for the user. The advantage of the method is its high accuracy and nearly unlimited resolution in time (Drewes [7]). Another method for eye tracking is electrooculography. This method uses multiple sensors attached to the skin around the eyes. The sensors measure the electric field of the eye which is an electric dipole, see Drewes [7]. One advantage of this method is that it allows for eye tracking with the eyes closed or in complete darkness. Which makes it usable when studying eye movement during sleep. Both these methods are most suitable for scientific research and not for human computer interaction due to its obtrusive nature. A non intrusive technique suitable for HCI is video-based eye tracking. Some video-based systems even allow for modest head movement as long as the head stays within the systems “trackbox”1 . Video-based eye tracking systems can be further subdivided into different subcategories, and be remote or head-mounted. The video-based tracker systems uses, as the name reveals, a video camera that, together with image processing, is able to estimate the gaze direction. In remote systems the camera is typically placed below the computer screen that is to be tracked. On head-mounted systems the camera is either mounted to the frame of eyeglasses or to a helmet. The frame rate and the resolution of the tracker camera will have a significant effect on the accuracy of the tracking (Majaranta and Bulling [5]). There are also a number of other factors that could have an effect on the quality of the tracker data such as eyeglasses, lenses, droopy eyelids, light conditions or even makeup. For more information on and what to consider when setting up a video-based eye tracking environment or experiment see section 4.4 in Holmqvist [15]. Many trackers aim to detect the pupil of the eye, to calculate the gaze direction. There are two methods of illumination methods to detect the pupil, referred to as the dark and the bright pupil method. The bright pupil method uses infrared light directed towards the eye to create a reflection on the retina. The reflection can be detected by the camera but is not visible to the human eye. The effect is similar to that of “red eye” when taking a photograph with flash activated. To use this method infrared lamps need to be mounted close to the eye tracking camera. 1 Trackbox is the volume space in front of the eye tracker camera where the system is able to detect the eyes. 3.2 Calibration of remote video-based eye tracker systems 13 Dark pupil method use image processing to locate the dark position of the eyes pupil. This can be problematic if the hue of the pupil and iris are close. Tracker systems that are based on visible light and pupil center tracking tend to have accuracy issues and be sensitive to head movement (Majaranta and Bulling [5]). To address this issue a reflective glint on the cornea of the eye is used. The reflection is cased by infra red (IR) light aimed on- or off-axis at the eye. An on-axis light will result in a bright pupil effect and off-axis will result in dark pupil. By measuring the corneal reflection(s) from the IR light relative to the pupil center, the system is able to compensate for the inaccuracies and also allow for modest head movement (Majaranta and Bulling [5]). Since the cornea has a spherical shape, the reflective glint stays in the same position for any direction of the gaze. The gaze direction can therefore be calculated by measuring the changing relationship between the moving pupil center of the eye and the corneal reflection. As the physical size and features of the eye differs from person to person a calibration procedure is needed. The calibration adjusts the system to suit the current user. 3.2 Calibration of remote video-based eye tracker systems For a remote video-based eye tracking system to accurately map a user’s gaze onto the tracked screen the system needs to be calibrated for each user. This is normally done by presenting a grid of calibration points on the screen. In general more calibration points generate better accuracy. In research nine calibration points are commonly used in a 3x3 grid, but software used in non experimental setup could cope with fewer, and still have a good enough accuracy. The accuracy will also be favored if the calibration conditions are the same as the application conditions. During the calibration procedure the user consecutively fixate at the calibration points one after another. The relationship between the pupil center position and the corneal reflection during a fixation changes as a function of the eye gaze direction, see Majaranta and Bulling [5]. The images of the eye orientation when looking at each of the calibration points are used to analyse the correspondence to screen coordinates. Now the system knows the eyes gaze direction when it is looking at certain points in the screen. By interpolation of these key points the system is thereafter able to estimate any gaze point on the screen. Calibration is essential for the accuracy of an eye tracker system. Its properties can be summarized by the number of points, placement of calibration points and conditions during calibration. Depending of the purpose of your application, you could design calibration to suit it. If an eye tracker is meant to be used by the same person all day, it would make sense to conduct a more thoroughgoing calibration. With the benefit of better accuracy. Whilst an eye tracker that is supposed to be used by multiple users every hour would be favored by having as simple and short a calibration procedure as possible, to enable fast and easy 14 3 Eye tracking techniques access to the application. There exist eye tracker systems that by different techniques tries to avoid the need for calibration. Drewes [7] reasoned that these systems will still have to be calibrated in some sense, due to the uniqueness of the eyes. As mentioned in section 2.1, the fovea is not located directly opposite the pupil but instead a few degrees higher. This individual difference is difficult to measure without a calibration. Hence a calibration free system is unlikely to be developed. The calibration procedure could, however, be reduced and hidden to the user. 3.3 Accuracy and precision Accuracy and precision are two important metrics to describe an eye tracker’s performance. Figure 3.1 describes the difference between precision and accuracy. Accuracy refers to how closely the eye tracker is able to track the exact gaze position. The precision on the other hand indicates the gaze point consistency during a completely steady fixation. In the best case the gaze point samples should be exactly in the center of the target and have a minimal spread. Since the human eyes are never completely still and not necessary fixating at the exact position the mind is trying to explore, synthetic eyes are used when calculating the performance of an eye tracker. Figure 3.1: A illustration of the difference of between accuracy and precision in eye tracker context. The accuracy is calculated as the root mean square distance between gaze data sample and the targeted stimuli. The distance is thereafter transformed from distance in length to gaze angle with the help of the known distance between the eye and the screen. Precision is calculated as the standard deviation of data samples or as root mean square of inter-sample distances (Holmqvist [15]). Precision dis- 3.4 Eye tracking as human computer interaction 15 tance is also transformed to gaze angle. For more information on how to calculate eye tracker performance see Tob [1]. Many commercial eye trackers state an accuracy of about 0.5 degree, for example the eye tracker used in this thesis, Tobii TX300, has a majority of participants with an accuracy distribution of 0.4 degree during ideal conditions. Drewes [7] argues that it is difficult to archive higher accuracy due to the uniqueness of the eye. As discussed in section 2.2.4, the eye does not need to be more accurate than to have the target somewhere within the fovea. Therefore eye trackers are also constrained to the accuracy of the eyes themselves. It should also be stated that small often unattended ocular motions such as microsaccades, tremor and drift will have an effect on the accuracy of the eye tracking system. Disturbances during calibration will result in poorer accuracy. Microsaccades can have a noticeable amplitude up to 1 degree and occur frequently, but are not necessary produced deliberately by the viewer. The user of an eye tracker system can therefore experience the system as noisy even though the noise is produced by the eyes themselves. 3.4 Eye tracking as human computer interaction With todays use of the computer mouse to interact with computers. Many users are able to pick a target to the closest pixel, even though it might require some concentration. With a quality eye tracker and during optimal conditions it is still going to be difficult to come closer than 0.5 degree of the target, which at a distance of 650 mm is 5.7 mm away. So for this reason it is not suitable to replace the mouse with an eye tracker without major changes to the user interface. The advantage of using an eye tracker compared to a mouse is that the eyes often gaze at an object prior making an action on it. For example finding a button before pushing it. Eye tracking could give the user the ability to act on the target without having to find the mouse cursor and drag it to the target. Which could enhance the efficiency of the application and reduce the need of hand motions. Using gaze as an input method could also introduce problems. Since using the eyes for both perception and control, could induce unwanted actions to be made. It is difficult to distinguish automated eye behavior from knowingly altered eye behavior. Which means using only gaze or gaze plus eye twitching to control a software could prove unsuccessful. To quote Jacob [16] “The most naive approach to using eye position as an input might be to use it as a direct substitute for a mouse: changes in the user’s line of gaze would cause the mouse cursor to move. This is an unworkable (and annoying) approach, because people are not accustomed to operating devices just by moving their eyes. They expect to be able to look at an item without having the look “mean” something. Normal visual perception requires that the eyes move about, scanning the scene before them. It is not desirable for each such move to 16 3 Eye tracking techniques initiate a computer command.” There is a common term “Midas Touch” used in HCI community to describe the problem. Referring to Greek mythology, where King Midas was blessed and doomed to turn everything he touched into gold. To avoid this problem a second source of interaction is often used such as a push of a button. This way the user is able do study an object without creating actions. Another way is to use a timer function for the fixation, and if the fixation stays long enough an action is executed. This method is called dwell time. The dwell time can be longer or shorter depending in the experience of the user. For example eye typing application can use dwell time to distinguish key search from key push. Majaranta and Bulling [5] argue that, when using gaze initiated selection, it is important to provide corresponding feedback to the user. By pressing a button the user makes a selection and physically executes it. When using dwell time the user only initiates the action and the system executes it after the time has elapsed. Approximate feedback to the user is necessary to provide an understanding of the system. For example, feedback, as to whether the system has recognized the correct target, is about to do an action, and when the action will be executed. This could be done by highlighting the target and providing a progress-bar of the dwell time. To summarize eye tracking have a good potential to be used in HCI. A gaze aware application should however be adapted to the use of an eye tracker. Picking task with an eye tracker will be less accurate than with a computer mouse. Which makes it necessary to design the user interface to be able to handle such conditions. The user should also be able to control or at least be aware of when the gaze is used to initialize actions. 3.5 Tobii TX300 eye tracker During the development and evaluation of the gaze aware image viewing application GazRaz, the Tobii TX300 eye tracker system was used together with an HP PC. The TX300 is a video-based eye tracker. It uses the relative positions of corneal reflection cased by its infra red lamps to estimate the gaze angle as discussed in section 3.1. The TX300 uses the dark pupil method which is also explained in section 3.1. The system needs a recalibration for each new user of the system. The TX300 is a high quality eye tracker system with the ability to collect gaze data up to a rate of 300Hz for each eye individually. Its accuracy is claimed to be 0.4◦ for binocular data, during ideal conditions. Binocular data is the average of the two eyes. Ideal conditions for the system is when the users head stays in the middle of the track box, at a distance of 65 cm from the eye tracker camera and with a illumination of 300 lux in the room. Precision is stated to be 0.07◦ for binocular data without any filter, and at a distance of 65 cm. Precision is calculated as root mean square of successive samples. In table 3.1 some of the system specification is stated. For further information on the TX300 specifications see 3.5 Tobii TX300 eye tracker 17 Table 3.1: Relevant specifications for the TX300 eye tracker Type Values Accuracy Accuracy, Large angle 0.4◦ At ideal conditions, binocular 0.5◦ At 30 ◦ gaze angle, binocular Precision 0.07 ◦ Without filter, binocular Sample rate Sample rate variablility 300, 250, 120, 60 Hz 0.3% Total system latency <10 ms Head movement Operation distance 37 x 17 cm. Freedom of head movement at a distance of 65 cm 50-80 cm Max gaze angle 35 ◦ TX3 [17]. For information on how the specifications are calculated see Tob [1]. Figure 3.2: TX300 eye tracker, viewed from the front. The purple circles under the screen is light from the IR lamps captured by the camera. The TX300 has an integrated computer screen that is used to map the gaze data upon. The screen can be dismounted, and the eye tracker part can be used separately, or with other screens. As long as the system is configured correctly and the screen fits within the trackable area. However during this project only the accompanying screen was used. Figure 3.2 shows the TX300 eye tracker as seen by the user. With the maximal sampling frequency of 300 Hz the age between each sample will be 1/300 = 0.0033 s, which translates to 3.3 ms. With the total system latency added the “youngest” gaze sample will be <13.3 ms old. 3 18 Eye tracking techniques Table 3.2: Relevant specifications for the TX300 TFT monitor Type Values Screen size 23 inch (509 mm x 286 mm) Resolution Pixels per millimeter 1920 x 1080 pixels 3.775 ppmm Respons time 5 ms typically From the specifications in table 3.2 and 3.1, the accuracy in gaze angle of the eye tracker can be translated to a distance in millimetres and in pixels. With simple trigonometry an accuracy of ±0.5◦ translates to a distance of ±5.67 mm or ±21 px at a distance of 650 mm. Which is a noticeable error if a task demands perfect accuracy. Equation 3.1 l is the length, d the distance and α the gaze angle. l = d ∗ tan(α) (3.1) Figure 3.3 and 3.4 illustrates the TX300 eye tracker system. The eye tracker camera is located in the center right below the screen. If the user is looking straight into the camera, the gaze angle is 0◦ , but looking at the top of the screen above the camera the gaze angel would be 26.4◦ , at a distance of 650 mm from the camera. The top left and right corners would render a gaze angle of 32.4◦ , which is within the max gaze angle limit of 35◦ . The whole screen will be within the trackable area and can be used for the application. 3.5.1 Tobii Gaze SDK Gaze SDK provides a low-level API for accessing eye tracker data from the TX300, and other Tobii eye trackers. Being low-level means it gives the developer full control of how to mount the eye tracker and how to process and use the data. It also means that there is no finished functionality for calibration and data processing. The rest of this section describes the inner workings of the Gaze SDK. For further information of the Tobii Gaze SDK see Tob [18]. GazRaz uses the Gaze SDK to: • Initialize and connect to an eye tracker. • Subscribe to gaze data. • Getting information about the eye tracker. Communication between the application GazRaz and the eye tracker is performed asynchronously. The TobiiGazeCore library uses a dedicated thread for processing data and messages received from the eye tracker. When a connection to the 3.5 Tobii TX300 eye tracker 19 Figure 3.3: Illustration of the front of the TX300 eye tracker. Notice that the camera represent the origin. Figure 3.4: TX300 eye tracker view from the side. Notice that the eye tracker and the z axis is angled with respect to the head. Head portrait designed by Freepik.com. 3 20 Eye tracking techniques eye tracker is established the thread waits for gaze data and evokes a callback whenever data is received. This way the gaze data can be gathered and processed without interference from the rest of the application. Lengthy operations should however be avoided in the callback function, since it could block the thread and result in loss of data. Gaze SDK coordinate system Figure 3.5: Illustration of the tree coordinate systems used by Gaze SDK. The Gaze SDK uses three different coordinate systems as illustrated in figure 3.5. The Active Display Coordinate System (ADCS), is the possibly most interesting, since it hold the gaze data for the both eyes each mapped onto a 2D coordinate system. ADCS have its origin in the top left corner of the active display e.g. the integrated screen. The values of the coordinates are described as (0, 0) for origin and (1, 1) right bottom corner. Since the trackable area stretches a bit outside the active display it is possible to receive coordinates outside the interval of 0 to 1. The User Coordinate System (UCS) describes each eye’s position in 3D space coordinates. The origin is in the center of the eye tracker front panel. The coordinates are in millimeters. The x-axis points horizontally toward the users right. The y-axis points straight up. The z-axis points towards the user with the same angle as the eye tracker itself. The track Box Coordinate System (TBCS) describes each eye’s position in a normalized track box. Track box is the volume in front of the eye tracker in which the tracker is able to detect the eyes and collect gaze data. The TBCS has its origin in the top right corner. 3.5 Tobii TX300 eye tracker 21 Gaze SDK gaze data Figure 3.6: Illustration of two eyes looking at a point at the active display. When a connection with the eye tracker is established, data packets will be streamed to the application. Each of these gaze data packets will contain: Timestamp. This value represents the time at which the information used to produced the gaze data package was sampled by the eye tracker. Eye position from the eye tracker. The 3D space position for each eye in the UCS coordinate system. Eye position in the track box. The normalized 3D space position of each eye within the track box described in the TBCS. Gaze point from eye tracker. The gaze point on the calibration plane for each eye. Calibration plane is the virtual plane used for calibration. Gaze point refers to the position where the users gaze would interact the calibration plane. Gaze point on display. The gaze point for each eye individually mapped to the active display. The position in described in ADCS. Figure 3.6 illustrates the two eyes fixating on an object in the screen. The eye tracker also provides a status code to help determine which eye is being tracked if only one is found. 4 Gaze aware image viewing application, GazRaz This chapter will discuss how the gaze aware image viewing application, GazRaz, was constructed. 4.1 The idea The idea for the application, originally stated by Matthew Cooper, was to use gaze control to help explore high-resolution images. High-resolution images in this case defined as images with larger resolution than the screen they are displayed on. A common method to explore high-resolution images is to zoom into the images and navigate around in them with the use of mouse or keyboard. The method is referred to as pan + zoom navigation. The problem with this method is that the user loses overview of the image when zooming into the image. To keep the overview of the image the GazRaz application uses a lens to zoom into the images at the cursor/gaze position in the screen. Which creates what is known as a detail + overview interface (Baudisch et al. [19]). Since the GazRaz application uses one screen and not two as Baudisch et al. [19], the lens (detailed view) is going to overlap the background (overview) to some extent, depending on the amount of zoom and size of the lens, therefore also hiding some of the information of the overview image under the lens. Figure 4.1 illustrates how a plane zoom lens magnifies an area of the image, but also hides some of the overview under the lens. To try to solve or decrease the effect of this issue a fish-eye lens was also developed. With a fish-eye lens the information in the border would be distorted but possibly provide enough information for the user to understand what lies behind or just outside the lens. 23 4 24 Gaze aware image viewing application, GazRaz The GazRaz application was developed in a experimental manner to explore whether gaze control is suitable for image scanning in a detail + overview interface. Figure 4.1: Illustration of the a plane zoom lens for detail and overview. The blue area is hidden behind the lens and cannot be seen. View plane is the background image. 4.2 Hardware setup The Tobii TX300 eye tracker, discussed in 3.5 was used during development and evaluation of GazRaz. The computer used together with the eye tracker system is a HP laptop with a standard PC architecture. Table 4.1 holds the specifications for the HP computer. Table 4.1: Specifications for HP Zbook17 Component Name Processor CPU Memory Graphics GPU Intel Core i7-4700MQ 2.4 GHz 8 GB RAM Nvidia Quadro K4100M Operating system Windows 7, 64bit 4.3 Software setup This section will discuss the softwares, libraries and SDK’s used. GazRaz was developed with the C++1 computer language. Modern OpenGL2 was used for pro1 http://www.cplusplus.com/ 2 https://www.opengl.org/ 4.4 Gaze data 25 graming against the graphics drivers. GLEW3 as extension manager for OpenGL. For OpenGL context handling and input/output management GLFW4 was used. To communicate with the eye tracker the Tobii Gaze SDK was used. Furthermore OpenGL Mathematics (glm)5 was used for vector and matrix calculations. SOIL26 was used for image to OpenGL texture loading. To help with text output to the user interface FreeType7 was used. Outside of the GazRaz Tobii studio has been used to calibrate the eye tracker, and to configure the eye tracker Tobii Eye Tracker Browser was used. 4.4 Gaze data The gaze data used by the GazeRaz application consists of the gaze position for each eye on the active display, and the 3D space coordinates for each eye described in USC. Listing 4.1 shows the struct for the gaze data for the gaze position of the active display. Listing 4.2 is the data object to hold the eyes postion Listing 4.1: Gaze position in the active display /** * Gaze point position on the screen. * Normalized to values between -1 and 1. * NOTE,If the user looks out side the active * display area or due to precision error the * x, y, z coordinates may be out side the interval of -1 to 1. */ struct GazePoint{ float x; float y; float z; }; Listing 4.2: Eye postion in 3D space /** * Eye distance in mm from eye tracker. * The coordinate system has its origin in the * center of the eye tracker’s front panel. */ struct EyeDistanceMm { double x; double y; double z; }; 3 http://glew.sourceforge.net/ 4 http://www.glfw.org/ 5 http://glm.g-truc.net/0.9.6/index.html 6 https://bitbucket.org/SpartanJ/soil2 7 http://www.freetype.org/ 26 4 Gaze aware image viewing application, GazRaz The original gaze point values provided by the Gaze SDK is in the active display coordinate system, see section 3.5.1. For the gaze data to be suitable for OpenGL it is transformed to the 2D Cartesian coordinate system normalized to values between 1 and -1. The origin is now in the middle of the screen and the top right corner has the coordinates (1, 1). whilst the bottom left corner has the coordinates (-1, -1). The gaze coordinates are cast to floats to provide direct parsing to graphic buffers. To store the gaze data a C++ std::vector container is used. To avoid flooding the memory with gaze data a reasonable size is used. Thereafter the oldest gaze data is deleted as new gaze data is pushed to the queue. Every other element in the vector will be the gaze point for either the left or the right eye. A vector size of 300 will hold 150 GazePoint pairs which is 0.5 seconds of gaze data when the TX300 eye tracker runs with 300 Hz sample rate. No z values are provided by the Gaze SDK for gaze points. The z dimension in GazePoint was added to provide a possibility to extend the gaze point into 3D space in the future. However during development z was used to represent whether a GazePoint was valid or not. To maintain the knowledge of how old a GazePoint is even invalid data (No eye tracked) is pushed to the GazePoint vector. This means that short disruptions, for example a blink of an eye will result in series of invalid gaze samples. In the same way as gaze points, eye position also uses a vector to store data. The EyeDistanceMm is described in the same way as the user coordinate system in section 3.5.1 4.5 Gaze data filtering The data provided by the eye tracker, even during ideal conditions, will contain a certain amount of noise. The origin of the noise can be from inaccurate calibration, inherited system noise or noise produced by the eyes themselves. Positioning the head close to the track box boundaries significantly increase the noise. To reduce the effect of noise and outliers8 in the data, common techniques involve calculating the mean values of multiple data samples and processing the data through filters. Since the gaze position is meant to control the lens position in real-time9 , it is essential that the gaze data is processed carefully. If noise influences the lens position, it could severely reduce the user experience and render the gaze control feature useless. On the other side of the noise issue is latency. Taking a large number of gaze samples to calculate the mean value will cause the mean gaze data to be older then the latest gaze data pushed by the eye tracker. Gaze data processing is a 8 Outliers refers to data samples that differs greatly compared to its neighbouring data samples. 9 Real-time is in this context defined as fast enough for the user to perceive the response of the application as instant. 4.5 Gaze data filtering 27 compromise between noise reduction and latency. The rest of this section will discuss how GazRaz processes the gaze data. 4.5.1 Mean The simplest filtering of gaze data is to substitute the received gaze point by a sample of the last N points. This way noise is reduced and precision is increased at the cost of latency. The eye tracker provide gaze point data for both the left and the right eye independently. The gaze point distribution between the right and the left eye change depending of gaze angle, but stays close to each other. The eyes might be looking at two slightly different locations, whilst the brain produces one image of the scene. Calculating the mean gaze point of the right and left eye, produces a gaze point with better accuracy than for one eye separately. The specification for the TX300 eye tracker states an accuracy of 0.5 ◦ monocular and 0.4 ◦ binocular. Equation 4.1 is used to calculate the mean gaze point GazePointmean out of n number of samples. Note that the number of samples should be selected such that an equal number of samples from left and right eye are used. Invalid gaze point data should be ignored. GazePointmean = n P i=1 GazePointn n (4.1) An eye tracker with a sample rate of 300 Hz produce 600 gaze points each second (one for each eye). With this speed one gaze point pair is produced every 3.3 ms. If 10 gaze point samples, i.e. 5 gaze point pairs are used to calculate a mean gaze point. The oldest gaze point affecting the value is 16.67 ms old and their mean age together is 10 ms. This is less than the update frequency of the screen that is 16.67 ms. However due to thread safety the very last gaze data was not used in the mean calculation. To be able to use the last gaze data in the gaze point calculation the software would need to be constructed in such a way that the eye tracker thread calls the draw calls which also would render it the user interface thread. This would have many disadvantages such as a blocking of gaze data would mean blocking of the user interface and latency would still be an issue due to the latency of the calculations needed for draw calls. So the mean age of the mean gaze point should be estimated as in equation 4.2 where n represents the number of sample pairs and total system latency is specified to <10 ms. This gives a mean age of <23.3 ms for 5 gaze sample pairs. n+1 P i=2 xn sampleRate + totalS ystemLatency (n) (4.2) 4 28 4.5.2 Gaze aware image viewing application, GazRaz Static threshold, weighted mean filter With only the mean value as filtering method the gaze point appeared unstable. Further increasing the number of samples used to calculate the mean gaze point helped, but also increased the latency. Instead a high-pass filter was constructed for the gaze data. The purpose of the filter is to allow the gaze point to move rapidly when the eyes produce a saccade, and in the opposite hold the gaze point steady when the eyes are fixating. Algorithm 1 shows the simplified pseudocode for the filter. Distance in algorithm 1, represent the distance between the "new" gaze point and the "old" gaze point. If the distance is large enough the new gaze point will remain unchanged. If the distance is within the different thresholds the old gaze point is being weighted into the new gaze point, creating an weighted average of the two. If the distance is smaller then the in parameter noiseThreshold the new gaze point is discarded and the old gaze point is kept. Algorithm 1 Static Filter 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: procedure staticFilter distance ← absoluteValueOf(newGazePoint - oldGazePoint); if distance < noiseThreshold then oldGazePoint is returned. else if distance < noiseThreshold * thresholdAmplifierFirst then A weighted mean average gaze point between the oldGazePoint and the newGazePoint is returned. gazePoint ← getWeightedMeanAverageGazePoint(weig htFirst); else if distance < noiseThreshold * thresholdAmplifierSecond then A weighted mean average gaze point between the oldGazePoint and the newGazePoint is returned. gazePoint ← getWeightedMeanAverageGazePoint(weig htS econd); else newGazePoint is returned. end end The noiseThreshold in algorithm 1, is described as a distance in two dimensional Cartesian coordinates with axis limits of -1 to 1. The threshold limits and the weighting were developed by trial and error to what seems a fair compromise between "speed" and stability. The base value for the noise threshold was set to 0.00015, which, with equation 4.3, translates to a distance radius rthreshold of 2.5 mm on the TX300 monitor. swidth , sH ig ht are the screen width and hight in mm, and xleng th , yleng th are x/y axis lengths in Cartesian coordinates. The threshold radius can be further translated to a gaze angle of 0.22◦ at the distance of 650 mm with equation 3.1. This means that the lens will stay steady during fixation if the gaze point data achieves a precision of less then 0.22◦ . Note that TX300 system specification stated a precision of 0.07◦ . Microsaccades discussed in sec- 4.5 Gaze data filtering 29 tion 2.2.3 can, however, have a larger amplitude than 0.22◦ and cause the lens to move. This can be perceived as if the lens is making unintended movements since microasaccades are not necessarily intended by the viewer. Increasing the base noise threshold will reduce this movement but at the same time also suppresses intended small saccedes. Lines 5 and 9 in algorithm 1 compare the distance to an amplified noise threshold. The thresholdAmplifierFirst was set to 200 which translates to a threshold radius of 3.15◦ at a distance of 650 mm . The thresholdAmplifierSecond was set to 600, which translates to a threshold radius of 5.44◦ at a distance of 650 mm. This means that if a saccade is within 5.44◦ the lens will not directly jump to the new gaze point, but rather gradually move towards it, depending on the weights. q rthreshold = q 2 2 sW idth + s H ig ht 2 2 xleng th + yleng th (4.3) With this filter the gaze point could be kept steady during fixation and move rapidly during "large" saccades. Small saccades inside the threshold limits would cause the gaze point to smoothly move towards the new gaze position. Figure 4.2 shows the raw and filtered gaze points during smooth pursuit eye movement. Smooth pursuit is only used to show the latency between the latest gaze point in the raw data and the filtered mean gaze point. Since smooth pursuit eye movement will be uncommon or non-existent in a image viewing application due to the lack of targets to follow. Instead alternation between saccades and fixation will the primary eye movement in the GazRaz application. Figure 4.4 shows the raw and filtered gaze points during a saccade. Notice that the static filtered gaze points make a "jump" from the left most circle to three fourths of the way to the target. This is because the new gaze point distance comes outside the thresholds. After the jump the filtered gaze point smoothly moves toward the target. 4.5.3 Exponential threshold, weighted mean filter A exponential weighted mean average filter was developed to explore if it could produce a filter that allowed for rapid gaze point repositioning for both large and small saccades. The algorithm 2 shows the simplified pseudocode for the filter. In the same way as the static filter the parameter distance represent the distance between the "old" and the "new" gaze point. The weight between the "old" and the "new" gaze point is calculated as a exponential function depending on the distance between the two. The thresholdAmplifier was set to 1000 which translates to a threshold radius of 7.01◦ at a distance of 650 mm. If a saccade is close to the border of 7.01◦ the lens 30 4 Gaze aware image viewing application, GazRaz Figure 4.2: Gaze points during smooth pursuit with static filter. Green points are the raw gaze points for the right eye and they fade towards yellow the older they get. Red points are the raw gaze points for the left eye and they fades towards purple with age. White line consist of the filtered gaze points. The red, green and blue circles represent the distance thresholds. The mean gaze point calculation was done with 14 samples i.e 7 gaze point pairs. The eyes move from the left side towards the right side in the image. Notice the distance between the left most raw gaze points and the left most filtered gaze points Figure 4.3: Gaze points during smooth pursuit with exponential filter. Green points are raw gaze points for the right and fade to yellow the older they get. Red points are raw gaze points for left eye and fade to purple with age. White line consist of the filtered gaze points. The red and cyan coloured circles represent the distance thresholds. The mean calculation was done with 14 samples i.e 7 gaze point pairs. The eyes move from the top left corner towards the bottom of the image. Notice the distance between the lowest raw gaze points and the lowest most filtered gaze points 4.6 Lens 31 Figure 4.4: Gaze points during a saccade with static filter (upper) exponential filter (lower). Green points are raw gaze points for the right and fade to yellow the older they get. Red points are raw gaze points for left eye and fade to purple with age. White points consist of the filtered gaze points. The mean calculation was done with 12 samples i.e 6 gaze point pairs. The eyes move from the left most circle to the right most circle. will jump to the new gaze point, while with shorter saccades the gaze point will be weighted more towards the old gaze point. Moving it gradually towards the new gaze position. This filter produced rapid movement of the gaze point for both large and small saccades. Since the filter was so rapid it was given a larger boundary than the static filter. Figure 4.3 shows the raw and filtered gaze points during smooth pursuit with exponential filter. Comparing figure 4.2 and 4.3 it is noticeable that the exponential filter have a lower latency (produce gaze points closer to the latest raw gaze point) then the static filter. Figure 4.4 shows the raw and filtered gaze points during a saccade. Notice that it takes fewer exponentially filtered gaze points to reach the target then staticly filtered. 4.6 Lens Two types of lenses were developed for the application. One plane circular lens and one fish-eye spherical lens. This section will discuss how the lenses were 4 32 Gaze aware image viewing application, GazRaz Algorithm 2 exponential Filter 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: procedure exponentialFilter distance ← absoluteValueOf(newGazePoint - oldGazePoint); if distance < noiseThreshold then oldGazePoint is returned. else if distance < noiseThreshold * thresholdAmplifier then A exponentially weighted mean average gaze point between the oldGazePoint and the newGazePoint is returned. gazePoint ← getExponentiallyWeightedMeanAverageGazePoint(distance); else newGazePoint is returned. end end implemented and their pros and cons. 4.6.1 Double texture lens The plane lens was the first to be developed. The lens uses a double texture technique. To begin with a plane that covers the whole screen is placed in front of the camera. The plane is covered with a texture. Then the gaze point together with a zoom factor are used to calculate new texture coordinates for the zoomed in image. The texture coordinates are thereafter used to render a new texture. The new zoom texture and original texture is parsed by the shader that draws the view plane. The shader will draw the original texture on the plane except for region where the gaze point is located, there the new zoom texture will be drawn. The advantage of the plane lens is that it is accurate with texture coordinates and zoom amplitude. The position of the plane lens is accurate to the gaze point and the zoomed in area is at the right position to the background. Another advantage is that it is easy to resize without altering the zoom amplitude. Figure 4.5 and 4.7 shows an example of the plane lens. The disadvantage of the lens is as shown in figure 4.1, that the lens will hide some of the background image. See example 4.1. Example 4.1: Overview hidden by plane lens If a saccade fails to accurately target a small island in a map, there is a risk that the lens will hide the island from the viewer, and the only thing perceived is water. 4.6.2 Refraction lens A fish-eye lens was developed to help the user avoid scenarios such as example 4.1. The distorted boundary of the lens could possibly provide a clue to the user where the island could be. 4.6 Lens 33 The fish-eye lens was developed with a OpenGL sphere refraction10 technique. A sphere is placed in the 3D space in front of the view plane. The same zoom texture as produced for the plane lens is parsed to the refract shader programme. With the incident direction vector from the camera onto the spheres surface, the refraction vector is calculated. The refraction vector will be affected by the spheres surface normals, and create a glass like effect. The refraction vector is calculated with OpenGL refract11 . With a refraction index of 1.0/1.52. The refraction causes a distortional zoom effect when it bends the incident vector. Thus two layers of zoom affect the lens, the one from the zoomed texture and the one from refraction. The sphere can also be scaled to increase or shrink in size. Changing the "thickness", the width in z axis12 , of the sphere also affected the distortional zoom effect that can be either magnifying or minimizing. When the gaze position changes on the screen the sphere is translated to this position which in turn updates the zoom texture. Since the incident vector will change in direction depending on where in the space the sphere is located, the refractional zoom effect will also be affected. When the lens is located near the edges of the screen the refractional zoom is skewed towards the boundary of the lens in the direction towards the middle of the screen. The advantage of this lens is that, with the right settings, it can provide clues as to what information is on the boundaries of the lens. However in many situations the lens still covers part of the view plane. There are no properties to determine the distance between the input texture and the sphere which makes it difficult to use as a proper magnifying glass simulation. The skewing of the refractional zoom effect is also an unwanted feature since it will skew the magnified area away from the users gaze point. Figure 4.5 shows the two different lenses. Figure 4.6 is another example of the refract lens. 4.6.3 Dynamic lens size In section 2.1 it was stated that the human vision is limited to see sharply within a limited degree of the visual angle. With the eye trackers ability to measure the distance of the eyes from the eye tracker itself and with simple trigonometry it is possible to determine the appropriate lens radius. This provides the ability to continuously reshape the lens to fit a predefined visual angle. The cosine law gives equation 4.4. (See figure 4.8 for variable explanation). We assume the sides b and c are of equal length i.e. b = c, this gives equation 4.5. With equation 4.5 we can calculate the diameter α of the lens with the approximation that b has the same length as the distance provided between the eye tracker and the eye. α is chosen to be a reasonable size, between at least 1◦ (the size of the fovea) or max 26◦ (height of the screen). With this implementation the size of the lens will adapt to the distance between the user and the screen. This way the 10 Refraction refers to the change in direction of a light ray when entering another medium. For example from air to glass. 11 https://www.opengl.org/sdk/docs/man/html/refract.xhtml 12 In our case the z axis is directed out of the screen towards the viewer 34 4 Gaze aware image viewing application, GazRaz Figure 4.5: Plane lens (upper), with a zoom factor of 2.0. Fish-eye lens (lower). Both a placed above Europe on a high resolution worldmap image taken from shaderelief.com. 4.6 Lens 35 Figure 4.6: Refract lens over USA’s east coast. Texture image taken from glif.is, see appendix for additional information Figure 4.7: Plane lens over USA’s east coast. Texture image taken from glif.is, see appendix for additional information 4 36 Gaze aware image viewing application, GazRaz lens can continuously cover the same area of the retina independent of distance to the screen. For example the lens could be set to a size of 5◦ , which would be the same size as the parafoveal region. To avoid noise causing the lens to flicker in size a mean eye distance calculation was done. The mean calculation followed the method described in section 4.5.1. Figure 4.8: Triangle. a, b, c represent the side lengths. α, β, γ represent the angles. Image by David Weisman. Taken from wikipedia.org a= q a= 4.6.4 b 2 + c 2 − 2bc ∗ cos(α) q 2b 2 − 2b 2 ∗ cos(α)) (4.4) (4.5) Gaze point correction. When using gaze control to steer the lens it is common to want to look/saccade to some area already within the lens. Originally the the application would not take into consideration that the saccade was done within the lens thus moving the lens to look at the new gaze point as it would point to the background image. This caused the target to disappear out of the lens or to the other side of the lens. To avoid this issue a correction is made of the gaze point while looking inside the lens, so that the corrected gaze point targets what the user sees within the lens. Figure 4.9 describes the difference between the corrected and uncorrected gaze point. Algorithm 3 presents a pseudocode example for gaze point correction. Algorithm 3 Gaze Point Correction procedure gazePointCorrection xDistance ← getXDistance(newGazePoint, oldGazePoint); yDistance ← getYDistance(newGazePoint, oldGazePoint); if distance < lensRadiusNormalized and zoomFactor > 1.0 then gazePoint.x = (oldGazePoint.x + xDistance/zoomFactor) gazePoint.y = (oldGazePoint.y + yDistance/zoomFactor) 7: end 8: end 1: 2: 3: 4: 5: 6: 4.7 User interface 37 Figure 4.9: Illustration of gaze point correction for the plane lens. The right arrow shows where the lens would be moved to if there were no gaze point correction. The dotted line shows how the gaze point correction repositions the gaze point to target that area that the lens is showing at the moment. 4.7 User interface This section will discuss how the user interacts with the application and how feedback are provided back to the user. Keyboard, mouse and gaze control are used to interact with the application. Below is a list presented that describes how application attributes are controlled/changed by user input. • Connect/disconnect eye tracker. The eye tracker is connected/disconnected with a push of a keyboard button. • Move lens. When the eye tracker is connected the lens will continuously be moved to the users current gaze point, as long as he/she is holding down the left mouse button. When the left mouse button is released the lens will freeze at its current position. The eye tracker will however continue to stream gaze data and whenever the left mouse button is pressed again the lens will position itself at the current gaze position. When the eye tracker is unconnected the lens will move to and follow the position of the mouse cursor. • Change image. Keyboard buttons are used to switch in between the various test images. • Change filter. A keyboard button is used to switch between the filter methods. • Change lens type. Lens type can be changed with right mouse button or a keyboard button. The alternatives are plane lens, fish-eye lens or no lens. • Change lens size. Lens size is changed by pressing and holding a keyboard button and in the same time use the scroll-wheel of the mouse. When using 38 4 Gaze aware image viewing application, GazRaz the plane lens changing the size, will change the lens diameter in degrees of gaze angle. If the eye tracker is connected, the plane lens will automatically adapt its size depending on the distance of the users eyes, as described in section 4.6.3. • Change zoom factor. The zoom amplitude is changed with the scroll-wheel of the mouse. • Change noise threshold. Increased and decreased with keyboard buttons. • Change number of samples for mean gaze point calculation. Increased and decreased with keyboard buttons. • Show/hide gaze points. Both the filtered and the raw gaze points can be toggled to be shown, by pressing a keyboard button. The alternatives are, filtered, raw, filtered and raw or none. • Show/hide threshold boarders. The visual simulation of the threshold limits can be toggled to be shown by pressing a keyboard button. • Default size/zoom. By pushing a keyboard button the current lens is resized back to its original size and zoom factor is returned to its origin. To provide feedback to the user of the system status a transparent information window is placed in the top left corner of the screen, see figure 4.10. The window present information of: • Zoom factor. The zoom amplitude. • Lens size. Lens diameter in gaze angle. • Samples. number of gaze samples used for mean gaze point calculation. • Eye distance X, Y, Z. The mean distance of left and right eye, described in user coordinate system. • Fps. Fps stands for frames per second i.e. the update frequency for UI thread. • Eye tracker sps. Sps stand for samples per second i.e. the gaze data sample frequency of the eye tracker. • 5 rows of massages. Changes in the status of the application and or warnings are pushed to the message field. This way the user is provided feedback on the system status or is informed why a action was dismissed. A feature to warn users if they are about to leave the track box was also added. If the mean eye position in user coordinates reached outside the track box boundaries a warning was displayed at the gaze position. Telling the user to move in a direction towards the track box. Figure 4.11 shows the warning presented to the 4.7 User interface user when the mean eye distance have exceeded the max z and min y limit. 39 40 4 Gaze aware image viewing application, GazRaz Figure 4.10: The information window in the top left corner of the screen. Figure 4.11: The red text in the center of the lens is the direction the user is asked to move when he/she is too close to the track box border. The warning disappears as fast as the eye distance comes inside the allowed limit. 5 Evaluation To understand how potential users would would perceive the GazRaz application a evaluation was done. The evaluation method is called think aloud method and is discussed in section 5.1. The evaluation aims to provide a guidance on which features were successful and which settings were preferred. Further the evaluation aimed to compare the participants experience of using gaze control as interaction, compared to a conventional computer mouse. 5.1 Method The method used for the evaluation is knows as the think aloud method. For more information on think aloud method see van Someren and And [20]. After the think aloud test the participant was also asked to complete a questionnaire. The think aloud test is done one participant in at a time together with the test facilitator. To begin with the participant answers a couple of questions on their age, the uses of any vision correction, whether they have full ability to see colors and if they approve of audio recording during the test. The TX300 can track the eyes of a person wearing glasses and contact lenses, but as mentioned in section 3.1 it might influence the gaze data quality. The participant were introduced to the test, explained that the test is done to evaluate the usability of the GazRaz application and evaluate how they experience gaze control compared to using a mouse. It was explained to the participant that the test was not designed to evaluate their problem solving abilities but solely to understand how they use GazRaz to solve the tasks they were given during 41 5 42 Evaluation the test. The participants were also asked to talk out loud about what they were thinking and trying to do, to solve the tasks they were given. Thereafter the participants conducted the eye tracker calibration procedure. The calibration is done with Tobii Studio1 . The calibration type was set to regular with 9 red calibration points on a dark grey background. The calibration speed was set to normal. Then the participant was presented with the GazRaz application. They were guided on how they switch between lenses how they move the lens with the mouse and with gaze control, how they increase/decrease the zoom and increase/decrease the size of the lens. The participants are allowed to try the experiment with the application for minute or two. During the introduction the image in figure 5.1 was used. When the participant have tried the application for a short while he/she are asked whether he/she feel comfortable and understand how to change the zoom, resize the lens and use gaze-control/mouse to move the lens. Was the answer yes the test proceeded, otherwise the controls were further explained. When the test proceeded the image was changed to figure 5.2. The participant was explained to the background of the images and that the tasks they will be given might include finding a certain cable or city and that these tasks are not meant to measure their geographical knowledge but rather to see how they use GazRaz during searching tasks, it was also explained that they can ask to abort a task if they feel it is to difficult. Thereafter the participants were asked to do a series of task, involving finding cities, cables, follow lines etc. To avoid misunderstanding due to pronunciation, names were also shown to the participants. The tasks were structured to be solved with the plane lens or refracting lens with mouse or gaze control, with static or exponential filter. Half (4) of the participants started the test by using the mouse and the other half (3) by using gaze control. During the whole test the test-facilitator took notes of how the participant solved the tasks. After the tasks were completed the participants were asked to complete a questionnaire with 14 statements. The participants was asked to rank the statements on a scale between 1-7. 1-3 Strongly disagree, 4-5 Neither, 6-7 Strongly agree. 5.1.1 Participants 7 participants did the evaluation, 4 male and 3 female, in the ages of 22-32. All claimed to have a full ability to see colors. Two were wearing contact lenses during the evaluation. 6 participants managed to collect data for all nine calibration points during calibration. 1 http://www.tobii.com/eye-tracking-research/global/products/software/tobii-studio-analysissoftware/ 5.1 Method 43 Figure 5.1: The image used during the introduction to GazRaz. Image resolution 16200x8100 pixels. Taken from shadedrelief.com. Figure 5.2: The image used during the think aloud evaluation. The image shows a world map centred over America. On the map there are coloured lines illustrating GLIF infrastructure of international research and education network bandwidth. The network cables have names, and some cities are marked with points and names. Image resolution 16384x8192 pixels. Taken from glif.is, see appendix for additional information 5 44 5.1.2 Evaluation GazRaz settings During the evaluation the participant was free to change the zoom factor and lens size as she/he wished. The number of samples for the mean calculation was set to 12 for the static filter and 30 for the exponential filter. The noise threshold was linked together with the zoom factor. Which caused the noise threshold to increase as the zoom factor increased and vice versa. Dynamic lens size, discussed in section 4.6.3 was active for the plane lens only. Gaze point correction, as explained in section 4.6.4 was active for both lenses. The Fish-eye lens was limited to a maximal size of 12◦ , while the plane lens had no upper limit. 5.1.3 TX300 settings The TX300 where set in 300 Hz sample rate with default illumination mode. The room was illuminated with diffuse ceiling light. 5.2 Result When using the mouse all participants succeeded with their tasks. Using the fisheye lens three of the participants kept the original size of the lens which was set to approximately 8◦ of visual angle. The others increased the size of the lens to around 10-11◦ . When using the plane lens most participants increased the lens size to around 10◦ with around 3 x zoom factor. Some participants made attempts with big lens sizes up to 23◦ , but shortly afterwards decreased them. With gaze control activated most of the task was solved. The task most participants gave up on was a search and find task with fish-eye lens and exponential filter activated. When using gaze control with the exponential filter active, the participants had problems to stabilize the lens, especially together with the fisheye lens. The lens would easily start to oscillate around the target, the participant tried to fixate on. The oscillation decreased somewhat with the plane lens, but from the facilitators point of view the lens seemed shaky and unstable. Some participants described it as very hard to read when the lens followed the gaze point. Participants also described it as too sensitive. Participants seemed to be concentrating hard to keep the lens stable. Which also could influence them to move closer to the screen, causing the noise to increase due to being too close to the track box boarder. With the static filter the oscillation and shakiness decreased, and the participants seemed to be able to use gaze control more easily. After two to three tasks with gaze control 3 of the participants noted that it was easier to read if they would release the left mouse button casing the lens to freeze 5.2 Result 45 at its current position. They asked the facilitator if it was allowed to release the button or click it, thereby casing the lens to alternate between movement and stationary position, and it was explained that it was allowed to do so. The other 4 kept holding down the left mouse button during the whole test. Some participants complained that he/she wanted the lens to stand still but yet kept holding down the mouse button. One of the questions was to find the cable that the participant thought had the highest bandwidth, the bandwidth was stated next to the name. This caused a search and compare task. The participants that adopted the click and release method managed to find the highest value whilst the others all selected some other value. With gaze control activated some participants experimented with large lens sizes but most seemed satisfied with a lens size of around 10◦ . The zoom factor used was around 2-5 x. To summarize the result. The plane lens was preferred over the fish-eye lens. The exponential filter was more likely to cause oscillating behaviour and forced the participants to concentrate more. Some of the participants kept holding the left mouse button pressed during the whole test, making the lens constantly follow the gaze position. The participants expressed enjoyment in using gaze control, but the evaluation is not sufficient to make claims that gaze control is preferred over the use of a computer mouse. The task provided was however solvable with gaze control. 6 Conclusions and Future work This chapter will discuss the result of the evaluation in section 5.2 and the decisions made during development of GazRaz. 6.1 Discussion The results of the evaluation in section 5.2, indicate that even with quite uncomplicated gaze data processing it is possible to acquire a stable and accurate enough gaze position to use it to control the position of a magnifying lens. The evaluation partially aimed to explore how the participants would perceive the different aspects of GazRaz such as both lenses and filters. Which makes it difficult to directly compare the participants experience of mouse usage compared to gaze control. A more specific evaluation only comparing computer mouse and to gaze control could provide a more reliable result. There are some learnt abilities to take into account when reviewing the evaluation result. Some of the participants might have performed better and stressed their eyes less if the would have thought about releasing the left mouse button when they managed to place the lens at a target. Maybe a more thorough instruction to the GazRaz application, before the test started would have made them aware of that possibility. The reason why the left button was added as a gaze control delimiter, was to allow for free viewing within the lens when the target was hit. In my own experience there was difficulty to use continuous gaze control with a zoom factor greater then 6, thereafter the lens would tend to oscillate aggressively during attempted 47 48 6 Conclusions and Future work fixation. By using a low zoom factor to aim the lens at the target. Thereafter releasing the left mouse button to lock the lens to the target. The zoom and size could thereafter be increased without oscillation or need for intense concentration. In my experience this procedure can be done rapidly and effortlessly. However none of the participants fully adopted this process during the evaluation, but some used a click approach to make the lens make a jump to the current gaze position whenever they wished it to do so. Between the fish-eye and the plane lens, the plane lens was the one preferred by the participants. This might be because text becomes easier to read on the plane lens where the words are not distorted, see figure 4.6. My opinion is, however, that the idea of a fish-eye lens should not be discarded. A different implementation technique for a fish-eye lens then done in GazRaz could have provided a better result. When it comes to the exponential and static filter, the exponential filter proved to be difficult to use. Even though it used many more gaze point samples, 30 compared to 12 of the static filter, it moved jerkily (as do the eyes) and the participants had problem stabilizing the lens. With the static filter small saccades close to the lens would cause the lens to gradually move toward the new gaze point. The movement of the static filter was balanced such that it was responsive enough for the eye to understand the lens attention without the confusing jerky movements. In section 2.2.2 corrective saccades are discussed, and during development my experience was that corrective saccades are very common. With the static filter the first saccade will move the lens close (within the 5.44◦ discussed in section 4.5.2) to the target, the followed corrective saccade will cause the lens to gradually move towards the target. With the exponential filter the corrective saccade would be close to instantly followed by the lens. The mechanisms involved in corrective saccedes are still argued about, but it might be disturbing for the visual sense to have unexpected stimuli at the gaze point. Remember the quote from Hillstrom and Yantis [9] in section 2.1, “when motion segregates a perceptual element from a perceptual group, a new perceptual object is created, and this event captures attention” and “motion as such does not capture attention but that the appearance of a new perceptual object does.”. The idea of the adaptive lens size was that the lens size would adapt to the size of the sensor dense part of the vision as discussed in section 2.1. When the participants were free to choose lens size they tended to use a much larger lens size then the fovea or parafovea region would represent. Large lens sizes when using the mouse is understandable since the eye is then able to search freely within the lens and using gaze control with the click and release method could also prove successful with large lens size. With gaze control constantly activated the eye will mostly only be able to see the center of the lens, thus large lens size is unnecessary. The out of track box warning discussed in section 4.7 could have provided better warnings than it was designed to do during the evaluation. Three times a 6.2 Future work 49 participant positioned him/her self such that the eyes were on the border of the track box, which increased the noise. For an inexperienced participant this will be perceived as if the application is unstable or broken. An earlier warning could provide the feedback needed to inform the participant to reposition in the track box. Gaze point correction was not evaluated in chapter 5, but in my experience it increased usability. Since it made it possible to conduct saccades within the lens and target what you would expect to see and not the actual gaze point behind the lens. 6.2 Future work Further development of gaze data filtering algorithm. The static filter that proved successful in this thesis has a simple construction. Making the weighting and thresholds smarter and adaptable to the eyes movement state could provide improvements. For example detecting saccades and fixations and making the filter adapt to the behaviours. Špakov [21] wrote a paper comparing different gaze data filtering methods. The fish-eye lens implementation was relatively unsuccessful and could possibly be implemented in a better way. A better implementation could provide a solution for the overlapping of overview that the plane lens creates, as in example 4.1. In section 4.5.1 it was mentioned that GazRaz dose not use the latest gaze point sample due to thread safety. This could be solved by implementing a circle queue design principle. This way the latency would be reduced slightly. In our case with the TX300 eye tracker system the different would have been small because of the high sample rate. For a eye tracker system with lower sample rate this latency enhancement would have a greater impact. It would be interesting to review the result of an evaluation specifically comparing the efficiency of gaze control compared to computer mouse in GazRaz. Would gaze control improve efficiency if the participants had sufficient time to learn how to use it? It would also be interesting to evaluate how different type of images would be perceived with the GazRaz application. Images presenting other forms of data than the maps used in figure 5.2 and 5.1. For example text rich images or text free images, or images that demand high zoom factor etc. Appendix A GLIF, Additional information and Acknowledgements Acknowledgements - The Global Lambda Integrated Facility (GLIF) Map 2011 visualization was created by Robert Patterson of the Advanced Visualization Laboratory (AVL) at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign (UIUC), using an Earth image provided by NASA with texture retouching by Jeff Carpenter, NCSA. Data was compiled by Maxine D. Brown of the Electronic Visualization Laboratory (EVL) at the University of Illinois at Chicago (UIC). Support was provided by GLIF, NCSA/UIUC, the State of Illinois, and US National Science Foundation grants # OCI0962997 to EVL/UIC. For more information on GLIF, see http://www.glif.is/. Additional Information - The GLIF map does not represent all the world’s Research and Education optical networks, and does not show international capacity that is dedicated to production usage. The GLIF map only illustrates excess capacity that its participants are willing to share with international research teams for applications-driven and computer-system experiments, in full or in part, all or some of the time. GLIF does not provide any network services itself, and researchers should approach individual GLIF network resource providers to obtain lightpath services. 53 54 A GLIF, Additional information and Acknowledgements Bibliography [1] Accuracy and precision test method for remote eye trackers. Tobii Technology AB, 2.1.1 edition, February 2011. Cited on pages 1, 15, and 17. [2] Keith Rayner. Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124(3):372 – 422, 1998. ISSN 0033-2909. URL https://login.e.bibl.liu.se/login?url=http: //search.ebscohost.com/login.aspx?direct=true&db=pdh& AN=1998-11174-004&site=ehost-live. Cited on pages 2 and 7. [3] Christina F Lassen, Sigurd Mikkelsen, Ann I Kryger, and Johan H Andersen. Risk factors for persistent elbow, forearm and hand pain among computer workers. Scandinavian Journal of Work, Environment & Health, (2):122, 2005. ISSN 03553140. URL https://login.e.bibl.liu.se/login? url=http://search.ebscohost.com/login.aspx?direct=true& db=edsjsr&AN=edsjsr.40967478&site=eds-live. Cited on page 2. [4] Susana Martinez-Conde, Jorge Otero-Millan, and Stephen L. Macknik. The impact of microsaccades on vision: towards a unified theory of saccadic function. Nature Reviews Neuroscience, 14(2):83 – 96, 2013. ISSN 1471003X. URL https://login.e.bibl.liu.se/login?url=http: //search.ebscohost.com/login.aspx?direct=true&db=aph& AN=84942422&site=eds-live. Cited on pages 2, 5, 8, and 9. [5] Päivi Majaranta and Andreas Bulling. Eye tracking and eye-based human–computer interaction. Advances in Physiological Computing, page 39, 2014. ISSN 9781447163916. URL https://login.e.bibl.liu. se/login?url=http://search.ebscohost.com/login.aspx? direct=true&db=edb&AN=95560053&site=eds-live. Cited on pages 5, 7, 11, 12, 13, and 16. [6] Richard A. Bolt. Eyes at the interface. In Proceedings of the 1982 Conference on Human Factors in Computing Systems, CHI ’82, pages 360–362, New York, NY, USA, 1982. ACM. doi: 10.1145/800049.801811. URL http:// doi.acm.org/10.1145/800049.801811. Cited on pages 5 and 6. 55 56 Bibliography [7] Heiko Drewes. Eye gaze tracking for human computer interaction. March 2010. URL http://nbn-resolving.de/urn:nbn:de:bvb: 19-115914. Cited on pages 5, 6, 7, 8, 9, 11, 12, 14, and 15. [8] R. John Leigh and David S. Zee. Neurology of eye movements. Contemporary neurology series: 70. Oxford : Oxford University Press, 2006, 2006. ISBN 0195300904. URL https://login.e.bibl.liu.se/login? url=http://search.ebscohost.com/login.aspx?direct=true& db=cat00115a&AN=lkp.451202&site=eds-live. Cited on pages 7 and 8. [9] AnneP. Hillstrom and Steven Yantis. Visual motion and attentional capture. Perception & Psychophysics, 55(4):399–411, 1994. ISSN 0031-5117. doi: 10. 3758/BF03205298. URL http://dx.doi.org/10.3758/BF03205298. Cited on pages 7 and 48. [10] Fize Denis Thorpe Simon and Marlot Catherine. Speed of processing in the human visual system. Nature, pages 520–522, 1996. URL http://dx. doi.org/10.1038/381520a0. Cited on page 8. [11] Jorge Otero-Millan, Stephen L. Macknik, Alessandro Serra, R. John Leigh, and Susana Martinez-Conde. Triggering mechanisms in microsaccade and saccade generation: a novel proposal. Annals of the New York Academy of Sciences, 1233(1):107–116, 2011. ISSN 1749-6632. doi: 10. 1111/j.1749-6632.2011.06177.x. URL http://dx.doi.org/10.1111/ j.1749-6632.2011.06177.x. Cited on page 9. [12] Robert M. Steinman, Robert J. Cunitz, George T. Timberlake, and Magdalena Herman. Voluntary control of microsaccades during maintained monocular fixation. Science, 155(3769):pp. 1577–1579, 1967. ISSN 00368075. URL http://www.jstor.org/stable/1721006. Cited on page 9. [13] Robert M. Steinman, Genevieve M. Haddad, Alexander A. Skavenski, and Diane Wyman. Miniature eye movement. Science, 181(4102):pp. 810–819, 1973. ISSN 00368075. URL http://www.jstor.org/stable/1736402. Cited on pages 9 and 12. [14] Michael F. Land and Sophie Furneaux. The knowledge base of the oculomotor system. Philosophical Transactions: Biological Sciences, 352(1358): pp. 1231–1239, 1997. ISSN 09628436. URL http://www.jstor.org/ stable/56660. Cited on page 11. [15] Kenneth B. I. Holmqvist. Eye tracking : a comprehensive guide to methods and measures. Oxford : Oxford University Press, 2011, 2011. ISBN 9780199697083. URL https://login.e.bibl.liu.se/login?url= http://search.ebscohost.com/login.aspx?direct=true&db= cat00115a&AN=lkp.636826&site=eds-live. Cited on pages 12 and 14. [16] Robert J. K. Jacob. The use of eye movements in human-computer interac- Bibliography 57 tion techniques: What you look at is what you get. ACM Trans. Inf. Syst., 9(2):152–169, April 1991. ISSN 1046-8188. doi: 10.1145/123078.128728. URL http://doi.acm.org/10.1145/123078.128728. Cited on page 15. [17] Tobii tx300 eye tracker. http://www.tobii.com/Global/Analysis/ Marketing/Brochures/ProductBrochures/Tobii_TX300_ Brochure.pdf, 2014. Accessed: 2015-05-18. Cited on page 17. [18] Tobii Gaze SDK Developer’s Guide General Concepts. Tobii Technology AB, 2014. URL http://developer.tobii.com/. Cited on page 18. [19] Patrick Baudisch, Nathaniel Good, Victoria Bellotti, and Pamela Schraedley. Keeping things in context: A comparative evaluation of focus plus context screens, overviews, and zooming. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’02, pages 259–266, New York, NY, USA, 2002. ACM. ISBN 1-58113-453-3. doi: 10.1145/503376. 503423. URL http://doi.acm.org/10.1145/503376.503423. Cited on page 23. [20] Maarten W. van Someren and Others And. The Think Aloud Method: A Practical Guide to Modelling Cognitive Processes. 1994. ISBN 0-12714270-3. URL https://login.e.bibl.liu.se/login?url=http: //search.ebscohost.com/login.aspx?direct=true&db=eric& AN=ED399532&site=eds-live. Cited on page 41. [21] Oleg Špakov. Comparison of eye movement filters used in hci. In Proceedings of the Symposium on Eye Tracking Research and Applications, ETRA ’12, pages 281–284, New York, NY, USA, 2012. ACM. ISBN 978-1-45031221-9. doi: 10.1145/2168556.2168616. URL http://doi.acm.org/ 10.1145/2168556.2168616. Cited on page 49. 58 Bibliography Upphovsrätt Detta dokument hålls tillgängligt på Internet — eller dess framtida ersättare — under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/ Copyright The publishers will keep this document online on the Internet — or its possible replacement — for a period of 25 years from the date of publication barring exceptional circumstances. The online availability of the document implies a permanent permission for anyone to read, to download, to print out single copies for his/her own use and to use it unchanged for any non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/ © Sebastian Rauhala