Download Gaze control for detail and overview in image exploration Sebastian Rauhala LIU-ITN-TEK-A--15/046--SE

Document related concepts

Corrective lens wikipedia , lookup

Contact lens wikipedia , lookup

Dry eye syndrome wikipedia , lookup

Cataract wikipedia , lookup

Human eye wikipedia , lookup

Eyeglass prescription wikipedia , lookup

Transcript
LIU-ITN-TEK-A--15/046--SE
Gaze control for detail and
overview in image exploration
Sebastian Rauhala
2015-06-16
Department of Science and Technology
Linköping University
SE- 6 0 1 7 4 No r r köping , Sw ed en
Institutionen för teknik och naturvetenskap
Linköpings universitet
6 0 1 7 4 No r r köping
LIU-ITN-TEK-A--15/046--SE
Gaze control for detail and
overview in image exploration
Examensarbete utfört i Medieteknik
vid Tekniska högskolan vid
Linköpings universitet
Sebastian Rauhala
Handledare Matthew Cooper
Examinator Jimmy Johansson
Norrköping 2015-06-16
Upphovsrätt
Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –
under en längre tid från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår.
Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,
skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för
ickekommersiell forskning och för undervisning. Överföring av upphovsrätten
vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av
dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,
säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ
art.
Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i
den omfattning som god sed kräver vid användning av dokumentet på ovan
beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan
form eller i sådant sammanhang som är kränkande för upphovsmannens litterära
eller konstnärliga anseende eller egenart.
För ytterligare information om Linköping University Electronic Press se
förlagets hemsida http://www.ep.liu.se/
Copyright
The publishers will keep this document online on the Internet - or its possible
replacement - for a considerable time from the date of publication barring
exceptional circumstances.
The online availability of the document implies a permanent permission for
anyone to read, to download, to print out single copies for your own use and to
use it unchanged for any non-commercial research and educational purpose.
Subsequent transfers of copyright cannot revoke this permission. All other uses
of the document are conditional on the consent of the copyright owner. The
publisher has taken technical and administrative measures to assure authenticity,
security and accessibility.
According to intellectual property law the author has the right to be
mentioned when his/her work is accessed as described above and to be protected
against infringement.
For additional information about the Linköping University Electronic Press
and its procedures for publication and for assurance of document integrity,
please refer to its WWW home page: http://www.ep.liu.se/
© Sebastian Rauhala
Avdelning, Institution
Division, Department
Datum
Date
Medie och informations teknik
Department of Science and Technology
SE-601 74 Norrköping
Språk
Language
Rapporttyp
Report category
ISBN
Svenska/Swedish
Licentiatavhandling
ISRN
Engelska/English
⊠
Examensarbete
⊠
C-uppsats
D-uppsats
—
LiU-ITN-TEK-A—15/046—SE
Serietitel och serienummer
Title of series, numbering
Övrig rapport
2015-06-23
ISSN
—
URL för elektronisk version
-
Titel
Title
Ögonstyrning för detalj och översikt, i bild utforskning
Författare
Author
Sebastian Rauhala
Gaze control for detail and overview in image exploration
Sammanfattning
Abstract
Eye tracking technology has made it possible to accurately and consistently track a users
gaze position on a screen. The human eyes center of focus, where it can see the most detailed
information, is quite small at a given moment. The peripheral vision of humans have a much
lower level of details than the center of gaze. Knowing this, it is possible to display a view
that increases the level of resolution at the position of the users gaze point on the screen,
while the rest of the screen keeps a lower resolution. An implementation of such a system
can generate a representation of data with both detail and overview.
The results indicate that even with simple gaze data processing it is possible to use gaze control to help explore details of a high resolution image. Gaze data processing often involve a
compromise between stability, responsiveness and latency. A low latency, highly responsive
gaze data filter would increase the risk for lens oscillation, and demand a higher concentration level from the viewer then a slower filter would. Applying a gaze data filter that allowed
for smooth and stable lens movement for small saccades and responsive movements for large
saccades proved successfully.
With the uses of gaze control the user might be able to use a gaze aware application more efficient since gaze precedes actions. Gaze control would also reduce the need for hand motions
which could provide a improved work environment for people interacting with computer.
Nyckelord
Keywords
Eye tracker, Gaze control, Gaze data processing, detail, overview, Media technology, Medieteknik, Tobii, lens, ögonstyrning, ögontracking
Sammanfattning
Ögontrackingsteknik har gjort det möjligt att noggrant och kontinuerligt följa en
användares blickposition på en skärm. Det mänskliga ögats fokuseringspunkt,
vart den har förmågan att uppfatta den mest detaljerade informationen, är relativt liten för varje givet ögonblick. Periferisynen hos människor har betydligt
sämre förmåga att uppfatta detaljer än vad de centrala delarna besitter. Denna
kunskap gör det möjligt att konstruera en scen på ett sådant sätt att upplösningen är högre för det område blicken är riktat på för nuläget, medans omkringliggande områden behåller sin ursprungliga upplösning. Ett system med en sådan
implementering kan visa en representation i både detalj och överblick.
Resultatet indikerar att även med enkel blickdatabehandling så är det möjligt
att använda ögonstyrning för att utforska detaljer av högupplösta bilder. Blickdatabehandling involverar ofta en kompromiss mellan stabilitet, responsivitet och
fördröjning. Ett blickdatafilter med hög responsivitet och låg fördröjning ökade
risken för oscillerande linsrörelser, vilket krävde högre koncentration utav användarna än vad ett långsammare filter skulle ha gjort. Användandet av ett blickdatafilter som tillåter mjuka och stabila linsförflyttningar för små ögonrörelser och
responsiva linsförflyttningar för stora ögonrörelser, visade sig vara framgångsrikt.
Med användandet utav ögonstyrning kan en användare möjligtvis bli mer effektiv när han/hon använder ett blickmedvetet program eftersom blick ofta föregår
handling. Ögonstyrning kan också minska behovet utav handrörelser vilket skulle kunna förbättra arbetsmiljön för personer som arbetar med datorer.
i
Abstract
Eye tracking technology has made it possible to accurately and consistently track
a users gaze position on a screen. The human eyes center of focus, where it can
see the most detailed information, is quite small at a given moment. The peripheral vision of humans have a much lower level of details than the center of gaze.
Knowing this, it is possible to display a view that increases the level of resolution
at the position of the users gaze point on the screen, while the rest of the screen
keeps a lower resolution. An implementation of such a system can generate a
representation of data with both detail and overview.
The results indicate that even with simple gaze data processing it is possible to
use gaze control to help explore details of a high resolution image. Gaze data
processing often involve a compromise between stability, responsiveness and latency. A low latency, highly responsive gaze data filter would increase the risk
for lens oscillation, and demand a higher concentration level from the viewer
then a slower filter would. Applying a gaze data filter that allowed for smooth
and stable lens movement for small saccades and responsive movements for large
saccades proved successfully.
With the uses of gaze control the user might be able to use a gaze aware application more efficient since gaze precedes actions. Gaze control would also reduce
the need for hand motions which could provide a improved work environment
for people interacting with computer.
iii
Acknowledgments
I would like to express my sincere thanks to my supervisor Matthew Cooper and
my examiner Jimmy Johansson. They introduced me to an interesting field of information visualization and human computer interaction, and shared their good
ideas and knowledge with me. Our discussions and their feedback have been of
great help and I really appreciate it.
I would also like tank my good friend and classmate Philip Zanderholm for all his
help, encouragement and support during my years in Norrköping. Tanks Kahin
Akram for your company and support during my thesis work.
I take this opportunity to express gratitude to all of the Department faculty members for their help and support. I also thank my parents and siblings for the
unceasing encouragement, support and attention.
Norrköping, Juni 2015
Sebastian Rauhala
v
Contents
List of Figures
ix
Notation
xi
1 Introduction
1.1 The problem statement . .
1.2 Goals . . . . . . . . . . . .
1.3 Audience . . . . . . . . . .
1.4 Organization of the thesis
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
2
3
3
3
2 Human vision
2.1 Autonomy of the human eye
2.2 Eye Moments . . . . . . . . .
2.2.1 Fixations . . . . . . .
2.2.2 Saccades . . . . . . .
2.2.3 Microsaccades . . . .
2.2.4 Gaze accuracy . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
6
7
7
8
8
9
3 Eye tracking techniques
3.1 Types of eye tracking systems . . . . . . . . . . . . . .
3.2 Calibration of remote video-based eye tracker systems
3.3 Accuracy and precision . . . . . . . . . . . . . . . . . .
3.4 Eye tracking as human computer interaction . . . . . .
3.5 Tobii TX300 eye tracker . . . . . . . . . . . . . . . . . .
3.5.1 Tobii Gaze SDK . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
12
13
14
15
16
18
4 Gaze aware image viewing application, GazRaz
4.1 The idea . . . . . . . . . . . . . . . . . . . .
4.2 Hardware setup . . . . . . . . . . . . . . . .
4.3 Software setup . . . . . . . . . . . . . . . . .
4.4 Gaze data . . . . . . . . . . . . . . . . . . . .
4.5 Gaze data filtering . . . . . . . . . . . . . . .
4.5.1 Mean . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23
23
24
24
25
26
27
vii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Contents
viii
4.5.2 Static threshold, weighted mean filter . . .
4.5.3 Exponential threshold, weighted mean filter
4.6 Lens . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.1 Double texture lens . . . . . . . . . . . . . .
4.6.2 Refraction lens . . . . . . . . . . . . . . . . .
4.6.3 Dynamic lens size . . . . . . . . . . . . . . .
4.6.4 Gaze point correction. . . . . . . . . . . . .
4.7 User interface . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
28
29
31
32
32
33
36
37
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
41
41
42
44
44
44
6 Conclusions and Future work
6.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
47
49
A GLIF, Additional information and Acknowledgements
53
Bibliography
55
5 Evaluation
5.1 Method . . . . . . . . .
5.1.1 Participants . .
5.1.2 GazRaz settings
5.1.3 TX300 settings .
5.2 Result . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
List of Figures
2.1 A simple illustration of the scemantics of the human eye. Picture
by Rhcastilhos. From wikipedia.org. . . . . . . . . . . . . . . . . .
6
3.1 A illustration of the difference of between accuracy and precision
in eye tracker context. . . . . . . . . . . . . . . . . . . . . . . . . . .
14
3.2 TX300 eye tracker, viewed from the front. The purple circles under
the screen is light from the IR lamps captured by the camera. . . .
17
3.3 Illustration of the front of the TX300 eye tracker. Notice that the
camera represent the origin. . . . . . . . . . . . . . . . . . . . . . .
19
3.4 TX300 eye tracker view from the side. Notice that the eye tracker
and the z axis is angled with respect to the head. Head portrait
designed by Freepik.com. . . . . . . . . . . . . . . . . . . . . . . . .
19
3.5 Illustration of the tree coordinate systems used by Gaze SDK. . . .
20
3.6 Illustration of two eyes looking at a point at the active display. . .
21
4.1 Illustration of the a plane zoom lens for detail and overview. The
blue area is hidden behind the lens and cannot be seen. View plane
is the background image. . . . . . . . . . . . . . . . . . . . . . . . .
24
4.2 Gaze points during smooth pursuit with static filter. Green points
are the raw gaze points for the right eye and they fade towards
yellow the older they get. Red points are the raw gaze points for
the left eye and they fades towards purple with age. White line
consist of the filtered gaze points. The red, green and blue circles
represent the distance thresholds. The mean gaze point calculation
was done with 14 samples i.e 7 gaze point pairs. The eyes move
from the left side towards the right side in the image. Notice the
distance between the left most raw gaze points and the left most
filtered gaze points . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
ix
x
List of Figures
4.3 Gaze points during smooth pursuit with exponential filter. Green
points are raw gaze points for the right and fade to yellow the older
they get. Red points are raw gaze points for left eye and fade to purple with age. White line consist of the filtered gaze points. The red
and cyan coloured circles represent the distance thresholds. The
mean calculation was done with 14 samples i.e 7 gaze point pairs.
The eyes move from the top left corner towards the bottom of the
image. Notice the distance between the lowest raw gaze points and
the lowest most filtered gaze points . . . . . . . . . . . . . . . . . .
4.4 Gaze points during a saccade with static filter (upper) exponential
filter (lower). Green points are raw gaze points for the right and
fade to yellow the older they get. Red points are raw gaze points
for left eye and fade to purple with age. White points consist of
the filtered gaze points. The mean calculation was done with 12
samples i.e 6 gaze point pairs. The eyes move from the left most
circle to the right most circle. . . . . . . . . . . . . . . . . . . . . .
4.5 Plane lens (upper), with a zoom factor of 2.0. Fish-eye lens (lower).
Both a placed above Europe on a high resolution worldmap image
taken from shaderelief.com. . . . . . . . . . . . . . . . . . . . . . .
4.6 Refract lens over USA’s east coast. Texture image taken from glif.is,
see appendix for additional information . . . . . . . . . . . . . . .
4.7 Plane lens over USA’s east coast. Texture image taken from glif.is,
see appendix for additional information . . . . . . . . . . . . . . .
4.8 Triangle. a, b, c represent the side lengths. α, β, γ represent the
angles. Image by David Weisman. Taken from wikipedia.org . . .
4.9 Illustration of gaze point correction for the plane lens. The right
arrow shows where the lens would be moved to if there were no
gaze point correction. The dotted line shows how the gaze point
correction repositions the gaze point to target that area that the
lens is showing at the moment. . . . . . . . . . . . . . . . . . . . .
4.10 The information window in the top left corner of the screen. . . .
4.11 The red text in the center of the lens is the direction the user is
asked to move when he/she is too close to the track box border.
The warning disappears as fast as the eye distance comes inside
the allowed limit. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1 The image used during the introduction to GazRaz. Image resolution 16200x8100 pixels. Taken from shadedrelief.com. . . . . . . .
5.2 The image used during the think aloud evaluation. The image
shows a world map centred over America. On the map there are
coloured lines illustrating GLIF infrastructure of international research and education network bandwidth. The network cables
have names, and some cities are marked with points and names.
Image resolution 16384x8192 pixels. Taken from glif.is, see appendix for additional information . . . . . . . . . . . . . . . . . . .
30
31
34
35
35
36
37
40
40
43
43
Notation
Abbreviations
Abbreviation
hci
sdk
api
ip
ir
Meaning
Human Computer Interaction
Software Development Kit
Application Programming Interface
Internet Protocol
Infra Red
xi
1
Introduction
With an increasingly data dense society growing up around us, computers of all
sizes are becoming essential for information sharing. To perceive the fast and frequently varying information, we have a variety of computer screens and monitors
to our help, see for example 1.1.
Since a digital monitor is limited to a physical size and resolution, the user will
have to interact with the device to make it present the information that is of
interest at the moment. Human computer interaction (HCI) is often done with
a mouse and keyboard or the increasingly popular use of touchscreens1 . Other
forms of HCI could be by voice control, gesture sensors or eye tracking. Classical
use of mouse and keyboard has proven itself reliable and useful. Eye tracking
technology has made important leaps the last decade, making remote eye trackers
increasingly powerful, accurate and unobtrusive (Tob [1]). This provides for new
opportunities in HCI.
The term eye tracker can be used in different ways. From simple camera applications that try to determine whether there are any eyes present in the scene. For
example a mobile device that keeps its screen illuminated as long as the front
mounted camera can detect any eyes. Other more advanced systems have the
ability to not only find eyes in the scene but also determine an accurate gaze direction or gaze position of the viewer. In this thesis eye tracker is referring to a
device that is able to identify the eyes of a user and calculate their gaze position.
The progressive improvements of eye tracking technology makes it an interesting
complement to the classical ways of human computer interaction. A gaze aware
1 Touchscreen refers to a screen of an electronic device that responds to, or is able to track a finger’s
motion on the screen.
1
2
1
Introduction
application might help the user to perceive information faster, than they would
have if she used a mouse and keyboard. This thesis will further explore the use
of eye tracking as a form of human computer interaction, and how the technique
could be used in an image viewing application.
Example 1.1: MRI scanners
Medical MRI scanners will produce a large amount of high resolution images.
To fully diagnose the patient an expert needs to do a thorough investigation of
the produced images. To review the vast amount of information the medical
expert uses digital monitors. The limits of the display make it necessary for the
medical personnel to interact with the device, to be able to navigate and zoom
into interesting parts of the data.
1.1 The problem statement
Vast amounts of graphical data are produced by various machines and systems.
Whether it is medical scanners, astronomical observatories or security systems,
the graphical output will be presented to a human viewer. Even though digital
monitors are evolving together with its graphical information, they will continue
to be limited to a physical size and a maximum resolution.
The human eye is evolved in such a way that we perceive the finest visual information in a small region, the Fovea. The fovea is the central 2 degrees of vision.
Outside the fovea the visual accuracy becomes poorer, see Rayner [2]. As a result
humans will be limited to studying a small part of a large digital monitor at any
given moment.
Modern systems and sensors can produce such high resolution imagery that it
is necessary to be able to zoom and navigate through the image. With the help
of modern eye tracking technology interaction with the information could be improved. A software application that knows where on the display the user is viewing could enhance the resolution at the given gaze position, thus giving the user
the ability to process the image in detail without losing the overall perception. A
gaze aware application could also be beneficial in efficiency without the need to
drag a mouse pointer to the acquired position.
Some people that spend large parts of their days interacting with a computer
using common tools such as keyboards and mouse claim to experience pain in
hands and forearms, see Lassen et al. [3]. Using the eyes to interact with computers could reduce the need for hand movement, which in turn should reduce the
stress to which the hands and arms are exposed. The human eye constantly moves
in its normal state (Martinez-Conde et al. [4]) so tracked or not they should not
be strained when interacting with computers. That is if we not change the way
we use our eye when we become aware that they are tracked.
This thesis aims to explore how a gaze aware image viewing application, can be
1.2
Goals
3
developed to provide the user with the abilities of gaze control.
1.2 Goals
This thesis was motivated by the following goals:
• Research and conclude the advanced processes that are involved in human
vision. What key components are involved in the human visual perception, and how could this knowledge be used to build a responsive and user
friendly gaze aware application.
• Research eye tracking technology. Review the different types of eye tracking technology. In which situations are they suitable for human computer
interaction? What information can an eye tracker provide to its user and
what are their limitations?
• Dynamic resolution of a high quality image. A high resolution image is, in
many cases, viewed through a digital monitor of lower resolution than the
image original. To fully review the image the user must zoom into, and
navigate around in the image. A gaze aware application could possibly
increase the efficiency of this processes by dynamical zooming the image
at the center of the gaze position, instead of relying on the need for hand
gestures.
• Non-intrusive gaze interaction. A gaze aware application should aim to
help the user interact in such a way that it feels natural, easy to use and
provide a boost in efficiency.
1.3 Audience
This thesis will be addressed to people interested in the field of gaze tracking and
the development of gaze aware applications. Previous knowledge of software
engineering and calculus are advantageous for understanding this thesis.
1.4 Organization of the thesis
Chapter 2 discusses the human vision and explains important parts of eye movements. Chapter 3 discusses eye tracking technology with focus on video-based
eye tracking. It further describes the TX300 eye tracker system that was used
during this thesis work, and the software development kit (SDK) that was used
together with the system. Chapter 4 describes the GazRaz application and how
it was designed. Chapter 5 discusses the evaluation of GazRaz and the result.
Chapter 6 provides a discussion of the evaluation result and the design of GazRaz,
together with a discussion of future work.
2
Human vision
To accomplish a useful gaze aware application it is necessary to understand the
components involved in human vision. Vision is arguably one of the humans
most important senses. It has evolved to be an instrument that, with high precision, can describe the world around us. Beyond an input device, the eyes also
serve for communicative purposes, see Majaranta and Bulling [5]. Because of the
human eyes fovea construction, we are forced to direct our gaze at the object we
want to see clearly. This behavior gives humans the ability to seamlessly communicate their direction of interest. The case explained by Bolt [6] in example
2.1, highlights the natural instinct in humans to use gaze as an extra dimension
in communication. If a computer has the ability to understand when its being
looked at it could also act thereafter. For example illuminate its screen or activate
a listener for voice control. How a person use their gaze for communication can,
however, vary depending on personality, cultural references or even emotional
state, see Drewes [7].
The book “Neurology of eye movements” by R. John Leigh and David S. Lee gives
a comprehensive review of the human visual system and how disorders can have
an affect on it. The article “The impact of microsaccades on vision: towards a
unified theory of saccadic function.” by Martinez-Conde et al. Provides a good
description of microsaccades1 in the human vision.
1 Microsaccades is small unconscious eye movements, often defined to less than 1 degree (MartinezConde et al. [4])
5
2
6
Human vision
Example 2.1: Gaze in communication
Bolt [6] made an example where he pronounced that humans use their gaze to
express to whom they communicate. “Consider the case of asking the question
"What is your favorite sport?" in the presence of several people, but looking at
Mary, say, not at Frank, Judy, or Dave. Then you utter the selfsame words, but
now looking at Dave. The question is a different question; the difference lies not
in its verbal component, but in its intended addressee as given by eye.”
2.1 Autonomy of the human eye
The eye can, from an engineering point of view, be described as a photo sensor
with stabilizing muscles. Three pairs of stabilizing muscles are used to give the
eye the ability to move horizontally, vertically and to rotate around the axis pointing out of the pupil. This provides the eyes 3 degrees of freedom and is sufficient
to compensate for all movements of the head. To stabilize head moments the
nerves controlling the eye muscles are closely connected with the equilibrium
organ located in the ear (Drewes [7]).
Figure 2.1: A simple illustration of the scemantics of the human eye. Picture
by Rhcastilhos. From wikipedia.org.
Figure 2.1 illustrates a schematic diagram of the human eye. Light enters the eye
through the pupil. The area surrounding the pupil, the iris, is responsible for
the light admission and is able to adapt to changing light conditions. The light
sensitive area of the inside of the eye is called the retina. The retina stretches in
an ellipsoidal manner with a horizontal extent of 180 degrees and 130 degrees
vertically (Drewes [7]).
It should also be noted that the fovea is not located directly in line with the pupil,
2.2
Eye Moments
7
but 4 to 8 degrees higher up (Drewes [7]). This means that the optical axis (papillary axis) does not fully correspond to the line of sight.
For a clear vision of an object, its image needs to be held steady on the retina and
kept close to the center of the fovea. Leigh and Zee [8] explain that, at a 2 degree
distance from the fovea, the visual acuity has declined by about 50%. Further
more Leigh and Zee [8] state that for best perception of an object it needs to be
within 0.5 degrees of the center of the fovea. The visual field outside the fovea
can be separated into parafoveal and peripheral. The fovea covers the central 2
degrees of vision, parafovea extends to 5 degree, see Rayner [2]. The remaining
peripheral vision supplies cues about where the eye should look next and also
provides information on moments or changes that occur in the scene (Majaranta
and Bulling [5]). For example Hillstrom and Yantis [9] argue that “when motion
segregates a perceptual element from a perceptual group, a new perceptual object is created, and this event captures attention” and “motion as such does not
capture attention but that the appearance of a new perceptual object does.”.
2.2 Eye Moments
The human eyes move and shift gaze direction frequently. Because of the foveal
construction of the eyes, humans need to shift their gaze to the object they want
to study. When there are no moving objects in the view and the head is in a
stable position the eye changes gaze direction in abrupt movements. These jerky
eye movements are known as saccades. Right after a saccade the eye needs a
moment to fetch the visual information before another saccade can be achieved.
This time for rest is called a fixation. When the eye is fixating on an object, but the
body or head is moving, the eye muscles can counteract the motion to maintain a
stable projection on the retina. Such action is known as smooth pursuit. Smooth
pursuit also include when the eye strive to maintain a fixation on a moving or
accelerating object.
The eyes movements can be summarized to two main types: those that stabilize
the gaze to preserve steady images on the retina, and those that shift gaze direction to explore new objects (Leigh and Zee [8]).
The rest of the section will discuss fixations, saccades and gaze accuracy in more
detail.
2.2.1
Fixations
The eyes need a stable exposure on the retina for image recognition. For this
purpose the eyes fixate and oppose drift and disturbance of the eye. Fixations can
be characterized as pauses that last at least 100 ms, but typically in the interval of
200 to 600 ms (Majaranta and Bulling [5]). Depending on fixation identification
technique, some authors present different results of fixation duration. Rayner [2]
experimented on eye movement when reading and suggested fixation lengths of
2
8
Human vision
100 to 400 ms.
The human eyes stabilizing muscles are good at counteracting head and body
movements, but they are not stable enough to hold the eyes completely still. Even
if we attempt to fixate our gaze to a steady point, small ocular motions will shift
our eye position (Martinez-Conde et al. [4]). The gaze instability during fixation
can be separated into three main components, high-frequency low-amplitude
tremor, microsaccades and slow drifts. The frequency of the tremor can reach
up to 150 Hz and its amplitude is less than 0.01 degree, which corresponds to
less then the size of one photo receptor (Leigh and Zee [8]).
2.2.2
Saccades
Saccades are the rapid motions the eye makes when shifting the gaze direction
between fixation points. Saccades include a range of ocular behaviors, and can
be activated both voluntarily and involuntarily (Leigh and Zee [8]). There are
multiple roles for saccades in vision: they correct gaze errors, foveate targets and
are used during visual search (Martinez-Conde et al. [4]).
Saccades show a consistent relation between the speed, size and duration. The
further saccade angle the longer duration will it need. Leigh and Zee [8] argue
that even large saccades, with an amplitude up to 30 degrees, do not last much
longer then 100 ms. Which is close to the response time of the visual system.
According to Thorpe Simon and Catherine [10] the time from a visual stimulus
to a reaction in the brain is roughly 150 ms. This implies that no visual feedback
has time to reach the brain during the saccade. The brain must therefore calculate
the predicted motions needed to translate the gaze onto a new target, before the
saccade begins. If the eye fails to hit the desired target a corrective saccade is
usually done with a latency of 100 to 130 ms (Leigh and Zee [8]). Leigh and Zee
[8] further mentioned that saccades can do corrective motions before the initial
saccade is complete. These corrective responses are believed to be trigged without
any visual feedback. This is supported by experiments showing that corrective
saccades can be conducted even in complete darkness (Leigh and Zee [8]).
The duration of a saccade can be approximately described as linearly related to
their amplitude of movement, if it is within the range of 1 to 50 degrees (Leigh
and Zee [8]). There are however a couple of different mathematical models suggested by various authors, to describe the relation between saccade duration and
saccade distance. Some also argue for and against the use of target size as a factor
for saccade duration. For more insight and discussion about these models see
Drewes [7] as a pointer to literature.
2.2.3
Microsaccades
During an attempted fixation the eye still produces low amplitude saccades known
as microsaccades. These saccades can occur in a frequency of 1-2 times per second during fixation. The purpose of the microsaccades are still debatable, but
2.2
Eye Moments
9
they are believed to share a common generator with larger saccades, see MartinezConde et al. [4].
The amplitude of a microsaccade has previously, before the 1990s, been defined
to be around 12 arc min (0.2 degree). Later studies, however, show microsaccades
frequently exceeding this limit. Instead a microsaccade magnitude distribution
of around 1 degree is suggested, see Martinez-Conde et al. [4].
Martinez-Conde et al. [4] describe the distinction between saccades and microsaccades during free viewing or visual search as “saccades produced during active exploration may be considered ‘regular saccades’, and saccades produced in
the fixation periods between exploratory saccades may be considered ‘microsaccades’”
Even though the exact mechanism that triggers microsaccades is unknown (OteroMillan et al. [11]). Steinman et al. [12] showed that conscious attempts to fixate
could decrease the rate of microsaccades. Steinman et al. [13] further confirmed
the evidence for saccade suppression, and also stated that voluntary saccades can
be the size of microsaccades. The average of these voluntary “microsaccades”
where 5.6 arc min (0.093 degree) with a standard deviation of < 3.0 arc min (0.05
degree).
This conclude that it is not sensible to treat all small saccades as unwanted gaze
disturbances.
Steinman et al. [13] reason that “The frequent occurrence of miniature saccades
during maintained fixation may merely confirm something we already know. Human beings are uncertain and human beings are curious. After all, how can you
be sure that you are really looking exactly at a target, and how can you resist the
temptation to look for something more interesting nearby?”
2.2.4
Gaze accuracy
The eyes high resolution vision (fovea region) covers 1 degree of the visual angle.
The size of the area is commonly described to approximately be the same size
as thumbnail when the arm is fully stretched. If a viewed object fits within this
area, it is sufficient for the eye to have its projection somewhere on the fovea. The
object does not necessarily have to be in the center (Drewes [7]). Drewes [7] also
argues that there is no need for the eye to position the gaze more accurate. Since
the target will be perceived clearly as long as it is within the fovea.
Drewes [7] concludes that, due to ocular disturbance and the size of the fovea,
the gaze accuracy is limited to around ±0.5 degrees. Which also means that eye
tracker technology will, at its best, produce the same amount of accuracy as the
eyes themselves.
3
Eye tracking techniques
A lot of research has been done in the field of eye tracking. The earliest work
dates back to the 18th century, (Drewes [7]). An essential part of the eye tracking research has been towards understanding human visual perception and the
physical movement of the eyes.
Eye tracking has a history through medical and psychological research as a important tool to study human visual behavior. Technology has since its early stage,
developed to be increasingly accurate and non-intrusive, which has made it applicable in new fields such as accessibility to help the handicapped, or market
advertisement testing. New affordable eye tracking systems will also provide
the possibilities of gaze aware applications to a wider audience. Modern affordable eye tracking systems might provide the motivation to expand to new user
domains, even becoming an established way of human computer interaction.
Humans cognitive processes are reflected in our gaze behavior, which can provide hints of our thoughts and intentions, see Majaranta and Bulling [5]. Land
and Furneaux [14] stated that humans look at things before acting on them. By introducing gaze data to a software application, it could adapt to the users interest
or intention and increase the efficiency of the software.
This chapter will provide a foundation to understand of eye tracking technology.
Discuss how it could make or wreak human computer interaction.
11
12
3
Eye tracking techniques
3.1 Types of eye tracking systems
There are various sets of techniques to track the motions of the eyes. The most
direct method is to place sensors in direct contact with the eye. For example
attaching small levers to the eyeball, as in the experiments of Steinman et al. [13].
This method is however not recommended because of the high risk of injuries
(Drewes [7]). An improved way of applying sensors to the eye is through the use
of contact lenses. The lens could hold a integrated coil, that allows measurement
of the coil’s magnetic field, which in turn can be used to track eye movement.
The wire needed to connect the lens to the rest of the gear could, however, be
annoying for the user. The advantage of the method is its high accuracy and
nearly unlimited resolution in time (Drewes [7]).
Another method for eye tracking is electrooculography. This method uses multiple sensors attached to the skin around the eyes. The sensors measure the electric
field of the eye which is an electric dipole, see Drewes [7]. One advantage of this
method is that it allows for eye tracking with the eyes closed or in complete darkness. Which makes it usable when studying eye movement during sleep. Both
these methods are most suitable for scientific research and not for human computer interaction due to its obtrusive nature.
A non intrusive technique suitable for HCI is video-based eye tracking. Some
video-based systems even allow for modest head movement as long as the head
stays within the systems “trackbox”1 . Video-based eye tracking systems can be
further subdivided into different subcategories, and be remote or head-mounted.
The video-based tracker systems uses, as the name reveals, a video camera that,
together with image processing, is able to estimate the gaze direction. In remote
systems the camera is typically placed below the computer screen that is to be
tracked. On head-mounted systems the camera is either mounted to the frame
of eyeglasses or to a helmet. The frame rate and the resolution of the tracker
camera will have a significant effect on the accuracy of the tracking (Majaranta
and Bulling [5]). There are also a number of other factors that could have an effect
on the quality of the tracker data such as eyeglasses, lenses, droopy eyelids, light
conditions or even makeup. For more information on and what to consider when
setting up a video-based eye tracking environment or experiment see section 4.4
in Holmqvist [15].
Many trackers aim to detect the pupil of the eye, to calculate the gaze direction.
There are two methods of illumination methods to detect the pupil, referred to
as the dark and the bright pupil method. The bright pupil method uses infrared
light directed towards the eye to create a reflection on the retina. The reflection
can be detected by the camera but is not visible to the human eye. The effect is
similar to that of “red eye” when taking a photograph with flash activated. To use
this method infrared lamps need to be mounted close to the eye tracking camera.
1 Trackbox is the volume space in front of the eye tracker camera where the system is able to detect
the eyes.
3.2
Calibration of remote video-based eye tracker systems
13
Dark pupil method use image processing to locate the dark position of the eyes
pupil. This can be problematic if the hue of the pupil and iris are close.
Tracker systems that are based on visible light and pupil center tracking tend to
have accuracy issues and be sensitive to head movement (Majaranta and Bulling
[5]). To address this issue a reflective glint on the cornea of the eye is used. The
reflection is cased by infra red (IR) light aimed on- or off-axis at the eye. An
on-axis light will result in a bright pupil effect and off-axis will result in dark
pupil. By measuring the corneal reflection(s) from the IR light relative to the
pupil center, the system is able to compensate for the inaccuracies and also allow
for modest head movement (Majaranta and Bulling [5]). Since the cornea has a
spherical shape, the reflective glint stays in the same position for any direction
of the gaze. The gaze direction can therefore be calculated by measuring the
changing relationship between the moving pupil center of the eye and the corneal
reflection. As the physical size and features of the eye differs from person to
person a calibration procedure is needed. The calibration adjusts the system to
suit the current user.
3.2 Calibration of remote video-based eye tracker
systems
For a remote video-based eye tracking system to accurately map a user’s gaze
onto the tracked screen the system needs to be calibrated for each user. This is
normally done by presenting a grid of calibration points on the screen. In general more calibration points generate better accuracy. In research nine calibration
points are commonly used in a 3x3 grid, but software used in non experimental
setup could cope with fewer, and still have a good enough accuracy. The accuracy
will also be favored if the calibration conditions are the same as the application
conditions. During the calibration procedure the user consecutively fixate at the
calibration points one after another. The relationship between the pupil center
position and the corneal reflection during a fixation changes as a function of the
eye gaze direction, see Majaranta and Bulling [5]. The images of the eye orientation when looking at each of the calibration points are used to analyse the correspondence to screen coordinates. Now the system knows the eyes gaze direction
when it is looking at certain points in the screen. By interpolation of these key
points the system is thereafter able to estimate any gaze point on the screen.
Calibration is essential for the accuracy of an eye tracker system. Its properties
can be summarized by the number of points, placement of calibration points and
conditions during calibration. Depending of the purpose of your application,
you could design calibration to suit it. If an eye tracker is meant to be used by
the same person all day, it would make sense to conduct a more thoroughgoing
calibration. With the benefit of better accuracy. Whilst an eye tracker that is
supposed to be used by multiple users every hour would be favored by having
as simple and short a calibration procedure as possible, to enable fast and easy
14
3
Eye tracking techniques
access to the application.
There exist eye tracker systems that by different techniques tries to avoid the
need for calibration. Drewes [7] reasoned that these systems will still have to
be calibrated in some sense, due to the uniqueness of the eyes. As mentioned
in section 2.1, the fovea is not located directly opposite the pupil but instead a
few degrees higher. This individual difference is difficult to measure without a
calibration. Hence a calibration free system is unlikely to be developed. The
calibration procedure could, however, be reduced and hidden to the user.
3.3 Accuracy and precision
Accuracy and precision are two important metrics to describe an eye tracker’s
performance. Figure 3.1 describes the difference between precision and accuracy.
Accuracy refers to how closely the eye tracker is able to track the exact gaze position. The precision on the other hand indicates the gaze point consistency during
a completely steady fixation. In the best case the gaze point samples should be
exactly in the center of the target and have a minimal spread. Since the human
eyes are never completely still and not necessary fixating at the exact position
the mind is trying to explore, synthetic eyes are used when calculating the performance of an eye tracker.
Figure 3.1: A illustration of the difference of between accuracy and precision
in eye tracker context.
The accuracy is calculated as the root mean square distance between gaze data
sample and the targeted stimuli. The distance is thereafter transformed from distance in length to gaze angle with the help of the known distance between the eye
and the screen. Precision is calculated as the standard deviation of data samples
or as root mean square of inter-sample distances (Holmqvist [15]). Precision dis-
3.4
Eye tracking as human computer interaction
15
tance is also transformed to gaze angle. For more information on how to calculate
eye tracker performance see Tob [1].
Many commercial eye trackers state an accuracy of about 0.5 degree, for example
the eye tracker used in this thesis, Tobii TX300, has a majority of participants
with an accuracy distribution of 0.4 degree during ideal conditions. Drewes [7]
argues that it is difficult to archive higher accuracy due to the uniqueness of the
eye. As discussed in section 2.2.4, the eye does not need to be more accurate than
to have the target somewhere within the fovea. Therefore eye trackers are also
constrained to the accuracy of the eyes themselves.
It should also be stated that small often unattended ocular motions such as microsaccades, tremor and drift will have an effect on the accuracy of the eye tracking system. Disturbances during calibration will result in poorer accuracy. Microsaccades can have a noticeable amplitude up to 1 degree and occur frequently,
but are not necessary produced deliberately by the viewer. The user of an eye
tracker system can therefore experience the system as noisy even though the noise
is produced by the eyes themselves.
3.4 Eye tracking as human computer interaction
With todays use of the computer mouse to interact with computers. Many users
are able to pick a target to the closest pixel, even though it might require some
concentration. With a quality eye tracker and during optimal conditions it is
still going to be difficult to come closer than 0.5 degree of the target, which at a
distance of 650 mm is 5.7 mm away. So for this reason it is not suitable to replace
the mouse with an eye tracker without major changes to the user interface.
The advantage of using an eye tracker compared to a mouse is that the eyes often
gaze at an object prior making an action on it. For example finding a button
before pushing it. Eye tracking could give the user the ability to act on the target
without having to find the mouse cursor and drag it to the target. Which could
enhance the efficiency of the application and reduce the need of hand motions.
Using gaze as an input method could also introduce problems. Since using the
eyes for both perception and control, could induce unwanted actions to be made.
It is difficult to distinguish automated eye behavior from knowingly altered eye
behavior. Which means using only gaze or gaze plus eye twitching to control a
software could prove unsuccessful.
To quote Jacob [16] “The most naive approach to using eye position as an input
might be to use it as a direct substitute for a mouse: changes in the user’s line of
gaze would cause the mouse cursor to move. This is an unworkable (and annoying) approach, because people are not accustomed to operating devices just by
moving their eyes. They expect to be able to look at an item without having the
look “mean” something. Normal visual perception requires that the eyes move
about, scanning the scene before them. It is not desirable for each such move to
16
3
Eye tracking techniques
initiate a computer command.”
There is a common term “Midas Touch” used in HCI community to describe
the problem. Referring to Greek mythology, where King Midas was blessed and
doomed to turn everything he touched into gold. To avoid this problem a second
source of interaction is often used such as a push of a button. This way the user
is able do study an object without creating actions. Another way is to use a timer
function for the fixation, and if the fixation stays long enough an action is executed. This method is called dwell time. The dwell time can be longer or shorter
depending in the experience of the user. For example eye typing application can
use dwell time to distinguish key search from key push. Majaranta and Bulling
[5] argue that, when using gaze initiated selection, it is important to provide corresponding feedback to the user. By pressing a button the user makes a selection
and physically executes it. When using dwell time the user only initiates the action and the system executes it after the time has elapsed. Approximate feedback
to the user is necessary to provide an understanding of the system. For example,
feedback, as to whether the system has recognized the correct target, is about
to do an action, and when the action will be executed. This could be done by
highlighting the target and providing a progress-bar of the dwell time.
To summarize eye tracking have a good potential to be used in HCI. A gaze aware
application should however be adapted to the use of an eye tracker. Picking task
with an eye tracker will be less accurate than with a computer mouse. Which
makes it necessary to design the user interface to be able to handle such conditions. The user should also be able to control or at least be aware of when the
gaze is used to initialize actions.
3.5 Tobii TX300 eye tracker
During the development and evaluation of the gaze aware image viewing application GazRaz, the Tobii TX300 eye tracker system was used together with an
HP PC. The TX300 is a video-based eye tracker. It uses the relative positions
of corneal reflection cased by its infra red lamps to estimate the gaze angle as
discussed in section 3.1. The TX300 uses the dark pupil method which is also
explained in section 3.1. The system needs a recalibration for each new user of
the system.
The TX300 is a high quality eye tracker system with the ability to collect gaze
data up to a rate of 300Hz for each eye individually. Its accuracy is claimed to
be 0.4◦ for binocular data, during ideal conditions. Binocular data is the average
of the two eyes. Ideal conditions for the system is when the users head stays in
the middle of the track box, at a distance of 65 cm from the eye tracker camera
and with a illumination of 300 lux in the room. Precision is stated to be 0.07◦ for
binocular data without any filter, and at a distance of 65 cm. Precision is calculated as root mean square of successive samples. In table 3.1 some of the system
specification is stated. For further information on the TX300 specifications see
3.5
Tobii TX300 eye tracker
17
Table 3.1: Relevant specifications for the TX300 eye tracker
Type
Values
Accuracy
Accuracy, Large angle
0.4◦ At ideal conditions, binocular
0.5◦ At 30 ◦ gaze angle, binocular
Precision
0.07 ◦ Without filter, binocular
Sample rate
Sample rate variablility
300, 250, 120, 60 Hz
0.3%
Total system latency
<10 ms
Head movement
Operation distance
37 x 17 cm. Freedom of head movement
at a distance of 65 cm
50-80 cm
Max gaze angle
35 ◦
TX3 [17]. For information on how the specifications are calculated see Tob [1].
Figure 3.2: TX300 eye tracker, viewed from the front. The purple circles
under the screen is light from the IR lamps captured by the camera.
The TX300 has an integrated computer screen that is used to map the gaze data
upon. The screen can be dismounted, and the eye tracker part can be used separately, or with other screens. As long as the system is configured correctly and
the screen fits within the trackable area. However during this project only the
accompanying screen was used. Figure 3.2 shows the TX300 eye tracker as seen
by the user. With the maximal sampling frequency of 300 Hz the age between
each sample will be 1/300 = 0.0033 s, which translates to 3.3 ms. With the total
system latency added the “youngest” gaze sample will be <13.3 ms old.
3
18
Eye tracking techniques
Table 3.2: Relevant specifications for the TX300 TFT monitor
Type
Values
Screen size
23 inch (509 mm x 286 mm)
Resolution
Pixels per millimeter
1920 x 1080 pixels
3.775 ppmm
Respons time
5 ms typically
From the specifications in table 3.2 and 3.1, the accuracy in gaze angle of the eye
tracker can be translated to a distance in millimetres and in pixels. With simple
trigonometry an accuracy of ±0.5◦ translates to a distance of ±5.67 mm or ±21 px
at a distance of 650 mm. Which is a noticeable error if a task demands perfect
accuracy. Equation 3.1 l is the length, d the distance and α the gaze angle.
l = d ∗ tan(α)
(3.1)
Figure 3.3 and 3.4 illustrates the TX300 eye tracker system. The eye tracker camera is located in the center right below the screen. If the user is looking straight
into the camera, the gaze angle is 0◦ , but looking at the top of the screen above the
camera the gaze angel would be 26.4◦ , at a distance of 650 mm from the camera.
The top left and right corners would render a gaze angle of 32.4◦ , which is within
the max gaze angle limit of 35◦ . The whole screen will be within the trackable
area and can be used for the application.
3.5.1
Tobii Gaze SDK
Gaze SDK provides a low-level API for accessing eye tracker data from the TX300,
and other Tobii eye trackers. Being low-level means it gives the developer full control of how to mount the eye tracker and how to process and use the data. It also
means that there is no finished functionality for calibration and data processing.
The rest of this section describes the inner workings of the Gaze SDK. For further
information of the Tobii Gaze SDK see Tob [18].
GazRaz uses the Gaze SDK to:
• Initialize and connect to an eye tracker.
• Subscribe to gaze data.
• Getting information about the eye tracker.
Communication between the application GazRaz and the eye tracker is performed
asynchronously. The TobiiGazeCore library uses a dedicated thread for processing data and messages received from the eye tracker. When a connection to the
3.5
Tobii TX300 eye tracker
19
Figure 3.3: Illustration of the front of the TX300 eye tracker. Notice that the
camera represent the origin.
Figure 3.4: TX300 eye tracker view from the side. Notice that the eye tracker
and the z axis is angled with respect to the head. Head portrait designed by
Freepik.com.
3
20
Eye tracking techniques
eye tracker is established the thread waits for gaze data and evokes a callback
whenever data is received. This way the gaze data can be gathered and processed
without interference from the rest of the application. Lengthy operations should
however be avoided in the callback function, since it could block the thread and
result in loss of data.
Gaze SDK coordinate system
Figure 3.5: Illustration of the tree coordinate systems used by Gaze SDK.
The Gaze SDK uses three different coordinate systems as illustrated in figure 3.5.
The Active Display Coordinate System (ADCS), is the possibly most interesting,
since it hold the gaze data for the both eyes each mapped onto a 2D coordinate
system. ADCS have its origin in the top left corner of the active display e.g. the
integrated screen. The values of the coordinates are described as (0, 0) for origin
and (1, 1) right bottom corner. Since the trackable area stretches a bit outside the
active display it is possible to receive coordinates outside the interval of 0 to 1.
The User Coordinate System (UCS) describes each eye’s position in 3D space coordinates. The origin is in the center of the eye tracker front panel. The coordinates
are in millimeters. The x-axis points horizontally toward the users right. The
y-axis points straight up. The z-axis points towards the user with the same angle
as the eye tracker itself.
The track Box Coordinate System (TBCS) describes each eye’s position in a normalized track box. Track box is the volume in front of the eye tracker in which
the tracker is able to detect the eyes and collect gaze data. The TBCS has its origin
in the top right corner.
3.5
Tobii TX300 eye tracker
21
Gaze SDK gaze data
Figure 3.6: Illustration of two eyes looking at a point at the active display.
When a connection with the eye tracker is established, data packets will be streamed
to the application. Each of these gaze data packets will contain:
Timestamp. This value represents the time at which the information used to
produced the gaze data package was sampled by the eye tracker.
Eye position from the eye tracker. The 3D space position for each eye in the
UCS coordinate system. Eye position in the track box. The normalized 3D space
position of each eye within the track box described in the TBCS.
Gaze point from eye tracker. The gaze point on the calibration plane for each eye.
Calibration plane is the virtual plane used for calibration. Gaze point refers to
the position where the users gaze would interact the calibration plane.
Gaze point on display. The gaze point for each eye individually mapped to the
active display. The position in described in ADCS. Figure 3.6 illustrates the two
eyes fixating on an object in the screen.
The eye tracker also provides a status code to help determine which eye is being
tracked if only one is found.
4
Gaze aware image viewing
application, GazRaz
This chapter will discuss how the gaze aware image viewing application, GazRaz,
was constructed.
4.1 The idea
The idea for the application, originally stated by Matthew Cooper, was to use
gaze control to help explore high-resolution images. High-resolution images in
this case defined as images with larger resolution than the screen they are displayed on. A common method to explore high-resolution images is to zoom into
the images and navigate around in them with the use of mouse or keyboard. The
method is referred to as pan + zoom navigation. The problem with this method is
that the user loses overview of the image when zooming into the image. To keep
the overview of the image the GazRaz application uses a lens to zoom into the
images at the cursor/gaze position in the screen. Which creates what is known
as a detail + overview interface (Baudisch et al. [19]). Since the GazRaz application uses one screen and not two as Baudisch et al. [19], the lens (detailed view)
is going to overlap the background (overview) to some extent, depending on the
amount of zoom and size of the lens, therefore also hiding some of the information of the overview image under the lens. Figure 4.1 illustrates how a plane
zoom lens magnifies an area of the image, but also hides some of the overview
under the lens. To try to solve or decrease the effect of this issue a fish-eye lens
was also developed. With a fish-eye lens the information in the border would
be distorted but possibly provide enough information for the user to understand
what lies behind or just outside the lens.
23
4
24
Gaze aware image viewing application, GazRaz
The GazRaz application was developed in a experimental manner to explore
whether gaze control is suitable for image scanning in a detail + overview interface.
Figure 4.1: Illustration of the a plane zoom lens for detail and overview. The
blue area is hidden behind the lens and cannot be seen. View plane is the
background image.
4.2 Hardware setup
The Tobii TX300 eye tracker, discussed in 3.5 was used during development and
evaluation of GazRaz. The computer used together with the eye tracker system
is a HP laptop with a standard PC architecture. Table 4.1 holds the specifications
for the HP computer.
Table 4.1: Specifications for HP Zbook17
Component
Name
Processor CPU
Memory
Graphics GPU
Intel Core i7-4700MQ 2.4 GHz
8 GB RAM
Nvidia Quadro K4100M
Operating system
Windows 7, 64bit
4.3 Software setup
This section will discuss the softwares, libraries and SDK’s used. GazRaz was developed with the C++1 computer language. Modern OpenGL2 was used for pro1 http://www.cplusplus.com/
2 https://www.opengl.org/
4.4
Gaze data
25
graming against the graphics drivers. GLEW3 as extension manager for OpenGL.
For OpenGL context handling and input/output management GLFW4 was used.
To communicate with the eye tracker the Tobii Gaze SDK was used.
Furthermore OpenGL Mathematics (glm)5 was used for vector and matrix calculations. SOIL26 was used for image to OpenGL texture loading. To help with text
output to the user interface FreeType7 was used.
Outside of the GazRaz Tobii studio has been used to calibrate the eye tracker, and
to configure the eye tracker Tobii Eye Tracker Browser was used.
4.4 Gaze data
The gaze data used by the GazeRaz application consists of the gaze position for
each eye on the active display, and the 3D space coordinates for each eye described in USC. Listing 4.1 shows the struct for the gaze data for the gaze position
of the active display. Listing 4.2 is the data object to hold the eyes postion
Listing 4.1: Gaze position in the active display
/**
* Gaze point position on the screen.
* Normalized to values between -1 and 1.
* NOTE,If the user looks out side the active
* display area or due to precision error the
* x, y, z coordinates may be out side the interval of -1 to 1.
*/
struct GazePoint{
float x;
float y;
float z;
};
Listing 4.2: Eye postion in 3D space
/**
* Eye distance in mm from eye tracker.
* The coordinate system has its origin in the
* center of the eye tracker’s front panel.
*/
struct EyeDistanceMm {
double x;
double y;
double z;
};
3 http://glew.sourceforge.net/
4 http://www.glfw.org/
5 http://glm.g-truc.net/0.9.6/index.html
6 https://bitbucket.org/SpartanJ/soil2
7 http://www.freetype.org/
26
4
Gaze aware image viewing application, GazRaz
The original gaze point values provided by the Gaze SDK is in the active display
coordinate system, see section 3.5.1. For the gaze data to be suitable for OpenGL
it is transformed to the 2D Cartesian coordinate system normalized to values
between 1 and -1. The origin is now in the middle of the screen and the top right
corner has the coordinates (1, 1). whilst the bottom left corner has the coordinates
(-1, -1). The gaze coordinates are cast to floats to provide direct parsing to graphic
buffers.
To store the gaze data a C++ std::vector container is used. To avoid flooding the
memory with gaze data a reasonable size is used. Thereafter the oldest gaze data
is deleted as new gaze data is pushed to the queue. Every other element in the
vector will be the gaze point for either the left or the right eye. A vector size of
300 will hold 150 GazePoint pairs which is 0.5 seconds of gaze data when the
TX300 eye tracker runs with 300 Hz sample rate.
No z values are provided by the Gaze SDK for gaze points. The z dimension
in GazePoint was added to provide a possibility to extend the gaze point into
3D space in the future. However during development z was used to represent
whether a GazePoint was valid or not. To maintain the knowledge of how old a
GazePoint is even invalid data (No eye tracked) is pushed to the GazePoint vector.
This means that short disruptions, for example a blink of an eye will result in
series of invalid gaze samples.
In the same way as gaze points, eye position also uses a vector to store data. The
EyeDistanceMm is described in the same way as the user coordinate system in
section 3.5.1
4.5 Gaze data filtering
The data provided by the eye tracker, even during ideal conditions, will contain
a certain amount of noise. The origin of the noise can be from inaccurate calibration, inherited system noise or noise produced by the eyes themselves. Positioning the head close to the track box boundaries significantly increase the noise. To
reduce the effect of noise and outliers8 in the data, common techniques involve
calculating the mean values of multiple data samples and processing the data
through filters. Since the gaze position is meant to control the lens position in
real-time9 , it is essential that the gaze data is processed carefully. If noise influences the lens position, it could severely reduce the user experience and render
the gaze control feature useless.
On the other side of the noise issue is latency. Taking a large number of gaze
samples to calculate the mean value will cause the mean gaze data to be older
then the latest gaze data pushed by the eye tracker. Gaze data processing is a
8 Outliers refers to data samples that differs greatly compared to its neighbouring data samples.
9 Real-time is in this context defined as fast enough for the user to perceive the response of the
application as instant.
4.5
Gaze data filtering
27
compromise between noise reduction and latency. The rest of this section will
discuss how GazRaz processes the gaze data.
4.5.1
Mean
The simplest filtering of gaze data is to substitute the received gaze point by a
sample of the last N points. This way noise is reduced and precision is increased
at the cost of latency. The eye tracker provide gaze point data for both the left
and the right eye independently. The gaze point distribution between the right
and the left eye change depending of gaze angle, but stays close to each other.
The eyes might be looking at two slightly different locations, whilst the brain produces one image of the scene. Calculating the mean gaze point of the right and
left eye, produces a gaze point with better accuracy than for one eye separately.
The specification for the TX300 eye tracker states an accuracy of 0.5 ◦ monocular and 0.4 ◦ binocular. Equation 4.1 is used to calculate the mean gaze point
GazePointmean out of n number of samples. Note that the number of samples
should be selected such that an equal number of samples from left and right eye
are used. Invalid gaze point data should be ignored.
GazePointmean =
n
P
i=1
GazePointn
n
(4.1)
An eye tracker with a sample rate of 300 Hz produce 600 gaze points each second
(one for each eye). With this speed one gaze point pair is produced every 3.3 ms.
If 10 gaze point samples, i.e. 5 gaze point pairs are used to calculate a mean gaze
point. The oldest gaze point affecting the value is 16.67 ms old and their mean
age together is 10 ms. This is less than the update frequency of the screen that
is 16.67 ms. However due to thread safety the very last gaze data was not used
in the mean calculation. To be able to use the last gaze data in the gaze point
calculation the software would need to be constructed in such a way that the eye
tracker thread calls the draw calls which also would render it the user interface
thread. This would have many disadvantages such as a blocking of gaze data
would mean blocking of the user interface and latency would still be an issue
due to the latency of the calculations needed for draw calls. So the mean age of
the mean gaze point should be estimated as in equation 4.2 where n represents
the number of sample pairs and total system latency is specified to <10 ms. This
gives a mean age of <23.3 ms for 5 gaze sample pairs.
n+1
P
i=2
xn
sampleRate
+ totalS ystemLatency
(n)
(4.2)
4
28
4.5.2
Gaze aware image viewing application, GazRaz
Static threshold, weighted mean filter
With only the mean value as filtering method the gaze point appeared unstable.
Further increasing the number of samples used to calculate the mean gaze point
helped, but also increased the latency. Instead a high-pass filter was constructed
for the gaze data. The purpose of the filter is to allow the gaze point to move
rapidly when the eyes produce a saccade, and in the opposite hold the gaze point
steady when the eyes are fixating. Algorithm 1 shows the simplified pseudocode
for the filter.
Distance in algorithm 1, represent the distance between the "new" gaze point
and the "old" gaze point. If the distance is large enough the new gaze point will
remain unchanged. If the distance is within the different thresholds the old gaze
point is being weighted into the new gaze point, creating an weighted average of
the two. If the distance is smaller then the in parameter noiseThreshold the new
gaze point is discarded and the old gaze point is kept.
Algorithm 1 Static Filter
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
procedure staticFilter
distance ← absoluteValueOf(newGazePoint - oldGazePoint);
if distance < noiseThreshold then
oldGazePoint is returned.
else if distance < noiseThreshold * thresholdAmplifierFirst then
A weighted mean average gaze point between
the oldGazePoint and the newGazePoint is returned.
gazePoint ← getWeightedMeanAverageGazePoint(weig htFirst);
else if distance < noiseThreshold * thresholdAmplifierSecond then
A weighted mean average gaze point between
the oldGazePoint and the newGazePoint is returned.
gazePoint ← getWeightedMeanAverageGazePoint(weig htS econd);
else
newGazePoint is returned.
end
end
The noiseThreshold in algorithm 1, is described as a distance in two dimensional
Cartesian coordinates with axis limits of -1 to 1. The threshold limits and the
weighting were developed by trial and error to what seems a fair compromise
between "speed" and stability. The base value for the noise threshold was set to
0.00015, which, with equation 4.3, translates to a distance radius rthreshold of 2.5
mm on the TX300 monitor. swidth , sH ig ht are the screen width and hight in mm,
and xleng th , yleng th are x/y axis lengths in Cartesian coordinates. The threshold
radius can be further translated to a gaze angle of 0.22◦ at the distance of 650
mm with equation 3.1. This means that the lens will stay steady during fixation
if the gaze point data achieves a precision of less then 0.22◦ . Note that TX300
system specification stated a precision of 0.07◦ . Microsaccades discussed in sec-
4.5
Gaze data filtering
29
tion 2.2.3 can, however, have a larger amplitude than 0.22◦ and cause the lens to
move. This can be perceived as if the lens is making unintended movements since
microasaccades are not necessarily intended by the viewer. Increasing the base
noise threshold will reduce this movement but at the same time also suppresses
intended small saccedes.
Lines 5 and 9 in algorithm 1 compare the distance to an amplified noise threshold.
The thresholdAmplifierFirst was set to 200 which translates to a threshold radius
of 3.15◦ at a distance of 650 mm . The thresholdAmplifierSecond was set to 600,
which translates to a threshold radius of 5.44◦ at a distance of 650 mm. This
means that if a saccade is within 5.44◦ the lens will not directly jump to the new
gaze point, but rather gradually move towards it, depending on the weights.
q
rthreshold = q
2
2
sW
idth + s H ig ht
2
2
xleng
th + yleng th
(4.3)
With this filter the gaze point could be kept steady during fixation and move
rapidly during "large" saccades. Small saccades inside the threshold limits would
cause the gaze point to smoothly move towards the new gaze position.
Figure 4.2 shows the raw and filtered gaze points during smooth pursuit eye
movement. Smooth pursuit is only used to show the latency between the latest
gaze point in the raw data and the filtered mean gaze point. Since smooth pursuit
eye movement will be uncommon or non-existent in a image viewing application
due to the lack of targets to follow. Instead alternation between saccades and
fixation will the primary eye movement in the GazRaz application.
Figure 4.4 shows the raw and filtered gaze points during a saccade. Notice that
the static filtered gaze points make a "jump" from the left most circle to three
fourths of the way to the target. This is because the new gaze point distance
comes outside the thresholds. After the jump the filtered gaze point smoothly
moves toward the target.
4.5.3
Exponential threshold, weighted mean filter
A exponential weighted mean average filter was developed to explore if it could
produce a filter that allowed for rapid gaze point repositioning for both large and
small saccades. The algorithm 2 shows the simplified pseudocode for the filter.
In the same way as the static filter the parameter distance represent the distance
between the "old" and the "new" gaze point. The weight between the "old" and
the "new" gaze point is calculated as a exponential function depending on the
distance between the two.
The thresholdAmplifier was set to 1000 which translates to a threshold radius of
7.01◦ at a distance of 650 mm. If a saccade is close to the border of 7.01◦ the lens
30
4
Gaze aware image viewing application, GazRaz
Figure 4.2: Gaze points during smooth pursuit with static filter. Green
points are the raw gaze points for the right eye and they fade towards yellow
the older they get. Red points are the raw gaze points for the left eye and
they fades towards purple with age. White line consist of the filtered gaze
points. The red, green and blue circles represent the distance thresholds.
The mean gaze point calculation was done with 14 samples i.e 7 gaze point
pairs. The eyes move from the left side towards the right side in the image.
Notice the distance between the left most raw gaze points and the left most
filtered gaze points
Figure 4.3: Gaze points during smooth pursuit with exponential filter. Green
points are raw gaze points for the right and fade to yellow the older they
get. Red points are raw gaze points for left eye and fade to purple with age.
White line consist of the filtered gaze points. The red and cyan coloured
circles represent the distance thresholds. The mean calculation was done
with 14 samples i.e 7 gaze point pairs. The eyes move from the top left
corner towards the bottom of the image. Notice the distance between the
lowest raw gaze points and the lowest most filtered gaze points
4.6
Lens
31
Figure 4.4: Gaze points during a saccade with static filter (upper) exponential filter (lower). Green points are raw gaze points for the right and fade
to yellow the older they get. Red points are raw gaze points for left eye and
fade to purple with age. White points consist of the filtered gaze points. The
mean calculation was done with 12 samples i.e 6 gaze point pairs. The eyes
move from the left most circle to the right most circle.
will jump to the new gaze point, while with shorter saccades the gaze point will
be weighted more towards the old gaze point. Moving it gradually towards the
new gaze position.
This filter produced rapid movement of the gaze point for both large and small
saccades. Since the filter was so rapid it was given a larger boundary than the
static filter.
Figure 4.3 shows the raw and filtered gaze points during smooth pursuit with
exponential filter. Comparing figure 4.2 and 4.3 it is noticeable that the exponential filter have a lower latency (produce gaze points closer to the latest raw gaze
point) then the static filter.
Figure 4.4 shows the raw and filtered gaze points during a saccade. Notice that
it takes fewer exponentially filtered gaze points to reach the target then staticly
filtered.
4.6 Lens
Two types of lenses were developed for the application. One plane circular lens
and one fish-eye spherical lens. This section will discuss how the lenses were
4
32
Gaze aware image viewing application, GazRaz
Algorithm 2 exponential Filter
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
procedure exponentialFilter
distance ← absoluteValueOf(newGazePoint - oldGazePoint);
if distance < noiseThreshold then
oldGazePoint is returned.
else if distance < noiseThreshold * thresholdAmplifier then
A exponentially weighted mean average gaze point between
the oldGazePoint and the newGazePoint is returned.
gazePoint ← getExponentiallyWeightedMeanAverageGazePoint(distance);
else
newGazePoint is returned.
end
end
implemented and their pros and cons.
4.6.1
Double texture lens
The plane lens was the first to be developed. The lens uses a double texture
technique. To begin with a plane that covers the whole screen is placed in front
of the camera. The plane is covered with a texture. Then the gaze point together
with a zoom factor are used to calculate new texture coordinates for the zoomed
in image. The texture coordinates are thereafter used to render a new texture.
The new zoom texture and original texture is parsed by the shader that draws
the view plane. The shader will draw the original texture on the plane except for
region where the gaze point is located, there the new zoom texture will be drawn.
The advantage of the plane lens is that it is accurate with texture coordinates and
zoom amplitude. The position of the plane lens is accurate to the gaze point and
the zoomed in area is at the right position to the background. Another advantage
is that it is easy to resize without altering the zoom amplitude. Figure 4.5 and 4.7
shows an example of the plane lens.
The disadvantage of the lens is as shown in figure 4.1, that the lens will hide some
of the background image. See example 4.1.
Example 4.1: Overview hidden by plane lens
If a saccade fails to accurately target a small island in a map, there is a risk that
the lens will hide the island from the viewer, and the only thing perceived is
water.
4.6.2
Refraction lens
A fish-eye lens was developed to help the user avoid scenarios such as example
4.1. The distorted boundary of the lens could possibly provide a clue to the user
where the island could be.
4.6
Lens
33
The fish-eye lens was developed with a OpenGL sphere refraction10 technique. A
sphere is placed in the 3D space in front of the view plane. The same zoom texture as produced for the plane lens is parsed to the refract shader programme.
With the incident direction vector from the camera onto the spheres surface,
the refraction vector is calculated. The refraction vector will be affected by the
spheres surface normals, and create a glass like effect. The refraction vector is calculated with OpenGL refract11 . With a refraction index of 1.0/1.52. The refraction causes a distortional zoom effect when it bends the incident vector. Thus two
layers of zoom affect the lens, the one from the zoomed texture and the one from
refraction. The sphere can also be scaled to increase or shrink in size. Changing
the "thickness", the width in z axis12 , of the sphere also affected the distortional
zoom effect that can be either magnifying or minimizing. When the gaze position changes on the screen the sphere is translated to this position which in turn
updates the zoom texture. Since the incident vector will change in direction depending on where in the space the sphere is located, the refractional zoom effect
will also be affected. When the lens is located near the edges of the screen the
refractional zoom is skewed towards the boundary of the lens in the direction
towards the middle of the screen.
The advantage of this lens is that, with the right settings, it can provide clues as
to what information is on the boundaries of the lens. However in many situations
the lens still covers part of the view plane. There are no properties to determine
the distance between the input texture and the sphere which makes it difficult
to use as a proper magnifying glass simulation. The skewing of the refractional
zoom effect is also an unwanted feature since it will skew the magnified area away
from the users gaze point. Figure 4.5 shows the two different lenses. Figure 4.6
is another example of the refract lens.
4.6.3
Dynamic lens size
In section 2.1 it was stated that the human vision is limited to see sharply within
a limited degree of the visual angle. With the eye trackers ability to measure the
distance of the eyes from the eye tracker itself and with simple trigonometry it
is possible to determine the appropriate lens radius. This provides the ability to
continuously reshape the lens to fit a predefined visual angle.
The cosine law gives equation 4.4. (See figure 4.8 for variable explanation). We assume the sides b and c are of equal length i.e. b = c, this gives equation 4.5. With
equation 4.5 we can calculate the diameter α of the lens with the approximation
that b has the same length as the distance provided between the eye tracker and
the eye. α is chosen to be a reasonable size, between at least 1◦ (the size of the
fovea) or max 26◦ (height of the screen). With this implementation the size of
the lens will adapt to the distance between the user and the screen. This way the
10 Refraction refers to the change in direction of a light ray when entering another medium. For
example from air to glass.
11 https://www.opengl.org/sdk/docs/man/html/refract.xhtml
12 In our case the z axis is directed out of the screen towards the viewer
34
4
Gaze aware image viewing application, GazRaz
Figure 4.5: Plane lens (upper), with a zoom factor of 2.0. Fish-eye lens
(lower). Both a placed above Europe on a high resolution worldmap image
taken from shaderelief.com.
4.6
Lens
35
Figure 4.6: Refract lens over USA’s east coast. Texture image taken from
glif.is, see appendix for additional information
Figure 4.7: Plane lens over USA’s east coast. Texture image taken from glif.is,
see appendix for additional information
4
36
Gaze aware image viewing application, GazRaz
lens can continuously cover the same area of the retina independent of distance
to the screen. For example the lens could be set to a size of 5◦ , which would be
the same size as the parafoveal region. To avoid noise causing the lens to flicker
in size a mean eye distance calculation was done. The mean calculation followed
the method described in section 4.5.1.
Figure 4.8: Triangle. a, b, c represent the side lengths. α, β, γ represent the
angles. Image by David Weisman. Taken from wikipedia.org
a=
q
a=
4.6.4
b 2 + c 2 − 2bc ∗ cos(α)
q
2b 2 − 2b 2 ∗ cos(α))
(4.4)
(4.5)
Gaze point correction.
When using gaze control to steer the lens it is common to want to look/saccade
to some area already within the lens. Originally the the application would not
take into consideration that the saccade was done within the lens thus moving
the lens to look at the new gaze point as it would point to the background image.
This caused the target to disappear out of the lens or to the other side of the lens.
To avoid this issue a correction is made of the gaze point while looking inside the
lens, so that the corrected gaze point targets what the user sees within the lens.
Figure 4.9 describes the difference between the corrected and uncorrected gaze
point. Algorithm 3 presents a pseudocode example for gaze point correction.
Algorithm 3 Gaze Point Correction
procedure gazePointCorrection
xDistance ← getXDistance(newGazePoint, oldGazePoint);
yDistance ← getYDistance(newGazePoint, oldGazePoint);
if distance < lensRadiusNormalized and zoomFactor > 1.0 then
gazePoint.x = (oldGazePoint.x + xDistance/zoomFactor)
gazePoint.y = (oldGazePoint.y + yDistance/zoomFactor)
7:
end
8: end
1:
2:
3:
4:
5:
6:
4.7
User interface
37
Figure 4.9: Illustration of gaze point correction for the plane lens. The right
arrow shows where the lens would be moved to if there were no gaze point
correction. The dotted line shows how the gaze point correction repositions
the gaze point to target that area that the lens is showing at the moment.
4.7 User interface
This section will discuss how the user interacts with the application and how
feedback are provided back to the user. Keyboard, mouse and gaze control are
used to interact with the application. Below is a list presented that describes how
application attributes are controlled/changed by user input.
• Connect/disconnect eye tracker. The eye tracker is connected/disconnected
with a push of a keyboard button.
• Move lens. When the eye tracker is connected the lens will continuously be
moved to the users current gaze point, as long as he/she is holding down
the left mouse button. When the left mouse button is released the lens
will freeze at its current position. The eye tracker will however continue to
stream gaze data and whenever the left mouse button is pressed again the
lens will position itself at the current gaze position. When the eye tracker
is unconnected the lens will move to and follow the position of the mouse
cursor.
• Change image. Keyboard buttons are used to switch in between the various
test images.
• Change filter. A keyboard button is used to switch between the filter methods.
• Change lens type. Lens type can be changed with right mouse button or a
keyboard button. The alternatives are plane lens, fish-eye lens or no lens.
• Change lens size. Lens size is changed by pressing and holding a keyboard
button and in the same time use the scroll-wheel of the mouse. When using
38
4
Gaze aware image viewing application, GazRaz
the plane lens changing the size, will change the lens diameter in degrees of
gaze angle. If the eye tracker is connected, the plane lens will automatically
adapt its size depending on the distance of the users eyes, as described in
section 4.6.3.
• Change zoom factor. The zoom amplitude is changed with the scroll-wheel
of the mouse.
• Change noise threshold. Increased and decreased with keyboard buttons.
• Change number of samples for mean gaze point calculation. Increased
and decreased with keyboard buttons.
• Show/hide gaze points. Both the filtered and the raw gaze points can be
toggled to be shown, by pressing a keyboard button. The alternatives are,
filtered, raw, filtered and raw or none.
• Show/hide threshold boarders. The visual simulation of the threshold limits can be toggled to be shown by pressing a keyboard button.
• Default size/zoom. By pushing a keyboard button the current lens is resized back to its original size and zoom factor is returned to its origin.
To provide feedback to the user of the system status a transparent information
window is placed in the top left corner of the screen, see figure 4.10. The window
present information of:
• Zoom factor. The zoom amplitude.
• Lens size. Lens diameter in gaze angle.
• Samples. number of gaze samples used for mean gaze point calculation.
• Eye distance X, Y, Z. The mean distance of left and right eye, described in
user coordinate system.
• Fps. Fps stands for frames per second i.e. the update frequency for UI
thread.
• Eye tracker sps. Sps stand for samples per second i.e. the gaze data sample
frequency of the eye tracker.
• 5 rows of massages. Changes in the status of the application and or warnings are pushed to the message field. This way the user is provided feedback
on the system status or is informed why a action was dismissed.
A feature to warn users if they are about to leave the track box was also added. If
the mean eye position in user coordinates reached outside the track box boundaries a warning was displayed at the gaze position. Telling the user to move in a
direction towards the track box. Figure 4.11 shows the warning presented to the
4.7
User interface
user when the mean eye distance have exceeded the max z and min y limit.
39
40
4
Gaze aware image viewing application, GazRaz
Figure 4.10: The information window in the top left corner of the screen.
Figure 4.11: The red text in the center of the lens is the direction the user is
asked to move when he/she is too close to the track box border. The warning
disappears as fast as the eye distance comes inside the allowed limit.
5
Evaluation
To understand how potential users would would perceive the GazRaz application a evaluation was done. The evaluation method is called think aloud method
and is discussed in section 5.1. The evaluation aims to provide a guidance on
which features were successful and which settings were preferred. Further the
evaluation aimed to compare the participants experience of using gaze control as
interaction, compared to a conventional computer mouse.
5.1 Method
The method used for the evaluation is knows as the think aloud method. For
more information on think aloud method see van Someren and And [20]. After
the think aloud test the participant was also asked to complete a questionnaire.
The think aloud test is done one participant in at a time together with the test
facilitator. To begin with the participant answers a couple of questions on their
age, the uses of any vision correction, whether they have full ability to see colors
and if they approve of audio recording during the test. The TX300 can track the
eyes of a person wearing glasses and contact lenses, but as mentioned in section
3.1 it might influence the gaze data quality.
The participant were introduced to the test, explained that the test is done to evaluate the usability of the GazRaz application and evaluate how they experience
gaze control compared to using a mouse. It was explained to the participant that
the test was not designed to evaluate their problem solving abilities but solely
to understand how they use GazRaz to solve the tasks they were given during
41
5
42
Evaluation
the test. The participants were also asked to talk out loud about what they were
thinking and trying to do, to solve the tasks they were given.
Thereafter the participants conducted the eye tracker calibration procedure. The
calibration is done with Tobii Studio1 . The calibration type was set to regular
with 9 red calibration points on a dark grey background. The calibration speed
was set to normal.
Then the participant was presented with the GazRaz application. They were
guided on how they switch between lenses how they move the lens with the
mouse and with gaze control, how they increase/decrease the zoom and increase/decrease the size of the lens. The participants are allowed to try the experiment with
the application for minute or two. During the introduction the image in figure
5.1 was used.
When the participant have tried the application for a short while he/she are asked
whether he/she feel comfortable and understand how to change the zoom, resize
the lens and use gaze-control/mouse to move the lens. Was the answer yes the
test proceeded, otherwise the controls were further explained.
When the test proceeded the image was changed to figure 5.2. The participant
was explained to the background of the images and that the tasks they will be
given might include finding a certain cable or city and that these tasks are not
meant to measure their geographical knowledge but rather to see how they use
GazRaz during searching tasks, it was also explained that they can ask to abort
a task if they feel it is to difficult. Thereafter the participants were asked to do
a series of task, involving finding cities, cables, follow lines etc. To avoid misunderstanding due to pronunciation, names were also shown to the participants.
The tasks were structured to be solved with the plane lens or refracting lens with
mouse or gaze control, with static or exponential filter. Half (4) of the participants started the test by using the mouse and the other half (3) by using gaze
control. During the whole test the test-facilitator took notes of how the participant solved the tasks.
After the tasks were completed the participants were asked to complete a questionnaire with 14 statements. The participants was asked to rank the statements
on a scale between 1-7. 1-3 Strongly disagree, 4-5 Neither, 6-7 Strongly agree.
5.1.1
Participants
7 participants did the evaluation, 4 male and 3 female, in the ages of 22-32. All
claimed to have a full ability to see colors. Two were wearing contact lenses during the evaluation. 6 participants managed to collect data for all nine calibration
points during calibration.
1 http://www.tobii.com/eye-tracking-research/global/products/software/tobii-studio-analysissoftware/
5.1
Method
43
Figure 5.1: The image used during the introduction to GazRaz. Image resolution 16200x8100 pixels. Taken from shadedrelief.com.
Figure 5.2: The image used during the think aloud evaluation. The image
shows a world map centred over America. On the map there are coloured
lines illustrating GLIF infrastructure of international research and education
network bandwidth. The network cables have names, and some cities are
marked with points and names. Image resolution 16384x8192 pixels. Taken
from glif.is, see appendix for additional information
5
44
5.1.2
Evaluation
GazRaz settings
During the evaluation the participant was free to change the zoom factor and
lens size as she/he wished. The number of samples for the mean calculation
was set to 12 for the static filter and 30 for the exponential filter. The noise
threshold was linked together with the zoom factor. Which caused the noise
threshold to increase as the zoom factor increased and vice versa. Dynamic lens
size, discussed in section 4.6.3 was active for the plane lens only. Gaze point
correction, as explained in section 4.6.4 was active for both lenses. The Fish-eye
lens was limited to a maximal size of 12◦ , while the plane lens had no upper
limit.
5.1.3
TX300 settings
The TX300 where set in 300 Hz sample rate with default illumination mode. The
room was illuminated with diffuse ceiling light.
5.2 Result
When using the mouse all participants succeeded with their tasks. Using the fisheye lens three of the participants kept the original size of the lens which was set
to approximately 8◦ of visual angle. The others increased the size of the lens to
around 10-11◦ .
When using the plane lens most participants increased the lens size to around
10◦ with around 3 x zoom factor. Some participants made attempts with big lens
sizes up to 23◦ , but shortly afterwards decreased them.
With gaze control activated most of the task was solved. The task most participants gave up on was a search and find task with fish-eye lens and exponential
filter activated. When using gaze control with the exponential filter active, the
participants had problems to stabilize the lens, especially together with the fisheye lens. The lens would easily start to oscillate around the target, the participant
tried to fixate on. The oscillation decreased somewhat with the plane lens, but
from the facilitators point of view the lens seemed shaky and unstable. Some
participants described it as very hard to read when the lens followed the gaze
point. Participants also described it as too sensitive. Participants seemed to be
concentrating hard to keep the lens stable. Which also could influence them to
move closer to the screen, causing the noise to increase due to being too close to
the track box boarder.
With the static filter the oscillation and shakiness decreased, and the participants
seemed to be able to use gaze control more easily.
After two to three tasks with gaze control 3 of the participants noted that it was
easier to read if they would release the left mouse button casing the lens to freeze
5.2
Result
45
at its current position. They asked the facilitator if it was allowed to release
the button or click it, thereby casing the lens to alternate between movement
and stationary position, and it was explained that it was allowed to do so. The
other 4 kept holding down the left mouse button during the whole test. Some
participants complained that he/she wanted the lens to stand still but yet kept
holding down the mouse button. One of the questions was to find the cable that
the participant thought had the highest bandwidth, the bandwidth was stated
next to the name. This caused a search and compare task. The participants that
adopted the click and release method managed to find the highest value whilst
the others all selected some other value.
With gaze control activated some participants experimented with large lens sizes
but most seemed satisfied with a lens size of around 10◦ . The zoom factor used
was around 2-5 x.
To summarize the result. The plane lens was preferred over the fish-eye lens. The
exponential filter was more likely to cause oscillating behaviour and forced the
participants to concentrate more. Some of the participants kept holding the left
mouse button pressed during the whole test, making the lens constantly follow
the gaze position. The participants expressed enjoyment in using gaze control,
but the evaluation is not sufficient to make claims that gaze control is preferred
over the use of a computer mouse. The task provided was however solvable with
gaze control.
6
Conclusions and Future work
This chapter will discuss the result of the evaluation in section 5.2 and the decisions made during development of GazRaz.
6.1 Discussion
The results of the evaluation in section 5.2, indicate that even with quite uncomplicated gaze data processing it is possible to acquire a stable and accurate
enough gaze position to use it to control the position of a magnifying lens. The
evaluation partially aimed to explore how the participants would perceive the
different aspects of GazRaz such as both lenses and filters. Which makes it difficult to directly compare the participants experience of mouse usage compared to
gaze control. A more specific evaluation only comparing computer mouse and to
gaze control could provide a more reliable result.
There are some learnt abilities to take into account when reviewing the evaluation
result. Some of the participants might have performed better and stressed their
eyes less if the would have thought about releasing the left mouse button when
they managed to place the lens at a target. Maybe a more thorough instruction to
the GazRaz application, before the test started would have made them aware of
that possibility.
The reason why the left button was added as a gaze control delimiter, was to allow
for free viewing within the lens when the target was hit. In my own experience
there was difficulty to use continuous gaze control with a zoom factor greater
then 6, thereafter the lens would tend to oscillate aggressively during attempted
47
48
6
Conclusions and Future work
fixation. By using a low zoom factor to aim the lens at the target. Thereafter
releasing the left mouse button to lock the lens to the target. The zoom and size
could thereafter be increased without oscillation or need for intense concentration. In my experience this procedure can be done rapidly and effortlessly. However none of the participants fully adopted this process during the evaluation,
but some used a click approach to make the lens make a jump to the current gaze
position whenever they wished it to do so.
Between the fish-eye and the plane lens, the plane lens was the one preferred by
the participants. This might be because text becomes easier to read on the plane
lens where the words are not distorted, see figure 4.6. My opinion is, however,
that the idea of a fish-eye lens should not be discarded. A different implementation technique for a fish-eye lens then done in GazRaz could have provided a
better result.
When it comes to the exponential and static filter, the exponential filter proved
to be difficult to use. Even though it used many more gaze point samples, 30
compared to 12 of the static filter, it moved jerkily (as do the eyes) and the participants had problem stabilizing the lens. With the static filter small saccades
close to the lens would cause the lens to gradually move toward the new gaze
point. The movement of the static filter was balanced such that it was responsive
enough for the eye to understand the lens attention without the confusing jerky
movements. In section 2.2.2 corrective saccades are discussed, and during development my experience was that corrective saccades are very common. With the
static filter the first saccade will move the lens close (within the 5.44◦ discussed in
section 4.5.2) to the target, the followed corrective saccade will cause the lens to
gradually move towards the target. With the exponential filter the corrective saccade would be close to instantly followed by the lens. The mechanisms involved
in corrective saccedes are still argued about, but it might be disturbing for the
visual sense to have unexpected stimuli at the gaze point. Remember the quote
from Hillstrom and Yantis [9] in section 2.1, “when motion segregates a perceptual element from a perceptual group, a new perceptual object is created, and
this event captures attention” and “motion as such does not capture attention
but that the appearance of a new perceptual object does.”.
The idea of the adaptive lens size was that the lens size would adapt to the size
of the sensor dense part of the vision as discussed in section 2.1. When the participants were free to choose lens size they tended to use a much larger lens size
then the fovea or parafovea region would represent. Large lens sizes when using
the mouse is understandable since the eye is then able to search freely within the
lens and using gaze control with the click and release method could also prove
successful with large lens size. With gaze control constantly activated the eye
will mostly only be able to see the center of the lens, thus large lens size is unnecessary.
The out of track box warning discussed in section 4.7 could have provided better warnings than it was designed to do during the evaluation. Three times a
6.2
Future work
49
participant positioned him/her self such that the eyes were on the border of the
track box, which increased the noise. For an inexperienced participant this will
be perceived as if the application is unstable or broken. An earlier warning could
provide the feedback needed to inform the participant to reposition in the track
box.
Gaze point correction was not evaluated in chapter 5, but in my experience it
increased usability. Since it made it possible to conduct saccades within the lens
and target what you would expect to see and not the actual gaze point behind the
lens.
6.2 Future work
Further development of gaze data filtering algorithm. The static filter that proved
successful in this thesis has a simple construction. Making the weighting and
thresholds smarter and adaptable to the eyes movement state could provide improvements. For example detecting saccades and fixations and making the filter
adapt to the behaviours. Špakov [21] wrote a paper comparing different gaze
data filtering methods.
The fish-eye lens implementation was relatively unsuccessful and could possibly
be implemented in a better way. A better implementation could provide a solution for the overlapping of overview that the plane lens creates, as in example
4.1.
In section 4.5.1 it was mentioned that GazRaz dose not use the latest gaze point
sample due to thread safety. This could be solved by implementing a circle queue
design principle. This way the latency would be reduced slightly. In our case with
the TX300 eye tracker system the different would have been small because of the
high sample rate. For a eye tracker system with lower sample rate this latency
enhancement would have a greater impact.
It would be interesting to review the result of an evaluation specifically comparing the efficiency of gaze control compared to computer mouse in GazRaz. Would
gaze control improve efficiency if the participants had sufficient time to learn how
to use it?
It would also be interesting to evaluate how different type of images would be
perceived with the GazRaz application. Images presenting other forms of data
than the maps used in figure 5.2 and 5.1. For example text rich images or text
free images, or images that demand high zoom factor etc.
Appendix
A
GLIF, Additional information and
Acknowledgements
Acknowledgements - The Global Lambda Integrated Facility (GLIF) Map 2011
visualization was created by Robert Patterson of the Advanced Visualization Laboratory (AVL) at the National Center for Supercomputing Applications (NCSA)
at the University of Illinois at Urbana-Champaign (UIUC), using an Earth image
provided by NASA with texture retouching by Jeff Carpenter, NCSA. Data was
compiled by Maxine D. Brown of the Electronic Visualization Laboratory (EVL)
at the University of Illinois at Chicago (UIC). Support was provided by GLIF, NCSA/UIUC, the State of Illinois, and US National Science Foundation grants # OCI0962997 to EVL/UIC. For more information on GLIF, see http://www.glif.is/.
Additional Information - The GLIF map does not represent all the world’s Research and Education optical networks, and does not show international capacity
that is dedicated to production usage. The GLIF map only illustrates excess capacity that its participants are willing to share with international research teams
for applications-driven and computer-system experiments, in full or in part, all
or some of the time. GLIF does not provide any network services itself, and researchers should approach individual GLIF network resource providers to obtain
lightpath services.
53
54
A GLIF, Additional information and Acknowledgements
Bibliography
[1] Accuracy and precision test method for remote eye trackers. Tobii Technology AB, 2.1.1 edition, February 2011. Cited on pages 1, 15, and 17.
[2] Keith Rayner. Eye movements in reading and information processing: 20
years of research. Psychological Bulletin, 124(3):372 – 422, 1998. ISSN
0033-2909. URL https://login.e.bibl.liu.se/login?url=http:
//search.ebscohost.com/login.aspx?direct=true&db=pdh&
AN=1998-11174-004&site=ehost-live. Cited on pages 2 and 7.
[3] Christina F Lassen, Sigurd Mikkelsen, Ann I Kryger, and Johan H Andersen.
Risk factors for persistent elbow, forearm and hand pain among computer
workers. Scandinavian Journal of Work, Environment & Health, (2):122,
2005. ISSN 03553140. URL https://login.e.bibl.liu.se/login?
url=http://search.ebscohost.com/login.aspx?direct=true&
db=edsjsr&AN=edsjsr.40967478&site=eds-live. Cited on page 2.
[4] Susana Martinez-Conde, Jorge Otero-Millan, and Stephen L. Macknik.
The impact of microsaccades on vision: towards a unified theory of saccadic function. Nature Reviews Neuroscience, 14(2):83 – 96, 2013. ISSN
1471003X. URL https://login.e.bibl.liu.se/login?url=http:
//search.ebscohost.com/login.aspx?direct=true&db=aph&
AN=84942422&site=eds-live. Cited on pages 2, 5, 8, and 9.
[5] Päivi Majaranta and Andreas Bulling. Eye tracking and eye-based human–computer interaction. Advances in Physiological Computing, page 39,
2014.
ISSN 9781447163916.
URL https://login.e.bibl.liu.
se/login?url=http://search.ebscohost.com/login.aspx?
direct=true&db=edb&AN=95560053&site=eds-live.
Cited on
pages 5, 7, 11, 12, 13, and 16.
[6] Richard A. Bolt. Eyes at the interface. In Proceedings of the 1982 Conference
on Human Factors in Computing Systems, CHI ’82, pages 360–362, New
York, NY, USA, 1982. ACM. doi: 10.1145/800049.801811. URL http://
doi.acm.org/10.1145/800049.801811. Cited on pages 5 and 6.
55
56
Bibliography
[7] Heiko Drewes.
Eye gaze tracking for human computer interaction.
March 2010. URL http://nbn-resolving.de/urn:nbn:de:bvb:
19-115914. Cited on pages 5, 6, 7, 8, 9, 11, 12, 14, and 15.
[8] R. John Leigh and David S. Zee. Neurology of eye movements. Contemporary neurology series: 70. Oxford : Oxford University Press, 2006, 2006.
ISBN 0195300904.
URL https://login.e.bibl.liu.se/login?
url=http://search.ebscohost.com/login.aspx?direct=true&
db=cat00115a&AN=lkp.451202&site=eds-live. Cited on pages 7
and 8.
[9] AnneP. Hillstrom and Steven Yantis. Visual motion and attentional capture.
Perception & Psychophysics, 55(4):399–411, 1994. ISSN 0031-5117. doi: 10.
3758/BF03205298. URL http://dx.doi.org/10.3758/BF03205298.
Cited on pages 7 and 48.
[10] Fize Denis Thorpe Simon and Marlot Catherine. Speed of processing in the
human visual system. Nature, pages 520–522, 1996. URL http://dx.
doi.org/10.1038/381520a0. Cited on page 8.
[11] Jorge Otero-Millan, Stephen L. Macknik, Alessandro Serra, R. John Leigh,
and Susana Martinez-Conde. Triggering mechanisms in microsaccade
and saccade generation: a novel proposal. Annals of the New York
Academy of Sciences, 1233(1):107–116, 2011. ISSN 1749-6632. doi: 10.
1111/j.1749-6632.2011.06177.x. URL http://dx.doi.org/10.1111/
j.1749-6632.2011.06177.x. Cited on page 9.
[12] Robert M. Steinman, Robert J. Cunitz, George T. Timberlake, and Magdalena
Herman. Voluntary control of microsaccades during maintained monocular
fixation. Science, 155(3769):pp. 1577–1579, 1967. ISSN 00368075. URL
http://www.jstor.org/stable/1721006. Cited on page 9.
[13] Robert M. Steinman, Genevieve M. Haddad, Alexander A. Skavenski, and
Diane Wyman. Miniature eye movement. Science, 181(4102):pp. 810–819,
1973. ISSN 00368075. URL http://www.jstor.org/stable/1736402.
Cited on pages 9 and 12.
[14] Michael F. Land and Sophie Furneaux. The knowledge base of the oculomotor system. Philosophical Transactions: Biological Sciences, 352(1358):
pp. 1231–1239, 1997. ISSN 09628436. URL http://www.jstor.org/
stable/56660. Cited on page 11.
[15] Kenneth B. I. Holmqvist. Eye tracking : a comprehensive guide to methods
and measures. Oxford : Oxford University Press, 2011, 2011. ISBN
9780199697083. URL https://login.e.bibl.liu.se/login?url=
http://search.ebscohost.com/login.aspx?direct=true&db=
cat00115a&AN=lkp.636826&site=eds-live.
Cited on pages 12
and 14.
[16] Robert J. K. Jacob. The use of eye movements in human-computer interac-
Bibliography
57
tion techniques: What you look at is what you get. ACM Trans. Inf. Syst.,
9(2):152–169, April 1991. ISSN 1046-8188. doi: 10.1145/123078.128728.
URL http://doi.acm.org/10.1145/123078.128728. Cited on page
15.
[17] Tobii tx300 eye tracker. http://www.tobii.com/Global/Analysis/
Marketing/Brochures/ProductBrochures/Tobii_TX300_
Brochure.pdf, 2014. Accessed: 2015-05-18. Cited on page 17.
[18] Tobii Gaze SDK Developer’s Guide General Concepts. Tobii Technology AB,
2014. URL http://developer.tobii.com/. Cited on page 18.
[19] Patrick Baudisch, Nathaniel Good, Victoria Bellotti, and Pamela Schraedley.
Keeping things in context: A comparative evaluation of focus plus context
screens, overviews, and zooming. In Proceedings of the SIGCHI Conference
on Human Factors in Computing Systems, CHI ’02, pages 259–266, New
York, NY, USA, 2002. ACM. ISBN 1-58113-453-3. doi: 10.1145/503376.
503423. URL http://doi.acm.org/10.1145/503376.503423. Cited
on page 23.
[20] Maarten W. van Someren and Others And. The Think Aloud Method:
A Practical Guide to Modelling Cognitive Processes. 1994. ISBN 0-12714270-3. URL https://login.e.bibl.liu.se/login?url=http:
//search.ebscohost.com/login.aspx?direct=true&db=eric&
AN=ED399532&site=eds-live. Cited on page 41.
[21] Oleg Špakov. Comparison of eye movement filters used in hci. In Proceedings of the Symposium on Eye Tracking Research and Applications, ETRA
’12, pages 281–284, New York, NY, USA, 2012. ACM. ISBN 978-1-45031221-9. doi: 10.1145/2168556.2168616. URL http://doi.acm.org/
10.1145/2168556.2168616. Cited on page 49.
58
Bibliography
Upphovsrätt
Detta dokument hålls tillgängligt på Internet — eller dess framtida ersättare —
under 25 år från publiceringsdatum under förutsättning att inga extraordinära
omständigheter uppstår.
Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,
skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid
en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av
dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,
säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ
art.
Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman
i den omfattning som god sed kräver vid användning av dokumentet på ovan
beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan
form eller i sådant sammanhang som är kränkande för upphovsmannens litterära
eller konstnärliga anseende eller egenart.
För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/
Copyright
The publishers will keep this document online on the Internet — or its possible replacement — for a period of 25 years from the date of publication barring
exceptional circumstances.
The online availability of the document implies a permanent permission for
anyone to read, to download, to print out single copies for his/her own use and
to use it unchanged for any non-commercial research and educational purpose.
Subsequent transfers of copyright cannot revoke this permission. All other uses
of the document are conditional on the consent of the copyright owner. The
publisher has taken technical and administrative measures to assure authenticity,
security and accessibility.
According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected
against infringement.
For additional information about the Linköping University Electronic Press
and its procedures for publication and for assurance of document integrity, please
refer to its www home page: http://www.ep.liu.se/
© Sebastian Rauhala