Download Detail to Attention: Exploiting Limits of the Human Visual

Document related concepts

Edge detection wikipedia , lookup

Subpixel rendering wikipedia , lookup

BSAVE (bitmap format) wikipedia , lookup

Autostereogram wikipedia , lookup

Hold-And-Modify wikipedia , lookup

Stereopsis wikipedia , lookup

Indexed color wikipedia , lookup

2.5D wikipedia , lookup

Anaglyph 3D wikipedia , lookup

Computer vision wikipedia , lookup

Image editing wikipedia , lookup

Spatial anti-aliasing wikipedia , lookup

Stereo display wikipedia , lookup

Stereoscopy wikipedia , lookup

Rendering (computer graphics) wikipedia , lookup

Transcript
Detail to Attention: Exploiting Limits of the
Human Visual System for Selective Rendering
Kirsten Fiona Cater
A thesis submitted to the University of Bristol, UK in accordance with the requirements
for the degree of Doctor of Philosophy in the Faculty of Engineering, Department of
Computer Science.
May 2004
c 47000
i
Abstract
The perceived quality of realistic computer graphics imagery depends on the physical
accuracy of the rendered frames, as well as the capabilities of the human visual system.
Fully detailed, high fidelity frames still may take many minutes, even hours, to render
on today’s computers. The human eye is physically incapable of capturing a whole
scene in full detail. Humans sense image detail only in a 2 degree foveal region, relying
on rapid eye movements, or saccades, to jump between points of interest. The human
brain then reassembles these glimpses into a coherent, but inevitably imperfect, visual
perception of the environment. In the process, humans literally lose sight of unimportant
details.
This thesis demonstrates how properties of the human visual system, in
particular Change Blindness and Inattentional Blindness, can be exploited to accelerate
the rendering of animated sequences by applying a priori knowledge of a viewer’s task
focus. This thesis shows, via several controlled psychophysical experiments, how
human subjects will consistently fail to notice degradations in the quality of image
details unrelated to their assigned task, even when these details fall under the viewer’s
gaze. This thesis has built on these observations to create a perceptual rendering
framework, which combines predetermined task maps with spatiotemporal contrast
sensitivity to guide a progressive animation system that takes full advantage of imagebased rendering techniques. This framework is demonstrated with a Radiance raytracing implementation that completes its work in a fraction of the normally required
time, with few noticeable artefacts for viewers performing the task.
ii
Declaration
I declare that the work in this thesis is original and no portion of work referred to here
has been submitted in support of an application for another degree or qualification of
this or any other university or institution of learning.
Signed:
Date:
Kirsten Cater
iii
Acknowledgements
Above all I must thank Alan Chalmers for his GREAT supervising, encouragement and
enthusiasm for the work that I performed in this thesis. Thank you to Greg Ward for his
never ending amount of knowledge and friendship, the selective rendering framework
discussed in Chapter 6 would not have been possible without him; Prof. Tom
Troscianko for his invaluable comments and advice in designing the psychophysical
experiments. I am truly grateful to everyone that participated in the pilot studies and
experiments, without them there would not have been any results for this thesis.
A special thanks to all my colleagues and friends in the Department of Computer
Science in Bristol who have helped me over the years, Kate, Patrick and Pete and the
rest of the Graphics group, Georgie (you’re a star
and a good friend), Angus, Sarah,
Oli (and Emma), Barry, Henk, Eric, Andrew, Becky, Lucy, Hayley and the rest of my
willing performers for the Christmas pantos! As well as Colin (and his lovely family,
Helen, Nia and Kyran), Neill, Mish and Steve (good luck in OZ, you’ll be missed), to
mention but a few! Thank you to Prof. David May for the continued support throughout
my degree, PhD and now my postdoc. As well as my new colleagues at Mobile Bristol
for giving me the time and peace of mind to actually finish my thesis, I appreciate their
patience.
Thanks to all my friends involved with ACM SIGGRAPH, particularly Thierry,
Scott and Etta, as well as Simon, Karol, Ann, Katerina, Nan, Alain, Judy, Reynald,
Fredo, and the rest of graphics/rendering community that I have got to know so well
over the many years of attending graphics conferences.
Last but not least, thank you to Hazel for her endless friendship over so many
years. My parents, John and Erlet, for giving me the best start in life anyone could wish
for, as well as their continued support and for showing me to how to enjoy and
experience life. My brother Carl, with whom I have shared such wonderful experiences
and memories and who always knows how to make me smile. Finally Jade, for showing
me what true love is; I love you so much. This thesis is dedicated to them.
iv
To my parents, brother and Jade
v
Contents
Abstract
ii
Declaration
iii
Acknowledgements
iv
Dedication
v
Chapter 1 – Introduction
1-6
1.1 Thesis Outline
5-6
Chapter 2 – Perceptual Background
7-35
2.1 The Anatomy of Sight - The Human Visual System (HVS)
7
2.2 Visual Perception
12
2.2.1 Visual Acuity of the Human Eye
13
2.2.2 Contrast Sensitivity
14
2.2.3 Eye Movements
15
2.3 Attention
18
2.3.1 Visual Attention
21
2.3.2 Top-down versus Bottom-up processes
22
2.3.3 Change Blindness
23
2.3.4 Inattentional Blindness
26
2.3.4.1 Inattentional Blindness in Magic
2.3.5 An Inattentional Paradigm
2.4 Summary
29
31
35
Chapter 3 - Rendering Realistic Computer Graphical Images
3.1 Local illumination
36-70
38
vi
3.2 Global illumination
41
3.2.1 Ray tracing
42
3.2.2 Radiosity
44
3.3 Radiance
49
3.4 Visual Perception in Computer Graphics
51
3.4.1 Image Quality Metrics
51
3.4.2 Simplification of complex models
60
3.4.3 Eye Tracking
61
3.4.4 Peripheral vision
62
3.4.5 Saliency models
64
3.5 Summary
70
Chapter 4 – Change Blindness
71-95
4.1 Introduction
72
4.2 The Pre-Study
73
4.3 Main Experimental Procedure
84
4.4 Results
87
4.4.1 Statistical Analysis
92
4.5 Discussion - Experimental Issues and Future Improvements
93
4.6 Summary
94
Chapter 5 – Inattentional Blindness
96-133
5.1 Introduction - Visual Attention
96
5.2 Experimental Methodology
97
5.2.1
Creating the Animation
98
5.2.2
Experimental Procedure
102
5.2.3
Results
105
5.2.4
Analysis
108
5.2.5
Verification with an eye-tracker
109
5.2.6
Conclusions
110
5.3 Non-Visual Tasks
110
5.2.7
The psychophysical experiment
111
5.2.8
Results
112
5.2.9
Analysis
115
5.2.10 Verification with an eye-tracker
vii
116
5.2.11 Conclusions
117
5.4 Inattentional Blindness versus Peripheral Vision
117
5.4.1 Task Maps: Experimental Validation
118
5.4.2 Creating the Selective Quality (SQ) Images
121
5.4.3 Experimental Methodology
123
5.4.4 Results
126
5.4.5 Statistical Analysis
129
5.4.6 Verification with an Eye-tracker
130
5.5 Summary
132
Chapter 6 – An Inattentional Blindness Rendering Framework
134-153
6.1 The Framework
136
6.2 Implementation
144
6.3 Summary
153
Chapter 7 – Conclusion
154-162
7.1 Contributions of this thesis to the research field
154
7.2 Future Research
158
7.2.1 Visual Attention
158
7.2.2 Peripheral Vision
160
7.2.3 Multi-Sensory Experiences
160
7.2.4 Type of Task
160
7.2.5 Varying Applications
161
7.2.6 Alterations for Experimental Methodologies
162
7.3 A Final Summary
162
Bibliography
163-178
Appendix A
179 -192
Materials A.1 – Judges Responses
180-190
Materials A.2 – Instructions and Questionnaire
191-192
Appendix B
193 - 208
Materials B.1 – Instructions – Free Viewing
194
Materials B.2 – Instructions – Counting Pencils
195
Materials B.3 – First Attempt Questionnaire
viii
196-201
Materials B.4 – Final Questionnaire
Materials B.5 – Instructions – Non-Visual
Appendix C
202-207
208
209 - 218
Materials C.1 – Questionnaire – Threshold Values
Materials C.2 – Instructions – Counting Teapots
Materials C.3 – Questionnaire
210-212
213
214-218
ix
List of Figures
Page No.
Figure 1.1: Some examples of photo-realistic images created in Radiance
[Ward Larson and Shakespeare1998].
Figure 1.2: Some examples of the ‘blocky graphics’ that can be achieved at
interactive rates in the Explorer 1, a six degrees of freedom single-seater
simulator.
2
Figure 2.1: The human eye [JMEC 2003].
Figure 2.2: The peak sensitivities of the three types of cones and the rods
[Dowling 1987].
Figure 2.3: The three functional concentric areas of the visual field, the
fovea, the macula and focal vision [VEHF 2004].
Figure 2.4: The location of cone and rod peaks in eccentricity from the fovea
[Osterberg 1935].
Figure 2.5: Original Image (left). How the eye actually views the image, as
seen with the right eye focused on the centre (right). Note the yellow circle in
the centre, the fovea, it is yellow due to the lack of blue cones. The black
circle to its right is the location of the optic nerve, black as there are no
photoreceptors in this part of the retina, and the degradation in visual acuity
and colour towards the periphery [CIT 2003].
Figure 2.6: Photograph of the back of the retina to show the location of the
fovea and optic nerve [CS 2004].
Figure 2.7: Finding your blind spot, i.e. the location of your optical nerve
[Lightwave 2003].
Figure 2.8: Image to demonstrate the concept of experience on perception
[OL 2003].
Figure 2.9: Can you see a young woman or an old hag? [OL 2003]
Figure 2.10: An example of a Snellen Chart used by opticians to test a
person’s visual acuity [MD Support 2004].
Figure 2.11a: Contrast sensitivity Vs. Spatial frequency [Daly 1993].
8
8
x
3
10
10
11
11
11
13
13
14
16
Figure 2.11b: The Campbell-Robson chart to show Contrast sensitivity
Vs. Spatial frequency. Spatial frequency increases from left to right (i.e.
the size of the bars decrease), whilst contrast sensitivity increases from the
bottom of the figure to the top, and thus the target contrasts decrease
[Prikryl and Purgathofer 1999].
Figure 2.12: The affect of contrast sensitivity on a human’s vision (normal
left) [CS 2003].
Figure 2.13: A demonstration of a negative image due to the eyes constant
need for new light stimulation [OL 2003].
Figure 2.14: A demonstration of a colour negative image [SVE 2003].
Figure 2.15: Diagram to show Broadbent’s Filter Theory of auditory
attention [Broadbent 1958].
Figure 2.16: Effects of task on eye movements. The same picture was
examined by subjects with different instructions; 1. Free viewing, 2. Judge
their ages, 3. Guess what they had been doing before the “unexpected
visitor’s” arrival, 4. Remember the clothes worn by the people, 5.
Remember the position of the people and objects in the room & 6. Estimate
how long the “unexpected visitor” had been away from the family [Yarbus
1967].
Figure 2.17: The experiment designed by O’Regan et al. [Rensink 2001].
Figure 2.18a & b): Examples of the modifications (presence and location)
made by O’Regan et al. to the photographs, a) (top) shows a presence
alteration where the cheese has been removed, b) (bottom) shows a
location alteration where the bar behind the people has been moved
[O’Regan et al. 2001].
Figure 2.19a: Choose one of the six cards displayed above.
Figure 2.19b: Can you still remember your card?
Figure 2.19c: Only five cards remain. Can you remember your card? I
must have chosen your card and removed it. Is this coincidence or is it an
illusion?
Figure 2.20: Critical Stimulus in parafovea: noncritical and critical trial
displays [Mack and Rock 1998].
Figure 2.21: Table to show the experimental ordering for observers [Mack
and Rock 1998].
Figure 2.22: Results from the inattention paradigm. Subjects perform
better than chance at recognising location, colour, and number of elements
but not shape [Mack and Rock 1998].
Figure 2.23: Critical stimulus at fixation: noncritical and critical stimulus
displays [Mack and Rock 1998].
16
Figure 3.1: The goal of realistic image synthesis: an example from
photography [Stroebel et al. 1986].
Figure 3.2: Light transmitted through a material.
Figure 3.3: Light absorbed by a material.
Figure 3.4: Light refracted through a material.
Figure 3.5: Light reflected off a material in different ways, from left to
right, specular, diffuse, mixed, retro-reflection and finally gloss [Katedros
2004].
37
xi
16
17
17
20
23
26
26
30
30
31
32
33
34
35
38
39
39
40
Figure 3.6: The differences between a simple computer generated polyhedral
cone (left), linearly interpolated shading to give appearance of curvature
(Gouraud Shading). Note Mach bands at edges of faces (middle) and a more
complex shading calculation, interpolating curved surface normals (Phong
Shading). This is necessary to eliminate Mach Bands (right).
Figure 3.7: Graphical Depiction of the rendering equation [Yee 2000].
Figure 3.8: Ray tracing.
Figure 3.9: Radiosity [McNamara 2000].
Figure 3.10: Relationship between two patches [Katedros 2004].
Figure 3.11: Nusselt’s analog. The form factor from the differential area
dAi to element Aj is proportional to the area of the double projection onto
the base of the hemisphere [Nusselt 1928].
Figure 3.12: The hemicube [Langbein 2004].
Figure 3.13: The difference in image quality between ray tracing (middle)
and radiosity (right hand image).
Figure 3.14: Renderings of a simple environment. Ray traced Solution (left),
Radiosity Solution (center), and Radiance Solution (right) [McNamara
2000].
Figure 3.15: Conceptually how a perceptually-assisted renderer makes use
of a perceptual error metric to decide when to halt the rendering process [Yee
2000].
Figure 3.16: Block structure of the Visual Difference Predictor [Prikryl and
Purgathofer 1999].
Figure 3.17: An Overview of the Visual Difference Predictor –
demonstrating in more detail the ordering of the processes that are involved
[Yee 2000].
Figure 3.18: (a) shows the computation at 346 seconds, (b) depicts the
absolute differences of pixel intensity between the current and fully
converged solutions, (c) shows the corresponding visible differences
predicted by the VDP, (d) shows the fully converged solution which is used
as a reference [Volevich et al. 2000].
Figure 3.19: The original Stanford Bunny model (69,451 faces) and a
simplification made by Luebke and Hallen’s perceptually driven system
(29,866 faces). In this view the user’s gaze is 29° from the centre of the
bunny [Luebke and Hallen 2001].
Figure 3.20: Watson et al.’s experimental environment as seen with the
coarse display [Watson et al. 1997b].
Figure 3.21: An example of McConkie’s work with an eye linked multiple
resolution display [McConkie and Loschky 1997].
Figure 3.22: How the saliency map is created from the feature maps of the
input image [Itti 2003a].
Figure 3.23: Diagrams to show how the saliency model has inhibited the
first fixational point that the system has highlighted as the most salient target,
so the next most salient point can be found. Left – image showing the first
fixation point, right – the corresponding saliency map with the fixation point
inhibited [Itti 2003b].
Figure 3.24: (a) Original Image (b) Image rendered using the Aleph map (c)
Saliency map of the original image (d) Aleph map used to re-render the
original image [Yee 2000].
xii
41
42
44
45
46
47
47
49
50
52
53
55
57
60
62
63
65
65
66
Figure 3.25: General architecture of the attention model. The bottom-up
component is a simplified version of the model developed by Itti et al.
[1998]. The top-down component was added to compensate for task
dependent user interaction [Haber et al. 2001].
Figure 3.26: Data flow during rendering for the Haber et al. model. [Haber
et al. 2001].
68
Figure 4.1: Pixar’s ‘Renderfarm’ is used to render their films [Pixar 2004].
Figure 4.2: Table to show the aspects listed in each of the images by Judges
1 and 2.
Figure 4.3: Images 1 to 7, used initially to work out the central and marginal
interest aspects in the scenes with the judges.
Figure 4.4: Table to show the aspects listed in each of the images by the
Judge 3.
Figure 4.5: Images 1 to 21, used to work out the central and marginal
interest aspects in the scenes with Judges 4 – 9, as well as in the final
experiment.
Figure 4.6: Table to show the aspects listed in each of the images by Judge 4
(the rest of the results are contained in appendix A.1).
Figure 4.7: Table to show the aspects that were altered in each of the
images.
Figure 4.8: a) Original Image b) Modified Image - here a Marginal Interest
aspect of the scene has been replaced with a low quality rendering (the left
hand wall with light fitting), thus a rendering alteration has been made.
Figure 4.9: a) Original Image b) Modified Image - here a Central Interest
aspect of the scene has been removed in a presence alteration (the wine
glass).
Figure 4.10 (a & b): Photographs demonstrating the experiments being run.
Figure 4.11: a) High quality image. b) High quality image with
mudsplashes. c) Selective quality image (look at the surface of the tabletop
compared to Figure 4.11a) d) Selective quality image with mudsplashes. e)
The ‘medium’ grey image used in the flicker paradigm. f) The ordering of
the images for the two paradigms.
Figure 4.12: Overall results of the three different types of alterations made
during the experiment, Rendering, Location and Presence.
Figure 4.13: Number of cycles of the images needed to detect the rendering
quality change in the Flicker paradigm.
Figure 4.14: Number of cycles of the images needed to detect the rendering
quality change in the Mudsplash paradigm.
Figure 4.15: a) Original Image 7 b) Modified Image 7 - here a Marginal
Interest aspect of the scene has been replaced with a low quality rendering,
the whole of the tiled floor has been replaced.
Figure 4.16: Comparison of the results produced in this experiment with the
results reproduced from Rensink et al. [1999] for location and presence with
the mudsplash paradigm.
Figure 4.17: Full results of the statistical analysis using unrelated t-test for
significance.
72
74
Figure 5.1: Effects of a task on eye movements. Eye scans for observers
examined with different task instructions; 1. Free viewing, 2. Remember the
central painting, 3. Remember as many objects on the table as you can, and
4. Count the number of books on the shelves.
xiii
69
75
76
77-79
79-80
81-82
83
83
85
86
88
88
89
89
91
93
97
Figure 5.2: Close up of the same mug showing the pencils and paintbrushes,
(each room had a different number of pencils and paintbrushes).
Figure 5.3: (a) High Quality (HQ) image (Frame 26 in the animation).
Figure 5.3: (b) Low Quality (LQ) image (Frame 26 in the animation).
Figure 5.3: (c) Selectively rendered (CQ) image with two Circles of high
Quality over the first and second mugs (Frame 26 in the animation).
Figure 5.3: (d) Close-up of High Quality rendered chair and the Low
Quality version of the same chair.
Figure 5.4: Calculation of the fovea and blend areas.
Figure 5.5: Selective Quality (SQ) frame where the visual angle covered by
the fovea for mugs in the first two rooms, 2 degrees (green circles), is
rendered at High Quality and then is blended to Low Quality at 4.1 degrees
(red circles).
Figure 5.6a: Conditions tested. The order of the two animations shown for
the experiments were: HQ + HQ, HQ + LQ, LQ + HQ, HQ + SQ or SQ +
HQ
Figure 5.6b: The orderings of the conditions for randomisation in the
experiment.
Figure 5.7: Images to show the experimental setup.
Figure 5.8: Image to give location of the first mug – to focus the observer’s
attention.
Figure 5.9: Image to show participants of the experiment filling out the
questionnaire after completion of the viewing of the two animations.
Figure 5.10: Experimental results for the two tasks: Counting the pencils and
simple watching the animations (free for all).
Figure 5.11 a): How observant were the participants: colour of the carpet
(outside the foveal angle).
Figure 5.11 b): How observant were the participants: colour of the mug
(inside the foveal angle).
Figure 5.12: Full results of statistical analysis using Chi-square test (X2) for
significance.
Figure 5.13: An eye scan for an observer counting the pencils. The green
crosses are fixation points and the red lines are the saccades.
Figure 5.14: An eye scan for an observer who was simply watching the
animation.
Figure 5.15: The orderings of the conditions for randomization in the
experiment.
Figure 5.16: Experimental results for the three tasks: Simply watching the
animations, the Visual task: Counting the pencils and the Non-visual task:
Counting backwards from 1000 in steps of 2.
Figure 5.17 (a): How observant were the participants depending on the task:
Colour of the mug.
Figure 5.17 (b): How observant were the participants depending on the task:
Colour of the carpet.
Figure 5.18: Full results of statistical analysis using t-test for significance.
Figure 5.19: An eye scan for an observer counting backwards. The green
crosses are fixation points and the red lines are the saccades.
Figure 5.20: Images to show the initial experimental scene developed in
Alias Wavefront Maya.
Figure 5.21: Image to show the final experimental scene developed in
Radiance rendered with a sampling resolution of 3072x3072, in the
experiment this is referred to as High Quality (HQ).
xiv
98
99
100
100
101
101
102
103
103
104
104
105
106
107
107
108
109
110
111
113
114
114
115
117
118
119
Figure 5.22: Results from the pilot study: determining a consistently
detectable rendering resolution difference.
Figure 5.23: Sampling resolutions: a(left) 3072x3072 (HQ), b(right)
1024x1024(LQ).
Figure 5.23: Sampling resolutions: c(left) 768x768, d(right) 512x512.
Figure 5.24: Selective Quality (SQ) image showing the high quality
rendered circles located over the teapots (black).
Figure 5.25: The circles in the viewing plane from which the high quality
fovea circles are generated.
Figure 5.26: The high quality fovea circles (left) are then composited
automatically with the low quality image, adding a glow effect (blend)
around each circle to reduce pop out effects, resulting in the Selective
Quality image (SQ) (right).
Figure 5.27a: The three different types of images being tested. The ordering
image pairs shown in the experiment were: (1) HQ+HQ, (2) HQ+LQ, (3)
LQ+HQ, (4) HQ+SQ and (5) SQ+HQ.
Figure 5.27b: The orderings of the conditions for randomisation in the
experiment.
Figure 5.28: Image to show the experimental setup.
Figure 5.29: Experimental results for the two tasks: counting the teapots vs.
simply looking at the images.
Figure 5.30: Experimental results for asking the participants what objects
there were in the scene, for the counting teapots criteria only.
Figure 5.31: List of the objects that were and were not in the scene.
Figure 5.32: Full results of statistical analysis using t-test for significance.
Figure 5.33: An eye scan for an observer counting the teapots. The X’s are
fixation points and the lines are the saccades.
Figure 5.34: Perceptual difference between SQ and LQ images using VDP
[Daly 1993]. Red denotes areas of high perceptual difference.
119
Figure 6.1: A framework for progressive refinement of animation frames
using task-level information.
Figure 6.2: A frame from our renderer with no refinement iterations at all.
Figure 6.3: The same frame as Figure 6.2 after the IBR pass, but with no
further refinement.
Figure 6.4: CSFs for different retinal velocities [Daly 1998].
Figure 6.5: Smooth pursuit behaviour of the Eye. The eye can track targets
reliably up to a speed of 80.0 deg/sec beyond which tracking is erratic [Daly
1998].
Figure 6.6: A frame from our task-based animation.
Figure 6.7: Initial frame error.
Figure 6.8: Initial error conspicuity.
Figure 6.9: Final frame samples.
Figure 6.10: Standard rendering taking the same time as Figure 6.5, i.e. two
minutes.
Figure 6.11: Standard rendering taking 7 times that of Figures 6.5 and 6.9,
i.e. 14 minutes.
136
xv
121
121
122
123
123
124
124
125
127
127
128
129
131
132
138
139
141
142
146
146
147
147
148
148
Figure 6.12: Perceptual differences using VDP [Daly 1993]. Red denotes
areas of high perceptual difference. a) Visible differences between a frame
with no iterations (Figure 6.2) and a frame after the IBR pass with no further
refinement (Figure 6.3), b) Visible differences between a frame after the IBR
pass with no further refinement (Figure 6.3) and a final frame created with
our method in 2 mins (Figure 6.6), c) Visible differences between a final
frame created with our method in 2 mins (Figure 6.6) and a standard
rendering in 2 mins (Figure 6.10), and d) Visible differences between a final
frame created with our method in 2 mins (Figure 6.6) and a standard
rendering in 14 mins (Figure 6.11).
Figure 6.13 a (left): Quincunx Sampling, used to find the initial sample
locations, the other pixels are sampled when determined necessary by the
algorithm; b (right) Visual Sensitivity threshold of spatial frequencies
[Bouville et al. 1991].
Figure 6.14: Pie chart to show where the two minutes are spent in rendering
the frame.
149
Figure 7.1: Hypothesis on how saliency, task and visible difference
methodologies might be combined in terms of their priority of rendering.
Figure 7.2: Lorenzo Lotto ‘Husband and Wife’ (c. 1543). Note the
incredible detail of the table cloth which attracts the viewer’s attention more
than the rest of the scene [Hockney and Falco 2000].
159
xvi
151
152
161
Chapter 1
Introduction
Computer Graphics: 1. Graphics implemented through
the use of computers. 2. Methods and techniques for
converting data to or from graphic displays via computers. 3.
The branch of science and technology concerned with methods
and techniques for converting data to or from visual
presentation using computers.
American National Standard for Telecommunications, [ANST 2003].
The power of images is summarised by Confusious’ saying ‘One picture is worth a
thousand words’. It is, therefore, hardly surprising that computer graphics has advanced
rapidly in the last 50 years, Machover Associates Corporation predict that the
worldwide revenue for commercial/industrial computer graphics applications will be
$108.7 billion for 2003, rising to $171.1 billion for 2008 [Machover 2003]. This has led
to an exponential growth in the power of computer hardware and the increased
complexity and speed of software algorithms. In turn, this has created the ability to
render highly realistic images of complex scenes in ever decreasing amounts of time.
The realism of rendered scenes has been increased by the development of lighting
models, which mimic the distribution and interaction of light in an environment. These
methods ensure, to a high degree, the physical accuracy of the images produced and
give what is known as a photo-realistic rendering [Ward Larson and Shakespeare 1997].
1
Photo-realism, [Ferwerda 2003], is when an image produces the same visual response as
the actual scene it is trying to re-create, for example the images in Figure 1.1.
Figure 1.1: Some examples of photo-realistic images created in Radiance
[Ward Larson and Shakespeare 1998].
One major goal of virtual reality and computer graphics is achieving these
photo-realistic images at real-time frame rates. On modern computers creating such
images may take a huge amount of computational time, for example the images shown
in Figure 1.1 all took over 3 hours to render on a 1GHz Pentium processor, and
although improvements in basic rendering hardware and algorithms have produced
some remarkable results, it is still impossible, to date, to render highly realistic imagery
in real-time [Chalmers and Cater 2002].
Real-time rendering is concerned with computing and displaying images rapidly
on the computer. An image appears on the screen, the viewer acts or reacts, and this
feedback affects what is generated next. This cycle of reaction and rendering happens at
a rapid enough rate that the viewer does not see individual images, but rather becomes
immersed in a dynamic, smooth process. The rate at which images are displayed is
measured in frames per second (fps) or Hertz (Hz). At one frame per second, there is
little sense of interactivity; the user is painfully aware of the arrival of each new image.
At around 6 fps, people start to feel a basic sense of interactivity [Akeine-Moller and
Haines 2002]. An application displaying at 15 fps is certainly real-time; the user can
then focus on action and reaction. There is however, a useful limit, i.e. above which
differences in the display rate are effectively undetectable, this occurs from about 72 fps
and above [Akeine-Moller and Haines 2002].
Still, there is more to real-time rendering than interactivity. Rendering in realtime normally means three-dimensional rendering, thus some sense of connection to
2
three-dimensional space, recently this incorporates graphics acceleration hardware and
realistic rendering software as well as the interactivity. While hardware dedicated to
three-dimensional graphics has been available on professional workstations for many
years, it is only relatively recently that the use of such accelerators at the consumer level
has become possible [Kirk 2003].
Figure 1.2 shows some examples of the ‘blocky graphics’ that can be achieved
at interactive rates. It is therefore necessary to explore other methods by which
rendering times can be decreased, without reducing the desired perceived rendering
quality. One obvious way to solve this challenge is by using far more powerful
computers, but economic constraints often preclude this.
Figure 1.2: Some examples of the ‘blocky graphics’ that can be achieved at interactive
rates in the Explorer 1, a six degrees of freedom single-seater simulator.
In many cases significant effort is spent on improving details the viewer will
never even notice. From the early days of flight simulation, researchers have studied
what parts of a scene or image are most likely to be noticed in an interactive setting. If it
is possible to find a way to apply effort selectively to the small number of regions a
viewer attends in a given scene, then the perceived quality can be improved without
paying the full computational price. Most of the research in this field has attempted to
exploit gaps in low-level visual processing, similar to JPEG and other image
compression schemes [Bolin and Meyer 1998], or it has been used to reduce the
bandwidth requirements for low bitrate video teleconferencing [Yang et al. 1996].
One way of producing perceptually high-quality images is to exploit the side
effects of human visual processes, for the human eye is good but it isn’t perfect! This
thesis considers how the eye’s inability to perceive the details of certain objects in
images might be used to reduce the level of rendering detail and thus computational
time, without the change being perceptible to the viewer. In practice, as this thesis will
show, the perception of rendering quality in a virtual environment depends upon the
3
user and the task the user is performing in that environment. Most computer graphics
serve some specific visual task – telling a story, advertising a product, playing a game,
or simulating an activity such as flying. In the majority of cases, objects relevant to the
task can be identified in advance, and this can be exploited as the human visual system
focuses its attention on these objects at the expense of other details in the scene.
Visual attention is therefore the process by which humans select a portion of the
available visual information for localisation, identification and understanding of objects
in the environment. It allows the human visual system to process visual input
preferentially by shifting attention about an image, giving more attention to salient
locations and less attention to unimportant regions. When attention is not focused onto
items in a scene they can literally go unnoticed [Mack and Rock 1998].
Pioneering work in the 1890s showed that there are two general human visual
processes, termed bottom-up and top-down, which determine where humans locate their
visual attention [James 1890]. The bottom-up process is purely stimulus driven, whilst
the top-down process is directed by a voluntary control process that focuses attention
onto one or more objects that are relevant to the observer’s goal. This will be discussed
in much greater detail in the perceptual background chapter. It is precisely this topdown processing of the human visual system while performing a task that is exploited in
this thesis to significantly reduce computational time while maintaining high perceived
image quality in virtual environments.
This thesis shows by means of psychophysical experiments that it is possible to
render scene objects not related to the task at hand at a lower resolution without the
viewer noticing any reduction in quality. These findings are taken advantage of in a
computational framework that applies high-level task information to deduce error
visibility in each frame of a progressively rendered animation. By this method, it is
possible to generate perceptually high quality animated sequences at constant frame
rates in a fraction of the time normally required. A key advantage to this technique is
that it only depends on the task, not on the viewer. Unlike the foveal detail rendering
used in flight simulators, there is no need for eye-tracking or similar single viewer
hardware to enable this technology, since attentive viewers participating in the same
task will employ similar visual processes. This thesis demonstrates a perceptual
rendering framework, which is built on these principles, and shows it working for an
4
actual application, a walkthrough of a submarine where the task is to imagine being a
fire marshal whose job it is to check the usability of the fire extinguishers and lanterns.
1.1 Thesis Outline
This thesis is divided into a number of sections.
Chapter 2: Perceptual Background
This chapter first describes the aspects of the human visual system that are particularly
relevant to this research along with the relevant literature that has been done from the
anatomical and psychophysical point of view. It also covers the two main side-effects of
the Human Visual System (HVS) that this thesis investigated – Change Blindness and
Inattentional Blindness.
Chapter 3: Rendering Realistic Computer Graphical Images
Some fundamental terms for the synthesis of computer graphical images are discussed
in this chapter, starting with defining light and its properties and subsequently
describing the different illumination models that can be used to render computer
graphical images. This chapter also reviews the relevant literature and research that has
been done in this area from the perceptually-based rendering and image quality metrics
side.
Chapter 4: Change Blindness
In this chapter the set-up is described for the first experiment that was run to determine
whether or not Change Blindness could be recreated under controlled conditions with
computer generated images. It goes on to discuss the results that were achieved and how
lessons learnt in designing and performing psychophysical experiments led to
improvements in the experimental methodology.
Chapter 5: Inattentional Blindness
Change Blindness led to investigating similar flaws in the HVS including that of
Inattentional Blindness. A similar experiment was designed and carried out as in
Chapter 4, but with Inattentional Blindness being the focus for the methodology.
5
During discussions with psychologists a question was raised ‘what would
happen if the task that was were given to the observers was not a visual one?’ To this
there was no answer and thus a new experiment was designed to resolve this quandary,
the results from this experiment are then discussed in the second part of this chapter. To
confirm that the results obtained in the first part of Chapter 5 were indeed due to
Inattentional Blindness, it was important to discount any likely affects due to peripheral
vision. Thus the last section in Chapter 5 presents the results from this experiment and
shows that this thesis tested Inattentional Blindness, not just peripheral vision, by
showing that even when observers are fixated on low quality objects they simply do not
perceive the quality difference if the objects are not related to the task at hand.
Chapter 6: Selective Rendering Framework
This chapter uses the results from the psychophysical experiments to implement a
selective renderer based on these principles. The selective rendering framework
proposed and demonstrated in theory is applicable to other rendering techniques such as
raytracing, radiosity and multi-pass hardware rendering techniques, not just Radiance
[Ward Larson and Shakespeare 1998], which is the rendering system that was used to
demonstrate the principle for this thesis.
Chapter 7: Conclusion & Future Work
Chapter 7 draws conclusions from this research, discussing what the main contributions
of this research are to the field, as well as describing some other applications for its use.
It then concludes by considering some possible future avenues for this research.
6
Chapter 2
Perceptual Background
Before going into depth about how visual perception has been used in computer
graphics it is important to encompass information on how the eye turns light rays into
images that humans can perceive as well as the limits of the human visual system that
are exploited in this thesis. In this chapter the work that has been previously done in
visual perception is discussed, along with what their results show.
2.1 The Anatomy of Sight - The Human Visual System (HVS)
Human vision is a complex process that requires numerous components of the human
eye and brain to work together. The eye is made up of a fibrous protective globe, the
sclera, which has an area that is transparent anteriorly, the cornea, Figure 2.1. The
sclera is lined by the choroid, which is a vascular, highly pigmented layer that absorbs
light waves; all the other remaining structures within the eyeball refract light towards
this layer. The layer in front of the choroids is called the retina and contains
photoreceptor cells and associated neurons. Light rays enter the eye through the pupil,
travelling through the lens, vitreous humor and aqueous humor to converge on a focal
7
point in front of the retina. Focused images are, therefore, captured on the retina, much
like the film’s role in photography.
Figure 2.1: The human eye [JMEC 2003].
Figure 2.2: The peak sensitivities of the three types of cones and the rods
[Dowling 1987].
The retina is a mosaic of two basic types of photoreceptors, rods and cones; these
translate the image into neural code, which is then transmitted to the brain for
processing. Rods are sensitive to blue-green light with peak sensitivity at a wavelength
of 498 nm [Baylor et al. 1987] and are used for vision under dark or dim conditions.
The rods are highly sensitive to light and allow the eye to detect motion. They supply
8
peripheral vision and facilitate vision in dim light and at night. This is due to the rods
being comprised of a single photo pigment, which also accounts for the loss in ability to
discriminate colour in low light conditions.
Cones, on the other hand, are useless in dim conditions, but do provide humans
with basic colour vision. There are three types of cones [Baylor et al. 1987]; they are Lcones (red) with a peak sensitivity of 564 nm, M-cones (green) with a peak sensitivity
of 533 nm, and S-cones (blue) with a peak sensitivity of 437 nm, as shown in Figure 2.2
[Dowling 1987]. Cones are highly concentrated in a region near the centre of the retina
called the macula or parafovea (Figure 2.3 and Figure 2.4). This macula is the small,
yellowish central portion of the retina, and it is the area providing the clear, distinct
vision. Its field is 5 degrees in diameter. The very centre of the macula (the central 2
degrees of the visual field, about the width of your thumb at arm’s length or about the
size of eight letters on a typical page of text) is called the fovea, which literally means
‘pit’ because it is a depression in the retina. It is the area where all of the photoreceptors
are cones, roughly 180,000 cones per square mm; there are no rods in the fovea. This
cone density decreases rapidly outside of the fovea to a value of less than 5,000 per
square mm. Due to the fact that the fovea has no rods, small dim objects in the dark
cannot be seen if one looks directly at them. For this reason, to detect faint stars in the
sky, one must look just to the side of them so that their light falls on a retinal area,
containing numerous rods, outside of the macular zone. Also the fovea is almost entirely
devoid of S-cones (blue) photoreceptors (Figure 2.5). The central 30 degrees of the
visual field is focal vision. This is the area that people use to view the world, making
eye movements, if necessary, to bring images onto the fovea. To view something
outside the focal area, the viewer will general turn his/her head rather than simply move
the eyes. The rest of the visual field is ambient vision and is used to maintain spatial
orientation.
The light rays that are captured by the photoreceptors are converted into electrical
impulses and are then sent to the brain for processing, via the optic nerve (Figure 2.6)
(about one million nerve fibres for each eye). Due to the presence of this massive
bundle of nerve fibres, there are no photoreceptors at the location where it leaves the
eye. As a result, a small area of the visual field is not represented by the retina, about 5
degrees of visual angle in diameter and at 15 degrees of eccentricity from the fovea on
the temporal side of the visual field. Although you are not aware of the presence of your
blind spots, they are easy to find: with one eye closed, fixate a point in front of you.
9
Without moving your open eye, move your finger at arm’s length in front of you, from
the fixation point towards the periphery, in the horizontal plane. At an angle of about 15
degrees, your finger will disappear. At smaller and larger angles, you will be able to see
your finger. Or close your right eye, fixate on the cross, shown in Figure 2.7, with your
left eye and move the image slowly away from your face. At a distance of about 9
inches, the spot on the left should disappear [Lightwave 2003].
Figure 2.3: The three functional concentric areas of the visual field, the fovea, the
macula and focal vision [VEHF 2004].
Figure 2.4: The location of cone and rod peaks in eccentricity from the fovea
[Osterberg 1935].
10
Figure 2.5: Original Image (left). How the eye actually views the image, as seen with
the right eye focused on the centre(right). Note the yellow circle in the centre, the fovea,
it is yellow due to the lack of blue cones. The black circle to its right is the location of
the optic nerve, black as there are no photoreceptors in this part of the retina, and the
degradation in visual acuity and colour towards the periphery [CIT 2003].
Figure 2.6: Photograph of the back of the retina to show the location of the fovea and
optic nerve [CS 2004].
Figure 2.7: Finding your blind spot, i.e. the location of your optical nerve
[Lightwave 2003].
11
The neural code transmitted by the retina travels along the optic nerve to the
visual cortex of the brain. The visual cortex then refines and interprets the incoming
neural messages, determining the size, shape, colour, and details of what humans see.
At the cortical level, cognitive and perceptual factors, such as attention, expectancy,
memory, and learned identification, influence the processing of visual information.
Therefore, seeing combines the eye’s optics, retinal function, the visual cortex, and
perception.
2.2 Visual Perception
Visual Perception is the process by which humans, and other organisms, interpret and
organize visual sensation in order to understand their surrounding environment. In other
words, visual perception is defined as the process of acquiring knowledge about
environmental objects and events by extracting information from the light they emit or
reflect. The sensory organs translate this physical light energy from the environment
into electrical signals that are processed by the brain as described in the previous
section. These signals are then not understood as just pure energy, but, rather,
interpreted by perception into objects, people, events and situations. For example,
cameras have no perceptual capabilities at all; that is, they do not know anything about
the scenes they record, and the photographic images they produce merely contain
information. Whereas sighted people and animals acquire knowledge about their
environments from the information they receive, and this is termed perception in this
thesis.
Perception therefore, depends on two factors: the light coming from the world,
and our experience and expectations. When humans see, the mind actively organizes the
visual world into meaningful shapes. Figure 2.8 shows an example of how our
experience can organise the world. Look at Figure 2.8, does the mix of black and white
mean anything?
If you look at the image for long enough, the meaningless black and white bits
organise themselves into a dog. The interesting thing is that once you can see the dog
you can’t see anything but the dog, your focus is always to the dog upon viewing the
image. Experience in this example has moulded the viewer’s perception. Figure 2.9
shows a similar example; it is called a reversible figure because it can be seen two ways,
12
as a young lady or as an old woman. The important point here is that people do not
passively see the real world. The human visual system does not work like a camera,
which objectively records some reality on film; instead humans see a version of the real
world that is greatly influenced by experiences and expectations.
Figure 2.8: Image to demonstrate the
Figure 2.9: Can you see a young woman
concept of experience on perception [OL
or an old hag? [OL 2003]
2003].
2.2.1 Visual Acuity of the Human Eye
The eye has a visual acuity threshold below which an object will go undetected. This
threshold varies from person to person, but as an example, the case of a person with
normal 20/20 vision can be considered. An optician can test visual acuity by asking the
patient to read from 20 feet away a Snellen eye chart, Figure 2.10, which was invented
by Dr. Snellen of Utrecht, Holland. This chart presents a series of high contrast, black
letters on a white background, in a range of sizes. An individual who can resolve letters
approximately one inch high at 20 feet is said to have 20/20 visual acuity. If an
individual has 20/40 acuity, he or she requires an object to be at 20 feet to visualise it
with the same resolution as an individual with 20/20 acuity would when the object was
at 40 feet, i.e. the top number of the visual acuity fraction refers to the distance you
stand from the chart and the bottom number indicates the distance at which a person
with normal eyesight could read the same line you correctly read.
13
Figure 2.10: An example of a Snellen Chart used by opticians to test a person’s
visual acuity [MD support 2004].
A vision of 20/20 can also be stated as the ability to resolve a spatial pattern
separated by a visual angle of one minute of arc. Since one degree contains sixty
minutes, a visual angle of one minute of arc is 1/60 of a degree. The spatial resolution
limit is derived from the fact that one degree of a scene is projected across 288
micrometers of the retina by the eye’s lens. In this 288 micrometers dimension, there are
120 colour sensing cone cells packed. Thus, if more than 120 alternating white and
black lines are crowded side-by-side in a single degree of viewing space, they will
appear as a single grey mass to the human eye.
2.2.2 Contrast Sensitivity
Contrast sensitivity is a method for determining the visual system’s capability to filter
spatial and temporal information about the objects that humans see. Contrast sensitivity
testing differs from visual acuity testing, which only evaluates vision at one point of the
higher spatial frequencies. The Snellen chart represents high contrast (black and white)
and small objects under ideal lighting conditions. Contrast sensitivity testing evaluates
the full spectrum of vision from high to low contrast and from small to large objects,
providing a comprehensive measure of functional vision. Contrast sensitivity is defined
by the minimum contrast required to distinguish between a bar pattern and a uniform
background. There are two elements to contrast sensitivity:
14
•
Contrast or illumination - In a sensitive visual system, little contrast between
light and dark bars (low contrast) is necessary for the viewer to see the pattern.
In a less sensitive visual system, a large difference in illumination (high
contrast) is necessary before the bar pattern is recognisable. Thus contrast is
created by the difference in luminance (the amount of reflected light) reflected
from two adjacent surfaces. The contrast threshold is the minimum contrast
required for pattern detection, whilst the average luminance is kept constant
from one pattern grid to another.
•
Spatial frequency - One dark and one light band in a grid pattern is called a
cycle, as can be seen in Figure 2.11b. The spatial frequency is the number of
cycles subtending one degree of visual angle at the observer’s eye. A low spatial
frequency consists of wide bars; a high spatial frequency of narrow bars. The
human eye is most sensitive to 4 - 5 cycles per degree of visual angle [Palmer
1999]. Visual sensitivity peaks at mid-spatial frequencies, with less sensitivity at
both the higher and lower spatial frequencies.
Contrast sensitivity is thus the reciprocal of the contrast at threshold, i.e. one
divided by the minimum amount of contrast needed to see the pattern. Contrast
thresholds for various spatial frequencies can be measured and graphed on a plot of
contrast sensitivity vs. spatial frequency. Such a graph is called the Contrast Sensitivity
Function (CSF), as illustrated in the Figures 2.11a and 2.11b. Contrast at the lower
spatial frequencies demonstrates how well the observer perceives shapes and large
objects. Contrast at the higher spatial frequencies demonstrates the observer’s ability to
see lines, edges and fine detail, Figure 2.12. Points below the CSF are visible to the
observer (those are the points that have even higher contrasts than the threshold level).
Points above the CSF are invisible to the observer (those are the points that have lower
contrasts than the threshold level).
2.2.3 Eye Movements
Humans move their eyes on average 100,000 to 150,000 times every day, these are
called eye saccades, this is to stimulate the photoreceptors, for if they do not receive
continual input of new light rays they become inactive and simply do not perceive
anything, [CS 2003; SVE 2003]. Figure 2.13 is an example of an after effect or negative
image. It is difficult to make out exactly what the black and white image on the left
15
portrays. However, if you stare intently at the image without moving your eyes from the
four dots in the centre for at least 60 seconds, and then look at the white right hand side
of the image you should immediately recognize the image that your eyes tell you is
located on the white side of the image although it doesn’t really exist at all. Afterimages work on fatigue of your eyes when staring at the same spot for a long time
without moving them. The areas of the retina that are fatigued from the image are not as
capable of reading those colours as the rest of the eye’s imaging area so they show you
a ‘negative’ image.
Low High
Target Contrast
Contrast Sensitivity
High Low
Large/ coarse
features
Target Size
Small/
fine details
Poor
Visual Acuity
Good
Figure 2.11b: The Campbell-Robson
chart to show Contrast sensitivity Vs.
Spatial frequency. Spatial frequency
increases from left to right (i.e. the size
of the bars decrease), whilst contrast
sensitivity increases from the bottom of
the figure to the top, and thus the target
contrasts decreases [Prikryl and
Purgathofer 1999].
Figure 2.11a: Contrast sensitivity Vs.
Spatial frequency [Daly 1993].
Figure 2.12: The affect of contrast sensitivity on a human’s vision (normal left) [CS
2003].
Figure 2.14 is another example, stare at the centre of the red bird for 30 seconds
and then quickly stare at the birdcage. You should see a bluish-green (cyan) bird in the
16
cage. As with the previous example, when you stare at the red bird, the image falls on
one region of your retina. The red-sensitive cells in that region start to grow tired, and
thus stop responding as strongly to red light. When you suddenly shift your gaze to the
cage, the fatigued red-sensitive cells don’t respond to the reflected red light, but the blue
and green cones respond strongly to the reflected blue and green light. As a result,
where the red cones don’t respond you see a bluish-green bird in the cage.
Figure 2.13: A demonstration of a negative image due to the eyes constant need for
new light stimulation [OL 2003].
Figure 2.14: A demonstration of a colour negative image [SVE 2003].
The other reason for eye saccades is to move the fovea towards the objects of
current interest, in order to examine them with the region of the retina that has finest
detail. A single saccade is usually very rapid, taking only about 150-200ms to plan and
execute [Kowler 1995]. Saccades are essentially ballistic movements; once movement
17
has begun its trajectory cannot be altered. If the eye misses the intended target object,
another saccade has to be made to fixate it. The ballistic movement of the eye itself is
exceedingly fast, typically only taking about 30ms and reaching up to speeds of 900
degrees per second [Goldberg et al. 1991]. Between saccades, the eyes fixate the object
of interest for a variable period of time, so that the visual system can process the optical
information available in that location [Pannasch et al. 2001]. During free viewing of a
scene, fixation durations are highly variable, ranging from less than 100 ms to several
seconds [Buswell 1935], however they are typically held for around 300ms [Rao et al.
1997]. Two attentional mechanisms are thought to govern saccadic eye movements: a
spatial system, which tells the eye where to go (the ‘where’ system) and a temporal
mechanism that tells the eye when to initiate a saccade (the ‘when’ system) [Unema et
al. 2001].
Yarbus [1967], and subsequently many others, showed that once an initial eye
saccade has found an object of interest and located it on the fovea, the eye subsequently
performs a smooth pursuit movement to keep the object in foveal vision [Palmer 1999].
Due to the fact that the image of a successfully tracked object is nearly stationary on the
retina, pursuit movements enable the visual system to extract maximum spatial
information from the image of the moving object itself. Untracked objects, both
stationary and moving objects with different velocities and directions to the target
object, are experienced as smeared and unclear because of their motion on the retina. To
experience this Palmer [1999] suggests a simple example: place your finger on this
page and move it fairly quickly from one side of the page to the other. As soon as you
track your moving finger, the letters and words appear so blurred you are unable to
read them, but your finger is clear. Even when you stop moving your finger, only the
words located within the visual angle of your fovea become sharp and thus readable.
2.3 Attention
Not everything that stimulates our sensory receptors is transformed into a mental
representation. Rather, humans selectively attend to some objects and events, and thus
in doing so ignore others. Failures of attention play a major role in several severe
mental disorders. Children with attention deficit/hyperactivity disorder are extremely
distractible; this is assumed to be because they cannot focus their attention on one task
18
and ignore the many other external stimuli that might be occurring around them.
Patients with obsessive-compulsive disorder are unable to inhibit unwanted thoughts
and impulses [TMHF 2004]. Similarly, individuals with depression and manicdepressive illness often report difficulties in focusing attention and in suppressing
unwanted thoughts [CCBFTRI 2004].
Attentional capacity varies from person to person and from time to time; drugs,
alcohol, fatigue and age lessen this control. Under these conditions, the likelihood of
noticing important events then declines. Attentional capacity is also a function of
experience. A keyboard typist learning to type might have to think about the position of
each letter he/she needs to hit on the keyboard to make up certain words and cannot let
his/her mind wander. After sufficient practice, the typist should be able to type without
looking at the keyboard, or having to think where the keys are located. Muscle memory
has taken over and the fingers know just where to go for the typing task. When humans
learn to perform tasks automatically, they need no longer pay attention to them and thus
can focus their attention onto other matters. However, automatic responses can also lead
to disastrous results. For example, there was an airline pilot who was operating an
aircraft very similar, but not identical to one that he usually flew. A fire started in one of
the engines, so he flipped the switch that he thought would cut the fuel supply.
However, in this new plane the cut-off fuel switch was in a slightly different position.
The same physical motion that set the fuel supply to off in the plane he was use to
caused the fuel flow actually to increase in the new plane. Naturally, the engine burst
into a massive fire. A beginner, who would have to think about the locations of the
appropriate switches, would probably not make that error [VEHF 2003a].
Psychologists have developed many ways to assess normal and abnormal
attention. For example, in dichotic listening experiments, subjects wear earphones and
are asked to repeat a message sent to one ear while ignoring different messages sent
simultaneously to the other ear. This task is relatively difficult when presented with
similar (e.g. both male or both female) voices, but relatively easy when the two
messages are presented in different (e.g. female and male) voices. In the latter case,
humans are greatly helped by the difference in voice quality [Cherry 1953].
Dichotic listening thus provides compelling evidence for limits on attention.
Broadbent, a researcher who performed some of the first dichotic listening experiments,
theorized that our mind can be conceived as a radio receiving many channels at once.
19
Each channel contains distinct sensory perceptions, as in the two auditory events in the
dichotic listening task. Due to the fact that our attention is limited, it is difficult to
spread attention thinly over several channels at once. In fact, humans only have enough
resources to effectively attend to any one channel at once. Therefore, humans need
some mechanism to limit the information that is taken in. Broadbent’s Filter Theory
was the first to fill this role, also referred to as the ‘early selection’ theory of attention
[Broadbent 1958]. This was one of the first perceptual theories to be cast in terms of
information processing operations and flow diagrams, Figure 2.15. It is called the ‘early
selection’ theory because attention is assumed to play its role early in perceptual
processing. In particular, attention is assumed to precede recognition. The basic idea is
that sensory inputs are stored in parallel in the sensory store and the attentional filter
acts to select one ‘channel’ of information for further processing. Unattended
information quickly decays from the sensory store unless the perceiver shifts the
attentional filter to that channel. Although the perceiver can pick up on some of a
stimulus’s physical characteristics preattentively, access to information about the
identity of that stimulus requires attention. Thus the selective filter acts as a switch that
allows the information from only one sensory channel to reach the higher-level
perceptual system, which in turn only has a limited capacity.
Figure 2.15: Diagram to show Broadbent’s Filter Theory of auditory attention
[Broadbent 1958].
Subsequent studies showed that attention was not quite as simple as this. Moray
[1959] found that subjects were likely to hear their own name if it was presented in the
unattended channel. For example, if you are at a party in the middle of a conversation
and someone at the next table mentions your name, it is very possible that you would
notice it despite paying your full attention to the conversation you’re having at that
moment: this is known as the cocktail party phenomenon. The cocktail-party effect also
20
works for other words of personal importance, such as the name of your favourite
restaurant or a movie that you just saw, or the word ‘sex’. This fact causes problems for
an early selection theory such as Broadbent’s because it suggests that recognition of
your name occurs before selection, not after it as his theory would predict. This
difficulty was overcome by Treisman’s Attenuation Theory [Treisman 1960] that
suggests that selection operates both in early and late stages of the attention process, i.e.
there are two stages to the selection model. In choosing what to pay attention to, a
selective filter firstly processes incoming information, and makes its selection according
to the physical characteristics of the information. The second stage, then contains the
thresholds for certain words or objects, thus, for example, your name would have a very
low threshold.
If attention operates very early, then it is unclear how the attention system can
determine what is important. Conversely, if attention operates relatively late, after a
good deal of processing has already been done, it is easy to determine what is important,
but the advantage of selection would be lost because a lot of irrelevant information will
already have been processed. The question as to whether selection takes place at an
early or a late stage of processing is an empirical question to which many experiments
have been directed [Palmer 1999].
2.3.1 Visual Attention
Visual attention is very similar to auditory attention. Visual scenes typically contain
many more objects than can ever be recognised or remembered in a single glance and
thus, as stated before, some kind of sequential selection of objects for detailed
processing is essential if humans are to cope with this wealth of information. You must
therefore be selective in what you attend to visually, and what you select will greatly
depend on your needs, goals, plans and desires.
Although there is certainly an important sense in which a beer is always a beer,
how you react to one depends a great deal on whether or not you have just finished a 10
day no-alcohol detox or are suffering from a bad hangover. After a 10-day no-alcohol
detox your visual attention would undoubtedly be drawn immediately to the beer;
however the day after a big night out when you are suffering from a hangover, you
would probably visually ignore the beer, and if you did not, the sight as well as the
smell of it might literally nauseate you. This example shows that perception is not
21
entirely stimulus-driven, i.e. it isn’t determined solely by the nature of the
electromagnetic radiation stimulating the sensory organs but is influenced to some
extent by cognitive constraints: higher-level goals, plans, and expectations.
Coded into the primary levels of human vision is a powerful means of
accomplishing this selection, namely the fovea, as described in Section 2.1. If detailed
information is needed from many different areas of the visual environment, it can only
be obtained by redirecting the eye so that the relevant objects fall sequentially on the
fovea.
2.3.2 Top-down versus Bottom-up processes
There are two general visual attention processes, called bottom-up and top-down, which
determine where humans locate their visual attention [James 1890]. The bottom-up
process is purely stimulus driven, for example a candle burning in a dark room; a red
ball amongst a large number of blue balls; or the lips and eyes of a human face as they
are the most mobile and expressive elements of the face. In all these cases the visual
stimulus captures attention automatically without volitional control. The top-down
process, on the other hand, is directed by a voluntary control process that focuses
attention on one or more objects that are relevant to the observer’s goal when studying
the scene. Such goals or tasks may include looking for street signs, searching for a
target in a computer game, or counting the number of pencils in a mug. In this case
attention, which is normally drawn due to conspicuous aspects in a scene, deliberately
ignores these conspicuous aspects because they are irrelevant to the goal at hand.
The study of saccadic exploration of complex images was pioneered by the
Russian psychologist Yarbus [Yarbus 1967]. By using relatively crude equipment he
was able to record the fixations and saccades observers made while viewing natural
objects and scenes. Specifically he studied saccadic records made by observers studying
an image after they were given a particular task. Each observer then viewed the scene
with that particular question or task in mind. This seeking information of a specific kind
has a significant affect on the eye-gaze pattern. To illustrate this, Yarbus instructed
several observers to answer a number of different questions concerning the depicted
situation in Repin’s picture ‘An Unexpected Visitor’ [Yarbus 1967]. This resulted in
seven substantially different patterns, each one once again being easily construable as a
sampling of those picture objects that were most informative for the answering of the
22
question, as shown in Figure 2.16. Land and Furneaux [1997] have also studied the role
of saccades using a variety of visuo-motor tasks such as driving, music reading and
playing ping pong. In each case the scan path of the eye was found to play a central
functional role, closely linked to the ongoing task demands.
Researchers have examined the effects of focusing human attention and have
reported that there are two main visual side effects from dividing or disrupting visual
attention; these are Change Blindness [Rensink et al. 1997] and Inattentional
Blindness [Mack and Rock 1998]. This chapter shall now discuss these two phenomena
in more depth, describing what is known from the two effects.
Figure 2.16: Effects of task on eye movements. The same picture was examined by
subjects with different instructions; 1. Free viewing, 2. Judge their ages, 3. Guess what
they had been doing before the “unexpected visitor’s” arrival, 4. Remember the clothes
worn by the people, 5. Remember the position of the people and objects in the room &
6. Estimate how long the “unexpected visitor” had been away from the family [Yarbus
1967].
2.3.3 Change Blindness
Imagine yourself walking on the street and someone stops you to ask for directions to
the train station. While you explain how to get there, the conversation is briefly
interrupted by construction workers carrying a door between you and the person. When
the workers have passed, the person you were talking to has been replaced by another
person, who carries on the conversation as if nothing happened. Would you notice that
this person is someone else? “Of course” is the intuitive answer. However, in a study
where this was actually done, the change was noticed in only 50% of the people
[Simons and Levin 1998]. The ones who did not notice it are not a special kind of
23
people. A myriad of evidence shows that normal human beings are simply very poor at
detecting changes in the visual scene in a wide variety of situations. Levin and Simons
[1997] have concluded that ‘Our intuition that we richly represent the visual details of
our environment is illusory’. O’Regan et al. [1999a] suggests that ‘We have the
impression of simultaneously seeing everything, because any portion of the visual field
that awakens our interest is immediately available for scrutiny through an unconscious
flick of the eye or of attention’.
If a change occurs simultaneously with a brief visual disruption, such as an eye
saccade, flicker or a blink that interrupts attention then a human can miss large changes
in their field of view; Rensink et al. have termed this phenomenon Change Blindness.
Thus ‘Change blindness is the inability of the human to detect what should be obvious
changes in a scene’ [Rensink et al. 1997]. This concept has long been used by stunt
doubles in film and has also been the reason why certain film mistakes have gone
unnoticed [Balazas 1945]. For example in the movie ‘American Pie’ in the bedroom
scene the girl is holding a clear cup full of beer. The camera goes off her and when it
comes back she is holding a blue cup. The camera then leaves her but when it returns to
focus on her again the cup is clear once more [MM 2003].
The onset of a visual disruption swamps the user’s local motion signals caused by
a change, short-circuiting the automatic system that normally draws attention to its
location. Without automatic control, attention is controlled entirely by slower, higherlevel mechanisms in the visual system that search the scene, object by object, until
attention finally lands upon the object that is changing. Once attention has latched onto
the appropriate object, the change is easy to see, however this only occurs after
exhaustive serial inspection of the scene [Rensink 2001].
Change Blindness has been researched from a psychological point of view –
looking into why these flaws occur and thus leading to a greater understanding of how
the visual system works [Rensink 1999a; 1999b; Noe et al. 2000; O’Regan and Noe
2000]. Although a number of paradigms have been used to study this change detection,
the three most frequently used are the ‘Flicker’ and ‘Mudsplash’ paradigms [Rensink et
al. 1997] and the ‘Forced Choice Detection’ paradigm [Pashler 1988; Phillips 1974;
Simons 1996]. In the flicker and mudsplashed paradigms, an original image is
displayed, for approximately 240ms followed by a blank image, in the case of the
flicker paradigm, or the original image with mudsplashes, for approximately 240ms and
24
then the original image, which has been modified, for 240ms before finally showing the
blank image or the modified image with mudsplashes for 240ms, Figure 2.17. The onset
of the blank image or mudsplashes recreates the visual disruption, like a blink or an eye
saccade. To create the images Rensink et al. took photographs of a variety of scenes
modifying each image by making a presence or a location alteration. The observers
responded as soon as they detected the changing object.
Research using the flicker paradigm has produced two primary findings: 1)
observers rarely detect changes during the first cycle of alternation, and some changes
are not detected even after nearly one minute of alternation [Rensink et al. 1997]; and 2)
changes to objects that are of ‘central interest’ of a scene are detected more readily than
peripheral or ‘marginal interest’ changes [Rensink et al. 1997], suggesting that attention
is focused on central objects either more rapidly or more often, thereby allowing faster
change detection.
Central interest aspects tend to concern what one would be tempted to call the
main theme of the scene, whilst marginal aspects of a scene are those aspects which are
most commonly ignored in a scene. Thus by manipulating whether changes caused are
to occur to Central or Marginal Interest aspects in the experiments, the degree of
attention that subjects were expected to pay to the changes can be controlled. In Figure
2.18a the central interest aspects are the wine, cheese, bread etc, whilst the marginal
interest aspect would be the brown, pleated curtain behind the foreground objects.
The forced choice detection paradigm, which was originally designed to
investigate visual memory, looks at the ability to detect a change in briefly presented
arrays of simple figures or letters [Pashler 1988; Phillips 1974]. In this paradigm
observers only receive one viewing of each scene before responding, so the total
duration of exposure to the initial scene can be controlled more precisely. For example,
an initial display would be presented for 100-500 ms, followed by a brief Inter-Stimulus
Interval (ISI), followed by a second display in which one of the items was removed or
replaced on half the trials. The responses of the observers are forced-choice guesses
about whether a change had occurred or not. Observers were found to be poor at
detecting change if old and new displays were separated by an ISI of more than 60-70
ms [Pashler 1988].
25
Figure 2.17: The experiment designed by O’Regan et al. [Rensink 2001].
Figure 2.18a & b): Examples of the modifications (presence and location) made by
O’Regan et al. to the photographs, a) (top) shows a presence alteration where the cheese
has been removed, b) (bottom) shows a location alteration where the bar behind the
people has been moved [O’Regan et al. 2001].
2.3.4 Inattentional Blindness
‘How much of the visual world do humans perceive when they are not attending to it?
Do humans see certain kinds of things because they have captured their attention, or is
this because their perception of them is independent of their attention?’ Mack and Rock
researched these questions, as well as numerous others. What they found was that there
is ‘no conscious perception without attention’ [Mack and Rock 1998]. Attention
primarily selects a region in space like a ‘spotlight’ (space-based attention), and objects
26
are constructed thus by virtue of attention. This refutes the previous pre-attentional
perception theories. Gestalt psychologists believe that the organisation of the visual
field into separate objects occurs automatically at an early stage in the processing of
visual information, i.e. before focused attention can occur [Wertheimer 1924/1950].
What Mack and Rock found out also refutes that attention is inherently intentional, i.e.
it must be directed to some thing. Psychologists, such as Triesman and Neisser, believed
it must exist prior to the activation of attention, thus attention directly selects the objects
(object-based attention), and thus attention is limited by the number of objects that can
be processed at once [Triesman 1982; Neisser 1967].
Most people will agree that they have experienced occasions when this is not the
case, for instance they have experienced looking without seeing. A car driver looks left
down a pavement and pulls forward into a driveway. She hears a thud, looks down and
sees a bicyclist on the ground near her left front bumper. Or a nurse pulls a vial from a
cabinet. She looks at the label, fills the syringe and then injects the patient. The patient
receives the wrong drug and dies. These are real accidents that have occurred and a
large number of others occur under strikingly similar circumstances: someone
performing a task simply fails to see what should have been plainly visible. Afterwards,
the person cannot explain the lapse.
These examples will most commonly have occurred when they have been
completely absorbed in a task or concentrating intensively. Due to this effect humans
can miss unattended items in a scene – this is called Inattentional Blindness and it is
‘the failure of the human to see unattended items in a scene’ [Mack and Rock 1998].
During these short periods of time, even though our eyes are open and all the objects in
a scene are projected onto our retina, humans perceive very little detail about them, if
anything. This is because visual attention allows the visual system to process visual
input preferentially by shifting attention about an image, giving more attention to salient
locations and less attention to unimportant regions. When attention is not focused onto
items in a scene they can literally go unnoticed, which is what Mack and Rock state as
Inattentional Blindness. Although the phenomenon has long been known, recent
evidence shows that it is much more pervasive that anyone had imagined and that it is
one of the major causes of accidents and human error.
To understand how Inattentional Blindness occurs, it is necessary to accept a very
unintuitive idea: most of our perceptual processing occurs outside of conscious
27
awareness. Our senses are bombarded with such a large amount of input, sights, sounds,
smells, etc., that our minds cannot fully process it all. The overload becomes even
worse when humans recall information from memory or are engaged in deep thought.
To cope with the problem, humans have evolved a mechanism called attention, which
acts as a filter that quickly examines sensory input and selects a small percentage for
full processing and for conscious perception. The remaining information is lost,
unnoticed and unremembered - humans are thus inattentionally blind to it, since it never
reached their consciousness. This all happens without their awareness, so it is not a
behaviour that people can bring under a conscious control. Research suggests that
inattentional blindness is affected by four factors: conspicuity, mental workload,
expectation and capacity [VEHF 2003a].
When humans are just casually looking around, sometimes an object will jump
out or ‘pop-out’ of the background at you [Wang et al. 1994]. The term ‘conspicuity’
refers to this ability to capture attention. There are two general types of factors that
determine conspicuity. One is sensory conspicuity, the physical properties of the object.
The most important sensory factor is contrast [Gilchrist et al. 1997]. Humans see
objects, not because of their absolute brightness, but by their contrast with the
background. When there is higher contrast, objects are more conspicuous. For example,
black cars are involved in many more accidents, presumably because they are harder to
notice at night, i.e. there is less contrast between them and the surrounding environment.
Humans also are more likely to notice objects which are large and which move or
flicker. That’s why school buses, police cars, ambulances, railway crossings etcetera, all
use flickering lights. Other factors that affect an object’s conspicuity are colour
[D’Zmura 1991], size [Green 1991; Green 1992] and motion [Northdurft 1993].
Cognitive conspicuity is the other factor that determines conspicuity. It is equally
or more important in its effect at drawing attention. Humans are much more likely to
notice things that are relevant or familiar to them in some way [Wang et al. 1994].
Again this is where the cocktail phenomenon comes into play. Errors, however, often
occur when there is a new and unusual combination of circumstances in a highly
familiar circumstance. The driver who hit the bicyclist had pulled into the same
driveway every day for a year and had never seen anyone. She had unconsciously
learned that there wasn’t anything important to see down the sidewalk. Or the nurse that
was used to picking out the same size and shape bottle for a particular drug, but one day
28
it ended up containing a different drug resulting in disastrous circumstances [VEHF
2003a].
Since the amount of attention that humans have is roughly fixed, the more
attention that is focused on one task, the less there is for others. Inattentional Blindness
often occurs because part of the attention is devoted to some secondary task. In theory,
for example, any mental workload such as speaking on a mobile phone, working out
what to cook for dinner, or carrying on a conversation with someone in the back seat
can absorb some of the attentional capacity and lead to Inattentional Blindness.
However, it is not always so simple. The notion that attentional capacity is constant is
only approximately true. There is ample evidence that visual and auditory senses
employ partially independent attentional pools. That means that an auditory task
(listening to the radio) will interfere less with a visual task (seeing a pedestrian), than a
second visual task would (focusing on the car ahead) [VEHF 2003b].
Expectation also has a powerful effect on our ability to see and to notice. For
example when out with a friend you might remember more what they were wearing so
if you agree to meet up in half an hour you will subconsciously be looking for, i.e.
expecting to see, someone wearing that particular colour clothing. Coloured blobs are
far easier to scan and search for than the finer details of facial features. This strategy
usually works, but let’s say your friend has bought a new coat whilst you were
separated. On an occasion such as this, you might end up walking right by her,
completely blind to the other features, all highly familiar, which should have attracted
your attention to your friend, but because you are expecting to see a certain colour your
attention is controlled by your expectation.
Inattentional Blindness accidents are usually caused by a combination of factors:
low conspicuity, divided attention and high expectation or lower arousal. It is a natural
consequence of our adaptive mental wiring. Humans only have a certain capacity and
are therefore only able to consciously perceive a small percentage of the available
information that flows into their senses leaving them blind to the rest [Wickens 2001].
2.3.4.1 Inattentional Blindness in Magic
One of the fundamental aspects of magic is the ability to perform an undetected action,
or slight of hand, that goes unnoticed by the spectator. Many of these slights rely on
29
misdirecting the audience’s attention. Magic tricks rely on misdirection, therefore once
this is known to the observer it becomes very obvious how the magic trick works. The
failure of detecting this action is a prime example of Inattentional Blindness, in which
subjects fail to notice very obvious visual events because their attention is focused
elsewhere. Figure 2.19 shows a simple magic trick demonstrating this phenomenon.
Figure 2.19a: Choose one of the six cards displayed above.
Choose one of the six cards shown in Figure 2.19a, now memorize it. Really
focus on that card, for I’m going to try to work out which card it is that you have
chosen.
Figure 2.19b: Can you still remember your card?
I have removed the card that I think is yours from the cards displayed in Figure
2.19b. Can you still remember which card that you chose? Now look at Figure 2.19c to
see if I correctly removed the card that you selected.
You just experienced Inattentional Blindness! When I asked you to select a card
you focused all your attention on that single card. Thus you suffered Inattentional
Blindness to the other cards that were displayed. When the five cards were shown in
Figure 2.19c, you immediately looked for your card only - and it’s not there. Why?
None of the original six cards were displayed, but due to Inattentional Blindness you
looked but didn’t attend to the other cards. Therefore you can’t remember any of them
to check against the five remaining cards. Thus it looks like only your card is missing!
30
Figure 2.19c: Only five cards remain. Can you remember your card? I must have
chosen your card and removed it. Is this coincidence or is it an illusion?
2.3.5 An Inattentional Paradigm
As there was no previous paradigm for inattention a new one had to be developed by
Mack and Rock [1998] which guaranteed that the observer would neither be expecting
nor looking for the object of interest, but instead would be looking in the general area in
which it was to be presented. It was also important to engage the subject’s attention
with another task, because without some distraction task, it seemed possible that by
default attention might settle on the only object present. This distraction task was to
report which arm of a cross, which was presented briefly at 76cm away from the
observer on a PC screen, was longer. This cross was centred at the point of fixation.
In the experiment the point of fixation was shown first for 1500ms; participants
were told to stare at this fixation point mark. Then the distraction task cross was
presented on the screen for 200ms, which is generally less time than it takes to move the
eyes from one location in space to another, i.e. to make a saccadic movement. Then
finally a pattern mask appeared for 500ms that covered the entire area of the visible
screen, a circular area about 8.9 degrees in diameter. This mask was to eliminate any
processing of the visual display after it disappeared from the screen. When the mask
disappeared, subjects reported which line of the cross seemed to be longer than the
other.
This procedure was repeated for the first two or three trials and on the third or
fourth trial a critical stimulus was introduced in one of the quadrants of the cross (see
Figure 2.20) within 2.3 degrees of the point of fixation. Immediately following a trial in
which a critical stimulus had been presented subjects were asked in addition to which
arm of the cross was longer, whether or not they had seen anything on the screen other
than the cross figure. If observers reported they had seen something they were asked to
31
identify it from a selection of alternatives. In many of their experiments they actually
asked the observers to select from the alternatives even if they hadn’t reported seeing
any other stimulus. This would then indicate if the observers correctly selected the
stimulus that the observers had in fact perceived without awareness or had perceived
and quickly forgotten.
Figure 2.20: Critical Stimulus in parafovea: noncritical and critical trial displays [Mack
and Rock 1998].
Figure 2.21 shows the experimental ordering for the observers. Note how there is
only one critical stimulus trial for each observer per section, this is due to the fact that
once you have asked the observer about something else on the screen then they become
alerted to the fact that the experiment may not just be testing the ability to detect which
arm of the cross is longer. If this is the case the observers may then be actively looking
for something else as well as trying to complete the distraction cross task, thus the
observer’s attention has now become divided. After three or four trials with the divided
attention observers were then told to ignore the distraction task and report only if there
32
was something else present. This was labelled full attention, as the observers were
paying full attention to the possibility that a critical stimulus may be present.
Inattentional Blindness Paradigm – Mack and Rock 1998
Inattention Trial (Report distraction task only)
1.
2.
3.
Distraction Task
Distraction Task
Distraction Task and Critical Stimulus
Divided Attention Trial (No new instructions but participants are now alerted
to the fact that the experiment may be testing other criteria than the ability to
correctly distinguish which arm of the cross is longer)
4.
Distraction Task
5.
Distraction Task
6.
Distraction Task and Critical Stimulus
Full Attention Trial (Ignore distraction task; report only the presence of
something else)
7.
Distraction Task
8.
Distraction Task
9.
Distraction Task and Critical Stimulus
Figure 2.21: Table of the experimental ordering for observers [Mack and Rock 1998].
Their initial results showed that 25% of the observers failed to detect the
presence of the critical stimulus for the inattention trials, whether the stimulus was a
moving bar, a black or coloured square, or some coloured, geometric form. Even when
prompted observers could not pick the correct stimulus from a selection of alternatives
greater than chance, this means that the observers were not actually perceiving the
critical stimulus and then not remembering. However, all the observers perceived the
critical stimulus for the divided attention and full attention trials. This result of the
inability to perceive seemed to be caused by the fact that subjects were not attending to
the stimulus but instead were attending to something else, namely the cross. This led
Mack and Rock to term this phenomenon as Inattentional Blindness, and then go on to
adopt the hypothesis that there is no perception without attention. It must however be
noted that Mack and Rock were using the term perception to refer to explicit conscious
awareness and not subliminal, unconscious, or implicit perception.
33
100
Performance relative to Chance
90
80
70
Shape
60
Number
Color
50
Location
40
Perfect Performance
30
Chance Performance
20
10
0
Inattention
Divided
Full
Attentional Condition
Figure 2.22: Results from the inattention paradigm. Subjects perform better than
chance at recognising location, colour, and number of elements but not shape [Mack
and Rock 1998].
Having introduced Inattentional Blindness, Mack and Rock came up with lots
more questions which they wanted to solve, such as: did inattention vary with different
types of stimuli? Figure 2.22. As well as could they increase the inattentional
phenomenon by manipulating attention? This last question led to a second set of
experiments where the cross was moved to 2 degrees from the point of fixation and
instead of the critical stimulus appearing within 2.3 degrees of the fixation mark, it was
placed actually at this point, Figure 2.23. Their expectation was that this change to the
experiment should eliminate inattentional blindness because how could an observer fail
to detect a stimulus presented for 200ms at the actual point of fixation. However the
opposite occurred, not only did the observers not identify the critical stimulus more
often, but the amount of inattentional blindness more than doubled, between 60-80%
now failed to detect the critical stimulus depending on the type. This result strongly
proved even more so Mack and Rock’s earlier hypothesis that there is no perception
without attention. ‘It is to be assumed that attention normally is to be paid to objects at
fixation, then when a visual task requires attending to an object placed at some distance
from the fixation, attention to objects at the fixation might have to be actively inhibited’
[Mack and Rock 1998]. This then could explain the fact that inattentional blindness is
so much greater when the inattention stimulus is presented at fixation.
34
Figure 2.23: Critical stimulus at fixation: non-critical and critical stimulus displays
[Mack and Rock 1998].
2.4 Summary
In this chapter the anatomy of the human visual system was covered, describing both its
strengths and weaknesses. To recap in brief, the human eye only has a very small
percentage of the retina where visual acuity is paramount; this area is called the fovea. It
is densely packed with cone photoreceptors, which are responsible for visual acuity.
Thus to perceive anything in detail humans must move their eyes, via saccades, to
locate the fovea on the particular object of interest. Human vision research has shown
that humans do not actually consciously perceive anything, even if the fovea is located
on a particular object, unless attention is involved [Rensink et al. 1997; Mack and Rock
1998; Simons and Levin 1998]. It is precisely this phenomenon that is the basis for the
research in this thesis.
35
Chapter 3
Rendering Realistic Computer Graphical Images
As described in Chapter 1 there is a great need for realistic rendering of computer
graphical images in real time. The term ‘realistic’ is used broadly to refer to an image
that captures and displays the effects of light interacting with physical objects, as occurs
in real environments, and looks authentic to the human eye, whether it be a painting, a
photograph or a computer generated image, Figure 3.1. You probably learnt this lesson
in eating at seafood restaurants; if it smells like fish, it is not good fish. Well a similar
principle applies in computer graphics; if it looks like computer graphics, it is not good
computer graphics [Birn 2000].
There were no previously agreed-upon standards for measuring the actual
realism of computer-generated images. Sometimes physical accuracy is used as the
standard to be achieved, other times it is a perceptual criteria, even in many cases an
undetermined ‘looks good’ criteria can be used. Thus Ferwerda [2003] proposes three
different standards of realism that require consideration when evaluating computer
graphical images including the criterion that needs to be met for each kind of realism.
These are physical realism- in which the image provides the same visual stimulation as
the scene; photo realism- in which the image produces the same visual response as the
scene; and functional realism- in which the image provides the same visual information
as the scene [Ferwerda 2003]. This chapter describes how different rendering models
produce their realistic graphical images.
36
Figure 3.1: The goal of realistic image synthesis: an example from photography
[Stroebel et al. 1986].
Rendering is fundamentally concerned with determining the ‘most appropriate’
colour (i.e. RGB) to assign to a pixel in the viewing plane, which is associated with an
object modeled in a 3D scene. The colour of an object at a point that is perceived by a
viewer depends on several different factors:
•
The geometry of the object at that point (normal direction),
•
The position, geometry and colour of the light sources in the scene,
•
The position and visual response of the viewer,
•
The surface reflectance properties of the object at that point, and,
•
The scattering by any participating media (e.g. smoke, rising hot air).
Rendering algorithms differ in the assumptions they make regarding lighting and
reflectance in the scene. From physics, models can be derived of how light reflects from
surfaces and produces what is perceived as colour, these are called Illumination
models. In general, light rays leave a light source, e.g. a lamp or the sun, which are then
reflected from many surfaces finally being reflected into our eyes, or through an image
plane of a camera. There are two main types of rendering illumination models, local and
global. Local illumination algorithms consider lighting only from the light sources and
ignore the effects of other objects in the scene (i.e. reflection off other objects or
37
shadowing) whilst Global illumination algorithms account for all modes of light
transport [Dingliana 2004].
Illumination models can create either 1) view dependent solutions, which
determine an image by solving the illumination that arrives through the viewing plane
only, for example ray tracing, or 2) view independent solutions, which determine the
lighting distribution in an entire scene regardless of the viewing position. Views are
then taken, after the lighting simulation has been completed, by sampling the full
solution to determine the view through the viewing plane, for example radiosity.
3.1 Local illumination
The contribution from the light that goes directly from the light source and is reflected
from the surface is called a local illumination model. So, for a local illumination model,
the shading of any surface is independent from the shading of all other surfaces. The
first problem that has to be addressed in order to create shaded images of threedimensional objects is the interaction of light with a surface. This may include emission,
transmission, absorption, refraction, interference and reflection of light [Palmer 1999].
•
Emission is when light is emitted from an object or surface, for example the sun
or man-made sources, such as candles or light bulbs. Emitted light is composed
of photons generated by the matter emitting the light; it is therefore an intrinsic
source of light.
•
Transmission describes a particular frequency of light that travels through a
material returning into the environment unchanged, Figure 3.2. As a result, the
material will be transparent to that frequency of light. Most materials are
transparent to some frequencies, but not to others. For example, high frequency
light rays, such as gamma rays and X-rays, will pass through ordinary glass, but
the lower frequencies of ultraviolet and infrared light will not.
Figure 3.2: Light transmitted through a material.
38
•
Absorption describes light as it passes through matter resulting in a decrease in
its intensity, Figure 3.3, i.e. some of the light has been absorbed by the object, an
incident photon can be completely removed from the simulation with no further
contribution to the illumination within the environment if the absorption is great
enough.
Figure 3.3: Light absorbed by a material.
•
Refraction describes the bending of a light ray when it crosses the boundary
between two different materials, Figure 3.4. This change in direction is due to a
change in speed. Light travels fastest in empty space and slows down upon
entering matter. The refractive index of a substance is the ratio of the speed of
light in space (or in air) to its speed in the substance. This ratio is always greater
than one.
Figure 3.4: Light refracted through a material.
•
Interference is an effect that occurs when two waves of equal frequency are
superimposed. This often happens when light rays from a single source travel by
different paths to the same point. If, at the point of meeting, the two waves are in
phase (the crest of one coincides with the crest of the other), they will combine
to form a new wave of the same frequency, however the amplitude of this new
wave is the sum of the amplitudes of the original waves. The process of forming
this new wave is called constructive interference [Flavios 2004]. If the two
waves meet out of phase (a crest of one wave coincides with a trough of the
other), the result is a wave whose amplitude is the difference of the original
amplitudes. This process is called destructive interference [Flavios 2004]. If the
original waves have equal amplitudes, they may completely destroy each other,
39
leaving no wave at all. Constructive interference results in a bright spot;
destructive interference produces a dark spot.
•
Reflection considers incident light that is propagated from a surface back into
the scene. Reflection depends on the smoothness of the material’s surface
relative to the wavelength of the radiation [ME 2004]. A rough surface will
affect both the relative direction and the phase coherency of the reflected wave.
Thus, this characteristic determines both the amount of radiation that is reflected
back to the first medium and the purity of the information that is preserved in the
reflected wave. A reflected wave that maintains the geometrical organization of
the incident radiation and produces a mirror image of the wave is called a
specular reflection, as can be seen in Figure 3.5.
Figure 3.5: Light reflected off a material in different ways, from left to
right, specular, diffuse, mixed, retro-reflection and finally gloss
[Katedros 2004].
Bouknight [1970] introduced one of the first models for local illumination of a
surface. This included two terms, a diffuse term and an ambient term. The diffuse term
is based upon the Lambertian reflection model, which makes the value of the outgoing
intensity equal in every direction and proportional to the cosine of the angle between the
incoming light and the surface normal. The ambient term is constant and approximates
diffuse inter-object reflection. Gouraud [1971] extended this model to calculate the
shading across a curved surface approximated by a polygonal mesh. His method
calculated the outgoing intensities at the polygon vertices, and then interpolated these
values across the polygon, Figure 3.6 (middle).
Phong [1975] introduced a more sophisticated interpolation scheme where the
surface normal is interpolated across a polygon, and the shading calculation is
performed at every visible point, Figure 3.6 (right). He also introduced a specular term.
Specular reflection is when the reflection is stronger in one viewing direction, i.e. there
is a bright spot called a specular highlight. This is readily apparent on shiny surfaces.
For an ideal reflector, such as a mirror, the angle of incidence equals the angle of
specular reflection. Although this model is not physically based, its simplicity and
efficiency make it still the most commonly used local reflection model.
40
Figure 3.6: The differences between a simple computer generated polyhedral cone
(left), linearly interpolated shading to give appearance of curvature (Gouraud Shading).
Note Mach bands at edges of faces (middle) and a more complex shading calculation,
interpolating curved surface normals (Phong Shading). This is necessary to eliminate
Mach Bands (right).
3.2 Global illumination
A global illumination model adds to the local illumination model, the light that is
reflected from other non-light surfaces to the current surface. A global illumination
model is more comprehensive, more physically correct, and produces more realistic
images because it simulates effects such as colour bleeding, motion blur, caustics, soft
shadows, anti-aliasing, and area light sources. Global illumination can generate images
that are physically accurate. When measured data is used for the geometry and surface
properties of objects in a scene, the image produced should then be practically
indistinguishable from reality. However it is also more computationally expensive.
Global illumination algorithms work by solving the rendering equation proposed
by Kajiya [1986]:
Lout = LE +
LIn ƒr cos(θ) d
θ
where Lout is the radiance leaving a surface, LE is the radiance emitted by the surface,
LIn is the radiance of an incoming light ray arriving at the surface from light sources and
other surfaces, ƒr is the bi-directional reflection distribution function of the surface, θ is
the angle between the surface normal and the incoming light ray, and d
differential solid angle around the incoming light ray.
41
θ
is the
The rendering equation is graphically depicted in Figure 3.7. In this figure LIn is
an example of a direct light source, such as the sun or a light bulb, L’In is an example of
an indirect light source i.e. light that is being reflected off another surface, R, to surface
S. The light seen by the eye, Lout, is simply the integral of the indirect and direct light
sources modulated by the reflectance function of the surface over the hemisphere
.
Figure 3.7: Graphical Depiction of the rendering equation [Yee 2000].
The problem of global illumination can be seen when you have to solve the
rendering equation for each and every point in the environment. In all but the simplest
case, there is no closed form solution for such an equation so it must be solved using
numerical techniques and therefore this implies that there can only be an approximation
of the solution [Lischinski 2003]. For this reason most global illumination computations
are approximate solutions to the rendering equation.
The two major types of graphics systems that use the global illumination model
are radiosity and ray tracing.
3.2.1 Ray tracing
The first global illumination model was ray tracing [Whitted 1980], which addresses
the problems of hidden surface removal, refraction, reflection and shadows. Rays of
light are traced from the eye through the centre of each pixel of the image plane into the
scene, these are called primary rays. When each of these rays hits a surface it spawns
two child rays, one for the reflected light and one for the refracted light. This process
42
continues recursively for each child ray until no object is hit, or the recursion reaches
some specified maximum depth. Rays are also traced to each light source from the point
of intersection, these are called shadow rays, and they account for direct illumination of
the surface, Figure 3.8. If this shadow ray hits an object before intersecting with the
light source(s), then that point under consideration is in shadow. Otherwise there must
be clear path from the point of intersection of the primary ray to the light source and
thus a local illumination model can be applied to calculate the contribution of the light
source(s) to that surface point.
The simple ray tracing method outlined above has several problems. Due to the
recursion involved, and the possibly large number of rays that may be cast, the
procedure is inherently expensive. Diffuse interaction is not modelled, nor is specular
interaction, other than that by perfect mirrors and filters. Surfaces receiving no direct
illumination appear black. To overcome this an indirect illumination term, referred to as
ambient light, is accounted for by a constant ambient term, which is usually assigned an
arbitrary value [Glassner 1989]. Shadows are hard-edged, and the method is very prone
to aliasing. Also the result of ray tracing is a single image for that particular position of
viewing plane, making it a view-dependent technique.
In ray tracing each ray must be tested for intersection with every object in the
scene. Thus for a scene of significant complexity the method rapidly becomes
impracticable. Several acceleration techniques have been developed, which may be
broadly categorized into two approaches: reducing the number of rays and reducing the
number of intersection tests. Hall and Greenberg noted that the intensity of each ray is
reduced by each surface it hits, thus the number of rays should be stopped before any
unnecessary recursion to a great depth occurs [Hall and Greenberg 1983]. Another
approach, which attempts to minimize the number of ray object intersections, is spatial
subdivision. This method encloses a scene in a cube that is then partitioned into discrete
regions, each of which contains a subset of the objects in the scene. Each region may
then be recursively subdivided until each sub-region (voxel or cell) contains no more
than a preset maximum number of objects.
Several methods for subdividing space exist. Glassner [1984] proposes the use
of an octree, a structure where the space is bisected in each dimension, resulting in eight
child regions. This subdivision is repeated for each child region until the maximum tree
depth is reached, or a region contains less than a certain number of objects. Using such a
43
framework allows for spatial coherence, i.e. the theory that similar objects in a scene
affect neighbouring pixels. Rays are traced through individual voxels, with intersection
tests performed only for the objects contained within, rather than for all the objects in
the scene. The ray is then processed through the voxels by determining the entry and
exit points for each voxel traversed by the ray until an object is intersected or the scene
boundary is reached.
Figure 3.8: Ray tracing.
3.2.2 Radiosity
The radiosity method of computer image generation has its basis in the field of thermal
heat transfer [Goral et al. 1984]. Heat transfer theory describes radiation as the transfer
of energy from a surface when that surface has been thermally excited. This
encompasses both surfaces that are basic emitters of energy, as with light sources, and
surfaces that receive energy from other surfaces and thus have energy to transfer. This
thermal radiation theory can be used to describe the transfer of many kinds of energy
between surfaces, including light energy.
As in thermal heat transfer, the basic radiosity method for computer image
generation makes the assumption that surfaces are diffuse emitters and reflectors of
energy, emitting and reflecting energy uniformly over their entire area. Thus, the
radiosity of a surface is the rate at which energy leaves that surface (energy per unit
time per unit area). This includes the energy emitted by the surface as well as the energy
reflected from other surfaces in the scene. Light sources in the scene are treated as
objects that have self-emittance.
44
Figure 3.9: Radiosity [McNamara 2000]
The environment is divided into surface patches, Figure 3.9, each with a
specified reflectivity, and between each pair of patches there is a form factor that
represents the proportion of light leaving one patch (patch i) that will arrive at the other
(patch j) [Siegel and Howell 1992].
Thus the radiosity equation is:
Bi = Ei + ρi Σ Bj Fij
Where:
Σ Bj Fij (energy
reaching this
patch from other
patches)
Bi = Radiosity of patch i
Ei (energy emitted
by patch i)
Ei = Emissivity of patch i
ρi Σ Bj Fij
(energy
reflected by
patch i)
ρi = Reflectivity of patch i
Bj = Radiosity of patch j
Fij = Form factor of patch j relative to patch i
Where the form factor, Fij, is the fraction of energy transferred from patch i to
patch j, and the reciprocity relationship [Siegel and Howell 1992] states:
Aj Fji = Ai Fij
Where Aj and Ai are the areas of patch j and i respectively, Figure 3.10.
45
Figure 3.10: Relationship between two patches [Katedros 2004].
As the environment is closed, the emittance functions, reflectivity values and
form factors form a system of simultaneous equations that can be solved to find the
radiosity of each patch. The radiosity is then interpolated across each of the patches and
finally the image can then be rendered.
The basic form factor equation is difficult even for simple surfaces. Nusselt
[1928] developed a geometric analog that allows the simple and accurate calculation of
the form factor between a surface and a point on a second surface. The Nusselt Analog
involves placing a hemispherical projection body, with unit radius, at a point on the
surface Ai. The second surface, Aj, is spherically projected onto the projection body,
then cylindrically projected onto the base of the hemisphere. The form factor then may
be approximated by the area projected on the base of the hemisphere divided by the area
of the base of the hemisphere, Figure 3.11.
Cohen and Greenberg [1985] proposed that the form factor between each pair of
patches could also be calculated by placing a hemi-cube on each patch and projecting
the environment on to it as defined by the Nusselt Analog. Each face of the hemicube is
subdivided into a set of small, usually square (‘discrete’) areas, each of which has a precomputed delta form factor value, Figure 3.12. When a surface is projected onto the
hemicube, the sum of the delta form factor values of the discrete areas of the hemicube
faces which are covered by the projection of the surface is the form factor between the
point on the first surface (about which the cube is placed) and the second surface (the
one which was projected). The speed and accuracy of this method of form factor
calculation can be affected by changing the size and number of discrete areas on the
faces of the hemicube.
46
Figure 3.11: Nusselt’s analog. The form factor from the differential area dAi to element
Aj is proportional to the area of the double projection onto the base of the hemisphere
[Nusselt 1928].
Figure 3.12: The hemicube [Langbein 2004].
Radiosity assumes that an equilibrium solution can be reached; that all of the
energy in an environment is accounted for, through absorption and reflection. It should
be noted that the because of the assumption of only perfectly diffuse surfaces, the basic
radiosity method is viewpoint independent, i.e. the solution will be the same regardless
of the viewpoint of the image. The diffuse transfer of light energy between surfaces is
unaffected by the position of the camera. This means that as long as the relative position
of all objects and light sources remains unchanged, the radiosity values need not be
recomputed for each frame. This has made the radiosity method particularly popular in
architectural simulation, for high-quality walkthroughs of static environments. Figure
3.13 demonstrates the difference in image quality that can be achieved with radiosity
compared to ray tracing.
However, there are several problems with using the hemicube radiosity method.
It can only model diffuse reflection in a closed environment, it is limited to polygonal
47
environments, prone to aliasing and has excessive time and memory requirements. Also,
only after all the radiosities have been computed in the scene is the resultant image
displayed. There is a form factor between each pair of patches, so in an environment
with N patches, N2 form factors must be stored. For a scene of moderate complexity this
will require a vast amount of storage, and as the form factor calculation is non-trivial the
time taken to produce a solution can be extensive. This means that the user is unable to
alter any of the parameters of the environment until the entire computation is complete.
Then once the alteration is made, the user must once again wait until the full solution is
recomputed.
The visual quality of the rendered images in radiosity also strongly depends on
the method employed for discretizing the scene into patches. A too fine discretization
may give rise to artefacts, while with a coarse discretization, areas with high radiosity
gradients may appear [Gibson and Hubbold 1997]. To overcome these problems, the
discretization should adapt to the scene. That is, the interaction between two patches
should account for the distance between them as well as their surface area. In other
words, surfaces that are far away are discretized less finely than surfaces that are
nearby. These aspects are considered by the adaptive discretization method proposed by
Languénou et al. [1992]. It performs both discretization and system resolution at each
iteration of the shooting process, which allows for interactivity. Gibson and Hubbold
[1997] demonstrated another solution for this problem by presenting an oracle that stops
patch refinement once the difference between successive levels of elements becomes
perceptually unnoticeable.
Progressive refinement radiosity [Cohen et al. 1988] works by not attempting to
solve the entire system simultaneously. Instead, the method proceeds in a number of
passes and the result converges towards the correct solution. At each pass, the patch
with the greatest unshot radiosity is selected, and this energy is propagated to all other
patches in the environment. This is repeated until the total unshot radiosity falls below
some threshold. Progressive refinement radiosity generally yields a good approximation
to the full solution in far less time and with lesser storage requirements, as the form
factors do not all need to be stored throughout. Many other extensions to radiosity have
been developed, a very comprehensive bibliography of these techniques can be found in
[Ashdown 2004].
48
Figure 3.13: The difference in image quality between ray tracing (middle) and radiosity
(right hand image).
3.3 Radiance
Radiance is a physically based lighting simulation tool for highly accurate visualization
of lighting in virtual environments [Ward 1994, Ward Larson and Shakespeare 1998]. It
synthesizes images from three-dimensional geometric models of physical environments.
The input model may often contain thousands of surfaces and for each there must be a
description of its shape, size, location, and composition. Once the geometry has been
defined the information can be compiled into an octree [Glassner 1984]. As stated in
Section 3.2.1, the octree data structure is necessary to accelerate the ray tracing process,
i.e. for efficient rendering.
Radiance employs a light-backwards ray-tracing method, extended from the
original algorithm introduced to computer graphics by Whitted in 1980 [Whitted 1980],
to achieve accurate simulation of propagation of light through an environment. The
approach encompasses a hybrid deterministic/stochastic ray tracing approach to
efficiently solve the rendering equation, while maintaining an optimum balance between
speed of computation and accuracy of the solution. Light is followed along geometric
rays from the viewpoint into the scene and back to the light sources. The result is
mathematically equivalent to following light forward, but the process is generally more
efficient because most of the light leaving a source never reaches the point of interest.
The chief difficulty of light-backwards ray tracing as practiced by most rendering
software is that it is an incomplete model of light interaction. In particular, as stated in
section 3.2.1, this type of algorithm fails for diffuse interreflection between objects,
which it usually approximated with a constant ambient term in the illumination
equation. Without a complete computation of global illumination, a rendering method
49
cannot produce accurate values and is therefore of limited use as a predictive tool for
lighting visualisation. Radiance overcomes this shortcoming with an efficient algorithm
for computing and caching indirect irradiance values over surfaces, while also
providing more accurate and realistic light sources and surface materials [Ward and
Heckbert 1992]. A comprehensive description on how Radiance renders its realistic
images can be found at [Radiance 2000] or by reading Rendering with Radiance - The
Art and Science of Lighting Visualization [Ward Larson and Shakespeare 1998].
A strictly deterministic ray tracer produces the exactly same rendering each time
that it is run, by contrast a stochastic renderer employs random processes and thus each
time the algorithm is repeated it will produce slightly different results. Light in itself is
stochastic, and it is only the fact that due to the large number of photons in a scene that
gives the persona that it is stable at any one point [Stam 1994]. Therefore a stochastic
renderer produces a more accurate outcome; however is quite time consuming to
achieve a noise-free solution.
Figure 3.14: Renderings of a simple environment.
Ray traced Solution (left), Radiosity Solution (center), and Radiance Solution (right)
[McNamara 2000].
Studies have shown that Radiance is capable of producing highly realistic and
accurate imagery [Khodulev and Kopylov 1996; McNamara 2000]. It has been used to
visualize the lighting of homes, apartments, offices, churches, archaeological sites,
museums, stadiums, bridges, and airports [Radiance 2000]. It has also answered
questions about light levels, aesthetics, daylight utilization, visual comfort and
visibility, energy savings potential, solar panel coverage, computer vision, and
circumstances surrounding accidents [Ward Larson and Shakespeare 1998]. For these
reasons Radiance was chosen as a lighting simulation package in many of the
experiments presented in this thesis.
50
To give an idea of the differences between these three approaches, ray tracing,
radiosity and Radiance, Figure 3.14 Shows from left to right, a ray traced image, an
image generated using radiosity and finally an image computed with the Radiance
lighting simulation package [McNamara 2000].
3.4 Visual Perception in Computer Graphics
Creating realistic computer graphical images is an expensive operation, and therefore
any saving of computational costs without reducing the perceived quality must be
substantially beneficial. Since human observers are the final ones to judge the fidelity
and quality of the resulting images, visual perception issues should be involved in the
process of creating these realistic images, and can be considered at the various stages of
computation, rendering and display [Chalmers et al. 2000].
3.4.1 Image Quality Metrics
Typically the quality of synthesised images is evaluated using numerical techniques that
attempt to quantify the fidelity of the images by a pair-wise comparison, this is often a
direct comparison to a photo taken of the scene that is being recreated. Several image
quality metrics have been developed whose goal is to predict the differences between a
pair of images. Mean Squared Error (MSE) is a simple example of such a program. This
kind of error metric is known as physically based. However to create more meaningful
measures of fidelity which actually correspond to the assessments made by humans
when viewing images, error metrics should be based on the computational models of the
human visual system. These are known as perceptual error metrics. For these metrics a
better understanding of the Human Visual System is needed, which can lead to more
effective comparisons of images, but also can steer image synthesis algorithms to
produce more realistic, reliable images and, as previously stated, avoid realistically
synthesizing any feature that is simply not visible to the Human Visual System [Daly
1993; Myszkowski 1998].
Perceptual error metrics operate on two intermediate images of a global
illumination solution in order to determine if the visual system is able to differentiate
these two images. They inform a rendering algorithm when and where it can stop an
51
iterative calculation prematurely because the differences between the present calculation
and the next are not perceptually noticeable. Thus, perceptually-based renderers attempt
to expend the least amount of work to obtain an image that is perceptually
indistinguishable from the fully converged solution, Figure 3.15.
Rendering
Engine
Lighting
Solution
N
Lighting
Solution
N+1
Perceptible difference,
render solution N + 2
Visual
Model
Visual
Model
Perceptual
Response
Perceptual
Response
Perceptual
Difference
No perceptual
difference
Stop!
Figure 3.15: Conceptually how a perceptually-assisted renderer makes use of a
perceptual error metric to decide when to halt the rendering process [Yee 2000].
One such perceptual metric is the Visual Difference Predictor (VDP) [Daly
1993]. The algorithm consists of three major components, as shown in Figure 3.16: a
calibration component, used to transform two input images into values that can be
understood by the second component, a model of the human visual system. The
difference of responses of the human visual system to the two images is then visualized
by the difference visualization component. The VDP output is a difference image map
that predicts the probability of detection of the visual differences between the two
images for every pixel.
52
Viewing
Parameters
Image 1
Calibration
HVS
Model
Difference
Visualisation
Image 2
Calibration
Output of
image of
visible
difference
HVS
Model
Figure 3.16: Block structure of the Visual Difference Predictor [Prikryl and
Purgathofer 1999].
The first stage, the calibration process takes a number of input parameters which
describe the conditions for which the VDP will be computed; these include the viewing
distance of the observer, the pixel spacing and the necessary values for the display
mapping. The human visual model used in VDP concentrates on the lower-order
processing of the visual path, i.e. that from the retina to the visual cortex. The model
addresses three main sensitivity variations of the human visual system: the dependence
of sensitivity on the illumination level; on the spatial frequency of visual stimuli; and on
the signal content itself.
Thus the first part of the human visual system model, see Figure 3.17, to account
for the variations in sensitivity as a function of light level, is the application of a nonlinear response function to the luminance channel of each of the images. This also
accounts for adaptation and the non-linear response of the retinal neurons. Next the
image is converted to the frequency domain and the Contrast Sensitivity Function (CSF)
is used to determine the visual sensitivity to spatial patterns in the retinal response
image. The transformed data is weighted with the CSF i.e. the scaled amplitude for each
frequency is multiplied by the CSF for that spatial frequency. This data is then
normalized by dividing each point by the original image mean to give local contrast
information. The Contrast Sensitivity Function is an experimentally derived equation
53
that quantifies the human visual sensitivity to spatial frequencies, as described in
Section 2.2.2.
The image is then divided into 31 independent streams. It is known that the
human visual system has specific selectivity based on orientation (6 channels) and
spatial frequency (approximately one octave per channel). Each of the five overlapping
spatial frequency bands is combined with each of the six overlapping orientation bands
to split the image into 30 channels. Along with the orientation-independent base band
this gives a total of 31 channels. The individual channels are then transformed back into
the spatial domain.
A mask is associated with each channel, this mask function models the
dependency of sensitivity on the signal contents due to the postreceptoral neural
circuitry [Ferwerda et al. 1997]. The product of the CSF and the masking function is
known as the threshold elevation factor. Contrasts of corresponding channels in one
image are subtracted from those of the other images, and the difference is scaled down
by the elevation factor i.e. to weight the spatial frequency and orientation signals.
The scaled contrast differences are used as the argument to a psychometric
function to compute a detection probability. The psychometric function yields a
probability of detection of a difference for each location in the image, for each of the 31
channels. The detection probabilities for each of the channels are finally combined to
derive a per pixel probability of detection value.
The Sarnoff Visual Discrimination Model (VDM) [Lubin 1995] is another well
designed perceptual metric that is also used for determining the perceptual differences
between two images. VDM takes two images, specified in CIE XYZ colour space,
along with a set of parameters for the viewing conditions as input and outputs a Just
Noticeable Difference (JND) map. One JND corresponds to a 75% probability that an
observer viewing the two images would detect a difference [Lubin 1997]. VDM focuses
more attention on modeling the physiology of the visual pathway, therefore it operates
in the spatial domain, unlike VDP which operates in the frequency domain. The main
components of the VDM include spatial re-sampling, wavelet-like pyramid channelling,
a transducer for JND calculations and a final refinement step to account for CSF
normalization and dipper effect simulation. Li et al. [1998] discuss and compare in
depth both the VDP and VDM metrics.
54
Image 1
Amplitude Non-linearity
Contrast Sensitivity
Spatial Frequency
Hierarchy
31 Channels of Spatial
Frequency and Orientation
Signals
Masking Function
From Image 2
Masking Function
Difference
Psychometric Function
Probability Summation
Figure 3.17: An Overview of the Visual Difference Predictor – demonstrating in more
detail the ordering of the processes that are involved [Yee 2000].
One example of the application of a perceptual metric is that proposed by
Myszkowski [1998], which uses the quantitative measurements of visibility produced
by VDP developed by Daly [1993] to improve both efficiency and effectiveness of
progressive global illumination computation. The technique uses a mixture of stochastic
(density estimation) and deterministic (adaptive mesh refinement) algorithms in a
sequence and optimizes to reduce the differences between the intermediate and final
images as perceived by the human observer in the course of lighting computation.
55
In the Myszkowski [1998] model, the VDP responses are used to support the
selection of the best component algorithms from a pool of global illumination solutions,
and to enhance the selected algorithms for even better progressive refinement of the
image quality. The VDP is also used to determine the optimal sequential order of the
component-algorithm execution, and to choose the points at which switchover between
the algorithms should take place. However, as the VDP is computationally expensive, it
is applied in this method exclusively at the design and tuning stages of the composite
technique, and so perceptual considerations are embedded into the resulting solution,
although no actual VDP calculations are performed during lighting simulation
[Volevich et al. 2000], Figure 3.18. This proposed global illumination technique
provides intermediate image solutions of high quality at unprecedented speeds, even for
complex scenes, Myszkowski [1998] quotes a speedup of roughly 3.5 times.
Myszkowski et al. [1999] addressed the perceptual issues relevant to rendering
dynamic environments and proposed a perception-based spatiotemporal Animation
Quality Metric (AQM) which was designed specifically for handling synthetic
animation sequences. They incorporated the spatiotemporal sensitivity of the human
visual system into the Daly VDP model [Daly 1993]. Thus the central part of the AQM
is the model for the spatiovelocity Contrast Sensitivity Function (CSF) which specifies
the detection threshold for a stimulus as a function of its spatial and temporal
frequencies [Kelly 1979]. Myszkowski et al.’s framework assumes that all objects in the
scene are tracked by the human eye. The tracking ability of the eye is very important in
the consideration of spatiotemporal sensitivity [Daly 1998]. If a conservative approach
is taken, as with Myszkowski et al. [1999], and all objects are assumed to be tracked
then this effectively reduces a dynamic scene to a static scene, thus negating the benefits
of spatiotemporally-based perceptual acceleration [Yee 2000].
The human visual system model that Myszkowski et al. used in the AQM was
also limited to the modeling of the early stages of the visual path i.e. that from the retina
to the visual cortex, Osberger [1999] showed that adding further extensions to such
early vision models does not produce any significant gain. Myszkowski et al. [2001] did
demonstrate applying AQM to guide the global illumination computation for dynamic
environments. They reported a speedup of indirect lighting computation of over 20
times for their tested scenes, describing the resulting animation quality as much better
than the frame-by-frame approach [Myszkowski et al. 2001]. Their test scenes were
56
composed of less than 100,000 triangles; however they foresee that they would achieve
a similar speedup for more complex scenes due to the fact that collecting photons in the
temporal domain is always cheaper than shooting them from scratch for each frame.
(a)
(b)
(c)
(d)
Figure 3.18: (a) shows the computation at 346 seconds, (b) depicts the absolute
differences of pixel intensity between the current and fully converged solutions, (c)
shows the corresponding visible differences predicted by the VDP, (d) shows the fully
converged solution which is used as a reference [Volevich et al. 2000].
Bolin and Meyer [1998] also used an application of a perceptual metric to guide
their global illumination algorithm. However, instead of using VDP they used a
computationally efficient and simplified variant of the Sarnoff Visual Discrimination
Model (VDM) [Lubin 1995]. They used the upper and lower bounded images from the
computation results at intermediate stages and used the predictor to get an error estimate
for that particular stage. The image quality model was then used to control where to take
samples in the image, and also to decide when enough samples have been taken across
the entire image, providing a visual stopping condition. Their method executed in a
1/60th of the time of the Sarnoff VDM and rendered their example images in only 10 –
28.1% of the time taken to render the images using either uniform or objective sampling
57
techniques, performing well for all test cases even when the uniform and objective
sampling techniques failed.
Ferwerda et al. [1996] developed a computational model of visual adaptation for
realistic image synthesis based on psychophysical experiments. The model captured the
changes in threshold visibility, colour appearance, visual acuity, and sensitivity over
time, all of which are caused by the visual system’s adaptation mechanisms. They used
the model to display the results of global illumination simulations illuminated at
intensities ranging from daylight down to starlight. The resulting images capture better
the visual characteristics of scenes viewed over a wide range of illumination levels. Due
to the fact that the model is based on psychophysical data it can be used to predict
faithfully the visibility and appearance of scene features. This allows their model to be
used as the basis of perceptual error metrics to limit the precision of global illumination
calculations based on visibility and appearance criteria and could eventually lead to
time-critical global illumination rendering algorithms that achieve real-time rates
[Ferwerda et al. 1996].
Greenberg et al. 1997 had the now common goal to develop a physically based
lighting model that encompassed a perceptually based rendering procedure to produce
synthetic images that were visually and measurably indistinguishable from real-world
images. To obtain fidelity of their physical simulation they subdivided their research
into three parts: the local light reflection model, the energy transport phase, and the
visual display procedures. The first two subsections being physically based, and the last
perceptually based. The approaches and algorithms that they proposed were at the time
not practical, and required excessive computational resources. However, it yielded an
important scientific insight into the physical processes of light reflection and light
transport and clarified the computational bottlenecks.
Pattanaik et al. [1998; 2000] introduced a visual model for realistic tone
reproduction. The model is based on a multi-scale representation of luminance, pattern,
and colour processing in the human visual system, and provides a coherent framework
for understanding the effects of adaptation on spatial vision. The model also accounts
for the changes in threshold visibility, visual acuity, and colour discrimination, as well
as suprathreshold brightness, colourfulness and the apparent contrast that occur with
changes in the level of illumination in scenes. They then apply their visual model to the
problem of realistic tone reproduction and develop a tone reproduction operator to solve
58
this quandary. Their method however is a static model of vision, thus they propose that
future models should incorporate knowledge about the temporal aspects of visual
processing in order to allow both dynamic scenes, and scenes where the level of
illumination is dynamically changing to properly display images.
Ramasubramanian et al. [1999] reduced the cost of using such metrics as VDP
and VDM by decoupling the expensive spatial frequency component evaluation from
the perceptual metric computation. They argued that the spatial frequency content of the
scene does not change significantly during the global illumination computation step.
Therefore they proposed pre-computing this information from a cheaper estimate of the
scene image. They reused the spatial frequency information during the evaluation of the
perceptual metric, without having to recalculate it at every iteration of the global
illumination computation. They carried out this pre-computation from the direct
illumination solution of the scene. However their technique does not take into account
any sensitivity loss due to motion and hence is not well suited for use in dynamic
environments.
McNamara [2000] and McNamara et al. [2001] introduced a method for
measuring the perceptual equivalence between a real scene and a computer simulation
of the same scene. The model developed was based on human judgments of lightness
when viewing the real scene, a photograph of the real scene and nine different computer
graphics simulations including a poorly meshed radiosity solution and a ray traced
image. The results of their experiments with human participants showed that certain
rendering solutions, such as the tone-mapped one, were of the same perceptual quality
as a photograph of the real scene.
Dumont et al. [2003] propose using efficient perceptual metrics within a decision
theoretic framework to optimally order rendering operations, to produce images of the
highest visual quality given the system constraints of commodity graphics hardware.
They demonstrate their approach by using their framework to develop a cache
management system for a hardware-based rendering system that uses map-based
methods to simulate global illumination effects. They show that using their framework
significantly increases the performance of the system and allows for interactive walkthrough of scenes with complex geometry, lighting and material properties.
59
3.4.2 Simplification of complex models
Other research has investigated how complex detail in the models can be reduced
without any reduction in the viewer’s perception of the models, for example Maciel and
Shirley’s visual navigation system which uses texture mapped primitives to represent
clusters of objects to maintain high and approximately constant frame rates [Maciel and
Shirley 1995]. Their approach works by ensuring that each visible object, or a cluster
that includes it, is drawn in each frame for the cases where there are more unoccluded
primitives inside the viewing frustum than can be drawn in real-time on a workstation.
Their system also supports the use of traditional Level-Of-Detail (LOD) representations
for individual objects, and supports the automatic generation of a certain type of LOD
for objects and clusters of objects. The system supports the concept of choosing a
representation from among those associated with an object that accounts for the
direction from which the object is viewed. The system as a whole can be viewed as a
generalization of the level-of-detail concept, where the entire scene is stored as a
hierarchy of levels-of-detail that is traversed top-down to find a good representation for
a given viewpoint. However their system does not assume that visibility information can
be extracted from the model and thus it is especially suited for outdoor environments.
Luebke et al. [2000] presented a polygonal simplification method which uses
perceptual metrics to drive the local simplification operations, rather than the geometric
metrics common to the other algorithms in this field. Equations derived from
psychophysical studies which they ran determine whether the simplification operation
will be perceptible; they then only perform the operation if its effect is judged
imperceptible. To increase the range of the simplification, they use a commercial eye
tracker to monitor the direction of the user’s gaze allowing the image to be simplified
more aggressively in the periphery than at the centre of vision. Luebke and Hallen
[2001] extended this work by presenting a framework for accelerating interactive
rendering produced on their psychophysical model of visual perception. In their user
trial the 4 subjects could perform not better than chance in perceiving a difference
between a rendering of a full-resolution model and a rendering of a model simplified
with their algorithm in 200 trials; thus showing that perceptually driven simplification
can reduce model complexity without any perceptually noticeable visual effects, Figure
3.19.
60
Figure 3.19: The original Stanford Bunny model (69,451 faces) and a simplification
made by Luebke and Hallen’s perceptually driven system (29,866 faces). In this view
the user’s gaze is 29° from the centre of the bunny [Luebke and Hallen 2001].
Marvie et al. [2003] propose a new navigation system, which is built upon a
client-server framework. With their system, one can navigate through a city model,
represented with procedural models that are transmitted to clients over a low bandwidth
network. The geometry of the models that they produce is generated on the fly and in
real time at the client side. Their navigation system relies on several different kinds of
pre-processing, such as space subdivision, visibility computation, as well as a method
for computing some parameters used to efficiently select the appropriate level of detail
of the objects in the scene. Both deciding on the appropriate level of detail and the
visibility computation are automatically performed by the graphics hardware.
Krivánek et al. [2003] devised a new fast algorithm for rendering the depth-offield effect for point-based surfaces. The algorithm handles partial occlusion correctly,
does not suffer from intensity leakage and it renders depth-of-field in presence of
transparent surfaces. The algorithm is new in that it exploits the level-of-detail to select
the surface detail according to the amount of depth-blur applied. This makes the speed
of the algorithm practically independent of the amount of depth-blur. Their proposed
algorithm is an extension of the Elliptical Weighted Average (EWA) surface splatting
[Heckbert 1989]. It uses mathematical analysis to increase the screen space EWA
surface splatting to handle depth-of-field rendering with level-of-detail.
3.4.3 Eye Tracking
The new generation of eye-trackers replaces the unsatisfactory situation of former
laboratory studies (with subject’s immobilisation, prohibited speech, darkness, recalibration of the method after every blink etc.) by ergonomically acceptable
61
measurement of gaze direction in a head-free condition. The new methods are fast and
increasingly non-invasive. This means that eye tracking no longer interferes with the
activities the participant is trying to carry out. As gaze-direction is the only reliable
(albeit not ideal) index of the locus of visual attention there is an almost open-ended list
of possible applications of this methodology of eye tracking.
Gaze-contingent processing can be used for enhancing low-bandwidth
communication, firstly by an appropriate selection of information and channels and,
secondly, by transmission with high resolution of only those parts of an image which
are at the focus of attention. In this way also low-bandwidth channels can be optimally
exploited, e.g. in Virtual Reality applications. There is, however, a general problem on
the way to realisation of most of these applications as ‘not every visual fixation is filled
with attention because our attention can be directed inward, on internal transformation
of knowledge’ [Challis et al. 1996], and as discussed in section 2.3.5, ‘without attention
there is no conscious perception’ [Mack and Rock 1998]. This is why knowledge of the
actual limits of human visual processing of information is needed to fully take
advantage of this exploitation.
3.4.4 Peripheral vision
This research builds on human vision work that indicates the human eye only processes
detailed information from a relatively small part of the visual field [Osterberg 1935].
Watson et al. [1997a; 1997b] proposed a paradigm for the design of systems that
manage level of detail in virtual environments. They performed a user study to evaluate
the effectiveness of high detail insets used in head-mounted displays. Subjects
performed a search task with different display types. Each of these display types was a
combination of two independent variables: peripheral resolution and the size of the high
level of detail inset, see Figure 3.20. The high detail inset they used was rectangular and
was always presented at the fine level of resolution. The level of peripheral resolution
was varied at three possible levels; fine resolution (320x240), medium resolution
(192x144) and coarse (64x48). There were three inset sizes; the large inset size was half
the complete display’s height and width, the small inset size was 30% of the complete
display’s height and width, the final size was no inset at all.
62
Figure 3.20: Watson et al.’s experimental environment as seen with the coarse display
[Watson et al. 1997b].
Their results showed observers found their search targets faster and more
accurately for the fine resolution no inset condition, however it was not significantly
better than the fine resolution inset displays with either medium or coarse peripheral
resolutions. Thus peripheral level of detail degradation can be a useful compromise to
achieve desired frame rates. Watson et al. are continuing to work on measuring and
predicting visual fidelity for simplifying polygonal models [Watson et al. 2000; 2001].
McConkie and Loschky [1997; 2000] and Loschky et al. [1999; 2001] had
observers examining complex scenes with an eye linked multiple resolution display,
which produces high visual resolution only in the region to which the eyes are directed
(Figure 3.21). Image resolution and details outside this ‘window’ of high resolution are
reduced. This significantly lowers bandwidth as many interactive single-user image
display applications have prohibitively large bandwidth requirements. Their recent
study measured viewers’ image quality judgments and their eye movement parameters,
and found that photographic images filtered with a window radius of 4.1 degrees
produced results statistically indistinguishable from that of a full high-resolution
display.
This approach did, however, encounter the problem of keeping up with updating
the multi-resolutional display after an eye movement without disturbing the visual
processing. The work has shown that the image needs to be updated after an eye saccade
within 5 milliseconds of a fixation otherwise the observer will detect the low resolution.
These high update rates were only achievable by using an extremely high temporal
63
resolution eye tracker and by pre-storing all possible multi-resolutional images that
were to be used.
Figure 3.21: An example of McConkie’s work with an eye linked multiple resolution
display [McConkie and Loschky 1997].
3.4.5 Saliency models
Saliency models determine what is visually important within the whole scene. It is
based on the idea first advanced by Koch and Ullman [1985] of the existence, in the
brain, of a specific visual map encoding for local visual conspicuity. The purpose of this
saliency map is to represent the conspicuity, or saliency, at every location in the visual
field by a scalar quantity, and to guide the selection of attended locations, based on the
spatial distribution of saliency [Wooding 2002]. A combination of the feature maps
provides bottom-up input to the saliency map, modelled as a dynamical neural network.
The system developed by Itti et al. [1998] attempts to predict given an input image in
which Regions Of Interest (ROIs) in the image will automatically and unconsciously
draw your attention towards them.
This biologically inspired system, takes an input image which is then
decomposed into a set of multi-scale neural feature maps that extract local spatial
discontinuities in the modalities of colour, intensity and orientation [Itti et al. 1998]. All
feature maps are then combined into a unique scalar saliency map that encodes for the
salience of a location in the scene irrespective of the particular feature that detected this
location as conspicuous. A winner-takes-all neural network then detects the point of
highest salience in the map at any given time, and draws the focus of attention towards
this location. In order to allow the focus of attention to shift to the next most salient
64
target, the currently attended target is transiently inhibited in the saliency map. This is a
mechanism that has been extensively studied in human psychophysics and is called the
inhibition-of-return [Itti and Koch 2001]. The interplay between winner-takes-all and
inhibition-of-return ensures that the saliency map is scanned in order of decreasing
saliency by the focus of attention, and generates the model’s output in the form of
spatio-temporal attentional scanpaths, as can be seen in Figures 3.22 and 3.23.
Figure 3.22: How the saliency map is created from the feature maps of the input image
[Itti 2003a].
Figure 3.23: Diagrams to show how the saliency model has inhibited the first fixational
point that the system has highlighted as the most salient target, so the next most salient
point can be found. Left – image showing the first fixation point, right – the
corresponding saliency map with the fixation point inhibited [Itti 2003b].
65
Yee [2000] and Yee et al. [2001] presented a method to accelerate global
illumination computation in pre-rendered animations by adapting the model of visual
attention by Itti and Koch to locate regions of interest in a scene and to modulate
spatiotemporal sensitivity (Figure 3.24). They create a spatiotemporal error tolerance
map [Daly 1998], constructed from data based on velocity dependent contrast
sensitivity, and saliency map [Itti and Koch 2000] for each frame in the animation. The
saliency map is, as previously stated, obtained by combining the conspicuity maps of
intensity, colour, and orientation with the new addition of motion. This then creates an
image where bright areas denote greater saliency, where attention is more likely to be
drawn (Figure 3.24c). An Aleph map is then created by combining the spatiotemporal
error tolerance map with the saliency map (Figure 3.24d). This resulting Aleph map is
then used as a guide to indicate where less rendering effort should be spent in
computing the lighting solution and thus significantly reduce the overall computational
time to produce their animations, Yee et al. [2001] demonstrated a 4-10 times speedup
over the time it would have taken to render the image in full.
Figure 3.24: (a) Original Image.
(b) Image rendered using the Aleph map.
Figure 3.24: (c) Saliency map of the
original image.
(d) Aleph map used to re-render the
original image [Yee 2000].
66
Haber et al.’s [2001] perceptually-guided corrective splatting algorithm presents
an approach for interactive navigation in photometrical complex environments with
arbitrary light scattering characteristics. The model operates by evaluating a projected
image on a frame-by-frame basis. At the preprocessing stage they use a Density
Estimation Particle Tracing technique [Volevich et al. 2000], which is capable of
handling surfaces with arbitrary light scattering functions during global illumination
computation. They also perform mesh-to-texture post-processing [Myszkowski and
Kunii 1995] to reduce the complexity of the mesh-based illumination maps.
During interactive rendering, they use graphics hardware to display the precomputed view-independent part of the global illumination solution using illumination
maps. Objects with directional characteristics of scattered lighting are ray traced in the
order corresponding to their saliency as predicted by the visual attention model
developed by Itti et al. [1998]. However they extended the original purely saliencydriven model of Itti et al. to also take into account volition-controlled and taskdependent attention, Figure 3.25.
The attention model developed by Itti et al. was originally designed for static
images. However, in interactive application scenarios, the user’s volitional focus of
attention should also be considered, and this is what Haber et al. try to capture by the
addition of their top-down model. A common observation being that the user tends to
place objects of interest in the proximity of the image centre and to zoom in on those
objects in order to see them in more detail. Therefore in their approach, which is shown
on the right side of Figure 3.25, they determine a bounding box for every non-diffuse
object using a unique identification code in the stencil mask and measure the distance
between the bounding box centre and the image centre. They normalize the obtained
distance with respect to half the length of the image diagonal. They also consider the
object coverage in the image plane measured as the percentage of associated pixels in
the stencil mask with respect to the number of pixels in the whole image. Although
these two factors are often considered as bottom-up saliency measures for static images,
Haber et al. argue that for interactive applications their meaning changes towards more
task-driven top-down factors.
67
Figure 3.25: General architecture of the attention model. The bottom-up component is a
simplified version of the model developed by Itti et al. [1998]. The top-down
component was added to compensate for task-dependent user interaction [Haber et al.
2001].
A hierarchical sampling technique is then used to cover the image region
representing an “attractive” object rapidly with point samples. These point samples are
splatted into the frame buffer using a footprint size that depends on the hierarchy level
of the samples. To ensure that this corrective splatting affects only those objects, for
which the samples have been computed, they use a stencil test feature of the graphics
hardware together with the stencil mask that has been created during rendering. Sample
caching can then be performed to reuse the samples computed for previous frames.
68
Rendering of the illumination maps, corrective splatting, and the evaluation of visual
attention models are all implemented as independent and asynchronously operating
threads, see Figure 3.26, which perform best on multi-processor operation platforms.
Their implementation delivers good results when interactively navigating through
scenes of medium complexity at about 10 fps.
Figure 3.26: Data flow during rendering for the Haber et al. model. The dotted lines
depict data flow between different threads. The following abbreviations are used: FB =
frame buffer, SB = stencil buffer, PQ = priority queue, SQ = sample queue. The number
of ray tracing threads depends on the number of processors available [Haber et al.
2001].
Marmitt and Duchowski [2002] looked at the fidelity of using such a model of
visual attention to predict the visually salient features in a virtual scene. Itti et al.’s
model [1998] had been previously shown to be quite accurate over still natural images
[Privitera and Stark 2000]; however it was not clear how well the model generalises to
the dynamic scene content presented during virtual reality immersion. Marmitt and
69
Duchowski’s analysis showed that the correlation between actual human and artificially
predicted scan-paths was much lower than predicted. They hypothesise that the problem
may lie in the algorithm’s lack of memory, for each time the algorithm is run, as far as it
is concerned, it is presented with a completely new image. In contrast a human has
already seen most parts of the image and is therefore free to distribute visual attention to
new areas, even though, according to the algorithm’s saliency map, the areas may
appear to the model as less interesting. Therefore such systems as Yee et al. [2001] and
Haber et al. [2001] propose may need some more refining to foster closer
correspondence to actual human attentional patterns for use with rendering dynamic
virtual reality scenes.
3.5 Summary
This chapter firstly described how computer graphical images are created using both
local and global illumination models; as well as the advantages and disadvantages to
both models. As will be discussed in Chapters 4 and 5 two different modelling packages
were used in this thesis, that of Alias Wavefront’s Maya [Maya 2004], a ray tracing
model, and Radiance [Ward Larson and Shakespeare 1998], a global illumination model
that employs a ray tracing strategy which encompasses a hybrid deterministic/stochastic
approach to efficiently solve the rendering equation.
The last section of the chapter showed how the two disciplinary subjects, human
vision and realistic computer graphical image synthesis, can be combined. Using the
knowledge of human visual perception, time can be saved computationally in producing
computer graphical images, by rendering only what humans truly perceive. Pioneering
research that has been performed in this application of visual perception to computer
graphics has mainly exploited the bottom-up visual attention process as discussed in
Section 2.3.2. This work has included using knowledge of the human visual system to
improve the quality of the displayed image. The methodology which is used in this
thesis is closest to that proposed by Yee et al. [2001], Haber at al. [2001] and McConkie
et al. [1997], with the crucial difference that this research directly exploits top-down
visual attention processing rather than the bottom-up process.
70
Chapter 4
Change Blindness
There is a need for realistic graphics in real time. Creating such realistic graphics takes a
significant amount of computational power. There are many different ways to approach
this problem. Banks of machines can be used to render realistic images, for example
Pixar use their ‘Renderfarm’ to render the frames for their films, see Figure 4.1, but
even with this huge computer system it still takes on average 6 hours to render a single
frame with some of the frames taking up to 90 hours to be rendered [Pixar 2004]. Their
‘Toy Story’ film took over two years to render on the 100 SGI workstations
‘Renderfarm’, which extrapolates to approximately 80 years to render the whole film on
a single contemporary PC [Pixar 2004]. Thus improvements in rendering hardware and
adaptations to the actual rendering algorithms have been popular topics of recent
research in this field in the past decade or more.
From an in-depth study of the human visual system and what had previously
been done in this field it was noted that nobody at the time had looked at, from a
graphical rendering point of view, two particular flaws of the human visual system
which cause the inability to notice changes or differences in large parts of a scene when
suffering from a visual disruption or when attention is focussed on a particular task.
These are Change Blindness and Inattentional Blindness, both discussed in detail in
Chapter 2.
The goal of this thesis was first to find out whether either of these flaws even
occurred when viewing computer graphical images as they do in real life, this chapter,
71
then to find out to what extent they could be exploited (Chapter 5), and finally to design
and produce a selective rendering framework based on the findings (Chapter 6).
Figure 4.1: Pixar’s ‘Renderfarm’ is used to render the frames for their films [Pixar
2004].
4.1 Introduction
As discussed in section 2.3.3 ‘Change Blindness is the inability of the human to detect
what should be obvious changes in a scene’ [Rensink et al. 1997]. If a change occurs
simultaneously with a brief visual disruption, such as an eye saccade, flicker or a blink
that interrupts attention then a human can miss large changes in their field of view. The
onset of a visual disruption swamps the user’s local motion signals caused by a change,
short-circuiting the automatic system that normally draws attention to its location.
Without automatic control, attention is controlled entirely by slower, higher-level
mechanisms in the visual system that search the scene, object by object, until attention
finally lands upon the object that is changing. Once attention has latched onto the
appropriate object, the change is easy to see, however this only occurs after exhaustive
serial inspection of the scene.
The same experimental procedures, both the ‘Flicker’ and ‘Mudsplash’
paradigms, were used as proposed by Rensink et al. [1997], but instead of using
photographs as in the case of Rensink et al. this work used graphically rendered scenes
and timed how long it would take observers to find the object that had been rendered to
a lower quality without pre-alerting them to the location of the change. If the
experiment was positive then further action could be taken using this idea for dynamic
72
scenes. Results of this work were presented firstly as a sketch at ACM SIGGRAPH
2001 in Los Angeles [Cater et al. 2001] and then as a full paper at Graphite 2003 in
Melbourne [Cater et al. 2003b].
4.2 The Pre-Study
To clarify which aspects of the scene were to be altered the initial experiment involved
seven images depicting a variety of scenes, rendered with Radiance [Ward Larson and
Shakespeare 1998]. A group of six judges, 3 female and 3 male with a range of ages and
amount of computer graphics knowledge, were asked to look at each scene for 30
seconds and then give a short description of what they could remember was in the
scene. This was based on the experimental procedure described by O’Regan et al.
1999a, however in the literature; how the judges were asked what they could remember
was not defined exactly. Thus, as will be seen in the next few pages, several attempts
were made before the best solution was achieved.
The judges’ descriptions enabled the definition of several aspects that were
termed Central Interest aspects [O’Regan et al. 1999a] for each scene. These are
defined as those aspects that are mentioned by at least three of the judges. Central
Interest (CI) aspects tend to concern what one would be tempted to call the main theme
of the scene. Similarly, several aspects of each of the scenes that [O’Regan et al. 1999a]
term Marginal Interest aspects were noted. Marginal Interest (MI) aspects are defined
as those that are mentioned by none of the judges and were generally parts that
constituted the setting in which the main ‘action’ of the scene could take place. Thus, by
manipulating whether changes caused are to occur to CI or MI aspects in the
experiments, the degree of attention that our subjects were expected to pay to the
changes could be controlled.
Figure 4.2 shows the results from the first two judges’ responses to viewing the
images in Figure 4.3. Each observer was asked to list the aspects that he/she could
remember after having seen the image for 30 seconds. The images were displayed to the
viewers randomly so there was no bias on the ordering to any of the images.
From these initial results it was realised that the observers were being
subconsciously forced to list only the main objects in the scene and not the aspects of
73
the scene. Due to this fact certain parts of the scene would always be missed out.
Evidence in attention literature suggests that attention may perhaps be better be
described, not in terms of space, but in terms of objects: that is not due to the basis of
spatially defined connecting regions in the visual field, but rather to collections of
attributes that can be grouped together to form objects [Baylis and Driver 1993, Duncan
and Nimmo-Smith 1996]. Despite the literature on this topic O’Regan proposes that the
term ‘object’ seems unsatisfactory; ‘presumably an observer can, for example, attend to
the sky (which is not really an object), or to a particular aspect of a scene (say its
symmetry or darkness) without encoding, in detail, all the attributes which compose it’
[O’Regan 2001]. Until further research is done in this area O’Regan suggests that it is
safer to suppose attention can be directed to scene ‘aspects’ rather than ‘objects’.
Judge 1
DOB: 24/04/73 Occupation: Teacher Glasses/Contacts: Contacts
Time of Day: 2.40pm Wednesday 11th July Female/Male: Male
Image 1
Stripped Box, Mirror, Yellow Ball, Glass with Straw, Red Cup, White Beaker
Image 2
Standing Lamp, Speaker, CD player, Table, Black Tube, CD, Carpet
Image 3
Chest of draws, Candle, Bowl, Wine Glass, Mirror, Door
Image 4
Mantelpiece, Two candles, Wine Bottle, Wine Glass, Picture with Boat and a
church
Image 5
Picture, Two candles, Wine Bottle, Wine Glass, Black Fire Guard, Another Candle,
Mantelpiece
Image 6
Bed Frame, Chest of Draws, Wine Bottle, Wine Glass, Lamp, Beaker Glass,
Wardrobe, Pencil Holder, Ball
Image 7
Box with Stripes, Beaker, Red Cup, Mirror, Glass with Straw, Yellow Ball
Judge 2
DOB: 18/01/77 Occupation: Research Assistant Glasses/Contacts: None
Time of Day: 11.20am Wednesday 11th July Female/Male: Female
Image 1
Glass with Straw, Orange and Green Box, Dark Ball, Orange thing on the left hand
side (maybe some kind of plastic cup), Another Ball, Mirror
Image 2
Table, CD, Amplifier, Two Beakers (one dark one light), Speaker and Stand, Lamp
Image 3
Chest of draws, Candle, Bowl, Glass, Mirror, Door
Image 4
Fireplace, Mantelpiece, Two candles, Picture with Boat, White Cliffs and a House,
Wine Glass, Bottle
Image 5
Fireplace, Mantelpiece, Two candles, Bottle & Glass on mantelpiece, Candle in
front of the Fireplace, Picture, Skirting Boards, Black bit in front of the Fireplace.
Image 6
Bed Frame, Chest of Draws, Lamp, Bottle, Glass, Beaker, Red Thing, Wardrobe
Image 7
Glass with Straw, Yellow Ball, Red Beaker, White Beaker, Mirror, Green and
Orange Box
Figure 4.2: Table to show the aspects listed in each of the images by Judges 1 and 2.
74
Image 1
Image 2
Image 3
Image 4
Image 5
Image 6
Image 7
Figure 4.3: Images 1 to 7, used initially to determine the central and marginal
interest aspects in the scenes.
75
Due to the experimental setup being based on that of Rensink and O’Regan’s
work it was best to keep to the construct of trying to get the judges to list the aspects in
the scene rather than just the main objects, thus the judges pre-study was re-run. It was
also thought that 30 seconds was too long as it was noticed that the observer’s attention
was starting to wander to other things than the image on the screen. The judge’s prestudy was thus re-run decreasing the viewing time to 20 seconds and the viewers were
asked to describe the scene and not list the aspects that they could remember. This was
thought would give the opportunity for the observers to describe more about the scene.
As can be seen in Figure 4.4, very different responses were achieved.
Judge 3
Image 1
Image 2
Image 3
Image 4
Image 5
Image 6
Image 7
DOB: 29/06/54 Occupation: Personal Assistant Glasses/Contacts: Glasses
Time of Day: 2.15pm Monday 16th July Female/Male: Female
I would say these things in the scene are stuff that you would take on holiday with you, or
are mainly outside objects. From what I can remember there was a ball, a striped cushion to
sit on, a stick, and a mirror, however I’m not sure why a mirror would be in an outdoor
scene.
This is of a corner of a room. There is a record player on a table, a glass vase, a speaker and
it’s a light room so maybe there is a window nearby that’s not in the picture or maybe a
light stand but I’m not sure.
Again the corner of a room, with a side door opening into the room. A mirror reflecting the
candle, blue glass and another object on the chest of draws but I can’t remember what it is.
There’s a table, possibly a coffee table with a mosaic appearance and burnt orange tiles on
the front of it. There’s a picture on the wall of the sea with a yacht and a lighthouse, at first
glance though I thought it was a submarine in the sea. There are candles burning, which
reflect light behind onto the wall and picture, and something between the two candles but I
don’t know what it is.
There is a mantelpiece and a fireplace. There is a picture of the sea and a boat above the
mantelpiece. There is a single candle on the floor projecting light up under the picture. I
can’t remember if I saw a bottle or a glass, but something related to drinking!
This is a bedroom scene, there’s an end of a iron bed without a mattress. There is a bedside
table with three draws on top of which is a glass and a modern light with a shade. This table
backs onto a single cupboard or a wardrobe that is white. There were some other objects on
the table but I’m not sure what they were.
Saw a similar scene before, the one with the striped cushion. This has in it a circular red
object, a yellow ball, a stick, a mirror reflecting the scene. I would now say that this scene is
of modern furniture in a room not the outside scene I originally said it was before.
Figure 4.4: Table to show the aspects listed in each of the images by the Judge 3.
The results received were not what were expected. Whether or not it was just due
to this one observer it was thought best to develop a more robust method without
influencing the judges’ decision or by relying on their memory of the scene. Thus, after
more discussions with the psychologists it was decided that the best way to run this prestudy with the judges was to sit them down in front of the images and get them to
verbally describe each of the scenes with unlimited viewing time. It was also decided
that the images should be changed slightly to include a greater range of scenes. Thus the
76
final 14 scenes that were used in the actual experiment, Figure 4.5, were created. (These
14 scenes were altered according to the judges’ responses to make 21 different sets of
images, hence why some of the images in Figure 4.5 have two or three numbers to
reference them.) The verbal responses of the judges were then transcribed; an example
of the responses received can be seen in Figure 4.6. The whole set of judges responses
can be seen in Appendix A.1.
Image 1
Image 4 & 20
Image 2
Image 5 & 11
Image 3 & 19
Image 6
77
Image 7
Image 12 & 21
Image 8, 14 & 18
Image 15
Image 9 & 13
Image 16
Image 10
78
Image 17
Figure 4.5: Images 1 to 21, used to work out the central and marginal interest aspects in
the scenes with Judges 4 – 9, as well as in the final experiment.
Judge 4
Image 1
Image 2
Image 3
& 19
Image 4
& 20
Image 5
& 11
Image 6
DOB: 14/04/75 Occupation: PhD Student – Geography Glasses/Contacts: None
Time of Day: 5.30pm Friday 20th July Female/Male: Female
It’s an office but it looks a kitchen. It’s got two chairs, one that swivels and one that’s
rigid with 4 legs. umm. It’s got a mottled floor and then it’s got a set of three draws
under the desk and it'
s got a picture with a green blob in the middle, which looks like it’s
meant to be a tree. It’s got two yellow cupboards on the right hand side mounted up on
the walls. There’s a light on a stand thing, which bends over beaming onto something
and a cup of tea or something, I think it’s a cup of tea.
Looks like a wine bar, a really snazzy wine bar. It has two stools and a table in the
middle, which is rigid with two legs. The floor is tiled white and red. The walls are very
dark; the ceiling is dark as well. Then in the centre of the picture there is the mirror
behind the table, which is reflecting a vase, a red vase, with a flower in it. And either
side of that there are two lights and there are two lights on the ceiling.
It’s three spherical objects on a piece of text and the three objects look like a tennis ball
an orange, but it’s not an orange, and a Christmas ball thing. It’s all set at an angle so the
text gets smaller as it moves away.
It’s a close up of a rectangular box, a rectangular box with red and green stripes. In the
background looks like a cylindrical vase, I don’t know what it is but looks like it has a
chop stick in it. Err and there’s a round yellow snooker cue ball with a mirror at the back
of it.
Looks like a really snazzy office or cafe. You look through the door and you are looking
at two yellow chairs back to back and a table in front of it with a wine glass on it. To the
right is a picture but you can see it behind another yellow chair. Right at the very back in
the centre of the wall is a kind of rainbow picture mounted on the wall. On the ceiling
there is ummmm a strip light running down the centre with tiny little lights either side of
the room. And then there is some strange art works. On the back on the right is a kind of
red base with a swirly art piece. On the other left hand wall is some other artwork
mounted like a curly stepladder thing and the carpet is grey.
This is a further away view of the other image and again it has the rectangular giftwrapped box, green and red striped. umm. It’s got what looks like a top to a shaving
foam red top or a hairspray can, and then it’s got a white empty cylindrical shape hollow
cylindrical shape on top of the box. And behind that looks like it’s a plastic glass with a
peachy straw in it. To the left of that is a yellow cue ball on top of a women’s vanity
mirror, which has been opened up.
79
Image 7
Image 8,
14 & 18
Image 9
& 13
Image
10
Image
12 & 21
Image
15
Image
16
Image
17
Rory’s dinner. An empty dinner umm with a long bar with 5 pink stools, kind of rigid
stools umm. The walls are vertical stripy kind of lilac and beige colour, brown beige
ummm. At the very front there is kind of seating area which surrounds a table with blue
chairs. umm Behind the counter of Rory’s cafe is kind of a glass cabinet that’s empty
but I would imagine it would have food in it and a row of glasses at the back. And to the
left of the image is a beige cylindrical object, which is holding up the building structure.
Ummm same as the other one but closer ummm the thing that I thought was a bit odd
was the office chair in the picture the base of it with the wheels the metal base is the
same as the floor and that looks a bit odd. Then there is the office desk same as before
and there is the reflection of the light coming down onto a book. Umm and a pink coffee
mug and that’s it.
Err looks like a scene in a kitchen with a close up of a unit. With in the very front a kind
of empty cookie jar and behind that is a coffee grinder and a handle on top, which you
twist, but the handle looks a bit odd. I don’t know the handle looks like it is sitting on
top of the cookie jar. Anyway umm there are three mugs or two mugs and one glass on
the panel behind, on the shelf behind umm and then ... and that’s it oh but it looks quite
a pink blotchy image.
Err again it looks like a funky office with yellow chairs with umm you are drawn to the
scene at the back with the main table with the curved legs and it has two tea cups either
side so it looks like people have just got up and left. There’s two chairs positioned not
facing each other just slightly apart, slightly to the side of one another. Umm in front of
that there is a yellow sofa on the right hand side and umm at the back there is writing on
the wall looks like something gallery, art gallery? Umm and to the right of that is the
window with the light shining through onto the picture. Umm there’s a series of lights
on the walls mounted at head height but they don’t look on and then through to the left
oh umm sorry still in that room there’s a blue picture mounted on the wall which is kind
of blobby and 3D. And through that door is another sort of gold/brass-y looking similar
kind of blobby picture 3D picture mounted on the wall and another yellow chair.
Umm the image on the left hand side is a dark area which I’m not sure what it is but to
the left of that is a free standing lamp on a long white stand which is switched on and
umm and then there is a speaker which is free standing on the floor with two legs. And
then on the table, a brown table is an amplifier that looks a bit featureless except for the
volume umm err and then there’s goodness and another cylindrical object, which looks
like a toilet roll, which is black.
Looks like a scene from a bedroom it’s in the corner of a bedroom with a creamy
coloured table and creamy coloured walls with a picture of a sailing boat and a beach in
the background. Umm On the table there are two creamy coloured candles and what
looks like a CD cover and an orange object - I think it’s a glass then to the right of that
looks like and empty wine bottle which is green.
Umm a sort of pine table with a strange feature which I don’t really know what it is,
with a brassy base which curves up which has three stages which look like miniature
chess boards red and white. Umm then on the table in front of that there is a glass full of
red wine. Two blue chairs around the table but the chair furthest away the back of the
chair and the legs don’t quite match. I don’t know if it’s just me but umm that'
s it apart
from a really non-descript background.
Err same image as before but from a different angle. You can see more of the picture
umm apart from that I don’t know what else, there is a chord to the left of the picture but
apart from that I don’t know what’s really different apart from a different angle and
further away and the shadows are falling to the right hand side of the picture.
Figure 4.6: Table to show the aspects listed in each of the images by Judge 4 (the rest
of the results are contained in Appendix A.1).
The results that were received from the final 6 judges (judges 4-9 in Appendix
A.1) allowed for the ability to choose the Central and Marginal Interest aspects for each
of the scenes. It was deemed important to have a wide selection of sets of images to
show to the observers, so that each participant got to see a different set of 10 images,
thus there could be no experimental bias to the scenes and the type of alteration made,
80
whether they were viewing the flicker paradigm or the mudsplash paradigm. In the end
21 possible images were created all of which were used for both paradigms, each
subject seeing 10 of these sets of images (half used the mudsplash paradigm and half the
flicker paradigm). However, image 21 was only used as an example for the observers of
the experimental setup, see instructions sheet Appendix A.2, and thus was not used in
the actual experiment, i.e. in the 10 images shown to the participants.
The MI changes were carefully chosen to be as similar as possible in visual
conspicuity as the CI changes. This was measured by the number of pixels that were
changed, and by the size and position within the scene. The mean centroid of the CI
locations were at (x,y) pixel coordinates (-17 ± 63, 19 ± 100) relative to the centre of
the screen, and at coordinates (2 ± 177, 108 ± 135) for the MI locations (the ± values
are standard deviations). The resolutions of the images were 512 x 512, 512 x 768 or
768 x 512, depending on the scene structure. The nature of the change between the
original and modified pictures was very strictly controlled so as to ensure that there was
hardly any difference in visual conspicuity of the change for the CI and MI changes.
Thus the mean proportion of the modified pixels in the picture was 8.2% ± 5 and 10% ±
7, for CI and MI changes respectively. The mean Euclidean distance in the RGB values
across all the changed pixels was 21 ± 16 and 38 ± 31 respectively for CI and MI
changes.
Aspect Changes
Image 1
No Alteration
Image 2
Central Interest Aspect: Rendering Quality Alteration to the chequered red and white
tiled floor.
Image 3
Central Interest Aspect: Rendering Quality Alteration to the Christmas bauble with its
reflection of the room and the print it is sitting on.
Image 4
Central Interest Aspect: Rendering Quality Alteration to the yellow round snooker ball.
Image 5
Central Interest Aspect: Location Alteration, Movement forwards of the yellow chairs,
which are back to back, and the low modern coffee table with the silver glass on it.
Image 6
Marginal Interest Aspect: Presence Alteration, Removed the purple/lilac pool ball.
Image 7
Marginal Interest Aspect: Rendering Quality Alteration to the tiled beige floor.
Image 8
Marginal Interest Aspect: Rendering Quality Alteration to the rigid, four-legged blue
chair.
Image 9
Central and Marginal Interest Aspects: Rendering Quality Alteration to the coffee pot
(central) and the shiny marble kitchen work surface (marginal).
81
Image 10
Marginal Interest Aspect: Rendering Quality Alteration to the left hand sidewall.
Image 11
No Alteration
Image 12
Marginal Interest Aspect: Rendering Quality Alteration to the carpet.
Image 13
Marginal Interest Aspect: Rendering Quality Alteration to the shiny marble kitchen
work surface.
Image 14
Central and Marginal Interest Aspects: Rendering Quality Alteration to the blue, fourlegged chair (marginal) and the yellow swivel chair (central).
Image 15
Marginal Interest Aspect: Location Alteration, Movement of Toy Story video from side
to side.
Image 16
Central Interest Aspect: Presence Alteration, Removed the wine glass half full of red
wine from the table.
Image 17
Central Interest Aspect: Rendering Quality Alteration to the trolley with all the objects
on it.
Image 18
Central and Marginal Interest Aspects: Rendering Quality Alteration to the pink coffee
mug (central) and shiny gold table post (marginal).
Image 19
Central Interest Aspect: Rendering Quality Alteration to the Sunkist orange.
Image 20
Marginal Interest Aspect: Rendering Quality Alteration to the white tube.
Image 21
Central Interest Aspect: Rendering Quality Alteration to the black speaker and its
stand.
Figure 4.7: Table to show the aspects that were altered in each of the images.
There were four different types of alteration made to the CI or MI aspects
selected. Figure 4.7 describes the aspects that were selected to be altered and the
alterations made for each of the 21 images. The list of possible alterations was as
follows:
•
A rendering alteration - part of the image has been rendered to a lower quality
causing any of the following properties. An example of this can be seen in
Figure 4.8, before and after alteration.
o An aliasing effect - this is where a jaggy edge can be seen to an area of
the scene due to the digital nature of the pixel grid.
o A shadow generalization - this is when a shadow has been simplified so
that only a harsh shadow is present instead of the more realistic
penumbras with the umbras too.
o A reflection generalization - this is when a reflection has been simplified
so that only a sharp reflection exists instead of a more realistic smoother
one.
82
•
A location alteration - part of the image has been moved in some way i.e. its
location has been moved compared to the original image.
•
A presence alteration - part of the image has been removed in some way i.e. it
is no longer present in the image. An example of this can be seen in Figure 4.9,
before and after alteration.
•
And No alteration at all.
Presence and location alterations were made so that some comparison could be
made of the timing differences between not only these two, but mainly with the
rendering times. Both of these alterations, presence and location, were also tested by
O’Regan et al. [1999b] so a comparison with their results could be achieved.
Figure 4.8: a) Original Image
b) Modified Image - here a Marginal
Interest aspect of the scene has been
replaced with a low quality rendering
(the left hand wall with light fitting),
thus a rendering alteration has been
made.
Figure 4.9: a) Original Image
b) Modified Image - here a Central
Interest aspect of the scene has been
removed in a presence alteration (the
wine glass).
83
4.3 Main Experimental Procedure
Before each experiment was started the observer was asked to fill out the first part of a
simple questionnaire, which asked about personal details such as date of birth,
occupation, whether or not they wore glasses or contacts, their sex and their self rating
of computer graphics experience. Also noted was the time of day, and whether or not
they were given the list of possible alterations made to the images. The reason for these
questions was to achieve minimum bias. For example, there were three sessions over
which the experiments were run, eight observers in each session. This was so that there
was no bias due to the time of day. Also there were approximately 50% females spread
throughout the sessions, and also a mix of occupations, ages and computer skills.
The observers were then given a written description of what the experiment
entailed and two printed examples, of the experiments that would be run, one of the
flicker paradigm and one of the mudsplash paradigm, Appendix A.2. This was to make
sure that each observer received the same instructions. To maintain the integrity of the
experiment, observers were also asked not to discuss with anyone what they did in the
experiment room until all the participants had performed the experiment. Each observer
also did not receive any feedback on how well they were doing until all of the
experiments were completed.
Figure 4.10a & b show examples of the experimental setup that was conducted
with 24 subjects under strictly controlled viewing conditions. The monitor was situated
45cm away from where the observer was sitting and it was situated such there was no
glare on the screen from any light sources in the room. A fixed seat was used so that the
observer could not change the angle at which the images were being viewed, also
guaranteeing the same viewing distance for all observers.
Figure 4.11(a) shows a scene rendered in Radiance to a high level of realism,
while 4.11(c) is the same scene rendered with less fidelity for some objects and at
significantly reduced computational cost. Figure 4.11(b) shows the same image as
4.11(a), but with mudsplashes and likewise 4.11(d) shows the same image as 4.11(c),
but with mudsplashes.
Each original image was displayed for 240 milliseconds followed by the
mudsplashed image for 290 milliseconds followed by the modified image for a further
84
240 milliseconds and finally the modified image with mudsplashes for 290
milliseconds. This sequence was continually repeated until the user spotted the change
and said ‘stop’. If no change had been observed after 60 seconds the experiment for that
particular set of images was terminated. Similar experiments were carried out with the
flicker paradigm. The flicker paradigm was when a totally ‘medium’ grey image was
substituted for the mudsplashed images, as demonstrated in Figure 4.11(e).
The high quality image in Figure 4.11a took 27 hours to render in Radiance, yet to
render the whole of the same image to low quality took only 1 minute. As Radiance did
not at the time have a facility for selecting particular objects to be rendered at different
qualities Adobe Photoshop was used to create the selectively rendered images.
Each subject saw 10 of these sets of images. As stated before these were
completely randomised from the 20 possible sets of images and amongst these were
some ‘red herrings’ i.e. there were some sets where no alteration was made to the
images. This was to make sure that the observers only said stop when they saw an actual
alteration. After the observer said ‘stop’ the time was noted and a verbal description of
their perception of the alteration was given.
The observers were asked to say ‘stop’ when they noticed the first alteration
always and not to look around to see if there were any more alterations. By following
this criterion it was possible to see whether, if faced with two or more alterations, which
of them were more easily attended to. If the observers did not say ‘stop’ after 60
seconds it was declared that the participant had seen no alteration in the images. Half of
the observers were given the complete list of the particular alterations possible
beforehand and the other half were not.
Figure 4.10 (a & b): Photographs showing the experiments being run.
85
Figure 4.11 a) High quality image.
b) High quality image with mudsplashes.
Figure 4.11 c) Selective quality image
(look at the surface of the tabletop
compared to Figure 4.11a).
d) Selective quality image with
mudsplashes.
Figure 4.11 e) The ‘medium’ grey
image used in the flicker paradigm.
f) The ordering of the images for the
two paradigms.
86
4.4 Results
The results showed that a significant time was taken to locate the rendering differences
even though the observers knew that a change in the images was occurring. Therefore as
will be discussed in this section, these results show that Change Blindness occurs in
computer graphical images as it does in real life, [Cater et al. 2003b].
Figure 4.12 shows the overall results in a graphical format. The numbers above
the columns give the time taken for the observers on average to notice the difference in
seconds. The data was categorized according to the type of alteration that the observer
had seen, i.e. whether it was a rendering change, a location change or a presence
change, as well as which type of paradigm that was used. The blue columns show the
time taken for the alteration of CI aspects and the maroon columns show the alteration
of MI aspects. The graph shows, as expected, modified CI aspects were always found
quicker than modified MI aspects. This is due to the fact that a human’s attention is
naturally drawn to CI aspects rather than to MI aspects. When searching the scene for
the alteration, each observer will naturally search all the CI objects first before
searching the rest of the scene exhaustively and thus attending to the MI aspects.
Figures 4.13 and 4.14 both show that most participants detected a change in less
number of cycles of the images when viewing with the mudsplash paradigm than when
viewing the images with the flicker paradigm. This is most likely due to the fact that
mudsplashes provoke a minor disturbance compared with that of the flicker paradigm,
due to mudsplashes covering a smaller percentage of the image. However, it is still
important to note that the time taken to notice the change for the mudsplash conditions
was still significantly impaired relative to having no interruption at all [Rensink et al.
1999]. This demonstrates that Change Blindness is not caused by a covering up of what
is changing, flicker paradigm, but is actually due to suffering from a visual disruption in
any part of the image, mudsplash paradigm, regardless to whether or not it covers the
aspect that is changing.
In Figure 4.15 an example can be seen of the rendering alteration made with the
flicker paradigm, this pair of images actually took the participants the longest time on
average to spot the change, 36 seconds or 34 cycles of the images, even though the
alteration was one of the largest taking up nearly 20% of the image.
87
25
20.95
Time (seconds)
20
14.74
15
10
4.90
5
6.28
5.83
5.51
4.35
2.54
4.77
3.82
2.01
1.59
0
Rendering
Flicker
Paradigm
Rendering
Mudsplash
Paradigm
Location
Flicker
Paradigm
Central Interest
Location
Mudsplash
Paradigm
Presence
Flicker
Paradigm
Presence
Mudsplash
Paradigm
Marginal Interest
Percentage of observers (%)
Figure 4.12: Overall results of the three different types of alterations made during the
experiment, Rendering, Location and Presence.
30
25
20
15
10
5
0
1
2
3
4
5
Central Interest
6 - 10 11 15
16 20
21 25
> 25
Marginal Interest
Figure 4.13: Number of cycles of the images needed to detect the rendering quality
change in the Flicker paradigm.
88
Percentage of observers (%)
35
30
25
20
15
10
5
0
1
2
3
4
5
Central Interest
6 - 10 11 - 15 16 - 20 21 - 25 > 25
Marginal Interest
Figure 4.14: Number of cycles of the images needed to detect the rendering quality
change in the Mudsplash paradigm.
Figure 4.15: a) Original Image 7 b) Modified Image 7 - here a Marginal Interest aspect
of the scene has been replaced with a low quality rendering, the whole of the tiled floor
has been replaced.
The most important finding from these set of results is the increase in time taken
for the rendering alterations to be noticed in comparison to the time taken for the
presence and location alterations, Figure 4.12. This is an increase of on average over 8
times for CI aspects and on average over 4.5 times for MI aspects for the flicker
paradigm, and on average over 1.5 times for the CI aspects and on average 1.2 times for
MI aspects for the mudsplash paradigm. This proves that a significant time is needed for
observers to notice a difference in rendering quality, even when changes occupy large
parts of the image (up to 22 visual degrees, when the participants were seated 45cm
away from the screen). This increase in time taken could be due to the fact that it is
more difficult to spot a rendering change than a presence or a location change simply
89
because in both of these latter cases, the mean Euclidean distance in the RGB values
and the intensity values across all the changed pixels are a lot greater than in a rendering
case. The mean intensity change of the changed pixels was only 4 ± 14 and 38 ± 31 for
the CI and MI rendering quality changes respectively. However in the presence and the
location changes the mean intensity change was 131 ± 52 and 146 ± 67. The mean
Euclidean distance in the RGB values across all the changed pixels was 21 ± 16 and 38
± 31 respectively for CI and MI rendering quality changes. However, in the presence
and the location changes, the mean Euclidean distance in the RGB values across all the
changed pixels was 106 ± 34 and 98 ± 28.
This increase in the mean intensity and RGB values is obviously because, in the
presence or location cases an aspect is being completely taken away or moved to a new
location and thus a completely new aspect with a new mean intensity and RGB value is
now located in its place. What is important, though, is that this increase actually exists
and in designing a selective renderer taking away aspects of scenes is not of interest,
only that of rendering the aspects to a significantly lower quality at a much reduced
computational cost. Thus if it takes a long time to spot a rendering quality alteration due
to the fact that the RGB and intensity values haven’t changed a lot or whether it is due
to another cause, it can still be concluded that this theory is plausible to be used in
designing a selective renderer. This renderer will save time computationally by
selectively rendering the scene to different qualities without the observers detecting a
difference as they will be suffering from Change Blindness.
Compared with those of O’Regan et al. [1999a] and Rensink et al. [1999], the
results for presence and location were graphically similar, showing that computer
generated images may be used when exploiting Change Blindness. As the full results
are not displayed in any of their papers a statistical analysis of how similar these results
are cannot be carried out. By comparing the results of this experiment for the location
and presence changes with the mudsplash paradigm with the results approximately
reproduced from Rensink et al. [1999] a visual similarity however can be seen, Figure
4.16.
90
25
Time (seconds)
20
15
10
5
0
Location
Presence
Type of Alteration made using Mudsplash Paradigm
Central Interest [Rensink et al.]
Central Interest [Cater et al.]
Marginal Interest [Rensink et al.]
Marginal Interest [Cater et al.]
Figure 4.16: Comparison of the results produced in this experiment with the results
reproduced from Rensink et al. [1999] for location and presence with the mudsplash
paradigm.
It was also noted that in the cases where more than one aspect of a scene was
altered most of the subjects only noticed one of the aspects, and out of the choice of
aspects the majority of observers attended to lower rendering of the reflections in/on
shiny objects over those of a low rendering of a matt object. This suggests that a
reflection generalization must be easier to see than those of a matt generalization,
however this cannot be due to the fact that the visual conspicuity of the change is
greater; as the mean intensity change for the altered pixels for the reflection
generalization was on average –2 ± 10 for CI and 37 ± 1 for MI, with the mean
Euclidean distance in RGB 18
± 3 for CI and 37 ± 1 for MI, but for the matt
generalization changes it was on average 19 ± 25 for CI and 39 ± 30 for MI intensity
changes, and 35 ± 41 and 38 ± 30 for CI and MI respectively for the mean Euclidean
distances in RGB. Thus the matt generalization changes were actually on the whole
slightly greater and more visually conspicuous than the reflection generalizations.
Therefore, the change in intensity or in RGB cannot have been the cause of alerting the
observers to reflection generalization quicker than the matt generalization.
It is already known from visual psychological researchers such as Yarbus [1967],
Yantis [1996] and Itti et al. [1998] that the visual system is highly sensitive to features
91
such as edges, abrupt changes in colour and sudden movements, as well as expectancy
and personal saliency. More research, however, needs to be done in this area, in the
computer rendering sense, to work out exactly what people attend to easily – this will
lead the research into calculating which properties of the scene, such as colour,
intensity, aliasing effects, reflections, shadows and orientation, can be rendered to a
lower quality without the observer noticing the difference.
It was noted that, as expected, the half of the participants that were given the list
of possible alterations gave a more accurate description of the change that they
perceived, however their timings on spotting the change were not significantly different
from those that were not given the list as described next.
4.4.1 Statistical Analysis
Statistical analysis shows that the results are statistically significant. The appropriate
method of statistical analysis was the t-test for significance since the response of the
observers was continuous data from normal distributions and not binary [Coolican
1999]. However, as each person had a different random selection of the images an
unrelated t-test had to be used. A t-test for unrelated data tests the difference between
means of unrelated groups, typically where each observer has participated in just one of
the conditions in an experiment. The t-test gives the probability that the difference
between the two means is caused by chance. It is customary to say that if this
probability is less than 0.05 then the difference is ‘significant’ i.e. the difference is not
caused by chance. Under the null hypothesis it assumes that any difference between the
means of the conditions should be zero (i.e. there is no real difference between the two
conditions). A large value of t means that difference found between the means is a long
way from the value of zero, expected if the null hypothesis is true. The critical value,
obtained from the appropriate statistical table for that type of test, is the value that t
must reach or exceed if the two conditions are significantly different, and thus the null
hypothesis is rejected [Coolican 1999]. The degrees of freedom (df) collates to the
notion that parametric tests, such as t-tests, calculate variances based on variability in
the scores. Therefore the df for the total variance is calculated by subtracting one from
the total number of subjects (N) i.e. df = N-1, due to the fact that the last score in the a
group of subjects is predictable and therefore cannot vary [Greene and D’Oliveira
1999].
92
By performing pair-wise comparisons of all of the rendering modifications to the
location or presence alterations it could be determined whether or not the results were
statistically significant. For example, the test statistics for the pair-wise comparison of
the results for the CI rendering flicker paradigm and the CI location flicker paradigm
was t = 6.1668, where the degrees of freedom (df) = 12, and the probability (p) < 0.01 (a
p value of 0.05 or less denotes a statistically significant result).
For a one-tailed test with the df of 12 t must be
1.782, for significance with p <
0.05. For each of the rest of the rendering results compared with the appropriate location
or presence change results a t value of at least 4.76 is achieved which easily exceeds the
critical value for significance, see Figure 4.17. Thus, the probability of all the results of
this experiment occurring due to the null hypothesis, i.e. not due to the rendering
change, is at least as low as 0.01 and probably a lot lower. It can be concluded that the
difference in results therefore is (highly) significant.
Pair-wise Comparison
CI Rendering
Flicker
MI Rendering
Flicker
CI Rendering
Mudsplash
MI Rendering
Mudsplash
CI Rendering
Flicker
MI Rendering
Flicker
CI Rendering
Mudsplash
MI Rendering
Mudsplash
CI Location
Flicker
MI Location
Flicker
CI Location
Mudsplash
MI Location
Mudsplash
CI Presence
Flicker
MI Presence
Flicker
CI Presence
Mudsplash
MI Presence
Mudsplash
t Value
(t 1.782 for
significance)
df value
p value
Statistically
significant?
6.1668
12
p < 0.01
Significant
4.8843
12
p < 0.01
Significant
5.2139
12
p < 0.01
Significant
4.8194
12
p < 0.01
Significant
5.2610
12
p < 0.01
Significant
5.2277
12
p < 0.01
Significant
4.7637
12
p < 0.01
Significant
5.9466
12
p < 0.01
Significant
Figure 4.17: Full results of statistical analysis using unrelated t test for significance.
4.5 Discussion - Experimental Issues and Future Improvements
Due to the performance of the graphics card in the computer used for the experiment,
the refresh rate could only be set to a maximum of 70Hz. Therefore each time the next
image was displayed it was refreshed in sections adding to the visual disruption that
observers were suffering. To correct this in the future a higher performance graphics
card should be installed.
93
The time taken to notice the alterations when there was no visual disruption to
the observer, i.e. no blank flicker images or mudsplashes between the pair of images,
should have been timed, for comparison. As this was not done, the results from
O’Regan et al. [1999b] can be used, but only as a guideline. This value was on average
1.4 cycles of the images or 1.5 seconds in their experimental setup. As can now be seen
if this result is compared to those obtained in this experiment, the time taken to detect a
rendering quality change whilst under the flicker paradigm greatly exceeds this value.
When observers are viewing a rendering quality change whilst under the mudsplash
paradigm, the visual disruption is a lot less and thus the average timings are a lot closer
to that of not suffering a visual disruption at all i.e. nearer to 1.5s, and even more so in
the presence and location changes.
It was later pointed out in further discussions with psychologists that 50% of the
sets of images shown to the observers should have been ‘red herrings’, i.e. had no
alteration at all. This then guarantees that the observer would not have a clue whether or
not there was an alteration until they had exhaustively searched the image. In the
experiment run each observer saw only one ‘red herring’.
To guarantee that each of the observers perceived the images at the same
distance the chair was positioned at the same point on the floor each time. However,
when the observers found an alteration hard to observe, they leant towards the screen,
meaning that they were no longer observing the images from the same distance as
previous observers. For example, the observer’s posture in Figure 4.10a compared to
that of the observer in Figure 4.10b. To solve this, the observer’s head should be in a
fixed position, not the chair’s location, in future experiments. This could be
accomplished by getting the participants to rest their chin on a chin rest and not to move
their head from that position until the experiment has finished.
4.6 Summary
The results of this experiment showed that Change Blindness does indeed occur in
computer generated images as it does in photographs and indeed real life. However,
Change Blindness requires a visual disruption to occur and thus it is not clear how, in a
typical animation without such disruption, this feature of the human eye could be
94
exploited in a straightforward manner by selective rendering. In Chapter 7, under future
work, it is discussed how Change Blindness, however, may be exploited in animated
sequences when the user is confronted by multi-sensory disruptions such as visual,
audio and motion.
Chapter 5 therefore goes on to consider the other feature of the Human Visual
System, Inattentional Blindness, and then Chapter 6 shows how this can be incorporated
within a selective rendering algorithm.
95
Chapter 5
Inattentional Blindness
In pursuit of the goal for this thesis of only rendering to the highest quality that which
humans truly perceive, the next stage was to consider Inattentional Blindness.
5.1 Introduction - Visual Attention
As discussed in Section 2.3.1, visual scenes typically contain many more objects than
can ever be recognized or remembered in a single glance. Some kind of sequential
selection of objects for detailed processing is essential so humans can cope with this
wealth of information. Coded into the primary levels of human vision is a powerful
means of accomplishing this selection, namely the retina. Due to spatial acuity being
highest right at the centre of the retina, the fovea, and then falling off rapidly toward the
periphery detailed information can only be processed by locating this fovea onto the
area of interest.
Thus as Yarbus showed [1967], Section 2.3.2, the choice of task is important in
the ability to predict the eye-gaze pattern of a viewer, i.e. the movement of their fovea
and thus their visual attention. This was confirmed by running a small experiment with
an eye tracker, a single observer and a list of four different task instructions, as can be
seen in Figure 5.1. It is precisely this knowledge of the expected eye-gaze pattern that is
used in this chapter to reduce the rendered quality of objects outside the area of interest
without affecting the viewer’s overall perception of the quality of the rendering.
96
Figure 5.1: Effects of a task on eye movements. Eye scans for observers examined with
different task instructions; 1. Free viewing, 2. Remember the central painting, 3.
Remember as many objects on the table as you can, and 4. Count the number of books
on the shelves.
5.2 Experimental Methodology
The next stage of this thesis was to develop an experimental methodology, based
on the Inattentional Blindness work of Mack and Rock [1998].
However, it was
deemed important that this methodology should use computer graphical animations
instead and of course a task that focused the observer’s attention only on a certain part
of the scene in the animation. It was then hypothesised that this would make the viewers
suffer from Inattentional Blindness to the rest of the animation. It was also important to
remedy the shortcomings of the experiment discussed in Chapter 4 to achieve a more
robust experimental procedure. Results of this work were discussed in two keynotes,
[Chalmers and Cater 2002] and [Chalmers et al. 2003], and were presented as a full
paper at VRST 2002 in Hong Kong [Cater et al. 2002].
This study involved three rendered animations of an identical fly-through of four
rooms, the only difference being the quality to which the individual animations had
been rendered. One animation was rendered all to Low Quality (LQ), one all to High
Quality (HQ) and the last was rendered selectively according to the task at hand,
Selective Quality (SQ). The task of determining which arm of a cross was longer,
suggested by Mack and Rock [1998], was not appropriate for animation. This was due
to the fact that this methodology needs a homogenous region in the centre large enough
for the cross figure to be superimposed on it. This makes sure that there are no contours
97
in the scene overlapping or camouflaging the cross. In a fly-through the camera view is
continually moving, thus a large area with no contours is very difficult to maintain when
it is also important that the scene contains several distinct, familiar objects [Mack Rock
1998]. Thus several new tasks had to be considered [Hoffman 1979]. The final task
chosen was for each user to count the number of pencils that appeared in a mug on a
table in a room as he/she moved on a fixed path through four such rooms. To count the
pencils, the user needed to perform a smooth pursuit eye movement to track the mug in
one room until they had successfully counted the number of pencils in that mug, then
perform an eye saccade to the mug in the next room. Each mug also contained a number
of superfluous paintbrushes to further complicate the task and thus retain the viewer’s
attention, Figure 5.2.
Figure 5.2: Close up of the same mug showing the pencils and paintbrushes, (each
room had a different number of pencils and paintbrushes).
5.2.1 Creating the Animation
Figure 5.3 (a) shows the high quality rendered scene, while (b) shows the same
scene rendered at a significantly lower quality, with a much reduced computational
time. Each frame for the high quality animation took on average 18 minutes 53 seconds
to render in Maya on a Intel Pentium 4 1.5 GHz Processor, while the frames for the low
quality animation were each rendered on average in only 3 minutes 21 seconds. All
frames were rendered to a resolution of 1280 x 1024. The HQ frames were rendered by
selecting Production quality rendering in Maya, whilst the LQ frames were rendered to
the Custom - Low setting with the lowest possible value of aliasing, Figure 5.3 d). Due
98
to not being able to select different rendering qualities for specific areas of the scene in
Maya, image processing code had to be written in C to take in the Low Quality (LQ)
and High Quality (HQ) frames as input and from these to create the Selective Quality
(SQ) frames, such as Figure 5.3 (c), by compositing the appropriate frames together in a
batch process.
To create the Selective Quality (SQ) frames, the actual area covered by the fovea
on the computer screen, when focused on the mug containing the pencils, had to be
calculated. This was solved by the equation below:
Radius = ratio * (D * tan (A / 2))
where
Radius = radius of the area covered by the fovea (in pixels), Figure 5.4.
Ratio = screen resolution / monitor size
A = Visual Angle = 2 degrees, Figure 5.4.
D = Distance of the viewer’s eye from the screen = 45cm
Figure 5.3(a): High Quality (HQ) image (Frame 26 in the animation)
99
Figure 5.3(b): Low Quality (LQ) image (Frame 26 in the animation)
Figure 5.3(c): Selectively rendered (SQ) image with two Circles of high Quality over
the first and second mugs (Frame 26 in the animation)
100
Figure 5.3(d): Close up of High Quality rendered chair and the Low Quality version of
the same chair.
Fovea Radius
Fovea Angle (2°)
Blend Radius
Eye
Blend Angle (4.1°)
Figure 5.4: Calculation of the fovea and blend areas.
Image processing code was written to composite this calculated fovea circle
from the HQ frame onto the LQ image. The code then blends this high quality fovea
101
circle into the low quality for the rest of the scene. From McConkie et al.’s [1997] work
it is known that this must be blended to 4.1 degrees to simulate the peripheral
degradation of the HVS and thus limit the user’s ability of detecting any difference in
image quality. Therefore for every pixel in this ‘blend angle’ the amount of low quality
blended is increased as you move away from the fovea angle, whilst the high quality is
decreased based on a simple cosine curve, till at 4.1 degrees the pixels are totally low
quality, see Figure 5.5.
Figure 5.5: A Selective Quality (SQ) frame, where the visual angle covered by the
fovea for the mugs in the first two rooms, 2 degrees (green circles), is rendered at High
Quality and then is blended to Low Quality at 4.1 degrees (red circles).
5.2.2 Experimental Procedure
A pilot study involving 30 participants was carried out to finalize the
experimental procedure and then a total of 160 subjects were considered. In the final
experiments, each subject saw two animations of 35 seconds, displayed at 15 frames per
second. Figures 5.6a and 5.6b describe the conditions tested with 16 subjects per
102
condition. Fifty percent of the subjects were asked to count the pencils in the mug while
the remaining 50% were simply asked to watch the animations. To minimize
experimental bias the choice of condition to be run was randomised and for each, 8 were
run in the morning and 8 in the afternoon. All subjects had a variety of experience with
computer graphics and all exhibited at least average corrected vision in testing.
Acronym
Description
High Quality: Entire animation rendered at the highest quality.
HQ
Low Quality: Entire animation rendered at a low quality with no antiLQ
aliasing.
Selective
Quality: Low Quality Picture with high quality rendering in
SQ
the visual angle of the fovea (2 degrees) centered around the pencils,
shown by the green circle in Figure 5.5. The high quality is blended to
the low quality at 4.1 degrees visual angle (the red circle in Figure 5.5)
[McConkie et al 1997].
Figure 5.6a: The three different types of animations being tested. The orderings of the
two animations shown for the experiments were either:
HQ + HQ, HQ + LQ, LQ + HQ, HQ + SQ or SQ + HQ
No. of Participants for
that condition
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
Total: 160
Condition
Counting (Task) /
Ordering of Animations
Watching (No Task)
HQ+HQ
Counting
HQ+HQ
Watching
HQ+LQ
Counting
LQ+HQ
Counting
HQ+LQ
Watching
LQ+HQ
Watching
HQ+SQ
Counting
SQ+HQ
Counting
HQ+SQ
Watching
SQ+HQ
Watching
HQ+HQ
Counting
HQ+HQ
Watching
HQ+LQ
Counting
LQ+HQ
Counting
HQ+LQ
Watching
LQ+HQ
Watching
HQ+SQ
Counting
SQ+HQ
Counting
HQ+SQ
Watching
SQ+HQ
Watching
16 Participants per condition
Time of Day
Morning
Morning
Morning
Morning
Morning
Morning
Morning
Morning
Morning
Morning
Afternoon
Afternoon
Afternoon
Afternoon
Afternoon
Afternoon
Afternoon
Afternoon
Afternoon
Afternoon
Figure 5.6b: The orderings of the conditions for randomization in the experiment.
Before beginning the experiment the subjects read a sheet of instructions on the
procedure of the particular task they were about to perform. This guaranteed that all
participants received exactly the same instructions, Appendix B.1 and B.2. After the
103
participant had read the instructions they were asked to clarify that they had understood
the task. They then rested their head on a chin rest that was located 45cm away from a
17” monitor. The chin rest was located so that their eye level was approximately level
with the centre of the screen, see Figure 5.7. The animations were then displayed to the
observers at a resolution of 1280 x 1024 and under specific lighting conditions.
Figure 5.7: Images to show the experimental setup.
To ensure that the viewers focused their attention immediately on the first mug
and thus they did not look around the scene to find it, a count down was shown to
prepare them that the animation was about to start followed immediately by a black
image with a white mug giving the location of the first mug (Figure 5.8). After the first
animation had ended participants were shown immediately the second animation.
Figure 5.8: Image to give location of the first mug – to focus the observer’s attention.
104
On completion of the experiment, i.e. once both animations had been viewed,
each participant was asked to fill in a detailed questionnaire, see Figure 5.9. This
questionnaire asked for some personal details, including age, occupation, sex and level
of computer graphics knowledge. The participants were then asked detailed questions
about the objects in the rooms, their colour, location and quality of rendering. These
objects were selected so that questions were asked about objects both near the foveal
visual angle (located about the mug with pencils) and in the periphery. They were
specifically asked not to guess, but rather state “don’t remember” when they had failed
to notice some details. They were also asked not to discuss any of the details of the
experiment with any friends or colleagues to prevent any participant having prior
knowledge of the experimental procedure. The initial questionnaire designed, Appendix
B.3, turned out to be too complex for the 30 participants in the pilot study, as it asked
the observer to recount details for each of the four rooms separately. Therefore a new
questionnaire was designed which only asked questions about each of the animations as
a whole, Appendix B.4, this returned much better results and was used for the final
experiment.
Figure 5.9: Image to show participants of the experiment filling out the questionnaire
after completion of the viewing of the two animations.
5.2.3 Results
Figure 5.10 shows the overall results of the experiment. Obviously the participants did
not notice any difference in the rendering quality between the two HQ animations (they
105
were the same). Of interest is the fact that, apart from one case in the SQ+HQ
experiment, the viewers performing the task consistently failed to notice any difference
between the high quality (HQ) rendered animation and the selective quality (SQ)
animation, where only the area around the mug was rendered to a high quality.
Surprisingly also 25% of the viewers in the HQ+LQ condition and 18% in the LQ+HQ
case were so engaged in the task that they completely failed to notice any difference in
the quality between these very different qualities of animation. Figure 5.11 (a) and (b)
show that having performed the task of counting the pencils, the vast majority of
participants were simply unable to recall the correct colour of the mug (90%) which was
in the foveal angle and even less the correct colour of the carpet (95%) which was
outside this angle. The Inattentional Blindness therefore being even higher for “less
obvious” objects, especially those outside the foveal angle.
100
Percetage of people who did notice the
rendering quality difference
90
80
70
60
50
40
30
20
10
0
HQ + HQ
HQ + LQ
LQ + HQ
HQ + SQ
SQ + HQ
Animation Conditions
Free For All
Counting Pencils Task
Figure 5.10: Experimental results for the two tasks: Counting the pencils and
simply watching the animations (free for all).
Overall the participants who simply watched the animations were able to recall
far more detail about the scene, although the generic nature of the task given to them
precluded a number from recalling such details as the colour of specific objects, for
106
example 47.5% could not recall the correct colour of the mug and 53.8% the correct
Percentage of people who answered what the
colour of the carpet was
colour of the carpet.
100
90
80
70
60
50
40
30
20
10
0
Blue
Other Colour
Don't
Remember
Blue
Watching Animation
Other Colour
Don't
Remember
Counting Pencils
Animation Conditions
HQ+HQ
HQ+LQ
LQ+HQ
HQ+SQ
SQ+HQ
Percentage of people who answered what the colour
of the mug was
Figure 5.11 a): How observant were the participants: colour of the carpet (outside the
foveal angle).
100
90
80
70
60
50
40
30
20
10
0
Red
Other Colour
Don't
Remember
Red
Watching Animation
Other Colour
Don't
Remember
Counting Pencils
Animation Conditions
HQ+HQ
HQ+LQ
LQ+HQ
HQ+SQ
SQ+HQ
Figure 5.11 b): How observant were the participants: colour of the mug (inside the
foveal angle).
107
5.2.4 Analysis
Since the response of the observers was binary, the appropriate method of statistical
analysis was the Chi-square test (X2) for significance. Standard linear regression models
or analysis of variance (ANOVA) are only valid on continuous data from normal
distributions, and therefore were not appropriate. By performing pair-wise comparisons
of all the other animations to the animations HQ+HQ it could be determined whether or
not the results were statistically significant (Figure 5.12).
When simply watching the animation, the test statistics for all the pair-wise
comparisons were statistically significant. For example, the result for the pair-wise
comparison of HQ+HQ and HQ+SQ was X2 = 32, df = 1, p < 0.005 (a p value of 0.05 or
less denotes a statistically significant result). In comparison when counting the pencils
the test statistics were significant for the pair-wise comparisons HQ+HQ with HQ+LQ
(X2 = 32, df = 1, p < 0.005). However, for the comparisons HQ+HQ with HQ+SQ the
results were statistically insignificant, and thus the null hypothesis was retained (X2 =
0.125, df = 1, p > 0.1). From this it can be concluded that when the observers were
counting the pencils, the HQ animations and the SQ animations produced the same
result, i.e. the observers thought that they were seeing the same animation twice, with
no alteration in rendering quality.
Pair-wise Comparison
X2 Value
(X2 3.84 for
significance)
df value
p value
Statistically
significant?
HQ+HQ watching
HQ+HQ counting
0
1
p > 0.1
HQ+LQ watching
HQ+LQ counting
4.333
1
p < 0.05
Null hypothesis is
retained
Significant (just!)
LQ+HQ watching
LQ+HQ counting
3.692
1
p < 0.05
Significant (just!)
HQ+SQ watching
HQ+SQ counting
28.125
1
p < 0.005
Significant
SQ+HQ watching
SQ+HQ counting
32
1
p < 0.005
Significant
HQ+HQ watching
HQ+LQ watching
32
1
p < 0.005
Significant
HQ+HQ watching
LQ+HQ watching
32
1
p < 0.005
Significant
HQ+HQ watching
HQ+SQ watching
32
1
p < 0.005
Significant
HQ+HQ watching
SQ+HQ watching
32
1
p < 0.005
HQ+HQ counting
HQ+SQ counting
0.125
1
p > 0.1
HQ+HQ counting
SQ+HQ counting
0
1
p > 0.1
Significant
Null hypothesis is
retained
Null hypothesis is
retained
Figure 5.12: Full results of statistical analysis using Chi-square test (X2) for
significance.
108
5.2.5 Verification with an eye-tracker
To make certain that the attention of the observer was indeed being captured by the task,
counting pencils, the experiment was briefly repeated with the Eyelink Eyetracking
System developed by SR Research Ltd. and manufactured by SensoMotoric
Instruments. Figure 5.13 shows an example of the scan path of one of the observers
whilst performing the counting pencils task for 2 seconds. While all the observers had
slightly different scan paths none of their eye scans left the green box (Figure 5.13).
Figure 5.14 shows an example of the scan path of one of the observers who was simply
watching the animation for 2 seconds. These results demonstrate that Inattentional
Blindness may in fact be exploited to significantly reduce the rendered quality of a large
portion of the scene without having any affect on the viewer’s perception of the scene.
Figure 5.13: An eye scan for an observer counting the pencils. The green crosses are
fixation points and the red lines are the saccades.
109
Figure 5.14: An eye scan for an observer who was simply watching the animation.
5.2.6 Conclusions
For virtual reality applications in which the task is known a priori the computational
savings of exploiting Inattentional Blindness can be dramatic. Our approach works by
identifying the area of user fixation, determined by the task being performed, rendering
this to a high quality and exploiting the Inattentional Blindness inherent in the human
visual system to render the rest of the scene at a significantly lower quality. Our results
show that while engaged in the task, users consistently failed to notice the quality
difference, and even objects, within the scene.
5.3 Non-Visual Tasks
The task tested previously in the experiment described in section 5.2 [Cater et al. 2002],
was visual, therefore it was not known what would happen if observers were given a
non-visual task. To answer this very question the experiment described in Section 5.2
was repeated but with a non-visual task instead. The task chosen for each user was to
110
count backwards from one thousand in steps of two. The results were then compared to
those of the visual task in Section 5.2. The results of this work was presented as a full
paper at SPIE 2003 in San Francisco [Cater et al. 2003a], and are also discussed in the
chapter written for the Visualization Handbook to appear summer 2004 [Chalmers and
Cater 2004].
5.3.1 The psychophysical experiment
The study involved the same three rendered animations, of the fly-through of the four
rooms. Again, as with the previous experiment, the only difference between the three
animations was the quality to which the individual animations had been rendered.
A total of 80 subjects were studied, each subject saw two animations of 35
seconds, displayed at 15 frames per second. All the subjects were asked to count
backwards from one thousand in steps of two, out loud. As subjects were a mix of
nationalities they were given a choice of language to count in so that the level of task
was the same for all nationalities and that language was not an issue. To minimize
experimental bias, the choice of condition to be run was randomized and for each, 8
were run in the morning and 8 in the afternoon, see Figure 5.15. Subjects had a variety
of experience with computer graphics and all exhibited at least normal or corrected to
normal vision in testing.
No. of Participants for
that condition
8
8
8
8
8
8
8
8
8
8
Total: 80
Condition
Counting Backwards
Ordering of Animations
(Non-Visual Task)
HQ+HQ
Counting Backwards
HQ+LQ
Counting Backwards
LQ+HQ
Counting Backwards
HQ+SQ
Counting Backwards
SQ+HQ
Counting Backwards
HQ+HQ
Counting Backwards
HQ+LQ
Counting Backwards
LQ+HQ
Counting Backwards
HQ+SQ
Counting Backwards
SQ+HQ
Counting Backwards
16 Participants per condition
Time of Day
Morning
Morning
Morning
Morning
Morning
Afternoon
Afternoon
Afternoon
Afternoon
Afternoon
Figure 5.15: The orderings of the conditions for randomization in the experiment.
Before beginning the experiment the subjects read a sheet of instructions on the
procedure of the task they were to perform, Appendix B.5. After the participant had
read the instructions they were asked to clarify that they had understood the task. They
then rested their head on a chin rest that was located 45cm away from a 17-inch
111
monitor. The chin rest was located so that their eye level was approximately level with
the centre of the screen. The animations were displayed at a resolution of 1280 x 1024.
The subjects were told to start counting backwards in steps of two out loud as
soon as the countdown shown at the beginning of the animation had finished. They were
shown the second animation immediately afterwards, again the subjects started counting
from one thousand backwards in steps of two after the countdown. On completion of the
experiment each participant was asked to fill in the same detailed questionnaire as
described in section 5.2.2, which can be seen in Appendix B.4. Again the participants
were then asked detailed questions about the objects in the rooms, their colour, location
and quality of rendering. They were specifically asked not to guess, but rather state
‘don’t remember’ when they had failed to notice some details.
5.3.2 Results
Once again the participants did not notice any difference in the rendering quality
between the two HQ animations (they were the same), Figure 5.16. Of interest is the
fact that when performing the non-visual task, counting backwards from one thousand
in steps of two, the majority of viewers noticed both the difference between the high
quality rendered animation and the low quality animation (HQ+LQ: 87.5% and
LQ+HQ: 75%) as well as noticing the difference between the high quality rendered
animation and the selective quality animation (HQ+SQ: 81.25% and SQ+HQ: 75%)
where the area around the mug was rendered to a high quality. Thus, it can be deduced
that when performing a non-visual task the majority of observers can still detect the
difference in rendering quality.
If these results are compared to those described in Section 5.2, where the
observers performed the counting pencils task, a visual task, apart from one case in the
SQ+HQ experiment, the viewers performing the visual task consistently failed to notice
any difference between the high quality rendered animation and the selective quality
animation. The same phenomenon occurs if the results from the questions what was the
colour of the mug and what was the colour of the floor are compared.
Figure 5.17 (a) and (b) show that having performed the non-visual task, the
majority of participants were unable to recall the correct colour of the mug (69%) and
even less the correct colour of the carpet (79%). However, this is still less than when
112
the participants were asked to perform the counting pencils task, with 90% unable to
recall the colour of the mug and 95% unable to recall the colour of the carpet. Overall,
the participants who simply watched the animations were able to recall far more detail
of the scenes than the non-visual task, although the generic nature of the task given to
them precluded a number from recalling such details as the colour of specific objects,
for example 47.5% could not recall the correct colour of the mug and 53.8% the correct
colour of the carpet. Therefore, it can be concluded that whilst performing a non-visual
task, participants can detect a rendering quality alteration but they are unable to recall
much detail about the scene due to having to perform the non-visual task.
Percentage of People who did notice the rendering quality
difference
100
90
80
70
60
50
40
30
20
10
0
HQ+HQ
HQ+LQ
LQ+HQ
HQ+SQ
SQ+HQ
Animation Conditions
Watching Animation
Counting Pencils
Non-Visual Task
Graph 5.16: Experimental results for the three tasks: Simply watching the animations,
the Visual task: Counting the pencils and the Non-visual task: Counting backwards
from 1000 in steps of 2.
113
90
80
70
60
50
40
30
20
10
Watching Animation
Counting Pencils
Don't
Remember
Other
Colour
Red
Don't
Remember
Other
Colour
Red
Other
Colour
Don't
Remember
0
Red
Percentage of people who answered what the
colour of the mug was
100
Non-Visual Task
Animation Conditions
HQ+HQ
HQ+LQ
LQ+HQ
HQ+SQ
SQ+HQ
Percentage of people who answered what the colour of the
carpet was
Figure 5.17 (a): How observant were the participants depending on the task: Colour of
the mug.
100
90
80
70
60
50
40
30
20
10
Watching Animation
Counting Pencils
Don't
Remember
Other
Colour
Blue
Don't
Remember
Other
Colour
Blue
Don't
Remember
Other
Colour
Blue
0
Non-Visual Task
Animation Conditions
HQ+HQ
HQ+LQ
LQ+HQ
HQ+SQ
SQ+HQ
Figure 5.17(b): How observant were the participants depending on the task: Colour of
the carpet.
114
5.3.3 Analysis
Statistical analysis shows that the results are statistically significant. The appropriate
method of statistical analysis was the t-test for significance and since each person had a
different random selection of the animations an unrelated t-test had to be used. By
performing pair-wise comparisons of all the other animations to the animations HQ+HQ
it could be determined whether or not the results were statistically significant, Figure
5.18.
Pair-wise Comparison
(t 2.042for
significance)
t Value
df
value
p value
Statistically
significant?
Null hypothesis is
retained
Significant (just!)
Null hypothesis is
retained
Highly Significant
HQ+HQ watching
HQ+HQ counting
0
30
p > 0.1
HQ+LQ watching
HQ+LQ counting
2.236
30
p < 0.05
LQ+HQ watching
LQ+HQ counting
1.464
30
p > 0.1
HQ+SQ watching
HQ+SQ counting
15
30
p < 0.05
SQ+HQ watching
SQ+HQ counting
15
30
p < 0.05
HQ+HQ watching
HQ+HQ non-visual
0
30
p > 0.1
HQ+LQ watching
HQ+LQ non-visual
1.464
30
p > 0.1
LQ+HQ watching
LQ+HQ non-visual
2.236
30
p < 0.05
HQ+SQ watching
HQ+SQ non-visual
1.861
30
p > 0.1
SQ+HQ watching
SQ+HQ non-visual
1.464
30
p > 0.1
HQ+HQ watching
HQ+LQ watching
30
HQ+HQ watching
LQ+HQ watching
30
HQ+HQ watching
HQ+SQ watching
30
HQ+HQ watching
SQ+HQ watching
15
30
p < 0.05
Highly Significant
Null hypothesis is
retained
Null hypothesis is
retained
Significant (just!)
Null hypothesis is
retained
Null hypothesis is
retained
Maximum
Significant
Maximum
Significant
Maximum
Significant
Highly Significant
HQ+HQ counting
HQ+LQ counting
6.708
30
p < 0.05
Significant
HQ+HQ counting
LQ+HQ counting
8.062
30
p < 0.05
HQ+HQ counting
HQ+SQ counting
1
30
p > 0.1
HQ+HQ counting
SQ+HQ counting
0
30
p > 0.1
HQ+HQ non-visual
HQ+HQ non-visual
HQ+HQ non-visual
HQ+HQ non-visual
HQ+LQ non-visual
LQ+HQ non-visual
HQ+SQ non-visual
SQ+HQ non-visual
10.247
6.708
8.062
6.708
30
30
30
30
p < 0.05
p < 0.05
p < 0.05
p < 0.05
Significant
Null hypothesis is
retained
Null hypothesis is
retained
Highly Significant
Significant
Significant
Significant
p < 0.05
p < 0.05
p < 0.05
Figure 5.18: Full results of statistical analysis using t-test for significance.
When the observers were performing the non-visual task, the test statistics for all
the pair-wise comparisons, were statistically significant. For a two-tailed test with the df
115
= 30, t must be
2.042 for significance with p < 0.05. The result for the pair-wise
comparison of HQ+HQ and HQ+SQ was t = 8.062, df = 30, p < 0.05 (a p value of 0.05
or less denotes a statistically significant result). In comparison when the observers were
performing the visual task, counting the pencils, the test statistics were significant for
the pair-wise comparisons HQ+HQ with HQ+LQ (t = 6.708, df = 30, p < 0.05).
However, for the comparisons HQ+HQ with HQ+SQ the results were statistically
insignificant, and thus the null hypothesis was retained (t = 1, df = 30, p > 0.1). From
this it can be concluded that when the observers were performing the visual task the
HQ+HQ animations and the HQ+SQ animations produced the same result, i.e. the
observers thought that they were seeing the same animation twice, with no alteration in
rendering quality. However, when the observers were performing the non-visual task
the result was significantly different, i.e. the observers could distinguish that they were
shown two differently rendered animations. If statistical analysis is also run on the pairwise comparison of watching animation (no task) and counting backwards (non-visual
task), the results are statistically insignificant, for example for the comparison between
watching animation HQ+SQ with counting backwards HQ+SQ the null hypothesis is
retained as t = 1.4638501, df = 30, p > 0.1, thus concluding there is no significant
difference between the results caused when watching the animation and those caused
when counting backwards.
5.3.4 Verification with an eye-tracker
Again, as done in Section 5.2, to compare the scan paths made by subjects performing
the counting backwards task the experiment was briefly repeated with the Eyelink
Eyetracking System. The results were then compared with previous scan paths recorded
when subjects were asked to count pencils or simply to watch the animation. Figure
5.19 shows an example of the scan path of one of the observers whilst performing the
counting backwards task for 2 seconds, note how less fixations are on actual objects,
this is because the observers are having to multi-task and thus less effort can be put into
simply watching the animation. This image can then be compared to the two eye-scan
path images produced whilst performing the counting pencils task for 2 seconds, Figure
5.13, and produced whilst simply watching the animation for 2 seconds, Figure 5.14.
116
Figure 5.19: An eye scan for an observer counting backwards. The green crosses are
fixation points and the red lines are the saccades.
5.3.5 Conclusions
For a non-visual task it is very difficult to predict where the observers are going to look,
although a non-visual task does affect the eye-scan path of an observer it does not take
control, i.e. guide the eyes only to specific areas of the scene, as can be see in Figure
5.19. Therefore using a top-down processing method to render the scene for a nonvisual task scenario is impractical. A more appropriate method would be one based on
bottom-up processing, such as the one described by Yee et al. [2001], or by using a
method which uses an eye-tracker, such as that proposed by McConkie and Loschky
[2000].
5.4 Inattentional Blindness versus Peripheral Vision
Sections 5.2 and 5.3 showed that conspicuous objects in a scene that would normally
attract the viewer’s attention are ignored if they are not relevant to the visual task at
hand. This section investigates whether this failure to notice quality difference when
117
performing a visual task is merely due to the loss of visual acuity in our peripheral
vision, or is indeed due to Inattentional Blindness.
5.4.1 Task Maps: Experimental Validation
This section demonstrates Inattentional Blindness experimentally in the presence of a
high-level task focus. The hypothesis, based on the previous findings in this thesis, was
that viewers would not notice what would be normally visible degradations in an image
when instructed to perform a task, as long as the objects which were being instructed to
find were not affected by the degradation. The experiments performed in this section
confirmed this hypothesis with a high level of certainty and were presented along with
the rendering framework (Chapter 6) at the Eurographics Symposium on Rendering in
Belgium [Cater et al. 2003c].
A pre-pilot study was run to refine the experimental procedure as well as which
method of processing was to be used. Initially Alias Wavefront Maya was used to
design a scene in which a task, counting the number of teapots in a scene, could be
carried out, Figure 5.20. However it was decided that a scene implemented in Radiance
would be more appropriate due to the fact that the long term goal was to develop a
selective rendering framework based on the global illumination package of Radiance,
Figure 5.21.
Figure 5.20: Images to show the initial experimental scene developed in Alias
Wavefront Maya.
118
Percentage of Participants that noticed the
difference between a 3072 Resolution Image (HQ)
and the Low Quality Rendered Image (LQ)
Figure 5.21: Image to show the final experimental scene developed in Radiance
rendered with a sampling resolution of 3072x3072, in the experiment this is referred to
as High Quality (HQ).
100
100
90
80
71.9
70
60
50
43.8
40
30
20
10
0
0
256
512
768
1024
1280
1536
1792
2048
2304
2560
2816
3072
Resolution of Low Quality Rendered Image (LQ)
Figure 5.22: Results from the pilot study: determining a consistently detectable
rendering resolution difference.
119
A pre-study was run with 10 participants, who were asked to count the number
of teapots in a computer rendered image, to find out how long subjects took to count
them correctly. This was found to be 2 seconds on average. In the previous experiments
the low quality was chosen simply by personal choice, however by using this method it
couldn’t be justified that all the participants could easily detect the chosen low quality
resolution. To eliminate this experimental flaw a pilot study was conducted to deduce
the appropriate detectable image resolution to use for the main experiment. Therefore
32 participants were shown 24 pairs of images at random, and asked if they could
distinguish a change in resolution or quality between the two images. Each image was
displayed for 2 seconds. One image was always the High Quality image rendered at a
3072x3072 sampling resolution, whilst the other image was one selected from images
rendered at sampling resolutions of 256x256, 512x512, 768x768, 1024x1024,
1536x1536 and 2048x2048. In half of the pairs of images, there was no change in
resolution; i.e., they saw two 3072x3072 resolution images. The results can be seen in
Figure 5.22.
All the participants could easily detect a quality difference with the resolutions
of 256x256 through to 1024x1024 in comparison to a resolution of 3072x3072. 72%
still detected a quality difference between a resolution image of 1536x1536 and
3072x3072, this just being under the 75% probability that an observer would notice a
difference, declared as one JND (Just Noticeable Difference) [Lubin 1997], and thus it
could have been an acceptable resolution to use for the main experiment. However, it
was decided that a resolution of 1024x1024 would be used in the main study as 100% of
participants in the pilot study detected the difference, thus guaranteeing that when the
participants are not performing a task the difference in resolution was easy to detect.
The main study involved two models of an office scene, the only difference
being the location of items in the scene, mainly teapots (Figure 5.21). Each scene was
then rendered to three different levels of resolution quality, the entire scene at High
Quality (HQ), a sampling resolution of 3072x3072 (Figure 5.23a), the entire scene at
Low Quality (LQ), a sampling resolution of 1024x1024 (Figure 5.23b), and Selective
Quality (SQ).
120
Figure 5.23: Sampling resolutions: a(left) 3072x3072 (HQ), b(right) 1024x1024(LQ)
Fig 5.23: Sampling resolutions: c(left) 768x768 d(right) 512x512
5.4.2 Creating the Selective Quality (SQ) images
The Selective Quality (SQ) images were created by selectively rendering the majority of
the scene in low quality (1024x1024) apart from the visual angle of the fovea (2°)
centered on each teapot, shown by the black circles in Figure 5.24, which were rendered
at the higher rate corresponding to a 3072x3072 sampling.
As the selective rendering framework had not yet been accomplished, as the
exact implementation was waiting on the results of this experiment, a way of simulating
the same effect in Radiance had to be created. This was done by using a selection of
Radiance functions. First the office scene was created in Radiance, scene.all, along with
a scene which had only five circles located in the viewing plane, spheres.rad. The
location and size of these circles in this file was very important for they had to a) be the
size of the foveal angle, from 45cm away, and b) match up with the location of each of
the teapots in the scene from the viewing plane. These circles can be seen in Figure
121
5.25, which also shows the geometry of the office scene as well as the viewing plane.
The spheres file then had to be rendered to high quality (3072x3072), Figure 5.26 (left)
using vwrays, whilst the scene file was rendered to low quality (1024 x 1024) using
rpict. The two files were then combined using pcomb, which blended the two files
together, simulating the degradation in acuity around the foveal angle, as can be seen
below.
oconv spheres.rad > spheres.oct
oconv scene.all > room.oct
vwrays -vf default.vf -x 3072 -y 3072 -ff \
| rtrace -w -h -ff -opd spheres.oct \
| rtrace -ffc `vwrays -d -vf default.vf -x 3072 -y 3072`room.oct \
| pfilt -1 -x /3 -y /3 -e +2 > highdetailfilt.pic
rpict -pj 0 -x 1024 -y 1024 -vf default.vf spheres.oct > alpha.pic
rpict -pj 0 -x 2048 -y 2048 -vf default.vf room.oct \
| pfilt -1 -x /2 -y /2 -e +2 > lowdetailfilt.pic
pcomb -e '
s1=gi(3);s2=1-s1;'\
-e '
ro=s1*ri(1)+s2*ri(2);go=s1*gi(1)+s2*gi(2);bo=s1*bi(1)+s2*bi(2)'\
lowdetailfilt.pic highdetailfilt.pic alpha.pic > combined.pic
Figure 5.24: Selective Quality (SQ) image showing the high quality rendered circles
located over the teapots (black).
122
The High Quality (HQ) images took 8.6 hours to render with full global
illumination in Radiance [Ward 1994] on a 1 GHz Pentium processor, whilst the images
for the Low Quality (LQ) were rendered in 4.3 hours, i.e. half the time of the HQ
images, and the Selective Quality (SQ) images were rendered in 5.4 hours.
Viewing Plane
Figure 5.25: The circles in the viewing plane from which the high quality fovea circles
are generated.
Figure 5.26: The high quality fovea circles (left) are then composited automatically
with the low quality image, adding a glow effect (blend) around each circle to reduce
any pop out effects, resulting in the Selective Quality image (SQ) (right).
5.4.3 Experimental Methodology
In the study, a total of 96 participants were considered. Each subject saw two images,
each displayed for 2 seconds. Figures 5.27a and 5.27b describe the conditions tested
with 32 subjects for the HQ+HQ condition and 16 subjects for the other conditions.
From the pilot study it was known that all participants should 100% be able to detect the
123
rendering quality difference if given no task; i.e. they are simply looking at the images
for 2 seconds, thus there was no need to repeat the watching conditions. The task chosen
to demonstrate the effect of Inattentional Blindness had the subjects counting teapots
located all around the scene. There were 5 teapots in both images. By placing teapots
and similar looking objects all over the scene, it was possible to see whether or not
having to scan the whole image, and thus fixate on low quality as well as high quality
regions, would mean that the viewers would indeed be able to detect the rendering
quality difference.
Acronym
HQ
LQ
SQ
Description
High Quality: Entire frame rendered at a sampling resolution of
3072x3072.
Low Quality: Entire frame rendered at a sampling resolution of
1024x1024.
Selective Quality: A sampling resolution of 1024x1024 all over
the image apart from in the visual angle of the fovea (2°) centred
around each teapot, shown by the circles in Figure 5.24, which
were rendered to a sampling resolution of 3072x3072.
Figure 5.27a: The three different types of images being tested. The ordering image
pairs shown in the experiment were: (1) HQ+HQ, (2) HQ+LQ, (3) LQ+HQ, (4) HQ+SQ
and (5) SQ+HQ.
No. of Participants for
that condition
16
8
8
8
8
16
8
8
8
8
Total: 96
Condition
Time of Day
Counting (Task) /
Ordering of Animations
Watching (No Task)
HQ+HQ
Counting
Morning
HQ+LQ
Counting
Morning
LQ+HQ
Counting
Morning
HQ+SQ
Counting
Morning
SQ+HQ
Counting
Morning
HQ+HQ
Counting
Afternoon
HQ+LQ
Counting
Afternoon
LQ+HQ
Counting
Afternoon
HQ+SQ
Counting
Afternoon
SQ+HQ
Counting
Afternoon
32 Participants per condition when the results are combined for HQ+LQ and
LQ+HQ as well as HQ+SQ and SQ+HQ
Figure 5.27b: The orderings of the conditions for randomisation in the experiment.
To minimize experimental bias, the choice of which condition to run was
randomized and for each; 8 were run in the morning and 8 in the afternoon. Subjects
had a variety of experience with computer graphics, and all exhibited normal or
corrected to normal vision in testing.
124
Before beginning the experiment, the subjects read a sheet of instructions on the
procedure of the task they were to perform, Appendix C.2. After each participant had
read the instructions, they were asked to clarify that they understood the task. They then
placed their head on a chin rest that was located 45cm away from a 17-inch monitor, see
Figure 5.28. The chin rest was located so that their eye level was approximately level
with the centre of the screen. The participants’ eyes were allowed to adjust to the
ambient lighting conditions before the experiment was begun. The first image was
displayed for 2 seconds, then the participant stated out loud how many teapots they saw.
Following this, the second image was displayed for 2 seconds, during which the task
was repeated.
Figure 5.28: Image to show the experimental setup.
Immediately following the experiment, each participant was asked to fill out a
detailed questionnaire. This questionnaire asked for some personal details including
age, sex, and level of computer graphics knowledge. The participants were then asked
detailed questions about the quality of the two images they had seen. There were a
series of questions all about whether they noticed any difference at all between the
images, as can be seen by the full questionnaire in Appendix C.3. This was so that no
matter what experience of computer graphics the participant would be jogged to answer
‘yes’ to at least one of the questions if they had perceived a change. If they had noticed
125
a difference in quality it was then possible to identify what had triggered it from the
selection of alternatives.
On completion of the questionnaire the subjects were finally shown a high
quality and a low quality image side-by-side and asked which one they saw for the first
and second displayed images, even if the observers hadn’t reported seeing any
difference. This was to confirm that participants had not simply failed to remember that
they had noticed a quality difference, but actually could not distinguish the correct
image when shown it from a choice of two, i.e. to guarantee that the observers had not
in fact perceived without awareness or perceived and quickly forgotten that they had
noticed a difference.
5.4.4 Results
Figure 5.29 shows the overall results of the experiment. Obviously, the participants did
not notice any difference in the rendering quality between the two HQ images (they
were the same). Of interest is the fact that, apart from two cases in the HQ/SQ condition
(HQ+SQ and SQ+HQ), the viewers performing the task consistently failed to notice any
difference between the HQ rendered image and the SQ image. Surprisingly, nearly 20%
of the viewers in the HQ/LQ (HQ+LQ and LQ+HQ) condition were so engaged in the
task that they failed to notice any difference between these very different quality
images.
Mack and Rock [1998] state that ‘It is to be assumed that attention normally is
to be paid to objects at fixation, however when a visual task requires attending to an
object placed at some distance from a fixation, attention to objects at the fixation might
have to be actively inhibited.’ This then could explain the fact that Inattentional
Blindness occurs to a great extent even when an object rendered to low quality occurs at
a fixation, the rendering quality of that object is simply inhibited from the observer’s
attention due to having to perform a visual task, counting the teapots.
126
Percentage of People who Did notice the rendering
resolution quality difference
100
90
80
70
60
50
40
30
20
10
0
HQ/HQ
HQ/LQ
HQ/SQ
Image Conditions
Counting Teapots
Watching Images (pilot study)
100
90
80
70
60
50
40
30
20
10
0
Va
se
s
Pe
n
C s
ha
ir
Bo s
o
Te ks
ap
Pi ots
ct
u
C res
ra
yo
n
Pa s
le
A s tte
C htra
om y
pu
te
Ph r
o
To ne
y
C
a
Vi r
de
M
Bo o
rP
ttl
es
ot
at
o Clo
H
ea ck
Si
d
lv
er
Te Toy
Ph a
ot cup
o
Fr s
am
e
Percentage of People that Selected that Item being in the
Image
Figure 5.29: Experimental results for the two tasks: counting the teapots vs. simply
looking at the images.
List of Items Participants Could Select From
Figure 5.30: Experimental results for asking the participants what objects there were in
the scene, for the counting teapots criteria only.
127
As well as asking questions about the quality of the images in the questionnaire
there was also a question which asked the participants to select from a list of 18 objects
which were actually in the scene, Appendix C.3. Of the 18 objects listed there were 9
that existed in the scene and 9 that did not i.e. were ‘red herrings’. Figure 5.30 shows
the results of the selections made when the participants were performing the counting
teapots task. The objects that were and weren’t in the scene are listed in Figure 5.31.
Objects located in the
scene
Vases
Chairs
Books
Teapots
Pictures
Phone
Video
Bottles
Teacups
Number of times that
object was selected
28
13
22
32
12
7
1
11
16
Objects NOT located in the
scene
Pens
Crayons
Palette
Ashtray
Computer
Toy Car
Clock
Mr Potato Head Toy
Silver Photo Frame
Number of times that
object was selected
7
1
0
0
8
1
4
0
9
Figure 5.31: List of the objects that were and were not in the scene.
From Figures 5.30 and 5.31, it can be seen that participants were more likely to
list objects that are commonly found in an office type of scene (chairs, books, pictures,
pens, silver photo frame, clock and a computer). However from the results it can be seen
that participants did not have a problem in selecting correctly the right objects in the
scene, thus showing that the participants did not have a problem in remembering aspects
about the image. This backs up the theory that participants were not registering that the
quality of the images had been altered due to their lack of memory of it, for as can be
seen their memory of other aspects in the scene was fine. Therefore it can be concluded
that the participants had not perceived and then forgotten the quality difference but had
indeed suffered Inattentional Blindness to the quality difference.
This was true for all of the criteria apart from the case of the pens, silver photo
frame, clock and computer. As stated this was probably due to participants assuming
that these objects would be in an office type of scene such as this one. Therefore it is
hypothesized that these were more likely guesses rather than the participants definitely
having thought they saw this object in the scene. To avoid having this result in future
experiments participants could be asked to select on a scale their ranking of how
confident they feel that the object is in the scene. From this scale it could then be seen if
the participants had truly perceived, i.e. were truly confident, or were guessing. The
most frequently selected objects, apart from the teapots, were ones that had probably
128
been fixated on to see whether or not they were teapots and thus should be counted
(vases, bottles and teacups).
5.4.5 Statistical Analysis
Statistical analysis shows where the results of this experiment are significant. The
appropriate method of analysis is a “paired samples” t-test for significance, and since
each subject had a different random selection of the images, an unrelated t-test was
applied [Coolican 1999]. By performing comparisons of the other image pairings to the
HQ/HQ data, it could be determined whether the results were statistically significant or
not, Figure 5.32.
When the observers were counting teapots, the difference between HQ/HQ and
HQ/LQ counts were statistically very significant. For a two-tailed test with the df = 62
(df is related to the number of subjects), t must be
2.0 for significance with p < 0.05
(less than 5% chance of random occurrence). The result for the pair-wise comparison of
HQ/HQ and HQ/LQ was t = 11.59 with p < 0.05.
Pair-wise Comparison
(t 2.0 for
significance)
t Value
df
value
p value
HQ/HQ watching
HQ/HQ counting
0
62
p > 0.1
HQ/LQ watching
HQ/LQ counting
2.67
62
p < 0.05
HQ/SQ watching
HQ/SQ counting
21.56
62
HQ/HQ watching
HQ/LQ watching
62
p < 0.05
p < 0.05
HQ/HQ watching
HQ/SQ watching
62
HQ/HQ counting
HQ/LQ counting
11.59
62
p < 0.05
HQ/HQ counting
HQ/SQ counting
1.44
62
p > 0.1
p < 0.05
Statistically
significant?
Null hypothesis is
retained
Significant (just!)
Highly Significant
Maximum
Significant
Maximum
Significant
Significant
Null hypothesis is
retained
Figure 5.32: Full results of statistical analysis using t-test for significance.
However, if the statistics are analyzed on the pair-wise comparison of HQ/HQ
and HQ/SQ, the results are not statistically significant – the null hypothesis is retained,
as t = 1.44, df = 62, and p > 0.1. From this it can be concluded that when observers were
counting teapots, the HQ/HQ images and the HQ/SQ images produced the same result;
i.e., the observers thought they were seeing the same pair twice, with no alteration in
rendering quality. However, when the observers were simply looking at the images
without searching for teapots in the pilot study, the result was significantly different;
129
i.e., the observers could distinguish that they were shown two images rendered at
different qualities.
An additional experiment was run to see at what value the results became
significantly different from the HQ resolution of 3072x3072. At a sampling resolution
of 768x768 (Figure 5.23c) the results were only just significant, t = 2.95, df = 62, and
p < 0.05, i.e., only 7 participants out of the 32 people studied noticed the difference
between the high quality image and a selectively rendered image whilst performing the
teapot counting task. This only increased to 8 people out of 32 when the sampling
resolution was dropped again to 512x512 (Figure 5.23d)!
5.4.6 Verification with an Eye-tracker
Once again, as performed in the previous experiments the experiment was repeated
using the Eyelink Eyetracking System to confirm that the attention of an observer was
being fully captured by the task of counting teapots. Figure 5.33 shows an example of a
scan path of an observer whilst performing the counting teapots task for 2 seconds.
Whilst all the observers had slightly different scan paths across the images, they fixated
both on the teapots and on other objects as well. The vases were the most commonly
non-teapot object fixated upon due to the fact they were the most similar looking item to
a teapot in the scene. Participants therefore made fixations on non-teapot objects in the
image to make sure whether or not they were in fact a teapot, but could not distinguish
the different quality to which they were rendered.
The time the observers spent fixating on the teapots was slightly longer than the
other fixations made when looking at the images. Thus it could be hypothesized that
maybe the length of fixation is important to the human’s ability to register the quality
for each aspect they perceive. However as the primary use for using the eye-tracker was
to confirm whether or not the tasks were capturing the observers’ attention, by looking
at their eye-saccades, only a few participants were experimented with. Thus to conclude
significantly with statistics whether this is the case another experiment would have to be
run with at least 16 participants, and the eye tracking data analysed thoroughly.
Figure 5.34 shows the perceptual difference between the Selective Quality (SQ)
and Low Quality (LQ) images computed using Daly’s Visual Difference Predictor
130
[Daly 1993; Myszkowski 1998]. The recorded eye-scan paths clearly cross, and indeed
fixate, on areas of high perceptual difference.
It can therefore be concluded that the failure to distinguish the difference in
rendering quality between the teapots, selectively rendered to high quality, and the other
low quality objects, is not due purely to peripheral vision effects. The observers are
fixating on low quality objects, but because they are not relevant to the given task of
counting teapots, they fail to notice the reduction in rendering quality. This is
Inattentional Blindness.
The results presented in this chapter demonstrate that Inattentional Blindness,
and not just peripheral vision, may be exploited to significantly reduce the rendered
quality of a large portion of the scene without having any significant effect on the
viewer’s perception of the scene.
Figure 5.33: An eye scan for an observer counting the teapots. The X’s are fixation
points and the lines are the saccades.
131
Figure 5.34: Perceptual difference between SQ and LQ images using VDP
[Daly 1993]. Red denotes areas of high perceptual difference.
5.5 Summary
In this chapter the focus of the research was changed from Change Blindness, to that of
Inattentional Blindness. Experiments were developed and run to determine if
Inattentional Blindness does in fact work with computer graphics imagery. The
experiment revolved around showing two 30s animations to an observer when they
were either performing a visual task or not, and then asking them a series of questions
on what they had just seen. What the results showed was that when the observers were
performing the task they only perceived the quality of the task-related objects i.e. they
did not notice when the rest of the scene was rendered to a lower resolution. However
132
when observers were asked simply to watch the animation, they perceived the quality
difference easily.
The experiment was then rerun using a non-visual task, that of counting
backwards from 1000 in steps of two. It was found that when performing a non-visual
task at least 75% of the observers could still detect when there was a change in
alteration of rendering quality. This shows that for this phenomenon of Inattentional
Blindness to be exploited, the task must be a visual one.
From the results it can be seen that if the task was in a fixed location, such as the
pencils in the mug, then the observers do not perceive an alteration in quality in the rest
of the scene, but could this be due to the human visual system’s poor ability to detect
acuity in the periphery, not due to Inattentional Blindness? As a result on its own this
did not matter, for the principle could be still used as a selective renderer. However for
completeness, it was deemed important to take this a step further and answer the
following question: What happens if the task is all over a scene, thus what happens
when observers actually fixate on the low quality rendering if it’s not related to the
task? This would prove whether or not Inattentional Blindness is the key to be used in a
selective render to save time computationally or whether peripheral vision is sufficient.
A final experiment was run to confirm that the failure of viewers to notice
quality difference in a selective rendered scene was due to inattentional blindness and
not simply peripheral vision. To perform the task of counting the teapots in the scene,
the participant’s eyes crossed the scene and fixated on both task and non-task objects.
When performing the visual task the viewers consistently failed to notice the quality
difference in the various parts of the scene even those non-task related objects upon
which they had fixated. When, however, viewers were simply looking at the scene,
100% of them noticed the quality differences.
The next chapter shows how Inattentional Blindness can be incorporated into a
working selective rendering framework for computing animations.
133
Chapter 6
An Inattentional Blindness Rendering Framework
The framework discussed in this chapter was achieved in collaboration with Greg Ward,
inventor and creator of the global illumination package Radiance [Ward 1994]. This
chapter shows that Inattentional Blindness can be used as a basis for a selective
rendering framework [Cater et al. 2003c]. A view-dependent system was created that
determines the regions of interest that a human observer is most likely going to attend to
whilst performing a visual task.
The results from the experiments discussed in Chapters 4 and 5 showed that
Inattentional Blindness can indeed be used to selectively render the scene without the
observers noticing any difference in quality whilst they were performing a visual task.
However, all the experiments were run by either using image processing code post
rendering or by altering code at the command line, as described in the appropriate
chapters. Thus, the next stage was to incorporate the techniques in a selective renderer.
A ‘just in time’ rendering system was designed. Such a ‘just in time’ rendering system
allows the user to be able to specify how long the framework has to spend on each
frame, i.e. each frame is calculated within a specified time budget.
The system designed not only uses the theories of Inattentional Blindness, but
also, for computational efficiency, image based rendering techniques as well. It was also
important that the system should be general so that it could be used not just for
Radiance, but could be adapted for raytracing, radiosity, and to multi-pass hardware
rendering techniques, allowing for the potential of real-time applications.
134
From the experiments performed in this thesis it is known that selective
rendering is cost effective for briefly viewed still images, and, in fact, task focus seems
to override low-level visual attention when it comes to noticing artefacts. In the more
general case of animated imagery, this can take even greater advantage of Inattentional
Blindness, because it is known that the eye preferentially tracks task objects at the
expense of other details [Cater et al. 2002]. Using Daly’s model of human contrast
sensitivity for moving images [Daly 1998; 2001], and Yee’s insight to substitute
saliency for movement-tracking efficacy [Yee 2000], the a priori knowledge of tasklevel saliency can be applied to optimise the animation process. It was decided to follow
this approach, of combining the three methods, rather than just implementing a task
based framework due to that fact that the method of using Inattentional Blindness falls
down if an observer isn’t performing the task 100%. As most tasks being performed in
computer graphics are over sustained periods the likelihood of an observer attending
avidly (100%) to the task over the whole time period is slim [Most et al. 2000]. Thus if
these methods are combined if an observer changes his/her focus of attention to a salient
non-task object briefly then the user will continue not to notice any difference due to
this object also being covered by this framework.
The approach that is described in this chapter has a number of key advantages
over previous methods using low-level visual perception. First, task-level saliency is
very quick to compute, as it is derived from a short list of important objects and their
known whereabouts. Second, this framework introduces a direct estimate of pixel error
(or uncertainty), avoiding the need for expensive image comparisons and Gabor filters
as required by other perceptually based methods [Yee et al. 2001; Myszkowski et al.
2001]. Third, the animation frames are rendered progressively, enabling the ability to
specify exactly how long a user is willing to wait for each image, or stopping when the
error has dropped below the visible threshold. Frames are still rendered in order, but the
time spent refining the images is under the control of the user. The implementation of
this framework proposed in this chapter is suitable for quick turnaround animations at
about a minute per frame, but it is also proposed that this method could be used to
achieve interactive and real-time renderings such as those of Parker et al. [1999] and
Wald et al. [2002].
A frame may be refined by any desired means, including improvements to
resolution, anti-aliasing, level of detail, global illumination, and so forth. In the
135
demonstration of this system, primarily focus is on resolution refinement (i.e.
samples/pixel), but greater gains are possible by manipulating other rendering variables
as well.
6.1 The Framework
The diagram shown in Figure 6.1 shows an overview of the system. The boxes represent
data, and the ovals represent processes. The inputs to the system, shown in the upper
left, are the viewer’s known task, the scene geometry, lighting, and view, all of which
are a function of time. The processes shown outside the “Iterate” box are carried out just
once for each frame. The processes shown inside the box may be applied multiple times
until the frame is considered “ready,” by whatever criteria that is set. In most cases, a
frame is called ready when the time allocated has been exhausted, but the iteration can
also be broken from the point when the error conspicuity (EC) drops below a certain
threshold over the entire image. The framework is designed to be general, and the
implementation presented here is just one realization.
Input:
• Task
• Geometry
• Lighting
• View
High-level
Vision
Model
Geometric
Entity
Ranking
Object Map
& Motion
Lookup
Task Map
First
Order
Render
Contrast
Sensitivity
Model
Current Frame &
Error Estimate
Iterate
Frame
Ready?
No
Error
Conspicuity
Map
Refine Frame
Yes
Output Frame
Last Frame
Figure 6.1: A framework for progressive refinement of animation frames using tasklevel information.
136
The high-level vision model, Figure 6.1, takes the task and geometry as input,
and produces a table quantifying relative object importance for this frame. This is called
geometric entity ranking. Specifically, a table of positive real numbers is derived, where
zero represents an object that will never be looked at, and 1 is the importance of scene
objects unrelated to the task at hand. Normally, only task-relevant objects will be listed
in this table, and their importance values will typically be between 1.5 and 3, where 3 is
an object that must be followed very closely in order to complete the task.
For the first order rendering, any method may be used that guarantees to finish
before the time is up. From this initial rendering, an object map and depth value for
each pixel is needed. If subsampling is applied, and some pixels are skipped, it is
important to separately project the scene objects onto a full resolution frame buffer to
obtain this map. The pixel motion map, or image flow, is computed from the object map
and the knowledge of object and camera movement relative to the previous frame. The
object map is also logically combined with the geometric entity ranking to obtain the
task map. This is usually accessed via a lookup into the ranking table, and does not
require actual storage in a separate buffer.
Once a first order rendering of our frame is achieved and a map with the object
ID, depth, motion, and task-level saliency at each pixel is established, the system can
proceed with image refinement. First, the relative uncertainty in each pixel estimate is
computed. This may be derived from the knowledge of the underlying rendering
algorithm, or from statistical measures of variance in the case of stochastic methods.
This was first thought that this might pose a serious challenge, but it turned out to be a
modest requirement, for the following reason. Since there is no point in quantifying
errors that the system cannot correct for in subsequent passes, it only needs to estimate
the difference between what the system has and what might be achieved after further
refinement of a pixel. For such improvements, the system can usually obtain a
reasonable bound on the error. For example, going from a calculation with a constant
ambient term to one with global illumination, the change is generally less than the
ambient value used in the first pass, times the diffuse material colour. Taking half this
product is a good estimate of the change that might be seen in either direction by
moving to a global illumination result. Where the rendering method is stochastic, it is
possible to collect neighbour samples to obtain a reasonable estimate of the variance in
each pixel neighbourhood and use this as the error estimate [Lee et al. 1985]. In either
137
case, error estimation is inexpensive as it only requires local information, plus the
knowledge of the scene and the rendering algorithm being applied.
With the current frame and error estimate in hand, the system can make a
decision whether to further refine this frame, or finish it and start the next one. Figure
6.2 shows an example of a frame with no refinement iterations. This “frame ready”
decision may be based as stated earlier on time limits or on some overall test of frame
quality. In most cases, the system will make at least one refinement pass before it moves
on, applying Image-Based Rendering (IBR) to gather useful samples from the previous
frame and adding them to this one. See Figure 6.3 for an example of the difference that
IBR makes.
Figure 6.2: A frame from our renderer with no refinement iterations at all.
138
Figure 6.3: The same frame as Figure 6.2 after the IBR pass, but with no further
refinement.
In an IBR refinement pass, the object motion map is used to correlate pixels
from the previous frame with pixels from this frame. This improves the ability to decide
when and where IBR is likely to be beneficial. This framework doesn’t however rely on
IBR to fully cover the image; it is only used to fill in areas where the framework might
otherwise have had to generate new samples. Holes therefore are filled in by the normal
sampling process. Thus the selection of replacement pixels is based on the following
heuristics:
1. The pixel pair in the two frames corresponds to the same point on the
same object, and does not lie on an object boundary.
2. The error estimate for the previous frame’s pixel must be less than the
error estimate for the current frame’s pixel by some set amount. (15%
is used.)
3. The previous frame’s pixel must agree with surrounding pixels in the
new frame within some tolerance. (A 32% relative difference is used.)
The first criterion prevents from using pixels from the wrong object or the
wrong part of the same object. Position correspondence is tested by comparing the
139
transformed depth values, and for object boundaries by looking at neighbouring pixels
above, below, right, and left in the object map. The second criterion prevents from
degrading the current frame estimate with unworthy prior pixels. 15% was deemed
reasonable by experimenting with different values until a logical percentage was
obtained. The third criterion reduces pollution in shadows and highlights that have
moved between frames, though it also limits the number of IBR pixels taken in highly
textured regions. If a pixel from the previous frame passes these three tests, the current
pixel estimate is overwritten with the previous one, and the error is reset to the previous
value degraded by the amount used for the second criterion. In this way, IBR pixels are
automatically retired as the system moves from one frame to the next. 32% was deemed
a reasonable percentage for the relative difference agreement by experimenting with
different values until a balance had been achieved between the number of pixels being
recomputed against the artifacts in the animation. Reprojection is handled by combining
the object motion transform with the camera transform between the two frames, and
reprojecting the last frame’s intersection point to the corresponding position on the
object in the new frame. From there, z-buffering is performed to make sure that it is the
front pixel.
For this next paragraph let it be assumed that there is time for further refinement.
Once the system has transferred what samples the system can using IBR, it determines
which pixels have noticeable, or conspicuous, errors so that it may select these for
improvement. Here the spatiotemporal contrast sensitivity function (CSF) defined by
Daly [Daly 1998; 2001] is combined with the task-level saliency map. Daly’s CSF
model is a function of two variables, spatial frequency in cy/deg, ρ, and retinal velocity
in deg/sec, vR:
CSF ( ρ , v R )=k ⋅ c0 ⋅ c2 ⋅ v R ⋅ (c1 2πρ ) 2 exp −
where:
k = 6.1+ 7.3 log(c 2v R /3)
ρmax =45.9 /(c2vR +2)
140
3
c1 4πρ
ρ max
c0 =1.14,c1 =0.67,c2 =1.7 for CRT at 100 cd/m2
Constants c0, c1 and c2 are modifications that Daly made to the original model
proposed by Kelly [1979], all originally being set to be equal to 1.0. These constants
allow for fine tuning, and can be adjusted to give a peak sensitivity near 250 and a
maximum spatial frequency cut-off near 30 cy/deg, so its maximum spatial performance
is closer to the results from light levels greater than 100 cd/m2, thus by making c0 =
1.14, c1 = 0.67 and c2 = 1.7 the equation is more applicable to current CRT displays
[Daly 1998]. The term k is primarily responsible for the vertical shift of the sensitivity
as a function of velocity, while the term
max
controls the horizontal shift of the
function’s peak frequency. The results of this spatiovelocity model for a range of
different traveling wave velocities are shown in Figure 6.4. The model’s overall results
are consistent with both the vast drop in sensitivity for saccadic eye movements which
have velocities greater than 160 deg/sec, as well as near-zero sensitivity with
traditionally stabilized imagery (i.e., zero velocity and zero temporal frequency).
Finally, the CSF for a retinal velocity of 0.15 deg/sec is close to that for the
conventional static CSF with natural viewing conditions.
Figure 6.4: CSFs for different retinal velocities [Daly 1998].
Following the work proposed by Yee [2000], the system substitutes saliency for
movement-tracking efficiency, based on the assumption that the viewer pays
141
proportionally more attention to task-relevant objects in their view. The equation for
retinal image velocity, vR, (in °/second) thus becomes:
v R = v I −min(v I ⋅ S /Smax + v min , v max )
where:
vI = local pixel velocity (from motion map)
S = task-level saliency for this region
Smax = maximum saliency in this frame, but not less than 1/0.82
vmin = 0.15°/sec (natural drift velocity of the eye [Kelly 1979])
vmax = 80°/sec (maximum velocity that the eye can track efficiently [Daly
1998])
The eye’s movement tracking efficiency is computed as S/Smax, which assumes
the viewer tracks the most salient object in view perfectly. Daly [2001] recommends an
overall value of 82% for the average efficiency when tracking all objects at once within
the visual field. The solid red line in Figure 6.5 was constructed using this fit. Therefore
the system does not allow Smax to drop below 1/0.82. This prevents it from predicting
perfect tracking over the whole image when no task-related objects are in view.
Figure 6.5: Smooth pursuit behaviour of the Eye. The eye can track targets reliably up
to a speed of 80.0 deg/sec beyond which tracking is erratic [Daly 1998].
142
Since peak contrast sensitivity shifts towards lower frequencies as retinal
velocity increases, objects that the viewer is not tracking, because they are not
important, will be visible at lower resolution than our task-relevant objects. However, if
the entire image is still or moving at the same rate, the computed CSF will be unaffected
by our task information because of this, the task map is reintroduced as an additional
multiplier in the final error conspicuity map, which is defined as:
EC=S ⋅ max(E ⋅ CSF / ND−1,0)
where:
E = relative error estimate for this pixel
ND = noticeable difference threshold
Dissecting this equation, the error E times the contrast sensitivity function CSF
tells gives a ratio of the error relative to the threshold. This is then divided by the
number of JNDs. Due to the fact that the relative error multiplied by the CSF yields the
normalized contrast, where 1.0 is just noticeable, a threshold difference value is
introduced, ND, below which errors are deemed to be insignificant. A value of 2 JNDs
is the threshold where 94% of viewers are predicted to notice a difference, and this is
the value commonly chosen for ND. The noticeable difference (ND) is subtracted by 1
to get a value that is below zero when the difference is allowable and above zero when
it is not. Further, this inner expression (E*CSF/ND - 1) will be 1.0 when the error
visibility is exactly twice the allowed threshold (i.e., 1.0 for E * CSF = 4 * JND and ND
= 2). However, this latter normalization is not important, since the test is only for where
the EC map is non-zero, and the overall scaling is irrelevant to the computation.
Finally, the max ( ) comparison to 0 merely prevents negative values, which are ignored
since this framework only cares about above-threshold errors.
The final multiplier by S is included due to the following reason; when the
camera is momentarily stopped, it is still important to care about the task map because
of course the entire animation cannot be static otherwise it wouldn’t be an animation!
At this point information about the higher-order derivatives of motion is merely missing
(velocity being the first derivative, acceleration the second, etc, in this framework only
velocity was covered). Since this framework is most interested in where EC is non-zero,
143
the final multiplication by the task map S only serves to concentrate additional samples
on the task objects, and does not bend the calculation to focus on these areas
exclusively.
To compute the CSF, an estimate of the stimulus spatial frequency, ρ, is needed.
This is obtained by evaluating an image pyramid. Unlike previous applications of the
CSF to rendering, two images are not being compared, so there is no need to determine
the relative spatial frequencies in a difference image. The system only needs to know
the uncertainty in each frequency band to bound the visible difference between the
current estimate and the correct image. This turns out to be a great time-saver, as it is
the evaluation of Gabor filters that usually takes longest in other approaches. Due to the
CSF falling off rapidly below spatial frequencies corresponding to the foveal diameter
of 2°, and statistical accuracy improving at lower frequencies as well, the system need
only compute the image pyramid up to a ρ of 0.5 cycles/degree.
The procedure is as follows, first the EC map is cleared up, and the image is
subdivided into 2° square cells. Within each cell, a recursive function is called that
descends a local image pyramid to the pixel level, computing EC values and summing
them into the map on the return trip. At each pyramid level, the EC function is
evaluated from the stimulus frequency (1/subcell radius in°), the task-level saliency, the
combined error estimate, and the average motion for pixels within that subcell. The
task-level saliency for a subcell is determined as the maximum of all saliency values
within a 2° neighbourhood. This may be computed very quickly using a 4-neighbour
check at the pixel level, where each pixel finds the maximum saliency of itself and its
neighbours 1° up, down, left, and right. The saliency maximum and statistical error
sums are then passed back up the call tree for the return evaluation. The entire EC map
computation, including a statistical estimation of relative error, takes less than a second
for a 640x480 image on a 1 GHz Pentium processor.
6.2 Implementation
In the implementation of the above framework, Radiance was modified to perform
progressive animation. Figure 6.6 shows a frame from a 4-minute long animation that
was computed at a 640x480 resolution using this software. Figure 6.7 shows the
144
estimate of relative error at each pixel in the first order rendering, and Figure 6.8 shows
the corresponding error conspicuity map. The viewer was assigned the task of counting
certain objects in the scene related to fire safety, emergency lanterns and fire
extinguishers. There are two task objects visible in Figure 6.6, the fire extinguisher and
the narrator’s helicopter (the checkered ball), so the regions around these objects show
strongly in the conspicuity map. The geometric entity ranking, i.e. object importance,
for this model were as follows: narrator’s helicopter – 1.5, fire extinguishers – 2,
emergency lanterns – 2.5, rest of the scene – 1. The emergency lanterns were given a
higher ranking due to the fact that the viewer was also asked to detect which lanterns
had cracked glass fronts, therefore the lanterns would require more of the viewer’s
attention than the fire extinguishers.
Figure 6.9 shows the final number of samples taken at each pixel in the refined
frame, which took two minutes to compute on a single 400 MHz G3 processor. This
time was found to be sufficient to render details on the task-related objects, but too short
to render the entire frame accurately. It was deemed important for there to be artifacts in
each frame in order to demonstrate the effect of task focus on viewer perception. About
50% of the pixels received IBR samples from the previous frame, and 20% received one
or more high quality refinement samples.
For comparison, Figure 6.10 shows the scene rendered as a still image in the
same amount of time. Both images contain artifacts, but the animation frame contains
fewer sampling errors on the task-related objects. In particular, the fire extinguisher in
the corner, which is one of the search objects, has better anti-aliasing than the
traditionally rendered image. This is at the expense of some detail on other parts of the
scene, such as the hatch door. Since the view is moving down the corridor, all objects
will be in motion, and it is assumed that the viewer will be tracking the task-related
objects more than the others. Rendering the entire frame to the same detail as the task
objects in Figure 6.6, takes 7 times longer than this optimized method. Figure 6.11
shows the entire frame rendered to the same quality as the task-related objects in Figure
6.6, i.e. rendered in 14 minutes.
145
Figure 6.6: A frame from our task-based animation.
Figure 6.7: Initial frame error.
146
Figure 6.8: Initial error conspicuity.
Figure 6.9: Final frame samples.
147
Figure 6.10: Standard rendering taking same time as Figure 6.6, i.e. two minutes.
Figure 6.11: Standard rendering taking 7 times that of Figures 6.6 and 6.10, i.e. 14
minutes.
148
a)
b)
c)
d)
Figure 6.12: Perceptual differences using VDP [Daly 1993]. Red denotes areas of high
perceptual difference. a) Visible differences between a frame with no iterations (Figure
6.2) and a frame after the IBR pass with no further refinement (Figure 6.3), b) Visible
differences between a frame after the IBR pass with no further refinement (Figure 6.3)
and a final frame created with our method in 2 mins (Figure 6.6), c) Visible differences
between a final frame created with our method in 2 mins (Figure 6.6) and a standard
rendering in 2 mins (Figure 6.10), and d) Visible differences between a final frame
created with our method in 2 mins (Figure 6.6) and a standard rendering in 14 mins
(Figure 6.11).
Figure 6.12 shows the perceptual differences using VDP [Daly 1993] between
the different stages of our optimized method and as well as between different methods
of rendering. From studying Figures 6.12a) and b) the benefits of using IBR and further
conspicuity refinement can easily be seen, the more red in the image the greater the
visible differences between the images. Figure 6.12c) shows the visible differences
between a rendering in 2 minutes using our optimized method and a standard rendering
in the same time, note our method is more efficient on the task based aspects as well as
the high conspicuity areas and thus these areas appear red, denoting a high level of
difference between the images. Figure 6.12d) shows the visible differences between a
rendering in 2 minutes using our optimized method and a standard rendering in 14
minutes, i.e. 7 times that of our optimized rendering. Note how this time the 14 minute
149
image is more efficient on most aspects of the image apart from those areas where our
method focused its sampling, i.e. the task-based aspects and the high conspicuity areas.
Therefore in Figure 6.12d) these areas show less difference i.e. are less red or even
green or grey. This show that our optimized method produces a similar output to that of
a 14 minute standard rendering for the high conspicuity and task-related aspects in a
1/7th of the time.
Although direct comparisons to other methods are difficult due to differences in
the rendering aims, Yee et al. demonstrated a 4-10 times speedup in [Yee et al. 2001]
and Myszkowski et al. showed a speedup of roughly 3.5 times in [Myszkowski et al.
1999]. This shows that the system proposed is able to achieve similar speedups
controlling only rendered sampling resolution. If the system was to refine the global
illumination calculation also, similar to Yee, it is hypothesized that even greater gains
could be achieved.
There are only a few aspects of this framework that must be tailored to a raytracing approach. Initially, a low quality, first order rendering is computed from a
quincunx sampling of the image plane, where one out of every 16 pixels is sampled, see
Figure 6.13a. This sampling pattern is visible in unrefined regions of Figure 6.9.
Qunicunx sampling has been proven to be well suited to the human visual system
[Bouville et al. 1991], for the keenness of sight is lower for diagonal directions than for
horizontal or vertical ones. In fact, the spatial sensitivity of the visual system in the
frequency domain is roughly diamond-shaped, Figure 6.13b. Thus by using a nonorthogonal sampling pattern with a reduced sampling density, such as Qunicunx
sampling, this anisotropy of the human eye response can be exploited. This is why this
sampling technique was chosen for this implementation, rather than any other sampling
techniques.
To obtain the object and depth maps at unsampled locations, the system casts
rays to determine the first intersected object at these pixels. An estimate of the rendering
error is then calculated by finding the 5 nearest samples to each pixel position, and
computing their standard deviation. This is a very crude approximation, but it suited this
system’s purposes well for it gives a good estimate of the error 75% of the time, whilst
being convenient and quick to compute. In cases where the high-quality samples in the
refinement pass have an interreflection calculation that the initial samples do not, the
method described earlier for estimating the error due to a constant ambient term is used.
150
Figure 6.13 a (left): Quincunx Sampling, used to find the initial sample locations, the
other pixels are sampled when determined necessary by the algorithm; b (right) Visual
Sensitivity threshold of spatial frequencies [Bouville et al. 1991].
Following the IBR refinement described in the previous section, and provided
the system is not out of the time allocated, the error conspicuity map is then computed,
sorting the pixels from most to least conspicuous. For pixels whose EC value are equal
(usually 0), the system orders from highest to lowest error, then from fewest to most
samples. Going down this list, the system adds one high-quality ray sample to each
pixel, until it has sampled them all or run out of time. If it managed to get through the
whole list, the system re-computes the error conspicuity map and re-sorts. This time,
only the samples are added to the top 1/8th of the list before sorting again. Smoother
animations can be achieved by sampling each pixel at least once before honing in on the
regions that are deemed to be conspicuous. The system could insist on sampling every
pixel in the first order rendering, but this is sometimes impossible due to time
constraints. Therefore, it is incorporated in the refinement phase instead.
Prior to frame output, a final filtering stage is performed to interpolate
unsampled pixels and add motion blur. Pixels that did not receive samples in the first
order rendering or subsequent refinements must be given a value prior to output. A
Gaussian filter kernel is applied whose support corresponds to the initial sample density
to arrive at a weighted average of the 4 closest neighbours. Once a value at each pixel is
achieved, the object motion map is multiplied by a user-specified blur parameter,
corresponding to the fraction of a frame time the virtual camera’s shutter is open. The
blur vector at each pixel is then applied using an energy-preserving smear filter to arrive
at the final output image. This technique is crude in the sense that it linearizes motion,
and does not discover obstructed geometry, but it has not been found to be objectionable
151
in any of the tests performed. However, the lack of motion blur on shadows does show
up as one of the few distracting artifacts in this implementation. Also at this stage any
exposure problems are handled by a simple tone-mapping process before the image is
displayed. These filtering operations takes a small fraction of a CPU second per video
resolution frame, and is inconsequential to the overall rendering time.
Of the two minute rendering time for the frame shown in Figure 6.6, 1 second is
spent updating the scene structures, 25 seconds is spent computing the 19,200 initial
samples and the object map, 0.25 seconds is spent on IBR extrapolation, 0.9 seconds to
compute the error map (times three evaluations – total 2.7 seconds), 1.25 seconds for
the EC map, 0.4 seconds for filtering, and the remaining 89.4 seconds to compute about
110,000 high quality refinement samples, see Figure 6.14.
In this test, the Radiance rendering parameters were set so there was little
computational difference between an initial sample and a high-quality refinement
sample; diffuse inter-reflections were not evaluated for either. This method’s combined
overhead for a 640x480 frame is thus in the order of 14 seconds, 10 of which are spent
computing the object map by ray casting. Intuitively and by measurements, this
overhead scales linearly with the number of pixels in a frame.
1s Updating the scene structures
25s Computing the 19,200 initial
samples & object map
0.25s IBR extrapolation
2.7s Computing the error map
(three evaluations)
1.25s Calculating the EC map
0.4s Filtering
89.4s Computing approximately
110,000 high quality refinement
samples
Figure 6.14: Pie chart to show where the two minutes are spent in rendering the frame.
152
It is worth noting that IBR works particularly well in this progressive rendering
framework, allowing the system to achieve constant frame generation times over a wide
range of motions. When motion is small, IBR extrapolation from the previous frame
provides the system with many low-error samples for the first refinement pass. When
motion is great, and thus fewer extrapolated samples are available, the eye’s inability to
track objects and the associated blur means the system does not need as many. This
holds promise for realistic, real-time rendering using this approach with hardware
support.
6.3 Summary
Over the last decade significant advances have been made in the development of
perceptual rendering metrics, which as discussed in Chapter 2, use computational
models of visual thresholds to efficiently produce approximated images that are
indistinguishable to a human being from an image produced at the highest possible
quality [Bolin and Meyer 1998; Myszkowski 1998]. However, as Dumont et al. [2003]
state, although these perceptually-based approaches are promising, metrics are limited
in their effectiveness for interactive rendering by two factors, 1) calculating these
metrics is a computationally intensive process in itself; 2) these metrics are based on
threshold measures, which collate to the ability of the HVS detecting differences
between a rendered image and the highest possible quality image. This second point is
particularly important due to the fact that in an interactive scenario time and resources
are so limited that this threshold value created by the perceptual metrics is far above
what the system can achieve within its constraints. Thus Dumont et al. [2003] propose
that what the rendering community should be asking themselves is not “How can I make
an image that is visually indistinguishable from the highest possible quality” but should
be “How can I make an image of the highest possible quality given my constraints.”
This backs up the ‘just in time’ attitude of the framework proposed in this thesis, for it
does exactly this – it orders the possible rendering operations to achieve the highest
quality it can within the system’s constraints. Thus this is one of the major strengths of
the proposed framework in this thesis that other perceptual rendering methods such as
Myszkowski et al. [2001], Yee [2001] etc. do not take into account.
153
The ‘just in time’ rendering framework proposed in this chapter produces
frames, within a specified time budget or to below a certain threshold value of error
conspicuity, by selectively rendering the scene depending on a prior knowledge of what
visual task the user is performing. The method described should pertain to any type of
rendering algorithm that might be used, from ray tracing to multi-pass hardware
rendering techniques.
The framework was shown to be working within the global illumination package
Radiance. Examples of frames from an animation that was created using this framework
are also shown along with VDP comparisons to show that this framework produces
visually better images for people that are performing visual tasks in a fraction of the
time that it would take to render the whole image to the same quality.
This chapter therefore completed the goal of this thesis and showed that an
inherent feature of the human visual system, Inattentional Blindness, can indeed be used
as a basis for a selective rendering framework. In addition, a framework can save
significant time computationally, at least 7 times, without the users perceiving any
noticeable differences in rendering quality.
154
Chapter 7
Conclusions and Future Work
The main aim of this thesis was to develop a methodology to help solve the
ongoing problem that creating realistic computer generated images in real time to date is
impossible. Such images need vast amounts of computational power to be rendered in
anything close to real time. Despite the availability of modern graphics hardware this
issue remains one of the key challenges in computer graphics.
7.1 Contributions of this thesis to the research field
Knowing that the goal is to compute images for humans to view, this thesis presented
the concept of exploiting known flaws in the human visual system. These limit the
ability of humans to perceive what they are actually viewing, thus allowing graphical
scenes to be selectively rendered to save time computationally without viewers being
aware of any difference in quality. A similar approach has been previously researched in
the last few years from the point of view of saliency and peripheral vision, both bottomup visual processing methods. Both of these methods were successful in saving
computational time in rendering images selectively without users perceiving any quality
differences; however both have their disadvantages as well, as discussed in Section 3.4.
Many open questions still remain even for bottom-up selective rendering.
155
From in depth research in psychology, vision and computer graphics it was
found and hypothesized that there were two particular flaws that hadn’t previously been
used in this field which could potentially save significant time computationally
rendering images. These were Change Blindness and Inattentional Blindness.
Firstly Change Blindness, the inability to detect what should be obviously
changes in a scene, was investigated. As this was the first experiment that was run, a lot
of lessons had to be learnt on how to perform psychophysical experiments without
introducing any biasing. The methodology chosen was built upon that of Rensink et al.
[1997] and O’Regan et al. [1999a], where a specific sequence of altering images were
displayed to the participants until they perceived any difference between the images.
This specific alteration of images caused the participants to suffer from Change
Blindness, thus making any detection of a difference a lot harder than normal
circumstances. Typically observers who were shown a rendering quality change using
the flicker paradigm took 8 times that of an observer noticing a presence or location
alteration, when suffering from Change Blindness. Knowing this meant that if observers
were slow to notice any alteration in rendering quality when suffering from this HVS
defect, it was highly plausible that this methodology could then be used in a selective
renderer to render specific parts of a scene to varying measures without the observer
perceiving any difference in quality.
From further investigation it was found that it was not clear how Change
Blindness could be exploited in an animation without requiring a visual disruption
which in itself would affect the observer’s overall experience of the animation. Thus
another similar flaw of the HVS was considered that would solve this very problem.
This was Inattentional Blindness, the failure to perceive unattended items in a scene.
It was important firstly to find out if Inattentional Blindness could indeed be
used to selectively render an animation without the viewers perceiving any difference in
quality from that of a fully rendered animation. Thus a new experiment was conducted
with an animation of a fly through of four rooms. The task given to half the observers
was to count the number of pencils in a mug located on the table of each room. The task
was made deliberately hard by adding superfluous paintbrushes in the mug as well as
the pencils; this was to focus the observer’s attention on the task only. The other half of
the observers were asked simply to watch the animations. Each observer was shown two
animations, one always a fully, high quality rendered animation and one which was total
156
low quality rendered or one that had been selectively rendered. This last animation was
created by rendering high quality circles (2 visual degrees) located on the mugs with the
pencils, then blending these circles to the low quality rendering for the rest of the scene.
From the results it could be seen that Inattentional Blindness did indeed cause
the observers not to notice that the animation had been selectively rendered, in fact the
observers reacted exactly the same as those that saw two fully, high quality rendered
animations. However when the observers were shown an animation that was low quality
all over then the observers could detect the quality difference between the two
animations. Thus it could be concluded that as long as the aspects related to the task at
hand were rendered to high quality then observers would not be able to detect the
difference in rendering quality. However several unanswered questions needed to be
solved. What happens if the task was not a visual one? Were the results actually because
of the decreasing acuity in peripheral vision?
To solve these quandaries more experiments had to be performed. The first was
to see whether or not a non-visual task would cause the same effects as previously
described. The non-visual task chosen was to count backwards from 1000 in steps of
two. However this resulted in the observers still noticing the quality difference between
all cases (high, selective and low) even though they were performing the non-visual
task. From this it is concluded that for Inattentional Blindness to be exploited in this
scenario it must be a visual task that is employed.
Next was to resolve the quandary about whether the results being achieved were
actually due to peripheral vision rather than Inattentional Blindness. To overcome this
dilemma a final experiment was conducted, this time a task was chosen that was all over
the scene. This then caused the participants to fixated on aspects in the scene that were
both high quality as well as low quality when the scene was selectively rendered. The
task in this experiment was to count the number of teapots. To make the participants
perform fixations on aspects that were not teapots and were not rendered in high quality,
other objects such as vases, similar in appearance to the teapots, were added to the
scene. From the results of this experiment it could be seen that even though participants
fixated on the low quality rendered objects it was only the quality of the teapots, i.e. the
task aspects, which affected the observers’ opinion on the overall quality of the scene
whilst they were performing the task.
157
After having proven that Inattentional Blindness could indeed be used in
selective rendering without observers perceiving any difference in quality, a selective
rendering framework was developed. It was deemed important a) to create a framework
that was transferable across a variety of different rendering techniques and b) to design
a system that combined spatiotemporal contrast sensitivity with pre-determinable task
maps. This second point was, as discussed in Chapter 6, due to the fact that if an
observer did not focus 100% on the task at hand then some of the user’s attention may
be attracted to the most salient aspects in the scene. By encompassing spatiotemporal
contrast sensitivity with pre-determinable task maps the aspects in the scene that were
most salient would also be rendered in a higher quality than the rest of the scene. This
therefore decreased the chance that an observer would perceive any difference in
rendering quality even more than just implementing the framework with task maps.
The framework designed in this thesis guided a progressive animation system
which also took full advantage of image-based rendering techniques. The framework
was demonstrated with a Radiance implementation, resulting in a system that completes
its renderings in approximately a 7th of the time needed to render an entire frame to the
same detail that the task objects were rendered. As the system proposed in this thesis
controlled only the sampling resolution over each frame it can only be hypothesized that
if the framework was used to refine the global illumination calculation as well then even
greater gains should be achieved, especially if this is then supported with latest gains in
graphics hardware as well. However, this framework shows promise of achieving truly
realistic renderings of frames in real-time.
7.2 Future Research
7.2.1 Visual attention.
It is already known from visual psychological researchers such as Yarbus [1967], Itti
and Koch [2000] and Yantis [1996] that the visual system is highly sensitive to features
such as edges, abrupt changes in colour and sudden movements. Much evidence has
accumulated in favor of a two-component framework for the control of where in a
visual scene attention is deployed [James 1890; Treisman and Gelade 1980; Treisman
1985; Itti and Koch 2000]: A bottom-up, fast, primitive mechanism that biases the
158
observer towards selecting stimuli based on their saliency (most likely encoded in terms
of centre-surround mechanisms) and a second slower, top-down mechanism with
variable selection criteria, which directs the ‘spotlight of attention’ under cognitive,
volitional control. Whether visual consciousness is achieved by either saliency-based or
top-down attentional selection, or by both, remains controversial. Therefore, from a
further understanding of the complex interaction between the bottom-up and top-down
visual attention processes of the human visual system the ability to combine, in the right
proportions, saliency models and predictors of areas of visible differences, with the
Inattentional Blindness approach described in this thesis could then be achieved. This
would then determine more precisely the “order” in which people may attend to objects
in a scene. Figure 7.1 shows a hypothesis on how this priority might be approached.
Ordering of Priority
Visible
Differences
Based (V)
T
S
V
P1 = T
S
P2 = T
V
V
P3 = T
S
V
T
P4 = S
V
V
P5 = S
Saliency
Based (S)
Task
Based (T)
P6 = V
P7 = Rest of the scene
that doesn’t fall into
any of the criteria T, S
or V.
Figure 7.1: Hypothesis on how saliency, task and visible difference methodologies
might be combined in terms of their priority of rendering.
Such knowledge would then also provide the framework with the high-level
vision model that remains to be implemented. This will give us the relative importance
of all of the objects in the scene, i.e. a detailed “priority queue” for the selective
rendering, providing the best perceptibly high-quality images within the time constraints
of the interactive display system. Such a priority-rendering queue also offers exciting
159
possibilities for efficient task scheduling within any parallel implementation of this
methodology.
7.2.2 Peripheral vision.
Foveal information is clear and chromatic, whereas peripheral information is blurry and
colour-weak to a degree that depends on the distance from the fovea. Thus more
research could be done on decreasing the amount of time spent rendering good colour in
the periphery. Also the fovea is almost entirely devoid of S-cones (blue) photoreceptors,
as described in Section 2.1. Thus for a true representation of the human visual system
then it is hypothesized that a selective renderer could get away with not rendering a blue
value for where the observer’s fixations are believed to be located when studying the
scene. Along this line also a small area where the optic nerve is located could be not
rendered at all without the observer noticing the difference. However with both of these
conditions the frame would have to be refreshed for every fixation, thus a model such as
McConkie and Loschky [1997; 2000] would be more appropriate which is based on
using an eye-tracker to know exact eye fixation positions.
7.2.3 Multi-sensory experiences
The task undertaken is crucial in determining the eye-gaze patterns of users studying the
images. The introduction of sound and motion within the virtual environments may
further increase the level of Inattentional Blindness and the related Change Blindness
[Cater et al. 2001]. This is based on the research that Inattentional Blindness is affected
by four factors, one of which is the amount of mental workload being performed
[TMHF 2004]. This is because the amount of attention that humans have is roughly
fixed, the more attention that is focused on one task, the less there is for others. The
hypothesis is that if an observer has to perform a visual task as well as an auditory task,
or a secondary visual task, then there is a high likelihood that the observer would not
only suffer from Inattentional Blindness, but also that it would be greater than if the
observer was just performing one visual task. This would need to be confirmed with
detailed psychophysical studies.
7.2.4 Type of task.
The task undertaken is crucial in determining the eye-gaze patterns of users studying the
images. In this thesis several different tasks were covered, from visual to non-visual,
160
however there are so many different types of task a user could be asked to do. Thus
there is a significant area for future work involving verifying whether or not each
different type of task effects the observer’s perception of the selective rendering quality.
7.2.5 Varying Applications
Although this thesis was primarily interested in high-quality image synthesis, the
technique proposed may also be applied to other areas of computer graphics, for
example: user interface design [Velichkovsky and Hansen 1996] and control [Hansen et
al. 1995], geometry level of detail selection, video telephony and video compression
[Yang et al. 1996]. Also, from discussions with artists, this type of selective rendering
algorithm may be used to alter the viewer’s impression of the meaning of a created
image by altering the quality of different parts of a computer generated artistic scene.
This technique was used by artists during Renaissance and Impressionist periods, where
they would deliberately focus on a particular aspect in detail, almost blurring the rest of
the scene to highlight it even more, Figure 7.2, [Hockney and Falco 2000]. This is also a
well known technique in photography and film.
Figure 7.2: Lorenzo Lotto ‘Husband and Wife’ (c.1543). Note the incredible detail of
the table cloth which attracts the viewer’s attention more than the rest of the scene
[Hockney and Falco 2000].
161
7.2.6 Alterations for Experimental Procedures
The experiments performed in this thesis were based on experimental procedures
carried out by vision psychologists and at each stage of the thesis more was learnt in
how to design and carry out a psychophysical experiment without introducing bias, or
experiment any other criteria other than those being tested. This learning curve would
only improve more with additional experiments and research.
As it took a great deal of time to design, test, re-design, perform and then
analyse the data for each experiment, only a limited number of experiments could be
carried out. Thus there are many different types of experiments whose results may affect
the theories proposed in this thesis both in positive and negative ways. For example, this
thesis can only truly hypothesize that Inattentional Blindness can be recreated for all
computer graphical scenes, but to find out if this is truly the case it would take a lifetime
of more experiments!
7.3 A Final Summary
The study of the limits of human perception in order to exploit them for improving
computer graphics rendering is essential for further development of the computer
graphics field. Such research has already resulted in numerous perceptually based
algorithms and image based quality metrics. However, this topic is far from being
exhausted, and thus will continue to lead the graphics community to exciting avenues of
future research. For it is always important to remember that the satisfaction of the
human perception is the end goal in the rendering pipeline and thus there is no point
displaying what humans cannot perceive.
This thesis demonstrated how flaws in the human visual system, such as Change
Blindness and Inattentional Blindness, can be incorporated into a selective rendering
framework. Although this topic is by no means complete, this thesis having merely
scratched the surface of a wide and important area of research, it is however a step in
the right direction to achieving realistic graphical imagery in real time.
162
References/Bibliography
[Akeine-Moller and Haines 2002] Akeine-Moller, T., and Haines, E. 2002. Real-time
Rendering. Second Edition. A.K. Peters Ltd.
[ANST 2003]
American National Standard for Telecommunications
[www], http://www.atis.org/tg2k/, (Accessed September
2003)
[Ashdown 2004]
Ian
Ashdown,
Radiosity
Bibliography
[www],
(Accessed
http://tralvex.com/pub/rover/abs-ian0.htm
March 2004)
[Balazas 1945]
Balazas B. 1945. Theory of Film. New York: Dover.
[Baylis and Driver 1993]
Baylis, G.C., and Driver J. 1993. “Visual attention and
objects: evidence for hierarchical coding of location.”
Journal of Experimental Psychology: Human Perception
and Performance. 19(3) 451-470.
[Baylor et al. 1987]
Baylor, D. A., Nunn B. J., and Schnapf J. L. 1987.
“Spectral sensitivity of cones of the monkey Macaca
Fascicularis.” Journal of Physiology 390:145-160.
[Birn 2000]
Birn, J. 2000. [digital] Lighting and Rendering. New
Riders.
[Bolin and Meyer 1998]
Bolin M.R. and Meyer G.W. 1998 “A Perceptually Based
Adaptive Sampling Algorithm”, In proceedings of ACM
SIGGRAPH 1998, 299-309.
[Bouknight 1970]
Bouknight J. 1970. “A procedure for generation of threedimensional half-toned computer graphics presentations.”
Communications of the ACM, Vol. 13 (9) 527—536.
[Bouville et al. 1991]
Bouville C., Tellier P. and Bouatouch K. 1991. “Low
sampling densities using a psychovisual approach” In
proceedings of EUROGRAPHICS ‘91, 167 -182.
[Broadbent 1958]
Broadbent, D.E. 1958. Perception and Communication.
Oxford: Pergamon Press.
[Buswell 1935]
Buswell, G.T. 1935. How people look at pictures. Univ. of
Chicago Press, Chicago.
[Cater et al. 2001]
Cater K., Chalmers A.G. and Dalton C. 2001 “Change
blindness with varying rendering fidelity: looking but not
seeing”, Sketch ACM SIGGRAPH 2001, Conference
Abstracts and Applications.
163
[Cater et al. 2002]
Cater K., Chalmers A.G., and Ledda P. 2002 “Selective
Quality Rendering by Exploiting Human Inattentional
Blindness: Looking but not Seeing”, In Proceedings of
Symposium on Virtual Reality Software and Technology
2002, ACM, 17-24.
[Cater et al. 2003a]
Cater K., and Chalmers A.G. 2003 “Maintaining Perceived
Quality For Interactive Tasks”, In IS&T/SPIE Conference
on Human Vision and Electronic Imaging VIII, SPIE
Proceedings Vol. 5007-21.
[Cater et al. 2003b]
Cater K., Chalmers A.G. and Dalton C. 2003 “Varying
Rendering Fidelity by Exploiting Human Change
Blindness” In proceedings of GRAPHITE 2003, ACM, 3946.
[Cater et al. 2003c]
Cater K., Chalmers A.G. and Ward G. 2003. “Detail to
Attention: Exploiting Visual Tasks for selective
Rendering” in the proceedings of the Eurographics
Symposium on Rendering 2003, ACM, 270-280.
[CCBFTRI 2004]
A wellness Center of the Chaitanya Bach Flower Therapy
Research Institute, Unwanted Thoughts [www],
http://www.charminghealth.com/applicability/unwantedtho
ughts.htm (Accessed January 2004)
[Challis et al. 1996]
Challis, B.H. Velichkovsky, B.M. and Craik, F.I.M. 1996.
“Levels-of-processing effects on a variety of memory
tasks: New findings and theoretical implications.”
Consciousness and Cognition, 5 (1).
[Chalmers et al. 2000]
Chalmers A.G., McNamara A., Troscianko T., Daly S. and
Myszkowski K. 2000. “Image Quality Metrics”, ACM
SIGGRAPH 2000, course notes.
[Chalmers and Cater 2002] Chalmers A.G, and Cater K. 2002 “Realistic Rendering in
Real-Time.” In proceedings of the 8th International EuroPar Conference on Parallel Processing, Springer-Verlag
21-28.
[Chalmers et al. 2003]
Chalmers A., Cater K. and Maffioli D. 2003. “Visual
Attention Models for Producing High Fidelity Graphics
Efficiently” in proceedings of the Spring Conference on
Computer Graphics, ACM, 47-54
[Chalmers and Cater 2004] Chalmers A.G, and Cater K. 2004 “Exploiting human
visual attentional models in visualisation.” In C. Hansen
and C. Johnson (Eds.) Visualisation Handbook, Academic
Press, to appear.
164
[Cherry 1953]
Cherry, E.C. 1953. “Some experiments on the recognition
of speech, with one and with two ears.” Journal of the
Acoustical Society of America, 25(5):975-979.
[CIT 2003]
California Institute of Technology, Seeing the world
through
a
retina
[www],
http://www.klab.caltech.edu/~itti/retina/index.html
(Accessed September 2003)
[Cohen and Greenberg 1985] Cohen M.F., and Greenberg D.P. 1985. “The Hemi-Cube:
A radiosity solution for complex environments.” In B. A.
Barsky, editor, Computer Graphics (SIGGRAPH '
85
Proceedings), volume 19, pages 31-40.
[Cohen et al. 1988]
Cohen M.F., Chen S.E., Wallace J.R., and Greenberg D.P.
1988. “A progressive refinement approach to fast radiosity
image generation.” In John Dill, editor, Computer
Graphics (SIGGRAPH 1988 Proceedings), volume 22,
pages 75—84.
[Coolican 1999]
Coolican, H. 1999. Research Methods and Statistics in
Psychology, Hodder & Stoughton Educational, U.K.
[CS 2003]
Contrast
Sensitivity,
Channel
Model
[www],
http://www.contrast-sensitivity.com/channel_model/
(Accessed January 2003)
[CS 2004]
Contrast
Sensitivity,
Visual
System
[www],
http://www.contrast-sensitivity.com/visual_system/
(Accessed December 2004)
[Daly 1993]
Daly S. 1993. “The Visible Differences Predictor: an
algorithm for the assessment of image fidelity.” In A.B.
Watson, editor, Digital Image and Human Vision,
Cambridge, MA: MIT Press, 179-206.
[Daly 1998]
Daly S. 1998. “Engineering observations from
spatiovelocity and spatiotemporal visual models.” In
IS&T/SPIE Conference on Human Vision and Electronic
Imaging III, SPIE Proceedings Vol. 3299, 180-193.
[Daly 2001]
Daly S. 2001. “Engineering observations from
spatiovelocity and spatiotemporal visual models”. Chapter
9 in Vision Models and Applications to Image and Video
Processing, ed. C. J. van den Branden Lambrecht, Kluwer
Academic Publishers.
[Dingliana 2004]
Dingliana, J., Image Synthesis Group, Trinity College,
Dublin
(Presentation
Slides)
[www],
http://isg.cs.tcd.ie/dingliaj/3d4ii/Light1.ppt
(Accessed
March 2004)
165
[Dowling 1987]
Dowling, J. E. 1987. The Retina: An Approachable Part of
the Brain. Cambridge, MA: Harvard University Press.
[Dumont et al. 2003]
Dumont R., Pellacini F., and Ferwerda J. 2003.
“Perceptually-Driven Decision Theory for Interactive
Realistic Rendering.” ACM Transactions on Graphics,
Vol. 22, 152- 181.
[Duncan and Nimmo-Smith 1996] Duncan J., and Nimmo-Smith, I. 1996. “Objects
and attributes in divided attention: surface and boundary
systems.” Perception and Psychophysics. 58(7) 1076-1084
[D’Zmura 1991]
D’Zmura, M. 1991. “Color in visual search.” Vision
Research, 31, 951-966.
[Ferwerda et al. 1996]
Ferwerda J.A., Pattanaik S.N., Shirley P.S., and Greenberg
D.P. 1996. “A Model of Visual Adaptation for Realistic
Image Synthesis.” In proceedings of ACM SIGGRAPH
1996, ACM Press / ACM SIGGRAPH, New York. H.
Rushmeier, Ed., Computer Graphics Proceedings, Annual
Conference Series, ACM, 249-258.
[Ferwerda 1997]
Ferwerda J.A., Pattanaik S.N., Shirley P.S., and
Greenberg, D.P. 1997. “A Model of Visual Masking for
Computer Graphics.” In proceedings of ACM SIGGRAPH
1997, ACM Press / ACM SIGGRAPH, New York. T.
Whitted, Ed., Computer Graphics Proceedings, Annual
Conference Series, ACM, 143-152.
[Ferwerda 2003]
Ferwerda, J.A. 2003. “Three varieties of realism in
computer graphics.” In IS&T/SPIE Conference on Human
Vision and Electronic Imaging VIII, SPIE Proceedings
Vol. 5007, 290-297.
[Flavios 2004]
Flavios, Light and optic theory and principles [www],
http://homepages.tig.com.au/~flavios/diffrac.htm
(Accessed March 2004)
[Gibson and Hubbold 1997] Gibson, S., and Hubbold R.J. 1997. “Perceptually Driven
Radiosity.” Computer Graphics Forum, 16 (2): 129-140.
[Gilchrist et al. 1997]
Gilchrist, A., Humphreys, G., Riddock, M., and Neumann,
H. 1997. “Luminance and edge information in grouping:
A study using visual search.” Journal of Experimental
Psychology: Human Perception and Performance, 23,
464-480.
[Glassner 1984]
Glassner, A.S. 1984. “Space Subdivision for Fast Ray
Tracing”, IEEE Computer Graphics & Applications, Vol.
4, No. 10, pp 15-22.
166
[Glassner 1989]
Glassner, A.S. 1989. An Introduction to Ray tracing.
Morgan Kaufmann.
[Goldberg et al. 1991]
Goldberg, M.E., Eggers, H.M., and Gluras, P. 1991. Ch.
43. The Ocular Motor System. In E.R.
[Goral et al. 1984]
Goral C.M., Torrance K.E., Greenberg
B. 1984. “Modeling the interaction
diffuse surfaces.” In proceedings of
Conference on Computer Graphics
Techniques. Vol. 18 (3) 212-222.
[Gouraud 1971]
Gouraud H., “Continuous shading of curved surfaces”,
IEEE Transactions on Computers, 20(6): 623-628.
[Green 1991]
Green M. 1991. “Visual Search, visual streams and visual
architectures.” Perception and Psychophysics 50, 388 –
403
[Green 1992]
Green M. 1992. “Visual Search: detection, identification
and localisation.” Perception, 21, 765 –777.
[Greenberg et al. 1997]
Greenberg D.P., Torrance K.E., Shirley P., Arvo J.,
Ferwerda J., Pattanaik S.N., Lafortune E., Walter B., Foo
S-C. and Trumbore B. 1997 “A Framework for Realistic
Image Synthesis”, In proceedings of SIGGRAPH 1997
(Special Session), ACM Press / ACM SIGGRAPH, New
York. T. Whitted, Ed., Computer Graphics Proceedings,
Annual Conference Series, ACM, 477-494.
D.P., and Battaile
of light between
the 11th Annual
and Interactive
[Greene and D’Oliveira 1999] Greene J., and D’Oliveira M. 1999. Learning to use
statistical tests in psychology. Open University Press.
[Haber et al. 2001]
Haber, J., Myskowski, K., Yamauchi, H., and Seidel, H-P.
2001 “Perceptually Guided Corrective Splatting”, In
Proceedings of EuroGraphics 2001, (Manchester, UK,
September 4-7).
[Hall and Greenberg 1983] Hall R.A., and Greenberg D.P. 1983. “A Testbed for
Realistic Image Synthesis.” IEEE Computer Graphics and
Applications, Vol. 3, No. 8. pp. 10-19.
[Hansen et al. 1995]
Hansen, J.P., Andersen, A.W. and Roed, P. 1995. “EyeGaze control of multimedia Systems.” In Symbiosis of
human and artifact. Proceedings of the 6th international
conference on human computer interaction, Elsevier
Science Publisher.
[Heckbert 1989]
Heckbert P.S. 1989. Fundamentals of Texture Mapping
and Image Warping. Master’s thesis, University of
California, Berkeley.
167
[Hockney and Falco 2000]
Hockney, D. and Falco, C. 2000. “Optical insights into
Renaissance art.” Optics and Photonics News, 11(7), 5259.
[Hoffman 1979]
Hoffman, J. 1979. “A two-stage model of visual search.”
Perception and Psychophysics, 25, 319-327.
[Itti et al. 1998]
Itti, L., Koch, C., and Niebur, E. 1998 “A Model of
Saliency-Based Visual Attention for Rapid Scene
Analysis”, IEEE Transactions on Pattern Analysis and
Machine Intelligence (PAMI) 20, 11, 1254-1259.
[Itti and Koch 2000]
Itti, L., and Koch, C. 2000 “A saliency-based search
mechanism for overt and covert shifts of visual attention”,
In Vision research, Vol. 40, no 10-12, 1489-1506.
[Itti and Koch 2001]
Itti, L., and Koch, C. 2001 “Computational modeling of
visual attention”, In Nature Reviews Neuroscience, Vol.
2(3), 194-203.
[Itti 2003a]
Itti, L. Bottom-up Visual Attention, University of Southern
California, http://ilab.usc.edu/bu/theory/ (Accessed March
2003)
[Itti 2003b]
Itti, L. Visual Attention: Movies, University of Southern
California [www], http://ilab.usc.edu/bu/movie/index.html
(Accessed March 2003)
[James 1890]
James W. 1890 Principles of Psychology, New York: Holt.
[JMEC 2003]
John Morgan Eye Center, Webvision, The organisation of
the
retina
and
visual
system
[www],
http://webvision.med.utah.edu/imageswv/Sagschem.jpeg,
(Accessed March 2003)
[Kajiya 1986]
Kajiya, J.T. 1986. “The Rendering Equation.” ACM
SIGGRAPH 1986 Conference Proceedings, volume
20,143-150.
[Katedros 2004]
Illumination: simulation and perception (course),
http://www.maf.vu.lt/katedros/cs2/lietuva/courses/spalvos/
illumination5.pdf (Accessed February 2004)
[Kelly 1979]
Kelly D.H. 1979. “Motion and Vision 2. Stabilized
spatiotemporal threshold surface.” Journal of the Optical
Society of America, 69 (10): 1340-1349.
[Khodlev and Kopylov 1996] Khodulev A., and Kopylov E. 1996. “Physically accurate
lighting simulation in computer graphics software.” In
GraphiCon ’96 – The sixth international conference on
computer graphics and visualisation, volume 2, pages 111119.
168
[Kirk 2003]
Kirk, D. 2003. “Graphics Architectures: The Dawn of
Cinematic Computing.” In proceedings of GRAPHITE
2003, ACM, 9.
[Koch and Ullman 1985]
Koch C., and Ullman S. 1985. “Shifts in selective visual
attention: towards the underlying neural circuitry.” Human
Neurobiology, 219--227.
[Kowler 1995]
Kowler, E. 1995. “Eye movements.” In: S. Kosslyn and
D.N. Osherson (Eds.): Visual Cognition. MIT Press,
Camgbridge, MA.
[Krivánek et al. 2003]
Krivánek J., Zara J., and Bouatouch K. 2003. “Fast Depth
of Field Rendering with Surface Splatting.” Computer
Graphics International 2003: 196-201.
[Land and Furneaux 1997] Land M.F., and Furneaux S. 1997. “The knowledge base
of the oculomotor system.” In proceedings of the Royal
Society Conference on Knowledge-Based Vision, February.
[Langbein 2004]
Langbein, F.C., Advanced Rendering - Radiosity, Cardiff
University,
(course
notes)
http://cyl.cs.cf.ac.uk/teaching/graphics/G-20-V_2.pdf
(Accessed February 2004)
[Languénou et al. 1992]
Languénou E., Bouatouch K., and Tellier P. 1992. “An
adaptive Discretization Method for Radiosity.” Computer
Graphics Forum 11(3): 205-216.
[Lee et al. 1985]
Lee M., Redner R., and Uselton S. 1985. “Statistically
Optimized Sampling for Distributed Ray Tracing.” In
proceedings of ACM SIGGRAPH Vol. 19, No. 3.
[Levin and Simons 1997]
Levin, D.T., and Simons, D.J. 1997. “Failure to detect
changes to attended objects in motion pictures.”
Psychonomic Bulletin and Review, 4 (4) pp. 501-506.
[Li et al. 1998]
Li B, Meyer G.W., and Klassen V. 1998. “A Comparison
of Two Image Quality Models.” In Human Vision and
Electronic Imaging III (Proceedings of SPIE), vol 3299, p
98-109, San Jose, California.
[Lightwave 2003]
Lightwave!
Finding
your
blindspot
[www],
http://www.lightwave.soton.ac.uk/experiments/blindspot/b
lindspot.html (Accessed September 2003)
[Lischinski 2003]
Lischinski,
D.,
Radiosity,
(Lecture
notes)
http://www.cs.huji.ac.il/~danix/advanced/notes3.pdf
(Accessed December 2003)
169
[Loschky and McConkie 1999] Loschky, L.C. and McConkie, G.W. 1999 “Gaze
Contingent Displays: Maximizing Display Bandwidth
Efficiency.” ARL Federated Laboratory Advanced
Displays and Interactive Displays Consortium, Advanced
Displays and Interactive Displays Third Annual
Symposium, College Park, MD. February 2-4, 79-83.
[Loschky et al. 2001]
Loschky, L.C., McConkie, G.W., Yang, J and Miller, M.E.
2001 “Perceptual Effects of a Gaze-Contingent MultiResolution Display Based on a Model of Visual
Sensitivity”. In the ARL Federated Laboratory 5th Annual
Symposium - ADID Consortium Proceedings, 53-58,
College Park, MD, March 20-22.
[Lubin 1995]
Lubin, J. 1995. “A Visual Discrimination Model for
Imaging System Design and Evaluation.” In Vision Models
for Target Detection and Recognition, 245-283, World
Scientific, New Jersey.
[Lubin 1997]
Lubin J. 1997. “A human vision model for objective
picture quality measurements.” Conference Publication No
447, IEE International Broadcasting Convention, 498-503.
[Luebke et al. 2000]
Luebke D., Reddy M., Watson B., Cohen J. and Varshney
A. 2001. “Advanced Issues in Level of Detail.” Course
#41 at ACM SIGGRAPH 2000. Los Angeles, CA. August
12-17.
[Luebke and Hallen 2001]
Luebke D. and Hallen B. 2001 “Perceptually driven
simplification
for
interactive
rendering”,
12th
Eurographics Workshop on Rendering, 221-223.
[Machover 2003]
Machover
Associates
Corporation
[www],
http://www.siggraph.org/s2003/media/factsheets/forecasts.
html, (Accessed August 2003)
[Maciel and Shirley 1995]
Maciel P.W.C. and Shirley P. 1995 “Visual Navigation of
Large Environments Using Textured Clusters”, Symposium
on Interactive 3D Graphics, 95-102.
[Mack and Rock 1998]
Mack, A. and Rock, I. 1998. Inattentional Blindness.
Massachusetts Institute of Technology Press.
[Marmitt and Duchowski 2002] Marmitt G., and Duchowski A.T. 2002. “Modeling
Visual Attention in VR: Measuring the Accuracy of
Predicted Scanpaths.” Eurographics 2002. Short
Presentations, 217-226.
[Marvie et al. 2003]
Marvie J-E., Perret J., and Bouatouch K. 2003. “Remote
Interactive Walkthrough of City Models.” Pacific
Conference on Computer Graphics and Applications, 389393.
170
[Maya 2004]
Alias
Wavefront
Maya
[www],
http://www.alias.com/eng/productsservices/maya/index.shtml (Accessed March 2004)
[McConkie and Loschky 1997] McConkie, G.W. and Loschky, L.C. 1997 “Human
Performance with a Gaze-Linked Multi-Resolutional
Display”. ARL Federated Laboratory Advanced Displays
and Interactive Displays Consortium, Advanced Displays
and Interactive Displays First Annual Symposium,
Adelphi, MD. January 28-29, (Pt. 1)25-34.
[McConkie and Loschky 2000] McConkie, G.W. and Loschky, L.C. 2000. “Attending
to Objects in a complex Display”. Proceedings of ARL
Federated Laboratory Advanced Displays and Interactive
Displays Consortium Fourth Annual Symposium, 21-25.
[McNamara 2000]
McNamara, Ann, Comparing Real and Synthetic Scenes
using Human Judgements of Lightness, PhD Thesis,
Bristol, October 2000.
[McNamara et al. 2001]
McNamara A., Chalmers A.G., Troscianko T., and
Gilchrist I. 2001 “Comparing Real and Synthetic Scenes
using Human Judgements of Lightness”. In B Peroche and
H Rushmeier (eds), 12th Eurographics Workshop on
Rendering.
[MD support 2004]
MD
support,
Snellen
Chart
http://www.mdsupport.org/snellen.html
[ME 2004]
Molecular Expressions, Physics of light and colour
[www],http://micro.magnet.fsu.edu/primer/java/reflection/
specular/ (Accessed February 2004)
[MM 2003]
Movie Mistakes [www], http://www.movie-mistakes.com/
(Accessed December 2003)
[Moray 1959]
Moray N. 1959. “Attention in dichotic listening: Affective
cues and the influence of instructions.” Quarterly Journal
of Experimental Psychology, 11, 56-60.
[Most et al. 2000]
Most, S.B, Simons, D.J., Scholl, B.J. & Chabris, C.F.
2000. “Sustained Inattentional Blindness: The role location
in the Detection of Unexpected Dynamic Events”.
PSYCHE, 6(14).
[www],
[Myszkowski and Kunii 1995] Myszkowski, K., and Kunii T. L. 1995. “Texture
Mapping as an Alternative for Meshing During
Walkthrough Animation.” In G. Sakas, P. Shirley, and S.
Müller, editors, Photorealistic Rendering Techniques,
389–400, Springer–Verlag.
171
[Myszkowski 1998]
Myszkowski, K. 1998. “The Visible Differences Predictor:
Applications to global illumination problems.” In
proceedings of the 1998 Eurographics Workshop on
Rendering Techniques, G. Drettakis and N. Max, Eds. 223236.
[Myszkowski et al. 1999]
Myszkowski K., Rokita P., and Tawara T. 1999.
“Perceptually-informed Accelerated Rendering of High
Quality Walkthrough Sequences.” In proceedings of the
1999 Eurographics Workshop on Rendering, G.W. Larson
and D. Lischinksi, Eds., 5-18.
[Myszkowski et al. 2001]
Myszkowski K., Tawara T., Akamine H. and Seidel H-P.
2001. “Perception-Guided Global Illumination Solution for
Animation Rendering.” In proceedings of SIGGRAPH
2001, ACM Press / ACM SIGGRAPH, New York. E.
Fiume, Ed., Computer Graphics Proceedings, Annual
Conference Series, ACM, 221-230.
[Neisser 1967]
Neisser, U. 1967. Cognitive psychology. New York:
Appleton-Century Crofts.
[Noe et al. 2000]
Noe, A., Pessoa, L. & Thompson, E. 2000. “Beyond the
Grand Illusion: What Change Blindness really teaches us
about vision.” Visual Cognition 7.
[Northdurft 1993]
Nothdurft, H. 1993. “The role of features in preattentive
vision: Comparison of orientation, motion, and color
cues.” Vision Research, 33, 1937-1958.
[Nusselt 1928]
Nusselt, W. 1928. “Grapische Bestimmung des
Winkelverhaltnisses bei der Warmestrahlung,” Zeitschrift
des Vereines Deutscher Ingenieure 72(20):673.
[OL 2003]
Optical
Illusions
–
index
[www],
http://members.lycos.co.uk/brisray/optill/oind.htm
(Accessed November 2003)
[O’Regan et al. 1999a]
O’Regan, J.K., Deubel, H., Clark, J.J. and Rensink, R.A.
1999 “Picture changes during blinks: looking without
seeing and seeing without looking.” Visual Cognition.
[O’Regan et al. 1999b]
O’Regan, J.K., Clark, J.J. and Rensink, R.A. 1999.
“Change blindness as a result of mudsplashes”. Nature,
398(6722), 34.
[O’Regan and Noe 2000]
O’Regan, J.K and Noe, A. 2000. “Experience is not
something we feel but something we do: a principled way
of explaining sensory phenomenology, with Change
Blindness and other empirical consequences.” The Unity of
Consciousness: Binding, Integration, and Dissociation.
172
[O’Regan 2001]
O’Regan, J.K. “Thoughts on Change Blindness.” L.R.
Harris & M. Jenkin (Eds.) Vision and Attention. Springer,
281-302.
[O’Regan et al. 2001]
O’Regan, Rensink and Clark. Supplementary Information
for “Change blindness as a result of mudsplashes”
[www],
http://nivea.psycho.univparis5.fr/Mudsplash/Nature_Supp_Inf/Nature_Supp_Inf.ht
ml (Accessed March 2001)
[Osberger 1999]
Osberger W. 1999. Perceptual Vision Models for Picture
Quality Assessment and Compression Applications. PhD
thesis, Queensland University of Technology.
[Osterberg 1935]
Osterberg, G. 1935. “Topography of the layer of rods and
cones in the human retina.” Acta Ophthalmologica, 6, 1102.
[Palmer 1999]
Palmer, S.E. 1999. Vision Science - Photons to
Phenomenology. Massachusetts Institute of Technology
Press.
[Pannasch et al. 2001]
Pannasch, S., Dornhoefer, S.M., Unema P.J.A. and
Velichkovsky, B.M. 2001. “The omnipresent prolongation
of visual fixations: Saccades are inhibited by changes in
situation and in subject’s activity.” Vision Research, 41
(25-26). 3345-3351.
[Parker 1999]
Parker S., Martin W., Sloan P-P., Shirley P., Smits B., and
Hansen C. 1999. “Interactive ray tracing.” In Symoposium
on Interactive 3D Graphics, ACM, 119-126.
[Pashler 1998]
Pashler, H. 1988. “Familiarity and visual change
detection.” Perception and Psychophysics, 44(4), 369–378.
[Pattaniak et al. 1998]
Pattanaik S.N., Ferwerda J, Fairchild M.D., and Greenberg
D.P. 1998 “A Multiscale Model of Adaptation and Spatial
Vision for Realistic Image Display”, Proceedings of ACM
SIGGRAPH 1998, 287-298, Orlando, July.
[Pattanaik et al. 2000]
Pattanaik S.N., Tumblin J.E.,Yee H. and Greenberg D.P.
2000 “Time-Dependent Visual Adaptation for Realistic
Real-Time Image Display”, Proceedings of ACM
SIGGRAPH 2000, 47-54.
[Phillips 1974]
Phillips, W.A. 1974. “On the distinction between sensory
storage and short-term visual memory.” Perception and
Psychophysics , 16, 283–290.
[Phong 1975]
Phong, B.T. 1975. “Illumination for Computer-Generated
Images.” Communications of the ACM, Vol 18 (6) 449—
455.
173
[Pixar 2004]
Pixar,
The
Pixar
Process
http://www.pixar.com/howwedoit/index.html
March 2004)
[www],
(Accessed
[Prikryl and Purgathofer 1999] Prikryl J. and Purgathofer W. 1999. “Overview of
Perceptually-Driven Radiosity Methods.” Institute of
Computer Graphics, Vienna University of Technology,
Technical Report, TR-186-2-99-26.
[Privitera and Stark 2000]
Privitera, C.M. and Stark, L.W. 2000. “Algorithms for
defining visual regions-of-interest: Comparison with eye
fixations.” IEEE Transactions on Pattern Analysis and
Machine Intelligence (PAMI) 22, 9, 970-982.
[Radiance 2000]
Radiance
Home
Page
[www],
http://radsite.lbl.gov/radiance/ (Accessed September 2000)
[Ramasubramanian et al. 1999] Ramasubramanian M., Pattanaik S.N., Greenberg D.P.
1999 “A Perceptually Based Physical Error Metric for
Realistic Image Synthesis”, Proceedings of ACM
SIGGRAPH 1999, 73-82, Los Angeles, 8-13 August.
[Rao et al. 1997]
Rao R.P.N., Zelinsky G.J., Hayhoe M.M., and Ballard
D.H. 1997. “Eye movements in visual cognition: a
computational study.” In Technical Report 97.1, National
Resource Laboratory for the Study of Brain and Behavior,
University of Rochester.
[Rensink et al. 1997]
Rensink, R. A., O’Regan, J. K., and Clark, J. J. 1997. “To
see or not to see: The need for attention to perceive
changes in scenes.” Psychological Science, 8, 368-373.
[Rensink 1999a]
Rensink, R.A. 1999. “The Dynamic Representation of
Scenes”. Visual Cognition.
[Rensink 1999b]
Rensink, R.A. 1999. “Visual Search for Change: A Probe
into the Nature of Attentional Processing”. Visual
Cognition.
[Rensink et al. 1999]
Rensink, R.A., O’Regan, J.K., and Clark, J.J. 1999. “On
Failure to Detect Changes in Scenes Across Brief
Interruptions”. Visual Cognition.
[Rensink 2001]
Rensink, R.A. The need for Attention to see change
[www],
http://www.psych.ubc.ca/~rensink/flicker/
(Accessed March 2001)
[Siegel and Howell 1992]
Siegel R., and Howell J.R. 1992. Thermal Radiation Heat
Transfer, 3rd Edition. Hemisphere Publishing Corporation,
New York, NY.
174
[Simons 1996]
Simons, D. J. 1996. “In sight, out of mind: When object
representations fail.” Psychological Science, 7(5), 301305.
[Simons and Levin 1998]
Simons, D. J., and Levin, D. T. 1998. “Failure to detect
changes to people in a real-world interaction.”
Psychonomic Bulletin and Review, 5(4), 644-649.
[Stam 1994]
Stam J. 1994. “Stochastic Rendering of Density Fields.”
Proceedings of Graphics Interface 1994, 51-58.
[Stroebel et al. 1986]
Stroebel L., Compton J., Current I., and Zakia R. 1986.
Photographic Materials and Processes, Boston: Focal
Press.
[SVE 2003]
Summit View Eyecare, Bird in the cage [www],
http://business.gorge.net/eyecare/explore/birdincage.asp
(Accessed October 2003)
[TMHF 2004]
The Mental Health Foundation, Obsessive Compulsive
Disorder
[www],
http://www.mentalhealth.org.uk/html/content/ocd.cfm
(Accessed January 2004)
[Treisman 1960]
Treisman, A. 1960. “Contextual cues in selective
listening.” Quarterly Journal of Experimental Psychology,
12, 242-248.
[Treisman and Gelade 1980] Treisman A., and Gelade 1980. “A Feature Integration
Theory of Attention”. Cognitive Psychology 12, 97-136.
[Triesman 1982]
Treisman, A. 1982. “Perceptual grouping and attention in
visual search for features and for objects.” Journal of
Experimental Psychology: Human Perception and
Performance, 8(2), 194-214.
[Treisman 1985]
Treisman A. 1985. “Search asymmetry: A diagnostic for
preattentive processing of separable features”. Journal of
Experimental Psychology: General, 114 (3), 285-310.
[Unema et al 2001]
Unema, P.J.A., Dornhoefer, S.M., Steudel, S. and
Velichkovsky, B.M. 2001. “An attentive look at driver'
s
fixation duration.” In A.G.Gale et al. (Eds.), Vision in
vehicles VII. Amsterdam/NY: North Holland.
[VEHF 2003a]
Visual Expert Human Factors, Inattentional Blindness
[www],
http://www.visualexpert.com/Resources/inattentionalblind
ness.html (Accessed November 2003)
175
[VEHF 2003b]
Visual Expert Human Factors, Attention and Perception
[www],
http://www.visualexpert.com/Resources/attentionperceptio
n.html (Accessed September 2003)
[VEHF 2004]
Visual Expert Human Factors, Visual Field [www],
http://www.visualexpert.com/Resources/visualfield.html,
(Accessed February 2004)
[Velichkovsky and Hansen 1996] Velichkovsky, B.M., and Hansen. J.P. 1996. “New
technological windows into mind: there is more in eyes
and brains for human-computer interaction”, Proceedings
of the SIGCHI conference on Human factors in computing
systems: common ground, 496-503.
[Volevich et al. 2000]
Volevich V., Myszkowski K., Khodulev A., and Kopylov
E.A. 2000. “Using the Visual Differences Predictor to
Improve Performance of Progressive Global Illumination
Computation.” ACM Transactions on Graphics, Vol. 19,
No. 1, April 2000, 122–161.
[Wald et al. 2002]
Wald I., Kollig T., Benthin C., Keller A., and Slusallek P.
2002. “Interactive global illumination using fast ray
tracing”. In proceedings of 13th Eurographics Workshop
on Rendering, Springer-Verlag, 9-19.
[Wang et al. 1994]
Wang, Q., Cavanagh, P., and Green, M. 1994.
“Familiarity and pop-out in visual search.” Perception
and Psychophysics, 56, 495-500.
[Ward and Heckbert 1992] Ward, G., and Heckbert P. 1992. “Irradiance Gradients.”
Third Annual Eurographics Workshop on Rendering,
Springer-Verlag.
[Ward 1994]
Ward, G. 1994. “The RADIANCE Lighting Simulation
and Rendering System”. In Proceedings of ACM
SIGGRAPH 1994, ACM Press / ACM SIGGRAPH, New
York. Computer Graphics Proceedings, Annual
Conference Series, ACM, 459-472.
[Ward Larson and Shakespeare 1998] Ward Larson, G and Shakespeare, R. 1998.
“Rendering with RADIANCE: The art and science of
lighting simulation”, San Francisco: Morgan Kauffman.
[Watson et al. 1997a]
Watson, B., Friedman, A. and McGaffey, A. 1997 “An
evaluation of Level of Detail Degradation in HeadMounted Display Peripheries”. Prescence, 6, 6, 630-637.
176
[Watson et al. 1997b]
Watson, B., Walker, N., Hodges, L.F., and Worden, A.
1997 “Managing Level of Detail through Peripheral
Degradation: Effects on Search Performance with a HeadMounted Display”. ACM Transactions on ComputerHuman Interaction 4, 4 (December 1997), 323-346.
[Watson et al. 2000]
Watson, B., Friedman, A. and McGaffey, A. 2000. “Using
naming time to evaluate quality predictors for model
simplification.” Proceedings of the SIGCHI conference on
Human factors in computing systems, 113-120.
[Watson et al. 2001]
Watson, B., Friedman, A. and McGaffey, A 2001
“Measuring and Predicting Visual Fidelity”, Proceedings
of ACM SIGGRAPH 2001. In Computer Graphics
Proceedings, Annual Conference Series, 213 – 220.
[Wertheimer 1924/1950]
Wertheimer M 1924/1950 Gestalt Theory. In W.D. Ellis
(ed.), A sourcebook of Gestalt psychology. New York: The
Humanities Press.
[Whitted 1980]
Whitted T. 1980. “An Improved Illumination Model for
Shaded Display,” Communications of the ACM, vol. 23,
no. 6, 343-349.
[Wickens 2001]
Wickens, C.D. 2001. “Attention to safety and the
Psychology of Surprise”, 11th International symposium on
Aviation Psychology, Columbus, OH: The Ohio State
University.
[Wooding 2002]
Wooding, D.S. 2002 “Eye movements of large
populations: II. Deriving regions of interest, coverage, and
similarity using fixation maps.” Behaviour Research
Methods, Instruments & Computers. 34(4), 518-528.
[Yang et al. 1996]
Wang J, Wu L. and Waibel A. 1996. “Focus of Attention
in Video Conferencing.” CMU CS technical report, CMUCS-96-150.
[Yantis 1996]
Yantis S. 1996 “Attentional capture in vision”, In A.
Kramer, M. Coles and G. Logan (eds), Converging
operations in the study of selective visual attention, 45-76,
American Psychological Association.
[Yarbus 1967]
Yarbus, A. L.1967 “Eye movements during perception of
complex objects”, in L. A. Riggs, ed., Eye Movements and
Vision, Plenum Press, New York, chapter VII, 171-196.
[Yee 2000]
Yee H. 2000. Spatiotemporal sensitivity and visual
attention for efficient rendering of dynamic environments.
MSc Thesis, Program of Computer Graphics, Cornell
University.
177
[Yee et al. 2001]
Yee, H., Pattanaik, S., and Greenberg, D.P. 2001
“Spatiotemporal sensitivity and Visual Attention for
efficient rendering of dynamic Environments”, In ACM
Transactions on Computer Graphics, Vol. 20, 39-65.
178
APPENDIX A
179
Materials A.1 – Judges Responses
Included in this appendix are all the results of the experimental data from experiment 1,
which looked into whether or not change blindness can be induced whilst looking at
computer generated images.
Judge 1
Image 1
DOB: 24/04/73 Occupation: Teacher Glasses/Contacts: Contacts
Time of Day: 2.40pm Wednesday 11th July Female/Male: Male
Stripped Box, Mirror, Yellow Ball, Glass with Straw, Red Cup, White
Beaker
Image 2
Standing Lamp, Speaker, CD player, Table, Black Tube, CD, Carpet
Image 3
Chest of draws, Candle, Bowl, Wine Glass, Mirror, Door
Image 4
Image 5
Image 6
Image 7
Mantelpiece, Two candles, Wine Bottle, Wine Glass, Picture with Boat
and a church
Picture, Two candles, Wine Bottle, Wine Glass, Black Fire Guard,
Another Candle, Mantelpiece
Bed Frame, Chest of Draws, Wine Bottle, Wine Glass, Lamp, Beaker
Glass, Wardrobe, Pencil Holder, Ball
Box with Stripes, Beaker, Red Cup, Mirror, Glass with Straw, Yellow
Ball
Judge 2
DOB: 18/01/77 Occupation: Research Assistant Glasses/Contacts: None
Time of Day: 11.20am Wednesday 11th July Female/Male: Female
Glass with Straw, Orange and Green Box, Dark Ball, Orange thing on
Image 1 the left hand side (maybe some kind of plastic cup), Another Ball,
Mirror
Table, CD, Amplifier, Two Beakers (one dark one light), Speaker and
Image 2
Stand, Lamp
Image 3
Image 4
Image 5
Image 6
Image 7
Chest of draws, Candle, Bowl, Glass, Mirror, Door
Fireplace, Mantelpiece, Two candles, Picture with Boat, White Cliffs
and a House, Wine Glass, Bottle
Fireplace, Mantelpiece, Two candles, Bottle & Glass on mantelpiece,
Candle in front of the Fireplace, Picture, Skirting Boards, Black bit in
front of the Fireplace.
Bed Frame, Chest of Draws, Lamp, Bottle, Glass, Beaker, Red Thing,
Wardrobe
Glass with Straw, Yellow Ball, Red Beaker, White Beaker, Mirror,
Green and Orange Box
Judge 3
DOB: 29/06/54 Occupation: Personal Assistant Glasses/Contacts: Glasses
Time of Day: 2.15pm Monday 16th July Female/Male: Female
180
Image 1
Image 2
Image 3
Image 4
Image 5
Image 6
Image 7
I would say these things in the scene are stuff that you would take on
holiday with you, or are mainly outside objects. From what I can
remember there was a ball, a striped cushion to sit on, a stick, and a
mirror, however I’m not sure why a mirror would be in an outdoor
scene.
This is of a corner of a room. There is a record player on a table, a glass
vase, a speaker and it’s a light room so maybe there is a window nearby
that’s not in the picture or maybe a light stand but I’m not sure.
Again the corner of a room, with a side door opening into the room. A
mirror reflecting the candle, blue glass and another object on the chest
of draws but I can’t remember what it is.
There’s a table, possibly a coffee table with a mosaic appearance and
burnt orange tiles on the front of it. There’s a picture on the wall of the
sea with a yacht and a lighthouse, at first glance though I thought it was
a submarine in the sea. There are candles burning, which reflect light
behind onto the wall and picture, and something between the two
candles but I don’t know what it is.
There is a mantelpiece and a fireplace. There is a picture of the sea and
a boat above the mantelpiece. There is a single candle on the floor
projecting light up under the picture. I can’t remember if I saw a bottle
or a glass, but something related to drinking!
This is a bedroom scene, there’s an end of an iron bed without a
mattress. There is a bedside table with three draws on top of which is a
glass and a modern light with a shade. This table backs onto a single
cupboard or a wardrobe that is white. There were some other objects on
the table but I’m not sure what they were.
Saw a similar scene before, the one with the striped cushion. This has in
it a circular red object, a yellow ball, a stick, a mirror reflecting the
scene. I would now say that this scene is of modern furniture in a room
not the outside scene I originally said it was before.
Judge 4
DOB: 14/04/75 Occupation: PhD Student – Geography Glasses/Contacts: None
Time of Day: 5.30pm Friday 20th July Female/Male: Female
It’s an office but it looks a kitchen. It’s got two chairs, one that swivels
and one that’s rigid with 4 legs. umm. It’s got a mottled floor and then
it’s got a set of three draws under the desk and it'
s got a picture with a
green
blob
in
the
middle,
which
looks
like
it’s
meant
to be a tree. It’s
Image 1
got two yellow cupboards on the right hand side mounted up on the
walls. There’s a light on a stand thing, which bends over beaming onto
something and a cup of tea or something, I think it’s a cup of tea.
Looks like a wine bar, a really snazzy wine bar. It has two stools and a
table in the middle, which is rigid with two legs. The floor is tiled white
and red. The walls are very dark; the ceiling is dark as well. Then in the
Image 2
centre of the picture there is the mirror behind the table, which is
reflecting a vase, a red vase, with a flower in it. And either side of that
there are two lights and there are two lights on the ceiling.
It’s three spherical objects on a piece of text and the three objects look
Image 3
like a tennis ball an orange, but it’s not an orange, and a Christmas ball
& 19
thing. It’s all set at an angle so the text gets smaller as it moves away.
181
Image 4
& 20
Image 5
& 11
Image 6
Image 7
Image 8,
14 & 18
Image 9
& 13
It’s a close up of a rectangular box, a rectangular box with red and
green stripes. In the background looks like a cylindrical vase, I don’t
know what it is but looks like it has a chop stick in it. Err and there’s a
round yellow snooker cue ball with a mirror at the back of it.
Looks like a really snazzy office or cafe. You look through the door and
you are looking at two yellow chairs back to back and a table in front of
it with a wine glass on it. To the right is a picture but you can see it
behind another yellow chair. Right at the very back in the centre of the
wall is a kind of rainbow picture mounted on the wall. On the ceiling
there is ummmm a strip light running down the centre with tiny little
lights either side of the room. And then there is some strange art works.
On the back on the right is a kind of red base with a swirly art piece. On
the other left hand wall is some other artwork mounted like a curly
stepladder thing and the carpet is grey.
This is a further away view of the other image and again it has the
rectangular gift-wrapped box, green and red striped. umm. It’s got what
looks like a top to a shaving foam red top or a hairspray can, and then
it’s got a white empty cylindrical shape hollow cylindrical shape on top
of the box. And behind that looks like it’s a plastic glass with a peachy
straw in it. To the left of that is a yellow cue ball on top of a women’s
vanity mirror, which has been opened up.
Rory’s dinner. An empty dinner umm with a long bar with 5 pink
stools, kind of rigid stools umm. The walls are vertical stripy kind of
lilac and beige colour, brown beige ummm. At the very front there is
kind of seating area which surrounds a table with blue chairs. umm
Behind the counter of Rory’s cafe is kind of a glass cabinet that’s
empty but I would imagine it would have food in it and a row of glasses
at the back. And to the left of the image is a beige cylindrical object,
which is holding up the building structure.
Ummm same as the other one but closer ummm the thing that I thought
was a bit odd was the office chair in the picture the base of it with the
wheels the metal base is the same as the floor and that looks a bit odd.
Then there is the office desk same as before and there is the reflection
of the light coming down onto a book. Umm and a pink coffee mug and
that’s it.
Err looks like a scene in a kitchen with a close up of a unit. With in the
very front a kind of empty cookie jar and behind that is a coffee grinder
and a handle on top, which you twist, but the handle looks a bit odd. I
don’t know the handle looks like it is sitting on top of the cookie jar.
Anyway umm there are three mugs or two mugs and one glass on the
panel behind, on the shelf behind umm and then ... and that’s it oh but it
looks quite a pink blotchy image.
182
Image 10
Image 12
& 21
Image 15
Image 16
Image 17
Err again it looks like a funky office with yellow chairs with umm you
are drawn to the scene at the back with the main table with the curved
legs and it has two tea cups either side so it looks like people have just
got up and left. There’s two chairs positioned not facing each other just
slightly apart, slightly to the side of one another. Umm in front of that
there is a yellow sofa on the right hand side and umm at the back there
is writing on the wall looks like something gallery, art gallery? Umm
and to the right of that is the window with the light shining through
onto the picture. Umm there’s a series of lights on the walls mounted at
head height but they don’t look on and then through to the left oh umm
sorry still in that room there’s a blue picture mounted on the wall which
is kind of blobby and 3D. And through that door is another sort of
gold/brass-y looking similar kind of blobby picture 3D picture mounted
on the wall and another yellow chair.
Umm the image on the left hand side is a dark area which I’m not sure
what it is but to the left of that is a free standing lamp on a long white
stand which is switched on and umm and then there is a speaker which
is free standing on the floor with two legs. And then on the table, a
brown table is an amplifier that looks a bit featureless except for the
volume umm err and then there’s goodness and another cylindrical
object, which looks like a toilet roll, which is black.
Looks like a scene from a bedroom it’s in the corner of a bedroom with
a creamy coloured table and creamy coloured walls with a picture of a
sailing boat and a beach in the background. Umm On the table there are
two creamy coloured candles and what looks like a CD cover and an
orange object - I think it’s a glass then to the right of that looks like and
empty wine bottle which is green.
Umm a sort of pine table with a strange feature which I don’t really
know what it is, with a brassy base which curves up which has three
stages which look like miniature chess boards red and white. Umm then
on the table in front of that there is a glass full of red wine. Two blue
chairs around the table but the chair furthest away the back of the chair
and the legs don’t quite match. I don’t know if it’s just me but umm
that'
s it apart from a really non-descript background.
Err same image as before but from a different angle. You can see more
of the picture umm apart from that I don’t know what else, there is a
chord to the left of the picture but apart from that I don’t know what’s
really different apart from a different angle and further away and the
shadows are falling to the right hand side of the picture.
183
Judge 5
Image 1
Image 2
Image 3
& 19
Image 4
& 20
Image 5
& 11
Image 6
Image 7
DOB: 23/07/1975 Occupation: PhD Student – Geography
Glasses/Contacts: None
Time of Day: 6.00pm Friday 20th July Female/Male: Male
Err there’s a desk which goes all around the room with a lamp on it and
there’s a picture of a tree on the wall and a filling cabinet under the
desk and the window is reflecting the image back of the room. There is
a blue chair, which has wheels on the bottom and a weird crosshatched
floor that’s pink and there are cupboards above the desk, and that’s
about it. Oh and there’s shutters slanted shutters on the window.
A red and white chess floor with a table at the back, a high table with
two high chairs and top of the table there is a red ball vase with a green
thing in it which looks like a caterpillar. There are two diamond lights
on the back wall and there are two lights on the ceiling and the walls
are black and that’s about it.
There’s three err round objects on a newspaper one of which is an
orange which says Sunkist on it, one of which is a silver Christmas
decoration and the other is a sort of white tennis ball. The text says
windows 3 or something in the middle one and it says windows 2 in the
left hand column and there is a 2.5 in the right hand column that sticks
out and the wall is pink behind it.
In the foreground there is a red candle to the left, there’s a sort of a gift
box that is red and green striped in front. Then behind that there is a
yellow ball from a pool table and there’s a makeup mirror that’s black
that’s sort of open behind the cue ball. To the right there’s like a glass
or something and a pool cue and the wall behind it is a sort of light
yellow beige colour.
It’s a sort of future waiting room. In the centre is a yellow chair which
faces both ways and in front of that there is a chrome table with a glass
on it to the left hand side. umm there’s sort of spot lights which hang
down from the ceiling to the sides. On the rear wall there is a colourful
spectrum sculpture thing. To the sides of both of them are reflective
sculpture things on the walls, one sort of silver and one is sort of more
golden. To the left on the left hand wall there is a weird mad sculpture,
which is blue and sort of sticks out. The carpet is a grey fleck kind of
like the carpet in here.
This is a kind of repeat of the image before. Instead to the left though
the red thing seems to be the top of a canister, to the right is a white cup
or something. Behind that what we couldn’t see before is a purple ball.
What I thought was a pool cue is now a straw. The makeup mirror,
yellow cue ball and gift box, which is red and green, are all still there.
It’s in the corner of a room with beige walls.
It’s a sort of milk bar thing with five stools in front of it with pink tops
and gold bases and there is a gold bar that you can put your feet on. In
the room itself there is a white table with blue chairs. The walls are
painted a sort of green with pink purple stripes going downwards. On
the wall behind above are sort of cabinets where you would keep the
food. And it’s called Rory’s and there is two columns of prices on the
menu thing behind the counter.
184
Image 8,
14 & 18
Image 9
& 13
Image 10
Image 12
& 21
Image 15
Image 16
Image 17
This is a closer image of the one with the desk. You can see in more
detail the yellow chair with the rollers. There’s a pink mug on the desk
and the spot light is highlighting a book that is open underneath it. You
can see the tree picture on the wall at the back.
It’s a black kitchen top with beige tiles behind it. There’s a white mug,
a coffee grinder that says coffee on it, and a cafetierre in the
foreground. Above that there is a shelf that’s got another mug and a cup
to the left hand side of it.
There’s a yellow bench seat in the foreground and then there’s a small
coffee table and with a mug on it. Then there’s another yellow chair, a
yellow desk behind that and another yellow chair behind that. On the
yellow desk there is a white mug and a red blob thing. Behind that on
the wall it says art gallery in big letters. There’s a blue painting hanging
on the wall next to the desk. There are spot lights above the desk and to
the left hand side wall is an opening and through that there is a gold
thing on the rear wall behind that and there is sort of two wall lights
hung by the opening and by the desk there is a window.
It’s a corner of a room to the left is like a lamp stand, then there’s a
speaker which is on a stand, then there is a brown table with a sort of
stereo on top of it and on top of that is one black cylindrical object and
one more lighter coloured cylindrical object and then there is a CD
lying next to the unit on the table. umm that’s it.
Corner of a room again on the back of the wall there is a big bright
painting by umm looks like it'
s that American bloke that paints in that
style but I can’t remember his name. Then there is a coffee table thing,
there is a CD sleeve on top with two white candles and an orange cup
with a green wine bottle to the right.
It’s a table shot again. There are two blue chairs and the table is fake
wood looking. On top of the table is a weird futuristic chess game
thing, which has three levels. The frame is gold and the things are red
and white - the three sorts of chess bits. Then there is a wine glass with
a rose wine in it.
This is a repeat of the upper scene onto the coffee table with the
painting in the background and the bloke was I think called Hopper.
There is the toy story video again on the bottom, the coffee table thing
has wheels, black wheels, and it is coloured beige and again two
candles on the thing with the yellow mug and the CD case and the
green wine bottle on the floor.
185
Judge 6
DOB: 21/10/73 Occupation: Research Assistant - Computer Science
Glasses/Contacts: None
Time of Day: 2.30pm Female/Male: Male
It is an office with two chairs and a curved table that is in a C shape.
There is a set of draws at one end and a cabinet on the wall as well. At
Image 1
the back of the office there is a metal shiny blind. On the table there is a
pink mug and a lamp which is shining on what looks like a book.
There only seems to be on wall to this room…weird! The other walls
are black so it looks like they aren’t there. On the wall that isn’t black
Image 2
there is a mirror and two diamond lights. In front of it there is a table
with a vase on it and two stools. The floor is chequered red and white.
Image 3 There is an orange, a tennis ball and a shiny Christmas bauble all sat on
& 19
a piece of newspaper or a book.
There’s a glass with a straw in it, a yellow cue ball, a red beaker, the
red one is to the left and the white one is to the right. umm There’s a
Image 4
mirror which has a black rim, kind of like one of those lady’s makeup
& 20
mirrors. Oh and there is a green and orange striped box in the
foreground.
This looks like an arty modern room, with lots of futuristic paintings
Image 5
and sculptures on the walls. In the centre there are two yellow chairs
& 11
back to back and in front of them there is a table.
It’s the same scene again, the one with the objects in the corner of the
room. There’s the glass and straw, the orange and green box. There is a
Image 6 dark coloured ball which we couldn’t see before, a red beaker on the
left hand side of the image, another ball in the centre and of course the
mirror.
Rory’s café. There is a high counter with five swivel stools in front of
it. Behind the counter there is a white cabinet with lots of glasses
Image 7
stacked on top of it. On the wall there is the menu. Oh and to the front
of the picture there is a round table with some blue chairs.
There is a rounded oak table which curves in an L shape. On the table is
Image 8, a pink mug and there is something bright in the corner but I don’t know
14 & 18 what it is. Looks like maybe a bright piece of white paper, but it doesn’t
look very realistic. There is a swivel yellow chair.
Image 9 A coffee mug, coffee grinder and a coffee glass pot – one of those
& 13
plunger things. They are all sat on a marble work surface.
This is a reception area for an art gallery. There is a yellow table with
yellow chairs to the back of the entrance. A blue piece of art on the wall
Image 10 and another yellow chair but this time it is a long bench to the front of
the picture. Then there is the entrance way which leads into the art
gallery.
It’s a corner of what looks like a sitting room. There is a table with a
Image 12 CD and amplifier on it. There are two beakers, one dark and one light
& 21
coloured. A speaker on its stand in the centre of the image and a lamp
to the left of it.
This image looks down onto a table top – there are two candles, an
orange cup, the ones that tango were giving a way a few years ago, a
Image 15
CD case and a green bottle of wine. There is a sailing picture on the
wall.
186
There is a weird sculpture sitting on a round wooden table next to a
glass of red wine. Two chairs are pulled up to the table.
A picture of a sailing scene is on the wall, in front of this there is a table
Image 17 with candles, an orange thing and a leaflet. Beneath is a blue box,
maybe a video. On the floor next to the table is a green bottle.
Image 16
Judge 7
DOB: 30/03/76 Occupation: PhD Student - Computer Science
Glasses/Contacts: None
Time of Day: 2.00pm Female/Male: Female
An office with two chairs, one blue, one yellow. On the table is a book,
Image 1
a pink mug and a lamp.
A fancy looking bar, but it doesn’t have any side walls! The two chairs
Image 2 and table look quite high to sit on. There is a mirror on the back wall
which is reflecting the vase and flowers on the table.
Image 3 A tennis ball, an orange and a Christmas tree decoration all sat on a
& 19
piece of text.
There are a lot of objects all crammed into the corner of a room. There
Image 4 is a stripy box, a red cup on the left hand side of the image. umm a
& 20
mirror at the back, a glass with a straw in it and what looks like a
yellow pool table ball.
It looks like a futuristic room, too clean for my liking, either that or a
weird art gallery with lots of sculptures and paintings. There are two
Image 5
wavy paintings, one gold and one silver, either side of a stripy rainbow
& 11
coloured painting. In the centre of the room there are two chairs back to
back and a table.
It’s the corner with all those objects again but this time it’s from a
different view. You can see all the same objects as in the other one,
Image 6
that’s the stripy box, the mirror at the back, the yellow cue ball
ummmm the glass with the straw in it, a red cup and a white beaker.
A café, but it looks very American, with pink stools at the counter and
Image 7
stripy walls. In the foreground there is a table with 4 blue chairs.
It’s the office scene that I’ve seen before but a lot closer than before.
Image 8,
This time you can only see the yellow chair and the table. Again on the
14 & 18
table is the pink mug and the book but this time the lamp is out of view.
Image 9 A kitchen scene with a shelf with mugs on it. Also there is a cafetierre
& 13
and a coffee grinder.
Oh it was an art gallery then, the image before. But this time it is a
different room – looks like more the reception area. You can still see
Image 10 the gold wavy painting but this time there is a blue one as well. There
are several of the yellow chairs and a yellow table too. Above the table
is a sign saying art gallery, that’s why I know it’s now an art gallery!!
There is a standing lamp, which is on, shining very brightly. Next to it
Image 12
is a speaker. Then there is a table with a CD player on it, a black tube
& 21
and ummmm a CD, oh and the carpet is a kind of pink brown colour.
It’s the living room I saw before but from a different angle, this time we
are looking more down onto the trolley. Again on the trolley are two
Image 15
candles, an orange cup and a CD case. Next to the trolley is the green
wine bottle.
187
There is a round oak table with a glass of red wine and what looks like
a weird cake stand or is it a three layered chess board? Anyway
Image 16
whatever it is, it has 3 chequered red and white layers. Around the table
are two blue chairs.
It’s a living room scene with a trolley in the corner. On the trolley is an
orange cup, a CD case and two white candles. There is a lower shelf
Image 17
and on that is a video. Next to the trolley on the floor is a green wine
bottle.
Judge 8
DOB: 15/02/54 Occupation: Computer Science Departmental Secretary
Glasses/Contacts: Glasses Time of Day: 3.00pm Female/Male: Female
This is an office at night time because the shutters on the window are
open but it’s dark outside. There are two chairs separated by a
Image 1
‘breakfast bar’ desk, a bright lamp shinning onto a book and a pink mug
on the desk.
This looks like a new bar with high chairs and a table. There are two
Image 2 diamond shaped lights on the wall and a circular mirror which is
reflecting what looks to be a vase with a flower in it.
Image 3
& 19
There are three types of balls on a piece of newspaper, an orange, a
Christmas bauble and a tennis ball.
There’s a kind of striped cushion, a circular red object. A yellow ball,
Image 4
maybe a beach ball, a stick and a mirror which reflects the scene. I
& 20
would say that the scene is of furniture, modern furniture in a room.
This looks like a modern art gallery, principally for weird sculptures,
the view is looking through a doorway. In the mid-ground there is a
Image 5 table with a glass on it suggesting that the gallery has recently hosted a
social. Behind this table there are two yellow chairs with theirs backs to
& 11
each other. There are several paintings and sculptures located in both
corners of the room.
There is a collection of objects clustered into a corner of a desk or
Image 6 counter. There is a yellow ball, a striped box, a stick, a red top of a
spray can and a mirror.
This looks like a modern lounge, with a sound system consisting of a
Image 7 speaker and amplifier and a CD. There is a bright lamp and some
containers sitting on the amplifier.
Image 8, This is an office scene with an oak table. There is a pink mug on the
14 & 18 table along with a piece of paper and there is a yellow chair.
This is a shiny kitchen. There is a cafetierre and a coffee grinder along
Image 9
with a white mug. Oh and there are another couple of mugs on the shelf
& 13
about the work surface.
This looks like the reception area of the art gallery we saw before.
There are several chairs and a desk so it looks like this is where the art
Image 10
is sold or where meetings are held. There are some art pieces on the
wall and a big sign saying art gallery.
This is of a corner of a room. There’s a record player on a table, a glass
Image 12
vase, a speaker and its light room as there is a light stand giving off a
& 21
lot of light.
188
This seems to be a strange arrangement of objects in the corner of a
Image 15 living room on what seems to be a table. There is a huge painting of a
lighthouse and sailing boat on the wall and a wine bottle on the ground.
There is an oak table with a glass of red wine on one side and on the
other a strange type of cake stand made of brass. It has three layers to it
Image 16
which are chequered in red and white. There are also two blue/grey
chairs.
This is a corner of a room with a trolley and lots of stuff on it. There is
Image 17 an orange cup, two white candles and a CD. On the lower shelf of the
trolley is a video and next to that is a green bottle.
Judge 9
Image 1
Image 2
Image 3
& 19
Image 4
& 20
Image 5
& 11
Image 6
Image 7
Image 8,
14 & 18
DOB: 12/08/1975 Occupation: Hairdresser Glasses/Contacts: None
Time of Day: 7.00pm Female/Male: Male
It’s a scene of an office. There is a desk which goes around the room in
a sort of curve shape. On it there’s a pink mug and a lamp. There are
two chairs, a filing cabinet underneath the desk, a picture of a tree on
the wall and two cupboards on the wall above the desk.
Looks like a futuristic cafe although it’s very dark and I wouldn’t want
to go there. There is a table in the centre with two chairs; well they are
kind of high stools really. On the table is a vase with a green swirl thing
coming out of it. There are two diamond lights on the back wall and the
floor is red and white, I think that’s it. It’s a dark image and doesn’t
look very realistic.
There are three ball objects on a piece of text. The balls were an orange
which said Sunkist, a tennis ball and a Christmas silver decoration.
There are lots of objects in this with no kind of theme. There is a red
thing to the left, not sure what it is, maybe a plastic spray can top. There
is a stripy green and red box in the centre of the image, a mirror at the
back with a yellow cue ball in front of it and a long brown thing coming
out of what looks like a glass at the back to the right of the mirror.
It’s a scene looking through and archway looking at lots of futuristic
looking things. There are two paintings at the back, one gold and one
silver, a rainbow picture thing in the centre. In front of that there is a
yellow chair and a chrome table. Errr the carpet is a blue speckled black
& grey colour.
It’s the scene that we had before, the one with lots of objects and you
can now see that they were all in the corner of a beige coloured room.
There is the red thing which now looks more like a top of a shaving
foam can, a white beaker, the stripy box again. Then behind them is the
mirror with the ball on it and to the right is the glass with a straw in it.
This looks like a very cheesy American dinner. It has a sign above the
bar saying Rory’s and it looks like it would display the prices or the
menu. In front of the bar are stools to sit on and in front of that is a
table with some chairs around it. To the left is a pillar which is a yellow
beige colour and the walls are a pink and green stripy thing.
We saw this image at the start, it’s the one of an office but this time it’s
a lot closer, more focused on the pink cup on the desk. You can just see
the picture of the tree still. Ummmm there is something bright white to
the right but I can’t work out what it is.
189
Image 9
& 13
Image 10
Image 12
& 21
Image 15
Image 16
Image 17
This is definitely a kitchen. It has beige tiles and there is a coffee
grinder, a cafetierre and a mug on the worktop. Oh and there is a shelf
above with two mugs on it.
This is of an art gallery but I only got that cause it says art gallery in big
letters on the wall at the back. Below that there’s a desk with two
yellow chairs and another yellow chair at the front of the image with a
smaller coffee table. There is a window to the right which is shining
onto the big table and a blue picture thing on the left hand wall. There’s
also an archway which leads through to another room where there is a
gold thing on the wall and something dark too.
This looks like my sitting room. There is a free standing lamp which is
shining brightly, a speaker, a table which is brown. On that there is a
black thing which I think is a stereo, in front of that there is a CD.
There is a white table trolley thing with objects on it. There is an orange
cup thing, one of those ones which you can fold flat, a CD sleeve and
two candles. Next to the trolley is a green wine bottle. On the wall
above all of this is a painting of a boat and a beach with what looks like
a lighthouse on it.
Another futuristic looking thing - I think it’s meant to be a chess game
of some sort. It has three layers with red and white checkers. The base
of it which curves up to each layer is a brass colour and it is sitting on a
pine table. In front of that is a full glass of red wine. There are two
chairs.
It’s that trolley table scene but at a lower angle, more looking at it all
from the side. Again there’s the picture on the wall, a green wine bottle.
And then ummm on the table there are two candles that orange thing
and a CD sleeve.
190
Materials A.2 – Instructions and Questionnaire
Please fill out this form.
For the observers
DOB:
Occupation:
Glasses/Contacts:
Time of Day:
Female/Male:
Person Number:
Remember for the results of this experiment to be accurate I need you not to
discuss with anyone what you did in this room until all the experiments have been
carried out.
You will be shown two images, with either what I call a mudsplashed image,
which is the image with random sized checkered blobs over the image, or a grey image
in between them. Now these two images may be the same image or there may be an
alteration of some type. What I want you to do is say stop when you know what it is
that is being altered or that there is nothing changing.
Don’t say stop until you are sure what exactly is being altered between the
images or that there is no change. I don’t want you to say stop when you just note that
something is changing, for I will need you to describe verbally to me what it is that is
actually changing. The images will keep on altering between the original and the altered
image with either the mudsplashes or the grey image between for 60 seconds or until
you say stop. Each image is only displayed for approximately 240ms before the next
image is flashed up. Once all four images have been displayed the cycle is repeated. If
you haven’t said stop in that 60 seconds we will move straight onto the next image set.
Here are some still examples:
A Mudsplashed Example
240ms
290ms
240ms
A Flicker Example
191
290ms
240ms
290ms
240ms
290ms
There are 10 different sets of images all together that you will be shown.
If you do not understand any of this please ask now before we start the experiments.
Image 1
Time:
Description:
Image 2
Time:
Description:
Image 3
Time:
Description:
Image 4
Time:
Description:
Image 5
Time:
Description:
Image 6
Time:
Description:
Image 7
Time:
Description:
Image 8
Time:
Description:
Image 9
Time:
Description:
Image 10
Time:
Description:
Do you have any general comments about the images that you have just seen?
192
APPENDIX B
193
Materials B.1 – Instructions for free viewing of the animation in
chapter 5
Free For All Instructions
You will be shown two 30 second animations of a series of 4 rooms. During the
animations look around the rooms, see what you can see and generally enjoy the
animation.
There is a visual countdown before each animation starts. Both animations will last
approximately 30 seconds. The second animation will start after the first one has
finished.
After the animations you will be asked to fill out a questionnaire about what you have
seen. This will be carried out on the table outside this room.
If you do not understand any of the instructions please ask now!
194
Materials B.2 – Instructions for observers performing the counting
pencils task in chapter 5
Counting Pencils Instructions
You will be shown two 30 second animations of a series of 4 rooms. During the
animations I want you to count the number of pencils located in the mug on the table in
the centre of each room. Therefore there are 4 mugs each with a different number of
pencils in them. As soon as you know the number of pencils say it out loud, then start
counting the next set of pencils.
Initially you will start far away from each mug but gradually you will get closer until
you’ll fly right over the mug towards the next room. Thus the pencils (and mug) will
increase in size and therefore increasing the ease of counting them. Warning there are
some red herrings in the mug as well, namely paintbrushes, just to make life that much
harder. So make sure you have counted the pencils and not the paintbrushes before you
give you final answer for each room.
There is a visual countdown then a black image with a white mug, see below, is
displayed for 1 second. This will alert you to the location of the first mug, then the first
animation will start. Each animation will last approximately 30 seconds. Once the first
animation has been played then the second animation will be started with the visual
countdown. Again a black image with white mug is displayed to alert you to the
location of the first mug.
After the animations you will be asked to fill out a questionnaire about what you have
seen. This will be carried out on the table outside this room.
If you do not understand any of the instructions please ask now!
195
Materials B.3 – First attempt at the questionnaire for the experiment
in chapter 5, however it was rewritten, see materials B.4, due to initial
participant’s inability to answer the questions.
Questionnaire
For the questions please be honest and DON’T GUESS! Don’t worry if you can’t
remember just circle the Don’t Remember. If you don’t understand a question please
ask.
Can I remind you for this experiment to work PLEASE don’t discuss what you did with
your friends as it will effect the results.
THANK YOU for doing this experiment and helping me get
ever closer to achieving my PhD!
Kirsten
Personal Details:
Name:
Age:
Male/Female:
Rating of Computer Graphics Knowledge:
1(least knowledge) - 5 (most knowledge)
Circle which rating most suits you:
1
2
3
4
5
Example scale:
1- Admires computer animations, e.g. Toy Story, but haven’t a clue how they are
computed.
2- Have heard computer graphics concepts talked about but don’t understand what
they mean.
3- Attending computer science course but haven’t covered graphics in detail yet.
4- Attended a computer graphics course.
5- Researcher in computer graphics.
Questions:
1a) Ignoring the alteration of objects on the central table, were all the rooms
the same?
Yes
No
Don’t Know
If circled Yes or Don’t Know please go straight to question 2, else answer 1b)
196
1b) Which of the rooms did you feel was different and why?
Room 1
Room 2
Room 3
Room 4
2) What colour was the mug and what was written on it?
Room 1:
Don’t Remember
Room 2:
Don’t Remember
Room 3:
Don’t Remember
Room 4:
Don’t Remember
3a) How many books were there?
Room 1:
Don’t Remember
Room 2:
Don’t Remember
Room 3:
Don’t Remember
Room 4:
Don’t Remember
3b) What were the titles of the books?
Room 1:
Don’t Remember
Room 2:
Don’t Remember
197
Room 3:
Don’t Remember
Room 4:
Don’t Remember
3c) What colour was each book?
Room 1:
Don’t Remember
Room 2:
Don’t Remember
Room 3:
Don’t Remember
Room 4:
Don’t Remember
4) How many paintings were there in each room and what were they of?
Room 1:
Don’t Remember
Room 2:
Don’t Remember
Room 3:
Don’t Remember
Room 4:
Don’t Remember
5) Apart from the objects on the main table and the paintings what other
furnishings were in each room?
(Listing the object, what colour it was and where it was situated in the room)
E.g.
Room 1:
Object
Bookcase
Object
Colour
Brown
Colour
Room 1:
Position
Against the left wall
Position
Don’t Remember
198
Room 2:
Don’t Remember
Room 3:
Don’t Remember
Room 4:
Don’t Remember
6) How many lights were in each room?
Room 1:
Don’t Remember
Room 2:
Don’t Remember
Room 3:
Don’t Remember
Room 4:
Don’t Remember
7) What colour was the carpet?
Room 1:
Don’t Remember
Room 2:
Don’t Remember
Room 3:
Don’t Remember
Room 4:
Don’t Remember
8) What colour were the walls?
Room 1:
Don’t Remember
Room 2:
Don’t Remember
199
Room 3:
Don’t Remember
Room 4:
Don’t Remember
9) What colour were the frames around the paintings?
Room 1:
Don’t Remember
Room 2:
Don’t Remember
Room 3:
Don’t Remember
Room 4:
Don’t Remember
10) What colour were the light fittings above the pictures?
Room 1:
Don’t Remember
Room 2:
Don’t Remember
Room 3:
Don’t Remember
Room 4:
Don’t Remember
11) Apart from the books and mugs with pencils, what objects appeared on the
central table (and in which room)?
Answer:
Don’t Remember
Any Other Objects
200
12) What out of all the objects in the rooms can you remember best – i.e. which
sticks in your mind?
Answer:
Nothing
13) Did anything strike you as being odd or ‘non-realistic’?
Is there anything in the animation that distracted your perception of the scene?
201
Materials B.4 – Final questionnaire for experiment in chapter 5, for
observers whom watched the animations or performed the visual task,
counting pencils, or performed the non-visual task, counting
backwards from 1000 in steps of two.
Questionnaire
For the questions please be honest and DON’T GUESS! Don’t worry if you can’t
remember just circle the Don’t Remember. If you don’t understand a question please
ask.
Can I remind you for this experiment to work PLEASE don’t discuss what you did with
your friends as it will effect the results.
THANK YOU for doing this experiment and helping me get
ever closer to achieving my PhD!
Kirsten
Personal Details:
Name:
Age:
Male/Female:
Course:
Rating of Computer Graphics Knowledge
1(least knowledge) - 5 (most knowledge)
Circle which rating most suits you:
1
2
3
4
5
Example scale:
1- Admires computer animations, e.g. Toy Story, but haven’t a clue how they are
computed.
2- Have heard computer graphics concepts talked about but don’t understand what
they mean.
3- Attending computer science course but haven’t covered graphics in detail yet.
4- Attended a computer graphics course.
5- Researcher in computer graphics.
Questions:
1a) Ignoring the alteration of objects on the central table were both animations the
same?
Yes
No
Don’t Know
If circled Yes or Don’t Know please go straight to question 2, else answer 1b)
202
1b) What did you feel was different between the two and in which animation?
Changing Location of objects
Yes
No
Don’t Know
If yes please give a reason see example:
E.g. bookcase was on the right wall in the second animation where as it was on the
in the first animation.
Changing Colour of objects
Yes
No
Don’t Know
If yes please give a reason see example:
E.g. bookcase was green in the second animation as it was blue in the first
animation.
Alteration of quality (rendering)
Yes
No
Don’t Know
If yes please give a reason see example:
E.g. There was a bad rendered shadow from the bookcase in the first animation
compared to the second animation
Speed
Yes
No
Don’t Know
If yes please give a reason see example:
E.g. The first animation was faster compared to the second animation
203
2) What colour was the mug and what was written on it?
Animation 1:
Don’t Remember
Animation 2:
Don’t Remember
3 a) How many books were there?
Animation 1:
Don’t Remember
Animation 2:
Don’t Remember
3 b) What were the titles of the books?
Animation 1:
Don’t Remember
Animation 2:
Don’t Remember
3 c) What colour was each book?
Animation 1:
Don’t Remember
Animation 2:
Don’t Remember
4) How many paintings were there in each room and what were they of?
Animation 1:
Don’t Remember
Animation 2:
Don’t Remember
204
5) Apart from the objects on the main table and the paintings what other
furnishings were there in each animation?
(Listing the object, what colour it was and where it was situated in the room)
E.g.
Object
Animation 1: Bookcase
Object
Colour
Brown
Position
Against the left wall
Colour
Position
Animation 1:
Don’t Remember
Animation 2:
Don’t Remember
6) How many lights were in each room, in the animations?
Animation 1:
Don’t Remember
Animation 2:
Don’t Remember
7) What colour was the carpet?
Animation 1:
Don’t Remember
Animation 2:
Don’t Remember
8) What colour were the walls?
Animation 1:
Don’t Remember
205
Animation 2:
Don’t Remember
9) What colour were the frames around the paintings?
Animation 1:
Don’t Remember
Animation 2:
Don’t Remember
10) What colour were the light fittings above the pictures?
Animation 1:
Don’t Remember
Animation 2:
Don’t Remember
11) Apart from the books and mugs with pencils, what objects appeared on the
central table?
Animation 1:
Don’t Remember
Any Other Objects
Animation 2:
Don’t Remember
Any Other Objects
12) What out of all the objects in the rooms can you remember best – i.e. which
sticks in your mind?
Answer:
Nothing
206
13) Did anything strike you as being odd or ‘non-realistic’?
Is there anything in the animation that distracted your perception of the scene?
207
Materials B.5 – Instructions for the non-visual task for the animation
in chapter 5
Non-Visual Task Instructions
You will be shown two 30 second animations of a series of 4 rooms. During each
animation you will be asked to count down back wards from 1000 in steps of 2
continuously from the end of the countdown until each animation has finished. For
example 1000, 9998, 9996, 9994, 9992, 9990, 9988, 9986, 9984, 9982, 9980 etc… If
this task is easier counting in your own native language then please do so. When
counting backwards please keep looking at the monitor and the animation being played.
There is a visual countdown before each of the animations start. Each animation will
last approximately 30 seconds. Once the first animation has been completely then next
animation will be started again with a visual countdown. Once the countdown has
finished then start counting down backwards again from 1000.
After the animations you will be asked to fill out a questionnaire about what you have
seen. This will be carried out on the table outside this room.
If you do not understand any of the instructions please ask now!
208
APPENDIX C
209
Materials C.1 – Questionnaire for the Pilot Study to deduce the visible
threshold value of resolution
QUESTIONNAIRE FOR THRESHOLD
VALUES:
Dear Participant,
RESEARCH PARTICIPANT CONSENT FORM
Title: ‘Performing Tasks in Computer Graphical Scenes’
Kirsten Cater,
Department of Computer Science,
Merchant Venturers Building,
Woodland Road, Bristol, BS8 1UB
Tel: 01179545112
Email: [email protected]
Purpose of Research: To investigate performance of tasks in computer graphical scenes.
It is imperative to mention at this point that in accordance with the ethical implications
and psychological consequences of your participation to our research, that there will be a
form of deception taking place however all participants will be debriefed afterwards
about the actual purpose of this empirical investigation. Moreover, all the information
that will be obtained about you is confidential and anonymity can be guaranteed.
Furthermore, you can withdraw your participation at any time if you so wish. Finally, you
have the right to be debriefed about the outcome of this study. You can thus contact the
researcher, which will be willing to give you details of the study and the outcome. Having
made those various points, please sign the following consent form, which states that you
agree to take part in the current study.
I HAVE HAD THE OPPORTUNITY TO READ THIS CONSENT FORM, ASK
QUESTIONS ABOUT THE RESEARCH PROJECT AND I AM PREPARED TO
PARTICIPATE IN THIS PROJECT.
____________________________
Participant’s Signature
____________________________
Researcher’s Signature
210
PERSONAL DETAILS:
AGE:
FEMALE/MALE
HAVE YOU DONE ANY EXPERIMENT INVOLVING PSYCHOPHYSICS AND
COMPUTER GRAPHICS BEFORE? (Please circle answer)
Yes
No
RATING OF COMPUTER GRAPHICS KNOWLEDGE: (Please circle answer)
1
- Admires computer animations, e.g. Toy Story, but haven’t a clue how
they are computed
2
- Have heard computer graphics concepts talked about but don’t
understand what they mean
3
- Attending computer science course but haven’t covered graphics in
detail yet
4
- Attended a computer graphics course
5
- Studying for/have got a PhD in Computer Graphics
QUESTIONS:
Can you please indicate whether there were any rendering quality differences
between the two images? (Please circle the answer)
Trial 1:
YES
NO
Trial 2:
YES
NO
Trial 3:
YES
NO
Trial 4:
YES
NO
Trial 5:
YES
NO
Trial 6:
YES
NO
Trial 7:
YES
NO
Trial 8:
YES
NO
Trial 9:
YES
NO
211
Trial 10:
YES
NO
Trial 11:
YES
NO
Trial 12:
YES
NO
Trial 13:
YES
NO
Trial 14:
YES
NO
Trial 15:
YES
NO
Trial 16:
YES
NO
Trial 17:
YES
NO
Trial 18:
YES
NO
Trial 19:
YES
NO
Trial 20:
YES
NO
Trial 21:
YES
NO
Trial 22:
YES
NO
Trial 23:
YES
NO
Trial 24:
YES
NO
Thank you very much for your participation!
212
Materials C.2 – Instructions for counting the teapots in the peripheral
vision vs inattentional blindness experiment
Counting Teapots Instructions
You will be shown two images of a room. Each of the images has teapots in it, similar
to those in the image on this page. The teapots are a variety of colours and are located
all over the image. Your task is to count as quickly as you can the number of teapots
located in the image. As soon as you know how many teapots there are in the image say
it out loud. You must count the teapots as quickly as you can, for the images are only
displayed for 2 seconds!
Warning there are some red herrings in the images as well, namely similar objects and
colours, just to make life that much harder. So make sure you have counted the teapots!
You will be given time for your eyes to adjust to the lighting conditions, then when you
are ready the experiment will be started. Firstly you will be shown a black image after
which the first image will appear, you’ll count the number of teapots and say out load
how many you counted. Then another black image will appear, followed by the second
image. Again you’ll count the number of teapots and state out loud how many you have
counted. Finally the black image is shown once more.
After the images you will be asked to fill out a questionnaire about what you have seen.
This will be carried out on the table outside this room.
If you do not understand any of the instructions please ask now!
213
Materials C.3 – Questionnaire for the peripheral vision vs
inattentional blindness experiment
QUESTIONNAIRE
Dear Participant,
RESEARCH PARTICIPANT CONSENT FORM
Title: ‘Performing Tasks in Computer Graphical Scenes’
Kirsten Cater,
Department of Computer Science,
Merchant Venturers Building,
Woodland Road, Bristol, BS8 1UB
Tel: 01179545112
Email: [email protected]
Purpose of Research: To investigate performance of tasks in computer graphical scenes.
It is imperative to mention at this point that in accordance with the ethical implications
and psychological consequences of your participation to our research, that there will be a
form of deception taking place however all participants will be debriefed afterwards
about the actual purpose of this empirical investigation. Moreover, all the information
that will be obtained about you is confidential and anonymity can be guaranteed.
Furthermore, you can withdraw your participation at any time if you so wish. Finally, you
have the right to be debriefed about the outcome of this study. You can thus contact the
researcher, which will be willing to give you details of the study and the outcome. Having
made those various points, please sign the following consent form, which states that you
agree to take part in the current study.
I HAVE HAD THE OPPORTUNITY TO READ THIS CONSENT FORM, ASK
QUESTIONS ABOUT THE RESEARCH PROJECT AND I AM PREPARED TO
PARTICIPATE IN THIS PROJECT.
____________________________
Participant’s Signature
____________________________
Researcher’s Signature
214
PERSONAL DETAILS:
AGE:
FEMALE/MALE
HAVE YOU DONE ANY EXPERIMENT INVOLVING PSYCHOPHYSICS AND
COMPUTER GRAPHICS BEFORE? (Please circle answer)
Yes
No
RATING OF COMPUTER GRAPHICS KNOWLEDGE: (Please circle answer)
1
- Admires computer animations, e.g. Toy Story, but haven’t a clue how they are
computed
2
- Have heard computer graphics concepts talked about but don’t understand
what they mean
3
- Attending computer science course but haven’t covered graphics in detail yet
4
- Attended a computer graphics course
5
- Studying for/have got a PhD in Computer Graphics or similar related field
215
Part 1:
1. Can you please indicate the realism of the images? (Please answer this question
by indicating your agreement or disagreement on the following 5-point scale
ranging from 1 (not realistic at all) to 5 (very realistic):
Image 1:
1
2
3
4
Not realistic
at all
Image 2:
1
5
Very realistic
2
3
4
Not realistic
at all
5
Very realistic
2. Can you please indicate whether there were any observed differences between
the images? (If yes please use the space provided in order to justify your
response):
YES
NO
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………
3. Can you please rate the rendering quality of the two images? (Please answer this
question by rating your agreement or disagreement on the following 5-point
scale ranging from 1 (High Quality) to 5 (Low Quality):
Image 1:
1
2
3
4
HQ
Image 2:
1
5
LQ
2
3
4
HQ
5
LQ
216
Part 2:
4. Can you please indicate which of the following items you remember seeing in
the scene? (Please place a tick on the appropriate box (es)).
Vase
Computer
Pens
Phone
Chair
Toy Car
Books
Video
Teapot
Bottle
Pictures
Clock
Crayons
Mr Potato Head from Toy Story
Palette
Teacup
Astray
Photo frame
5. Did you notice any difference in the quality of the colour of the teapots in the
two images? (If yes please justify your answer in the space provided below).
YES
NO
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………
217
6. Did you notice any difference in the quality of the colour of the pictures in the
two images? (If yes please justify your answer in the space provided below).
YES
NO
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………
7a. Out of the two images shown to you by the experimenter, which was the first
image? (Please circle the answer, if you are not sure circle don’t know)
Image 1
or
Image 2
Don’t Know
7b. Out of the two images shown to you by the experimenter, which was the second
image? (Please circle the answer, if you are not sure circle don’t know)
Image 3
or
Image 4
Don’t Know
Thank you very much for your participation!
218