Download Detail to Attention: Exploiting Limits of the Human Visual

Detail to Attention: Exploiting Limits of the Human Visual System for Selective Rendering Kirsten Fiona Cater A thesis submitted to the University of Bristol, UK in accordance with the requirements for the degree of Doctor of Philosophy in the Faculty of Engineering, Department of Computer Science. May 2004 c 47000 i Abstract The perceived quality of realistic computer graphics imagery depends on the physical accuracy of the rendered frames, as well as the capabilities of the human visual system. Fully detailed, high fidelity frames still may take many minutes, even hours, to render on today’s computers. The human eye is physically incapable of capturing a whole scene in full detail. Humans sense image detail only in a 2 degree foveal region, relying on rapid eye movements, or saccades, to jump between points of interest. The human brain then reassembles these glimpses into a coherent, but inevitably imperfect, visual perception of the environment. In the process, humans literally lose sight of unimportant details. This thesis demonstrates how properties of the human visual system, in particular Change Blindness and Inattentional Blindness, can be exploited to accelerate the rendering of animated sequences by applying a priori knowledge of a viewer’s task focus. This thesis shows, via several controlled psychophysical experiments, how human subjects will consistently fail to notice degradations in the quality of image details unrelated to their assigned task, even when these details fall under the viewer’s gaze. This thesis has built on these observations to create a perceptual rendering framework, which combines predetermined task maps with spatiotemporal contrast sensitivity to guide a progressive animation system that takes full advantage of imagebased rendering techniques. This framework is demonstrated with a Radiance raytracing implementation that completes its work in a fraction of the normally required time, with few noticeable artefacts for viewers performing the task. ii Declaration I declare that the work in this thesis is original and no portion of work referred to here has been submitted in support of an application for another degree or qualification of this or any other university or institution of learning. Signed: Date: Kirsten Cater iii Acknowledgements Above all I must thank Alan Chalmers for his GREAT supervising, encouragement and enthusiasm for the work that I performed in this thesis. Thank you to Greg Ward for his never ending amount of knowledge and friendship, the selective rendering framework discussed in Chapter 6 would not have been possible without him; Prof. Tom Troscianko for his invaluable comments and advice in designing the psychophysical experiments. I am truly grateful to everyone that participated in the pilot studies and experiments, without them there would not have been any results for this thesis. A special thanks to all my colleagues and friends in the Department of Computer Science in Bristol who have helped me over the years, Kate, Patrick and Pete and the rest of the Graphics group, Georgie (you’re a star and a good friend), Angus, Sarah, Oli (and Emma), Barry, Henk, Eric, Andrew, Becky, Lucy, Hayley and the rest of my willing performers for the Christmas pantos! As well as Colin (and his lovely family, Helen, Nia and Kyran), Neill, Mish and Steve (good luck in OZ, you’ll be missed), to mention but a few! Thank you to Prof. David May for the continued support throughout my degree, PhD and now my postdoc. As well as my new colleagues at Mobile Bristol for giving me the time and peace of mind to actually finish my thesis, I appreciate their patience. Thanks to all my friends involved with ACM SIGGRAPH, particularly Thierry, Scott and Etta, as well as Simon, Karol, Ann, Katerina, Nan, Alain, Judy, Reynald, Fredo, and the rest of graphics/rendering community that I have got to know so well over the many years of attending graphics conferences. Last but not least, thank you to Hazel for her endless friendship over so many years. My parents, John and Erlet, for giving me the best start in life anyone could wish for, as well as their continued support and for showing me to how to enjoy and experience life. My brother Carl, with whom I have shared such wonderful experiences and memories and who always knows how to make me smile. Finally Jade, for showing me what true love is; I love you so much. This thesis is dedicated to them. iv To my parents, brother and Jade v Contents Abstract ii Declaration iii Acknowledgements iv Dedication v Chapter 1 – Introduction 1-6 1.1 Thesis Outline 5-6 Chapter 2 – Perceptual Background 7-35 2.1 The Anatomy of Sight - The Human Visual System (HVS) 7 2.2 Visual Perception 12 2.2.1 Visual Acuity of the Human Eye 13 2.2.2 Contrast Sensitivity 14 2.2.3 Eye Movements 15 2.3 Attention 18 2.3.1 Visual Attention 21 2.3.2 Top-down versus Bottom-up processes 22 2.3.3 Change Blindness 23 2.3.4 Inattentional Blindness 26 2.3.4.1 Inattentional Blindness in Magic 2.3.5 An Inattentional Paradigm 2.4 Summary 29 31 35 Chapter 3 - Rendering Realistic Computer Graphical Images 3.1 Local illumination 36-70 38 vi 3.2 Global illumination 41 3.2.1 Ray tracing 42 3.2.2 Radiosity 44 3.3 Radiance 49 3.4 Visual Perception in Computer Graphics 51 3.4.1 Image Quality Metrics 51 3.4.2 Simplification of complex models 60 3.4.3 Eye Tracking 61 3.4.4 Peripheral vision 62 3.4.5 Saliency models 64 3.5 Summary 70 Chapter 4 – Change Blindness 71-95 4.1 Introduction 72 4.2 The Pre-Study 73 4.3 Main Experimental Procedure 84 4.4 Results 87 4.4.1 Statistical Analysis 92 4.5 Discussion - Experimental Issues and Future Improvements 93 4.6 Summary 94 Chapter 5 – Inattentional Blindness 96-133 5.1 Introduction - Visual Attention 96 5.2 Experimental Methodology 97 5.2.1 Creating the Animation 98 5.2.2 Experimental Procedure 102 5.2.3 Results 105 5.2.4 Analysis 108 5.2.5 Verification with an eye-tracker 109 5.2.6 Conclusions 110 5.3 Non-Visual Tasks 110 5.2.7 The psychophysical experiment 111 5.2.8 Results 112 5.2.9 Analysis 115 5.2.10 Verification with an eye-tracker vii 116 5.2.11 Conclusions 117 5.4 Inattentional Blindness versus Peripheral Vision 117 5.4.1 Task Maps: Experimental Validation 118 5.4.2 Creating the Selective Quality (SQ) Images 121 5.4.3 Experimental Methodology 123 5.4.4 Results 126 5.4.5 Statistical Analysis 129 5.4.6 Verification with an Eye-tracker 130 5.5 Summary 132 Chapter 6 – An Inattentional Blindness Rendering Framework 134-153 6.1 The Framework 136 6.2 Implementation 144 6.3 Summary 153 Chapter 7 – Conclusion 154-162 7.1 Contributions of this thesis to the research field 154 7.2 Future Research 158 7.2.1 Visual Attention 158 7.2.2 Peripheral Vision 160 7.2.3 Multi-Sensory Experiences 160 7.2.4 Type of Task 160 7.2.5 Varying Applications 161 7.2.6 Alterations for Experimental Methodologies 162 7.3 A Final Summary 162 Bibliography 163-178 Appendix A 179 -192 Materials A.1 – Judges Responses 180-190 Materials A.2 – Instructions and Questionnaire 191-192 Appendix B 193 - 208 Materials B.1 – Instructions – Free Viewing 194 Materials B.2 – Instructions – Counting Pencils 195 Materials B.3 – First Attempt Questionnaire viii 196-201 Materials B.4 – Final Questionnaire Materials B.5 – Instructions – Non-Visual Appendix C 202-207 208 209 - 218 Materials C.1 – Questionnaire – Threshold Values Materials C.2 – Instructions – Counting Teapots Materials C.3 – Questionnaire 210-212 213 214-218 ix List of Figures Page No. Figure 1.1: Some examples of photo-realistic images created in Radiance [Ward Larson and Shakespeare1998]. Figure 1.2: Some examples of the ‘blocky graphics’ that can be achieved at interactive rates in the Explorer 1, a six degrees of freedom single-seater simulator. 2 Figure 2.1: The human eye [JMEC 2003]. Figure 2.2: The peak sensitivities of the three types of cones and the rods [Dowling 1987]. Figure 2.3: The three functional concentric areas of the visual field, the fovea, the macula and focal vision [VEHF 2004]. Figure 2.4: The location of cone and rod peaks in eccentricity from the fovea [Osterberg 1935]. Figure 2.5: Original Image (left). How the eye actually views the image, as seen with the right eye focused on the centre (right). Note the yellow circle in the centre, the fovea, it is yellow due to the lack of blue cones. The black circle to its right is the location of the optic nerve, black as there are no photoreceptors in this part of the retina, and the degradation in visual acuity and colour towards the periphery [CIT 2003]. Figure 2.6: Photograph of the back of the retina to show the location of the fovea and optic nerve [CS 2004]. Figure 2.7: Finding your blind spot, i.e. the location of your optical nerve [Lightwave 2003]. Figure 2.8: Image to demonstrate the concept of experience on perception [OL 2003]. Figure 2.9: Can you see a young woman or an old hag? [OL 2003] Figure 2.10: An example of a Snellen Chart used by opticians to test a person’s visual acuity [MD Support 2004]. Figure 2.11a: Contrast sensitivity Vs. Spatial frequency [Daly 1993]. 8 8 x 3 10 10 11 11 11 13 13 14 16 Figure 2.11b: The Campbell-Robson chart to show Contrast sensitivity Vs. Spatial frequency. Spatial frequency increases from left to right (i.e. the size of the bars decrease), whilst contrast sensitivity increases from the bottom of the figure to the top, and thus the target contrasts decrease [Prikryl and Purgathofer 1999]. Figure 2.12: The affect of contrast sensitivity on a human’s vision (normal left) [CS 2003]. Figure 2.13: A demonstration of a negative image due to the eyes constant need for new light stimulation [OL 2003]. Figure 2.14: A demonstration of a colour negative image [SVE 2003]. Figure 2.15: Diagram to show Broadbent’s Filter Theory of auditory attention [Broadbent 1958]. Figure 2.16: Effects of task on eye movements. The same picture was examined by subjects with different instructions; 1. Free viewing, 2. Judge their ages, 3. Guess what they had been doing before the “unexpected visitor’s” arrival, 4. Remember the clothes worn by the people, 5. Remember the position of the people and objects in the room & 6. Estimate how long the “unexpected visitor” had been away from the family [Yarbus 1967]. Figure 2.17: The experiment designed by O’Regan et al. [Rensink 2001]. Figure 2.18a & b): Examples of the modifications (presence and location) made by O’Regan et al. to the photographs, a) (top) shows a presence alteration where the cheese has been removed, b) (bottom) shows a location alteration where the bar behind the people has been moved [O’Regan et al. 2001]. Figure 2.19a: Choose one of the six cards displayed above. Figure 2.19b: Can you still remember your card? Figure 2.19c: Only five cards remain. Can you remember your card? I must have chosen your card and removed it. Is this coincidence or is it an illusion? Figure 2.20: Critical Stimulus in parafovea: noncritical and critical trial displays [Mack and Rock 1998]. Figure 2.21: Table to show the experimental ordering for observers [Mack and Rock 1998]. Figure 2.22: Results from the inattention paradigm. Subjects perform better than chance at recognising location, colour, and number of elements but not shape [Mack and Rock 1998]. Figure 2.23: Critical stimulus at fixation: noncritical and critical stimulus displays [Mack and Rock 1998]. 16 Figure 3.1: The goal of realistic image synthesis: an example from photography [Stroebel et al. 1986]. Figure 3.2: Light transmitted through a material. Figure 3.3: Light absorbed by a material. Figure 3.4: Light refracted through a material. Figure 3.5: Light reflected off a material in different ways, from left to right, specular, diffuse, mixed, retro-reflection and finally gloss [Katedros 2004]. 37 xi 16 17 17 20 23 26 26 30 30 31 32 33 34 35 38 39 39 40 Figure 3.6: The differences between a simple computer generated polyhedral cone (left), linearly interpolated shading to give appearance of curvature (Gouraud Shading). Note Mach bands at edges of faces (middle) and a more complex shading calculation, interpolating curved surface normals (Phong Shading). This is necessary to eliminate Mach Bands (right). Figure 3.7: Graphical Depiction of the rendering equation [Yee 2000]. Figure 3.8: Ray tracing. Figure 3.9: Radiosity [McNamara 2000]. Figure 3.10: Relationship between two patches [Katedros 2004]. Figure 3.11: Nusselt’s analog. The form factor from the differential area dAi to element Aj is proportional to the area of the double projection onto the base of the hemisphere [Nusselt 1928]. Figure 3.12: The hemicube [Langbein 2004]. Figure 3.13: The difference in image quality between ray tracing (middle) and radiosity (right hand image). Figure 3.14: Renderings of a simple environment. Ray traced Solution (left), Radiosity Solution (center), and Radiance Solution (right) [McNamara 2000]. Figure 3.15: Conceptually how a perceptually-assisted renderer makes use of a perceptual error metric to decide when to halt the rendering process [Yee 2000]. Figure 3.16: Block structure of the Visual Difference Predictor [Prikryl and Purgathofer 1999]. Figure 3.17: An Overview of the Visual Difference Predictor – demonstrating in more detail the ordering of the processes that are involved [Yee 2000]. Figure 3.18: (a) shows the computation at 346 seconds, (b) depicts the absolute differences of pixel intensity between the current and fully converged solutions, (c) shows the corresponding visible differences predicted by the VDP, (d) shows the fully converged solution which is used as a reference [Volevich et al. 2000]. Figure 3.19: The original Stanford Bunny model (69,451 faces) and a simplification made by Luebke and Hallen’s perceptually driven system (29,866 faces). In this view the user’s gaze is 29° from the centre of the bunny [Luebke and Hallen 2001]. Figure 3.20: Watson et al.’s experimental environment as seen with the coarse display [Watson et al. 1997b]. Figure 3.21: An example of McConkie’s work with an eye linked multiple resolution display [McConkie and Loschky 1997]. Figure 3.22: How the saliency map is created from the feature maps of the input image [Itti 2003a]. Figure 3.23: Diagrams to show how the saliency model has inhibited the first fixational point that the system has highlighted as the most salient target, so the next most salient point can be found. Left – image showing the first fixation point, right – the corresponding saliency map with the fixation point inhibited [Itti 2003b]. Figure 3.24: (a) Original Image (b) Image rendered using the Aleph map (c) Saliency map of the original image (d) Aleph map used to re-render the original image [Yee 2000]. xii 41 42 44 45 46 47 47 49 50 52 53 55 57 60 62 63 65 65 66 Figure 3.25: General architecture of the attention model. The bottom-up component is a simplified version of the model developed by Itti et al. [1998]. The top-down component was added to compensate for task dependent user interaction [Haber et al. 2001]. Figure 3.26: Data flow during rendering for the Haber et al. model. [Haber et al. 2001]. 68 Figure 4.1: Pixar’s ‘Renderfarm’ is used to render their films [Pixar 2004]. Figure 4.2: Table to show the aspects listed in each of the images by Judges 1 and 2. Figure 4.3: Images 1 to 7, used initially to work out the central and marginal interest aspects in the scenes with the judges. Figure 4.4: Table to show the aspects listed in each of the images by the Judge 3. Figure 4.5: Images 1 to 21, used to work out the central and marginal interest aspects in the scenes with Judges 4 – 9, as well as in the final experiment. Figure 4.6: Table to show the aspects listed in each of the images by Judge 4 (the rest of the results are contained in appendix A.1). Figure 4.7: Table to show the aspects that were altered in each of the images. Figure 4.8: a) Original Image b) Modified Image - here a Marginal Interest aspect of the scene has been replaced with a low quality rendering (the left hand wall with light fitting), thus a rendering alteration has been made. Figure 4.9: a) Original Image b) Modified Image - here a Central Interest aspect of the scene has been removed in a presence alteration (the wine glass). Figure 4.10 (a & b): Photographs demonstrating the experiments being run. Figure 4.11: a) High quality image. b) High quality image with mudsplashes. c) Selective quality image (look at the surface of the tabletop compared to Figure 4.11a) d) Selective quality image with mudsplashes. e) The ‘medium’ grey image used in the flicker paradigm. f) The ordering of the images for the two paradigms. Figure 4.12: Overall results of the three different types of alterations made during the experiment, Rendering, Location and Presence. Figure 4.13: Number of cycles of the images needed to detect the rendering quality change in the Flicker paradigm. Figure 4.14: Number of cycles of the images needed to detect the rendering quality change in the Mudsplash paradigm. Figure 4.15: a) Original Image 7 b) Modified Image 7 - here a Marginal Interest aspect of the scene has been replaced with a low quality rendering, the whole of the tiled floor has been replaced. Figure 4.16: Comparison of the results produced in this experiment with the results reproduced from Rensink et al. [1999] for location and presence with the mudsplash paradigm. Figure 4.17: Full results of the statistical analysis using unrelated t-test for significance. 72 74 Figure 5.1: Effects of a task on eye movements. Eye scans for observers examined with different task instructions; 1. Free viewing, 2. Remember the central painting, 3. Remember as many objects on the table as you can, and 4. Count the number of books on the shelves. xiii 69 75 76 77-79 79-80 81-82 83 83 85 86 88 88 89 89 91 93 97 Figure 5.2: Close up of the same mug showing the pencils and paintbrushes, (each room had a different number of pencils and paintbrushes). Figure 5.3: (a) High Quality (HQ) image (Frame 26 in the animation). Figure 5.3: (b) Low Quality (LQ) image (Frame 26 in the animation). Figure 5.3: (c) Selectively rendered (CQ) image with two Circles of high Quality over the first and second mugs (Frame 26 in the animation). Figure 5.3: (d) Close-up of High Quality rendered chair and the Low Quality version of the same chair. Figure 5.4: Calculation of the fovea and blend areas. Figure 5.5: Selective Quality (SQ) frame where the visual angle covered by the fovea for mugs in the first two rooms, 2 degrees (green circles), is rendered at High Quality and then is blended to Low Quality at 4.1 degrees (red circles). Figure 5.6a: Conditions tested. The order of the two animations shown for the experiments were: HQ + HQ, HQ + LQ, LQ + HQ, HQ + SQ or SQ + HQ Figure 5.6b: The orderings of the conditions for randomisation in the experiment. Figure 5.7: Images to show the experimental setup. Figure 5.8: Image to give location of the first mug – to focus the observer’s attention. Figure 5.9: Image to show participants of the experiment filling out the questionnaire after completion of the viewing of the two animations. Figure 5.10: Experimental results for the two tasks: Counting the pencils and simple watching the animations (free for all). Figure 5.11 a): How observant were the participants: colour of the carpet (outside the foveal angle). Figure 5.11 b): How observant were the participants: colour of the mug (inside the foveal angle). Figure 5.12: Full results of statistical analysis using Chi-square test (X2) for significance. Figure 5.13: An eye scan for an observer counting the pencils. The green crosses are fixation points and the red lines are the saccades. Figure 5.14: An eye scan for an observer who was simply watching the animation. Figure 5.15: The orderings of the conditions for randomization in the experiment. Figure 5.16: Experimental results for the three tasks: Simply watching the animations, the Visual task: Counting the pencils and the Non-visual task: Counting backwards from 1000 in steps of 2. Figure 5.17 (a): How observant were the participants depending on the task: Colour of the mug. Figure 5.17 (b): How observant were the participants depending on the task: Colour of the carpet. Figure 5.18: Full results of statistical analysis using t-test for significance. Figure 5.19: An eye scan for an observer counting backwards. The green crosses are fixation points and the red lines are the saccades. Figure 5.20: Images to show the initial experimental scene developed in Alias Wavefront Maya. Figure 5.21: Image to show the final experimental scene developed in Radiance rendered with a sampling resolution of 3072x3072, in the experiment this is referred to as High Quality (HQ). xiv 98 99 100 100 101 101 102 103 103 104 104 105 106 107 107 108 109 110 111 113 114 114 115 117 118 119 Figure 5.22: Results from the pilot study: determining a consistently detectable rendering resolution difference. Figure 5.23: Sampling resolutions: a(left) 3072x3072 (HQ), b(right) 1024x1024(LQ). Figure 5.23: Sampling resolutions: c(left) 768x768, d(right) 512x512. Figure 5.24: Selective Quality (SQ) image showing the high quality rendered circles located over the teapots (black). Figure 5.25: The circles in the viewing plane from which the high quality fovea circles are generated. Figure 5.26: The high quality fovea circles (left) are then composited automatically with the low quality image, adding a glow effect (blend) around each circle to reduce pop out effects, resulting in the Selective Quality image (SQ) (right). Figure 5.27a: The three different types of images being tested. The ordering image pairs shown in the experiment were: (1) HQ+HQ, (2) HQ+LQ, (3) LQ+HQ, (4) HQ+SQ and (5) SQ+HQ. Figure 5.27b: The orderings of the conditions for randomisation in the experiment. Figure 5.28: Image to show the experimental setup. Figure 5.29: Experimental results for the two tasks: counting the teapots vs. simply looking at the images. Figure 5.30: Experimental results for asking the participants what objects there were in the scene, for the counting teapots criteria only. Figure 5.31: List of the objects that were and were not in the scene. Figure 5.32: Full results of statistical analysis using t-test for significance. Figure 5.33: An eye scan for an observer counting the teapots. The X’s are fixation points and the lines are the saccades. Figure 5.34: Perceptual difference between SQ and LQ images using VDP [Daly 1993]. Red denotes areas of high perceptual difference. 119 Figure 6.1: A framework for progressive refinement of animation frames using task-level information. Figure 6.2: A frame from our renderer with no refinement iterations at all. Figure 6.3: The same frame as Figure 6.2 after the IBR pass, but with no further refinement. Figure 6.4: CSFs for different retinal velocities [Daly 1998]. Figure 6.5: Smooth pursuit behaviour of the Eye. The eye can track targets reliably up to a speed of 80.0 deg/sec beyond which tracking is erratic [Daly 1998]. Figure 6.6: A frame from our task-based animation. Figure 6.7: Initial frame error. Figure 6.8: Initial error conspicuity. Figure 6.9: Final frame samples. Figure 6.10: Standard rendering taking the same time as Figure 6.5, i.e. two minutes. Figure 6.11: Standard rendering taking 7 times that of Figures 6.5 and 6.9, i.e. 14 minutes. 136 xv 121 121 122 123 123 124 124 125 127 127 128 129 131 132 138 139 141 142 146 146 147 147 148 148 Figure 6.12: Perceptual differences using VDP [Daly 1993]. Red denotes areas of high perceptual difference. a) Visible differences between a frame with no iterations (Figure 6.2) and a frame after the IBR pass with no further refinement (Figure 6.3), b) Visible differences between a frame after the IBR pass with no further refinement (Figure 6.3) and a final frame created with our method in 2 mins (Figure 6.6), c) Visible differences between a final frame created with our method in 2 mins (Figure 6.6) and a standard rendering in 2 mins (Figure 6.10), and d) Visible differences between a final frame created with our method in 2 mins (Figure 6.6) and a standard rendering in 14 mins (Figure 6.11). Figure 6.13 a (left): Quincunx Sampling, used to find the initial sample locations, the other pixels are sampled when determined necessary by the algorithm; b (right) Visual Sensitivity threshold of spatial frequencies [Bouville et al. 1991]. Figure 6.14: Pie chart to show where the two minutes are spent in rendering the frame. 149 Figure 7.1: Hypothesis on how saliency, task and visible difference methodologies might be combined in terms of their priority of rendering. Figure 7.2: Lorenzo Lotto ‘Husband and Wife’ (c. 1543). Note the incredible detail of the table cloth which attracts the viewer’s attention more than the rest of the scene [Hockney and Falco 2000]. 159 xvi 151 152 161 Chapter 1 Introduction Computer Graphics: 1. Graphics implemented through the use of computers. 2. Methods and techniques for converting data to or from graphic displays via computers. 3. The branch of science and technology concerned with methods and techniques for converting data to or from visual presentation using computers. American National Standard for Telecommunications, [ANST 2003]. The power of images is summarised by Confusious’ saying ‘One picture is worth a thousand words’. It is, therefore, hardly surprising that computer graphics has advanced rapidly in the last 50 years, Machover Associates Corporation predict that the worldwide revenue for commercial/industrial computer graphics applications will be $108.7 billion for 2003, rising to $171.1 billion for 2008 [Machover 2003]. This has led to an exponential growth in the power of computer hardware and the increased complexity and speed of software algorithms. In turn, this has created the ability to render highly realistic images of complex scenes in ever decreasing amounts of time. The realism of rendered scenes has been increased by the development of lighting models, which mimic the distribution and interaction of light in an environment. These methods ensure, to a high degree, the physical accuracy of the images produced and give what is known as a photo-realistic rendering [Ward Larson and Shakespeare 1997]. 1 Photo-realism, [Ferwerda 2003], is when an image produces the same visual response as the actual scene it is trying to re-create, for example the images in Figure 1.1. Figure 1.1: Some examples of photo-realistic images created in Radiance [Ward Larson and Shakespeare 1998]. One major goal of virtual reality and computer graphics is achieving these photo-realistic images at real-time frame rates. On modern computers creating such images may take a huge amount of computational time, for example the images shown in Figure 1.1 all took over 3 hours to render on a 1GHz Pentium processor, and although improvements in basic rendering hardware and algorithms have produced some remarkable results, it is still impossible, to date, to render highly realistic imagery in real-time [Chalmers and Cater 2002]. Real-time rendering is concerned with computing and displaying images rapidly on the computer. An image appears on the screen, the viewer acts or reacts, and this feedback affects what is generated next. This cycle of reaction and rendering happens at a rapid enough rate that the viewer does not see individual images, but rather becomes immersed in a dynamic, smooth process. The rate at which images are displayed is measured in frames per second (fps) or Hertz (Hz). At one frame per second, there is little sense of interactivity; the user is painfully aware of the arrival of each new image. At around 6 fps, people start to feel a basic sense of interactivity [Akeine-Moller and Haines 2002]. An application displaying at 15 fps is certainly real-time; the user can then focus on action and reaction. There is however, a useful limit, i.e. above which differences in the display rate are effectively undetectable, this occurs from about 72 fps and above [Akeine-Moller and Haines 2002]. Still, there is more to real-time rendering than interactivity. Rendering in realtime normally means three-dimensional rendering, thus some sense of connection to 2 three-dimensional space, recently this incorporates graphics acceleration hardware and realistic rendering software as well as the interactivity. While hardware dedicated to three-dimensional graphics has been available on professional workstations for many years, it is only relatively recently that the use of such accelerators at the consumer level has become possible [Kirk 2003]. Figure 1.2 shows some examples of the ‘blocky graphics’ that can be achieved at interactive rates. It is therefore necessary to explore other methods by which rendering times can be decreased, without reducing the desired perceived rendering quality. One obvious way to solve this challenge is by using far more powerful computers, but economic constraints often preclude this. Figure 1.2: Some examples of the ‘blocky graphics’ that can be achieved at interactive rates in the Explorer 1, a six degrees of freedom single-seater simulator. In many cases significant effort is spent on improving details the viewer will never even notice. From the early days of flight simulation, researchers have studied what parts of a scene or image are most likely to be noticed in an interactive setting. If it is possible to find a way to apply effort selectively to the small number of regions a viewer attends in a given scene, then the perceived quality can be improved without paying the full computational price. Most of the research in this field has attempted to exploit gaps in low-level visual processing, similar to JPEG and other image compression schemes [Bolin and Meyer 1998], or it has been used to reduce the bandwidth requirements for low bitrate video teleconferencing [Yang et al. 1996]. One way of producing perceptually high-quality images is to exploit the side effects of human visual processes, for the human eye is good but it isn’t perfect! This thesis considers how the eye’s inability to perceive the details of certain objects in images might be used to reduce the level of rendering detail and thus computational time, without the change being perceptible to the viewer. In practice, as this thesis will show, the perception of rendering quality in a virtual environment depends upon the 3 user and the task the user is performing in that environment. Most computer graphics serve some specific visual task – telling a story, advertising a product, playing a game, or simulating an activity such as flying. In the majority of cases, objects relevant to the task can be identified in advance, and this can be exploited as the human visual system focuses its attention on these objects at the expense of other details in the scene. Visual attention is therefore the process by which humans select a portion of the available visual information for localisation, identification and understanding of objects in the environment. It allows the human visual system to process visual input preferentially by shifting attention about an image, giving more attention to salient locations and less attention to unimportant regions. When attention is not focused onto items in a scene they can literally go unnoticed [Mack and Rock 1998]. Pioneering work in the 1890s showed that there are two general human visual processes, termed bottom-up and top-down, which determine where humans locate their visual attention [James 1890]. The bottom-up process is purely stimulus driven, whilst the top-down process is directed by a voluntary control process that focuses attention onto one or more objects that are relevant to the observer’s goal. This will be discussed in much greater detail in the perceptual background chapter. It is precisely this topdown processing of the human visual system while performing a task that is exploited in this thesis to significantly reduce computational time while maintaining high perceived image quality in virtual environments. This thesis shows by means of psychophysical experiments that it is possible to render scene objects not related to the task at hand at a lower resolution without the viewer noticing any reduction in quality. These findings are taken advantage of in a computational framework that applies high-level task information to deduce error visibility in each frame of a progressively rendered animation. By this method, it is possible to generate perceptually high quality animated sequences at constant frame rates in a fraction of the time normally required. A key advantage to this technique is that it only depends on the task, not on the viewer. Unlike the foveal detail rendering used in flight simulators, there is no need for eye-tracking or similar single viewer hardware to enable this technology, since attentive viewers participating in the same task will employ similar visual processes. This thesis demonstrates a perceptual rendering framework, which is built on these principles, and shows it working for an 4 actual application, a walkthrough of a submarine where the task is to imagine being a fire marshal whose job it is to check the usability of the fire extinguishers and lanterns. 1.1 Thesis Outline This thesis is divided into a number of sections. Chapter 2: Perceptual Background This chapter first describes the aspects of the human visual system that are particularly relevant to this research along with the relevant literature that has been done from the anatomical and psychophysical point of view. It also covers the two main side-effects of the Human Visual System (HVS) that this thesis investigated – Change Blindness and Inattentional Blindness. Chapter 3: Rendering Realistic Computer Graphical Images Some fundamental terms for the synthesis of computer graphical images are discussed in this chapter, starting with defining light and its properties and subsequently describing the different illumination models that can be used to render computer graphical images. This chapter also reviews the relevant literature and research that has been done in this area from the perceptually-based rendering and image quality metrics side. Chapter 4: Change Blindness In this chapter the set-up is described for the first experiment that was run to determine whether or not Change Blindness could be recreated under controlled conditions with computer generated images. It goes on to discuss the results that were achieved and how lessons learnt in designing and performing psychophysical experiments led to improvements in the experimental methodology. Chapter 5: Inattentional Blindness Change Blindness led to investigating similar flaws in the HVS including that of Inattentional Blindness. A similar experiment was designed and carried out as in Chapter 4, but with Inattentional Blindness being the focus for the methodology. 5 During discussions with psychologists a question was raised ‘what would happen if the task that was were given to the observers was not a visual one?’ To this there was no answer and thus a new experiment was designed to resolve this quandary, the results from this experiment are then discussed in the second part of this chapter. To confirm that the results obtained in the first part of Chapter 5 were indeed due to Inattentional Blindness, it was important to discount any likely affects due to peripheral vision. Thus the last section in Chapter 5 presents the results from this experiment and shows that this thesis tested Inattentional Blindness, not just peripheral vision, by showing that even when observers are fixated on low quality objects they simply do not perceive the quality difference if the objects are not related to the task at hand. Chapter 6: Selective Rendering Framework This chapter uses the results from the psychophysical experiments to implement a selective renderer based on these principles. The selective rendering framework proposed and demonstrated in theory is applicable to other rendering techniques such as raytracing, radiosity and multi-pass hardware rendering techniques, not just Radiance [Ward Larson and Shakespeare 1998], which is the rendering system that was used to demonstrate the principle for this thesis. Chapter 7: Conclusion & Future Work Chapter 7 draws conclusions from this research, discussing what the main contributions of this research are to the field, as well as describing some other applications for its use. It then concludes by considering some possible future avenues for this research. 6 Chapter 2 Perceptual Background Before going into depth about how visual perception has been used in computer graphics it is important to encompass information on how the eye turns light rays into images that humans can perceive as well as the limits of the human visual system that are exploited in this thesis. In this chapter the work that has been previously done in visual perception is discussed, along with what their results show. 2.1 The Anatomy of Sight - The Human Visual System (HVS) Human vision is a complex process that requires numerous components of the human eye and brain to work together. The eye is made up of a fibrous protective globe, the sclera, which has an area that is transparent anteriorly, the cornea, Figure 2.1. The sclera is lined by the choroid, which is a vascular, highly pigmented layer that absorbs light waves; all the other remaining structures within the eyeball refract light towards this layer. The layer in front of the choroids is called the retina and contains photoreceptor cells and associated neurons. Light rays enter the eye through the pupil, travelling through the lens, vitreous humor and aqueous humor to converge on a focal 7 point in front of the retina. Focused images are, therefore, captured on the retina, much like the film’s role in photography. Figure 2.1: The human eye [JMEC 2003]. Figure 2.2: The peak sensitivities of the three types of cones and the rods [Dowling 1987]. The retina is a mosaic of two basic types of photoreceptors, rods and cones; these translate the image into neural code, which is then transmitted to the brain for processing. Rods are sensitive to blue-green light with peak sensitivity at a wavelength of 498 nm [Baylor et al. 1987] and are used for vision under dark or dim conditions. The rods are highly sensitive to light and allow the eye to detect motion. They supply 8 peripheral vision and facilitate vision in dim light and at night. This is due to the rods being comprised of a single photo pigment, which also accounts for the loss in ability to discriminate colour in low light conditions. Cones, on the other hand, are useless in dim conditions, but do provide humans with basic colour vision. There are three types of cones [Baylor et al. 1987]; they are Lcones (red) with a peak sensitivity of 564 nm, M-cones (green) with a peak sensitivity of 533 nm, and S-cones (blue) with a peak sensitivity of 437 nm, as shown in Figure 2.2 [Dowling 1987]. Cones are highly concentrated in a region near the centre of the retina called the macula or parafovea (Figure 2.3 and Figure 2.4). This macula is the small, yellowish central portion of the retina, and it is the area providing the clear, distinct vision. Its field is 5 degrees in diameter. The very centre of the macula (the central 2 degrees of the visual field, about the width of your thumb at arm’s length or about the size of eight letters on a typical page of text) is called the fovea, which literally means ‘pit’ because it is a depression in the retina. It is the area where all of the photoreceptors are cones, roughly 180,000 cones per square mm; there are no rods in the fovea. This cone density decreases rapidly outside of the fovea to a value of less than 5,000 per square mm. Due to the fact that the fovea has no rods, small dim objects in the dark cannot be seen if one looks directly at them. For this reason, to detect faint stars in the sky, one must look just to the side of them so that their light falls on a retinal area, containing numerous rods, outside of the macular zone. Also the fovea is almost entirely devoid of S-cones (blue) photoreceptors (Figure 2.5). The central 30 degrees of the visual field is focal vision. This is the area that people use to view the world, making eye movements, if necessary, to bring images onto the fovea. To view something outside the focal area, the viewer will general turn his/her head rather than simply move the eyes. The rest of the visual field is ambient vision and is used to maintain spatial orientation. The light rays that are captured by the photoreceptors are converted into electrical impulses and are then sent to the brain for processing, via the optic nerve (Figure 2.6) (about one million nerve fibres for each eye). Due to the presence of this massive bundle of nerve fibres, there are no photoreceptors at the location where it leaves the eye. As a result, a small area of the visual field is not represented by the retina, about 5 degrees of visual angle in diameter and at 15 degrees of eccentricity from the fovea on the temporal side of the visual field. Although you are not aware of the presence of your blind spots, they are easy to find: with one eye closed, fixate a point in front of you. 9 Without moving your open eye, move your finger at arm’s length in front of you, from the fixation point towards the periphery, in the horizontal plane. At an angle of about 15 degrees, your finger will disappear. At smaller and larger angles, you will be able to see your finger. Or close your right eye, fixate on the cross, shown in Figure 2.7, with your left eye and move the image slowly away from your face. At a distance of about 9 inches, the spot on the left should disappear [Lightwave 2003]. Figure 2.3: The three functional concentric areas of the visual field, the fovea, the macula and focal vision [VEHF 2004]. Figure 2.4: The location of cone and rod peaks in eccentricity from the fovea [Osterberg 1935]. 10 Figure 2.5: Original Image (left). How the eye actually views the image, as seen with the right eye focused on the centre(right). Note the yellow circle in the centre, the fovea, it is yellow due to the lack of blue cones. The black circle to its right is the location of the optic nerve, black as there are no photoreceptors in this part of the retina, and the degradation in visual acuity and colour towards the periphery [CIT 2003]. Figure 2.6: Photograph of the back of the retina to show the location of the fovea and optic nerve [CS 2004]. Figure 2.7: Finding your blind spot, i.e. the location of your optical nerve [Lightwave 2003]. 11 The neural code transmitted by the retina travels along the optic nerve to the visual cortex of the brain. The visual cortex then refines and interprets the incoming neural messages, determining the size, shape, colour, and details of what humans see. At the cortical level, cognitive and perceptual factors, such as attention, expectancy, memory, and learned identification, influence the processing of visual information. Therefore, seeing combines the eye’s optics, retinal function, the visual cortex, and perception. 2.2 Visual Perception Visual Perception is the process by which humans, and other organisms, interpret and organize visual sensation in order to understand their surrounding environment. In other words, visual perception is defined as the process of acquiring knowledge about environmental objects and events by extracting information from the light they emit or reflect. The sensory organs translate this physical light energy from the environment into electrical signals that are processed by the brain as described in the previous section. These signals are then not understood as just pure energy, but, rather, interpreted by perception into objects, people, events and situations. For example, cameras have no perceptual capabilities at all; that is, they do not know anything about the scenes they record, and the photographic images they produce merely contain information. Whereas sighted people and animals acquire knowledge about their environments from the information they receive, and this is termed perception in this thesis. Perception therefore, depends on two factors: the light coming from the world, and our experience and expectations. When humans see, the mind actively organizes the visual world into meaningful shapes. Figure 2.8 shows an example of how our experience can organise the world. Look at Figure 2.8, does the mix of black and white mean anything? If you look at the image for long enough, the meaningless black and white bits organise themselves into a dog. The interesting thing is that once you can see the dog you can’t see anything but the dog, your focus is always to the dog upon viewing the image. Experience in this example has moulded the viewer’s perception. Figure 2.9 shows a similar example; it is called a reversible figure because it can be seen two ways, 12 as a young lady or as an old woman. The important point here is that people do not passively see the real world. The human visual system does not work like a camera, which objectively records some reality on film; instead humans see a version of the real world that is greatly influenced by experiences and expectations. Figure 2.8: Image to demonstrate the Figure 2.9: Can you see a young woman concept of experience on perception [OL or an old hag? [OL 2003] 2003]. 2.2.1 Visual Acuity of the Human Eye The eye has a visual acuity threshold below which an object will go undetected. This threshold varies from person to person, but as an example, the case of a person with normal 20/20 vision can be considered. An optician can test visual acuity by asking the patient to read from 20 feet away a Snellen eye chart, Figure 2.10, which was invented by Dr. Snellen of Utrecht, Holland. This chart presents a series of high contrast, black letters on a white background, in a range of sizes. An individual who can resolve letters approximately one inch high at 20 feet is said to have 20/20 visual acuity. If an individual has 20/40 acuity, he or she requires an object to be at 20 feet to visualise it with the same resolution as an individual with 20/20 acuity would when the object was at 40 feet, i.e. the top number of the visual acuity fraction refers to the distance you stand from the chart and the bottom number indicates the distance at which a person with normal eyesight could read the same line you correctly read. 13 Figure 2.10: An example of a Snellen Chart used by opticians to test a person’s visual acuity [MD support 2004]. A vision of 20/20 can also be stated as the ability to resolve a spatial pattern separated by a visual angle of one minute of arc. Since one degree contains sixty minutes, a visual angle of one minute of arc is 1/60 of a degree. The spatial resolution limit is derived from the fact that one degree of a scene is projected across 288 micrometers of the retina by the eye’s lens. In this 288 micrometers dimension, there are 120 colour sensing cone cells packed. Thus, if more than 120 alternating white and black lines are crowded side-by-side in a single degree of viewing space, they will appear as a single grey mass to the human eye. 2.2.2 Contrast Sensitivity Contrast sensitivity is a method for determining the visual system’s capability to filter spatial and temporal information about the objects that humans see. Contrast sensitivity testing differs from visual acuity testing, which only evaluates vision at one point of the higher spatial frequencies. The Snellen chart represents high contrast (black and white) and small objects under ideal lighting conditions. Contrast sensitivity testing evaluates the full spectrum of vision from high to low contrast and from small to large objects, providing a comprehensive measure of functional vision. Contrast sensitivity is defined by the minimum contrast required to distinguish between a bar pattern and a uniform background. There are two elements to contrast sensitivity: 14 • Contrast or illumination - In a sensitive visual system, little contrast between light and dark bars (low contrast) is necessary for the viewer to see the pattern. In a less sensitive visual system, a large difference in illumination (high contrast) is necessary before the bar pattern is recognisable. Thus contrast is created by the difference in luminance (the amount of reflected light) reflected from two adjacent surfaces. The contrast threshold is the minimum contrast required for pattern detection, whilst the average luminance is kept constant from one pattern grid to another. • Spatial frequency - One dark and one light band in a grid pattern is called a cycle, as can be seen in Figure 2.11b. The spatial frequency is the number of cycles subtending one degree of visual angle at the observer’s eye. A low spatial frequency consists of wide bars; a high spatial frequency of narrow bars. The human eye is most sensitive to 4 - 5 cycles per degree of visual angle [Palmer 1999]. Visual sensitivity peaks at mid-spatial frequencies, with less sensitivity at both the higher and lower spatial frequencies. Contrast sensitivity is thus the reciprocal of the contrast at threshold, i.e. one divided by the minimum amount of contrast needed to see the pattern. Contrast thresholds for various spatial frequencies can be measured and graphed on a plot of contrast sensitivity vs. spatial frequency. Such a graph is called the Contrast Sensitivity Function (CSF), as illustrated in the Figures 2.11a and 2.11b. Contrast at the lower spatial frequencies demonstrates how well the observer perceives shapes and large objects. Contrast at the higher spatial frequencies demonstrates the observer’s ability to see lines, edges and fine detail, Figure 2.12. Points below the CSF are visible to the observer (those are the points that have even higher contrasts than the threshold level). Points above the CSF are invisible to the observer (those are the points that have lower contrasts than the threshold level). 2.2.3 Eye Movements Humans move their eyes on average 100,000 to 150,000 times every day, these are called eye saccades, this is to stimulate the photoreceptors, for if they do not receive continual input of new light rays they become inactive and simply do not perceive anything, [CS 2003; SVE 2003]. Figure 2.13 is an example of an after effect or negative image. It is difficult to make out exactly what the black and white image on the left 15 portrays. However, if you stare intently at the image without moving your eyes from the four dots in the centre for at least 60 seconds, and then look at the white right hand side of the image you should immediately recognize the image that your eyes tell you is located on the white side of the image although it doesn’t really exist at all. Afterimages work on fatigue of your eyes when staring at the same spot for a long time without moving them. The areas of the retina that are fatigued from the image are not as capable of reading those colours as the rest of the eye’s imaging area so they show you a ‘negative’ image. Low High Target Contrast Contrast Sensitivity High Low Large/ coarse features Target Size Small/ fine details Poor Visual Acuity Good Figure 2.11b: The Campbell-Robson chart to show Contrast sensitivity Vs. Spatial frequency. Spatial frequency increases from left to right (i.e. the size of the bars decrease), whilst contrast sensitivity increases from the bottom of the figure to the top, and thus the target contrasts decreases [Prikryl and Purgathofer 1999]. Figure 2.11a: Contrast sensitivity Vs. Spatial frequency [Daly 1993]. Figure 2.12: The affect of contrast sensitivity on a human’s vision (normal left) [CS 2003]. Figure 2.14 is another example, stare at the centre of the red bird for 30 seconds and then quickly stare at the birdcage. You should see a bluish-green (cyan) bird in the 16 cage. As with the previous example, when you stare at the red bird, the image falls on one region of your retina. The red-sensitive cells in that region start to grow tired, and thus stop responding as strongly to red light. When you suddenly shift your gaze to the cage, the fatigued red-sensitive cells don’t respond to the reflected red light, but the blue and green cones respond strongly to the reflected blue and green light. As a result, where the red cones don’t respond you see a bluish-green bird in the cage. Figure 2.13: A demonstration of a negative image due to the eyes constant need for new light stimulation [OL 2003]. Figure 2.14: A demonstration of a colour negative image [SVE 2003]. The other reason for eye saccades is to move the fovea towards the objects of current interest, in order to examine them with the region of the retina that has finest detail. A single saccade is usually very rapid, taking only about 150-200ms to plan and execute [Kowler 1995]. Saccades are essentially ballistic movements; once movement 17 has begun its trajectory cannot be altered. If the eye misses the intended target object, another saccade has to be made to fixate it. The ballistic movement of the eye itself is exceedingly fast, typically only taking about 30ms and reaching up to speeds of 900 degrees per second [Goldberg et al. 1991]. Between saccades, the eyes fixate the object of interest for a variable period of time, so that the visual system can process the optical information available in that location [Pannasch et al. 2001]. During free viewing of a scene, fixation durations are highly variable, ranging from less than 100 ms to several seconds [Buswell 1935], however they are typically held for around 300ms [Rao et al. 1997]. Two attentional mechanisms are thought to govern saccadic eye movements: a spatial system, which tells the eye where to go (the ‘where’ system) and a temporal mechanism that tells the eye when to initiate a saccade (the ‘when’ system) [Unema et al. 2001]. Yarbus [1967], and subsequently many others, showed that once an initial eye saccade has found an object of interest and located it on the fovea, the eye subsequently performs a smooth pursuit movement to keep the object in foveal vision [Palmer 1999]. Due to the fact that the image of a successfully tracked object is nearly stationary on the retina, pursuit movements enable the visual system to extract maximum spatial information from the image of the moving object itself. Untracked objects, both stationary and moving objects with different velocities and directions to the target object, are experienced as smeared and unclear because of their motion on the retina. To experience this Palmer [1999] suggests a simple example: place your finger on this page and move it fairly quickly from one side of the page to the other. As soon as you track your moving finger, the letters and words appear so blurred you are unable to read them, but your finger is clear. Even when you stop moving your finger, only the words located within the visual angle of your fovea become sharp and thus readable. 2.3 Attention Not everything that stimulates our sensory receptors is transformed into a mental representation. Rather, humans selectively attend to some objects and events, and thus in doing so ignore others. Failures of attention play a major role in several severe mental disorders. Children with attention deficit/hyperactivity disorder are extremely distractible; this is assumed to be because they cannot focus their attention on one task 18 and ignore the many other external stimuli that might be occurring around them. Patients with obsessive-compulsive disorder are unable to inhibit unwanted thoughts and impulses [TMHF 2004]. Similarly, individuals with depression and manicdepressive illness often report difficulties in focusing attention and in suppressing unwanted thoughts [CCBFTRI 2004]. Attentional capacity varies from person to person and from time to time; drugs, alcohol, fatigue and age lessen this control. Under these conditions, the likelihood of noticing important events then declines. Attentional capacity is also a function of experience. A keyboard typist learning to type might have to think about the position of each letter he/she needs to hit on the keyboard to make up certain words and cannot let his/her mind wander. After sufficient practice, the typist should be able to type without looking at the keyboard, or having to think where the keys are located. Muscle memory has taken over and the fingers know just where to go for the typing task. When humans learn to perform tasks automatically, they need no longer pay attention to them and thus can focus their attention onto other matters. However, automatic responses can also lead to disastrous results. For example, there was an airline pilot who was operating an aircraft very similar, but not identical to one that he usually flew. A fire started in one of the engines, so he flipped the switch that he thought would cut the fuel supply. However, in this new plane the cut-off fuel switch was in a slightly different position. The same physical motion that set the fuel supply to off in the plane he was use to caused the fuel flow actually to increase in the new plane. Naturally, the engine burst into a massive fire. A beginner, who would have to think about the locations of the appropriate switches, would probably not make that error [VEHF 2003a]. Psychologists have developed many ways to assess normal and abnormal attention. For example, in dichotic listening experiments, subjects wear earphones and are asked to repeat a message sent to one ear while ignoring different messages sent simultaneously to the other ear. This task is relatively difficult when presented with similar (e.g. both male or both female) voices, but relatively easy when the two messages are presented in different (e.g. female and male) voices. In the latter case, humans are greatly helped by the difference in voice quality [Cherry 1953]. Dichotic listening thus provides compelling evidence for limits on attention. Broadbent, a researcher who performed some of the first dichotic listening experiments, theorized that our mind can be conceived as a radio receiving many channels at once. 19 Each channel contains distinct sensory perceptions, as in the two auditory events in the dichotic listening task. Due to the fact that our attention is limited, it is difficult to spread attention thinly over several channels at once. In fact, humans only have enough resources to effectively attend to any one channel at once. Therefore, humans need some mechanism to limit the information that is taken in. Broadbent’s Filter Theory was the first to fill this role, also referred to as the ‘early selection’ theory of attention [Broadbent 1958]. This was one of the first perceptual theories to be cast in terms of information processing operations and flow diagrams, Figure 2.15. It is called the ‘early selection’ theory because attention is assumed to play its role early in perceptual processing. In particular, attention is assumed to precede recognition. The basic idea is that sensory inputs are stored in parallel in the sensory store and the attentional filter acts to select one ‘channel’ of information for further processing. Unattended information quickly decays from the sensory store unless the perceiver shifts the attentional filter to that channel. Although the perceiver can pick up on some of a stimulus’s physical characteristics preattentively, access to information about the identity of that stimulus requires attention. Thus the selective filter acts as a switch that allows the information from only one sensory channel to reach the higher-level perceptual system, which in turn only has a limited capacity. Figure 2.15: Diagram to show Broadbent’s Filter Theory of auditory attention [Broadbent 1958]. Subsequent studies showed that attention was not quite as simple as this. Moray [1959] found that subjects were likely to hear their own name if it was presented in the unattended channel. For example, if you are at a party in the middle of a conversation and someone at the next table mentions your name, it is very possible that you would notice it despite paying your full attention to the conversation you’re having at that moment: this is known as the cocktail party phenomenon. The cocktail-party effect also 20 works for other words of personal importance, such as the name of your favourite restaurant or a movie that you just saw, or the word ‘sex’. This fact causes problems for an early selection theory such as Broadbent’s because it suggests that recognition of your name occurs before selection, not after it as his theory would predict. This difficulty was overcome by Treisman’s Attenuation Theory [Treisman 1960] that suggests that selection operates both in early and late stages of the attention process, i.e. there are two stages to the selection model. In choosing what to pay attention to, a selective filter firstly processes incoming information, and makes its selection according to the physical characteristics of the information. The second stage, then contains the thresholds for certain words or objects, thus, for example, your name would have a very low threshold. If attention operates very early, then it is unclear how the attention system can determine what is important. Conversely, if attention operates relatively late, after a good deal of processing has already been done, it is easy to determine what is important, but the advantage of selection would be lost because a lot of irrelevant information will already have been processed. The question as to whether selection takes place at an early or a late stage of processing is an empirical question to which many experiments have been directed [Palmer 1999]. 2.3.1 Visual Attention Visual attention is very similar to auditory attention. Visual scenes typically contain many more objects than can ever be recognised or remembered in a single glance and thus, as stated before, some kind of sequential selection of objects for detailed processing is essential if humans are to cope with this wealth of information. You must therefore be selective in what you attend to visually, and what you select will greatly depend on your needs, goals, plans and desires. Although there is certainly an important sense in which a beer is always a beer, how you react to one depends a great deal on whether or not you have just finished a 10 day no-alcohol detox or are suffering from a bad hangover. After a 10-day no-alcohol detox your visual attention would undoubtedly be drawn immediately to the beer; however the day after a big night out when you are suffering from a hangover, you would probably visually ignore the beer, and if you did not, the sight as well as the smell of it might literally nauseate you. This example shows that perception is not 21 entirely stimulus-driven, i.e. it isn’t determined solely by the nature of the electromagnetic radiation stimulating the sensory organs but is influenced to some extent by cognitive constraints: higher-level goals, plans, and expectations. Coded into the primary levels of human vision is a powerful means of accomplishing this selection, namely the fovea, as described in Section 2.1. If detailed information is needed from many different areas of the visual environment, it can only be obtained by redirecting the eye so that the relevant objects fall sequentially on the fovea. 2.3.2 Top-down versus Bottom-up processes There are two general visual attention processes, called bottom-up and top-down, which determine where humans locate their visual attention [James 1890]. The bottom-up process is purely stimulus driven, for example a candle burning in a dark room; a red ball amongst a large number of blue balls; or the lips and eyes of a human face as they are the most mobile and expressive elements of the face. In all these cases the visual stimulus captures attention automatically without volitional control. The top-down process, on the other hand, is directed by a voluntary control process that focuses attention on one or more objects that are relevant to the observer’s goal when studying the scene. Such goals or tasks may include looking for street signs, searching for a target in a computer game, or counting the number of pencils in a mug. In this case attention, which is normally drawn due to conspicuous aspects in a scene, deliberately ignores these conspicuous aspects because they are irrelevant to the goal at hand. The study of saccadic exploration of complex images was pioneered by the Russian psychologist Yarbus [Yarbus 1967]. By using relatively crude equipment he was able to record the fixations and saccades observers made while viewing natural objects and scenes. Specifically he studied saccadic records made by observers studying an image after they were given a particular task. Each observer then viewed the scene with that particular question or task in mind. This seeking information of a specific kind has a significant affect on the eye-gaze pattern. To illustrate this, Yarbus instructed several observers to answer a number of different questions concerning the depicted situation in Repin’s picture ‘An Unexpected Visitor’ [Yarbus 1967]. This resulted in seven substantially different patterns, each one once again being easily construable as a sampling of those picture objects that were most informative for the answering of the 22 question, as shown in Figure 2.16. Land and Furneaux [1997] have also studied the role of saccades using a variety of visuo-motor tasks such as driving, music reading and playing ping pong. In each case the scan path of the eye was found to play a central functional role, closely linked to the ongoing task demands. Researchers have examined the effects of focusing human attention and have reported that there are two main visual side effects from dividing or disrupting visual attention; these are Change Blindness [Rensink et al. 1997] and Inattentional Blindness [Mack and Rock 1998]. This chapter shall now discuss these two phenomena in more depth, describing what is known from the two effects. Figure 2.16: Effects of task on eye movements. The same picture was examined by subjects with different instructions; 1. Free viewing, 2. Judge their ages, 3. Guess what they had been doing before the “unexpected visitor’s” arrival, 4. Remember the clothes worn by the people, 5. Remember the position of the people and objects in the room & 6. Estimate how long the “unexpected visitor” had been away from the family [Yarbus 1967]. 2.3.3 Change Blindness Imagine yourself walking on the street and someone stops you to ask for directions to the train station. While you explain how to get there, the conversation is briefly interrupted by construction workers carrying a door between you and the person. When the workers have passed, the person you were talking to has been replaced by another person, who carries on the conversation as if nothing happened. Would you notice that this person is someone else? “Of course” is the intuitive answer. However, in a study where this was actually done, the change was noticed in only 50% of the people [Simons and Levin 1998]. The ones who did not notice it are not a special kind of 23 people. A myriad of evidence shows that normal human beings are simply very poor at detecting changes in the visual scene in a wide variety of situations. Levin and Simons [1997] have concluded that ‘Our intuition that we richly represent the visual details of our environment is illusory’. O’Regan et al. [1999a] suggests that ‘We have the impression of simultaneously seeing everything, because any portion of the visual field that awakens our interest is immediately available for scrutiny through an unconscious flick of the eye or of attention’. If a change occurs simultaneously with a brief visual disruption, such as an eye saccade, flicker or a blink that interrupts attention then a human can miss large changes in their field of view; Rensink et al. have termed this phenomenon Change Blindness. Thus ‘Change blindness is the inability of the human to detect what should be obvious changes in a scene’ [Rensink et al. 1997]. This concept has long been used by stunt doubles in film and has also been the reason why certain film mistakes have gone unnoticed [Balazas 1945]. For example in the movie ‘American Pie’ in the bedroom scene the girl is holding a clear cup full of beer. The camera goes off her and when it comes back she is holding a blue cup. The camera then leaves her but when it returns to focus on her again the cup is clear once more [MM 2003]. The onset of a visual disruption swamps the user’s local motion signals caused by a change, short-circuiting the automatic system that normally draws attention to its location. Without automatic control, attention is controlled entirely by slower, higherlevel mechanisms in the visual system that search the scene, object by object, until attention finally lands upon the object that is changing. Once attention has latched onto the appropriate object, the change is easy to see, however this only occurs after exhaustive serial inspection of the scene [Rensink 2001]. Change Blindness has been researched from a psychological point of view – looking into why these flaws occur and thus leading to a greater understanding of how the visual system works [Rensink 1999a; 1999b; Noe et al. 2000; O’Regan and Noe 2000]. Although a number of paradigms have been used to study this change detection, the three most frequently used are the ‘Flicker’ and ‘Mudsplash’ paradigms [Rensink et al. 1997] and the ‘Forced Choice Detection’ paradigm [Pashler 1988; Phillips 1974; Simons 1996]. In the flicker and mudsplashed paradigms, an original image is displayed, for approximately 240ms followed by a blank image, in the case of the flicker paradigm, or the original image with mudsplashes, for approximately 240ms and 24 then the original image, which has been modified, for 240ms before finally showing the blank image or the modified image with mudsplashes for 240ms, Figure 2.17. The onset of the blank image or mudsplashes recreates the visual disruption, like a blink or an eye saccade. To create the images Rensink et al. took photographs of a variety of scenes modifying each image by making a presence or a location alteration. The observers responded as soon as they detected the changing object. Research using the flicker paradigm has produced two primary findings: 1) observers rarely detect changes during the first cycle of alternation, and some changes are not detected even after nearly one minute of alternation [Rensink et al. 1997]; and 2) changes to objects that are of ‘central interest’ of a scene are detected more readily than peripheral or ‘marginal interest’ changes [Rensink et al. 1997], suggesting that attention is focused on central objects either more rapidly or more often, thereby allowing faster change detection. Central interest aspects tend to concern what one would be tempted to call the main theme of the scene, whilst marginal aspects of a scene are those aspects which are most commonly ignored in a scene. Thus by manipulating whether changes caused are to occur to Central or Marginal Interest aspects in the experiments, the degree of attention that subjects were expected to pay to the changes can be controlled. In Figure 2.18a the central interest aspects are the wine, cheese, bread etc, whilst the marginal interest aspect would be the brown, pleated curtain behind the foreground objects. The forced choice detection paradigm, which was originally designed to investigate visual memory, looks at the ability to detect a change in briefly presented arrays of simple figures or letters [Pashler 1988; Phillips 1974]. In this paradigm observers only receive one viewing of each scene before responding, so the total duration of exposure to the initial scene can be controlled more precisely. For example, an initial display would be presented for 100-500 ms, followed by a brief Inter-Stimulus Interval (ISI), followed by a second display in which one of the items was removed or replaced on half the trials. The responses of the observers are forced-choice guesses about whether a change had occurred or not. Observers were found to be poor at detecting change if old and new displays were separated by an ISI of more than 60-70 ms [Pashler 1988]. 25 Figure 2.17: The experiment designed by O’Regan et al. [Rensink 2001]. Figure 2.18a & b): Examples of the modifications (presence and location) made by O’Regan et al. to the photographs, a) (top) shows a presence alteration where the cheese has been removed, b) (bottom) shows a location alteration where the bar behind the people has been moved [O’Regan et al. 2001]. 2.3.4 Inattentional Blindness ‘How much of the visual world do humans perceive when they are not attending to it? Do humans see certain kinds of things because they have captured their attention, or is this because their perception of them is independent of their attention?’ Mack and Rock researched these questions, as well as numerous others. What they found was that there is ‘no conscious perception without attention’ [Mack and Rock 1998]. Attention primarily selects a region in space like a ‘spotlight’ (space-based attention), and objects 26 are constructed thus by virtue of attention. This refutes the previous pre-attentional perception theories. Gestalt psychologists believe that the organisation of the visual field into separate objects occurs automatically at an early stage in the processing of visual information, i.e. before focused attention can occur [Wertheimer 1924/1950]. What Mack and Rock found out also refutes that attention is inherently intentional, i.e. it must be directed to some thing. Psychologists, such as Triesman and Neisser, believed it must exist prior to the activation of attention, thus attention directly selects the objects (object-based attention), and thus attention is limited by the number of objects that can be processed at once [Triesman 1982; Neisser 1967]. Most people will agree that they have experienced occasions when this is not the case, for instance they have experienced looking without seeing. A car driver looks left down a pavement and pulls forward into a driveway. She hears a thud, looks down and sees a bicyclist on the ground near her left front bumper. Or a nurse pulls a vial from a cabinet. She looks at the label, fills the syringe and then injects the patient. The patient receives the wrong drug and dies. These are real accidents that have occurred and a large number of others occur under strikingly similar circumstances: someone performing a task simply fails to see what should have been plainly visible. Afterwards, the person cannot explain the lapse. These examples will most commonly have occurred when they have been completely absorbed in a task or concentrating intensively. Due to this effect humans can miss unattended items in a scene – this is called Inattentional Blindness and it is ‘the failure of the human to see unattended items in a scene’ [Mack and Rock 1998]. During these short periods of time, even though our eyes are open and all the objects in a scene are projected onto our retina, humans perceive very little detail about them, if anything. This is because visual attention allows the visual system to process visual input preferentially by shifting attention about an image, giving more attention to salient locations and less attention to unimportant regions. When attention is not focused onto items in a scene they can literally go unnoticed, which is what Mack and Rock state as Inattentional Blindness. Although the phenomenon has long been known, recent evidence shows that it is much more pervasive that anyone had imagined and that it is one of the major causes of accidents and human error. To understand how Inattentional Blindness occurs, it is necessary to accept a very unintuitive idea: most of our perceptual processing occurs outside of conscious 27 awareness. Our senses are bombarded with such a large amount of input, sights, sounds, smells, etc., that our minds cannot fully process it all. The overload becomes even worse when humans recall information from memory or are engaged in deep thought. To cope with the problem, humans have evolved a mechanism called attention, which acts as a filter that quickly examines sensory input and selects a small percentage for full processing and for conscious perception. The remaining information is lost, unnoticed and unremembered - humans are thus inattentionally blind to it, since it never reached their consciousness. This all happens without their awareness, so it is not a behaviour that people can bring under a conscious control. Research suggests that inattentional blindness is affected by four factors: conspicuity, mental workload, expectation and capacity [VEHF 2003a]. When humans are just casually looking around, sometimes an object will jump out or ‘pop-out’ of the background at you [Wang et al. 1994]. The term ‘conspicuity’ refers to this ability to capture attention. There are two general types of factors that determine conspicuity. One is sensory conspicuity, the physical properties of the object. The most important sensory factor is contrast [Gilchrist et al. 1997]. Humans see objects, not because of their absolute brightness, but by their contrast with the background. When there is higher contrast, objects are more conspicuous. For example, black cars are involved in many more accidents, presumably because they are harder to notice at night, i.e. there is less contrast between them and the surrounding environment. Humans also are more likely to notice objects which are large and which move or flicker. That’s why school buses, police cars, ambulances, railway crossings etcetera, all use flickering lights. Other factors that affect an object’s conspicuity are colour [D’Zmura 1991], size [Green 1991; Green 1992] and motion [Northdurft 1993]. Cognitive conspicuity is the other factor that determines conspicuity. It is equally or more important in its effect at drawing attention. Humans are much more likely to notice things that are relevant or familiar to them in some way [Wang et al. 1994]. Again this is where the cocktail phenomenon comes into play. Errors, however, often occur when there is a new and unusual combination of circumstances in a highly familiar circumstance. The driver who hit the bicyclist had pulled into the same driveway every day for a year and had never seen anyone. She had unconsciously learned that there wasn’t anything important to see down the sidewalk. Or the nurse that was used to picking out the same size and shape bottle for a particular drug, but one day 28 it ended up containing a different drug resulting in disastrous circumstances [VEHF 2003a]. Since the amount of attention that humans have is roughly fixed, the more attention that is focused on one task, the less there is for others. Inattentional Blindness often occurs because part of the attention is devoted to some secondary task. In theory, for example, any mental workload such as speaking on a mobile phone, working out what to cook for dinner, or carrying on a conversation with someone in the back seat can absorb some of the attentional capacity and lead to Inattentional Blindness. However, it is not always so simple. The notion that attentional capacity is constant is only approximately true. There is ample evidence that visual and auditory senses employ partially independent attentional pools. That means that an auditory task (listening to the radio) will interfere less with a visual task (seeing a pedestrian), than a second visual task would (focusing on the car ahead) [VEHF 2003b]. Expectation also has a powerful effect on our ability to see and to notice. For example when out with a friend you might remember more what they were wearing so if you agree to meet up in half an hour you will subconsciously be looking for, i.e. expecting to see, someone wearing that particular colour clothing. Coloured blobs are far easier to scan and search for than the finer details of facial features. This strategy usually works, but let’s say your friend has bought a new coat whilst you were separated. On an occasion such as this, you might end up walking right by her, completely blind to the other features, all highly familiar, which should have attracted your attention to your friend, but because you are expecting to see a certain colour your attention is controlled by your expectation. Inattentional Blindness accidents are usually caused by a combination of factors: low conspicuity, divided attention and high expectation or lower arousal. It is a natural consequence of our adaptive mental wiring. Humans only have a certain capacity and are therefore only able to consciously perceive a small percentage of the available information that flows into their senses leaving them blind to the rest [Wickens 2001]. 2.3.4.1 Inattentional Blindness in Magic One of the fundamental aspects of magic is the ability to perform an undetected action, or slight of hand, that goes unnoticed by the spectator. Many of these slights rely on 29 misdirecting the audience’s attention. Magic tricks rely on misdirection, therefore once this is known to the observer it becomes very obvious how the magic trick works. The failure of detecting this action is a prime example of Inattentional Blindness, in which subjects fail to notice very obvious visual events because their attention is focused elsewhere. Figure 2.19 shows a simple magic trick demonstrating this phenomenon. Figure 2.19a: Choose one of the six cards displayed above. Choose one of the six cards shown in Figure 2.19a, now memorize it. Really focus on that card, for I’m going to try to work out which card it is that you have chosen. Figure 2.19b: Can you still remember your card? I have removed the card that I think is yours from the cards displayed in Figure 2.19b. Can you still remember which card that you chose? Now look at Figure 2.19c to see if I correctly removed the card that you selected. You just experienced Inattentional Blindness! When I asked you to select a card you focused all your attention on that single card. Thus you suffered Inattentional Blindness to the other cards that were displayed. When the five cards were shown in Figure 2.19c, you immediately looked for your card only - and it’s not there. Why? None of the original six cards were displayed, but due to Inattentional Blindness you looked but didn’t attend to the other cards. Therefore you can’t remember any of them to check against the five remaining cards. Thus it looks like only your card is missing! 30 Figure 2.19c: Only five cards remain. Can you remember your card? I must have chosen your card and removed it. Is this coincidence or is it an illusion? 2.3.5 An Inattentional Paradigm As there was no previous paradigm for inattention a new one had to be developed by Mack and Rock [1998] which guaranteed that the observer would neither be expecting nor looking for the object of interest, but instead would be looking in the general area in which it was to be presented. It was also important to engage the subject’s attention with another task, because without some distraction task, it seemed possible that by default attention might settle on the only object present. This distraction task was to report which arm of a cross, which was presented briefly at 76cm away from the observer on a PC screen, was longer. This cross was centred at the point of fixation. In the experiment the point of fixation was shown first for 1500ms; participants were told to stare at this fixation point mark. Then the distraction task cross was presented on the screen for 200ms, which is generally less time than it takes to move the eyes from one location in space to another, i.e. to make a saccadic movement. Then finally a pattern mask appeared for 500ms that covered the entire area of the visible screen, a circular area about 8.9 degrees in diameter. This mask was to eliminate any processing of the visual display after it disappeared from the screen. When the mask disappeared, subjects reported which line of the cross seemed to be longer than the other. This procedure was repeated for the first two or three trials and on the third or fourth trial a critical stimulus was introduced in one of the quadrants of the cross (see Figure 2.20) within 2.3 degrees of the point of fixation. Immediately following a trial in which a critical stimulus had been presented subjects were asked in addition to which arm of the cross was longer, whether or not they had seen anything on the screen other than the cross figure. If observers reported they had seen something they were asked to 31 identify it from a selection of alternatives. In many of their experiments they actually asked the observers to select from the alternatives even if they hadn’t reported seeing any other stimulus. This would then indicate if the observers correctly selected the stimulus that the observers had in fact perceived without awareness or had perceived and quickly forgotten. Figure 2.20: Critical Stimulus in parafovea: noncritical and critical trial displays [Mack and Rock 1998]. Figure 2.21 shows the experimental ordering for the observers. Note how there is only one critical stimulus trial for each observer per section, this is due to the fact that once you have asked the observer about something else on the screen then they become alerted to the fact that the experiment may not just be testing the ability to detect which arm of the cross is longer. If this is the case the observers may then be actively looking for something else as well as trying to complete the distraction cross task, thus the observer’s attention has now become divided. After three or four trials with the divided attention observers were then told to ignore the distraction task and report only if there 32 was something else present. This was labelled full attention, as the observers were paying full attention to the possibility that a critical stimulus may be present. Inattentional Blindness Paradigm – Mack and Rock 1998 Inattention Trial (Report distraction task only) 1. 2. 3. Distraction Task Distraction Task Distraction Task and Critical Stimulus Divided Attention Trial (No new instructions but participants are now alerted to the fact that the experiment may be testing other criteria than the ability to correctly distinguish which arm of the cross is longer) 4. Distraction Task 5. Distraction Task 6. Distraction Task and Critical Stimulus Full Attention Trial (Ignore distraction task; report only the presence of something else) 7. Distraction Task 8. Distraction Task 9. Distraction Task and Critical Stimulus Figure 2.21: Table of the experimental ordering for observers [Mack and Rock 1998]. Their initial results showed that 25% of the observers failed to detect the presence of the critical stimulus for the inattention trials, whether the stimulus was a moving bar, a black or coloured square, or some coloured, geometric form. Even when prompted observers could not pick the correct stimulus from a selection of alternatives greater than chance, this means that the observers were not actually perceiving the critical stimulus and then not remembering. However, all the observers perceived the critical stimulus for the divided attention and full attention trials. This result of the inability to perceive seemed to be caused by the fact that subjects were not attending to the stimulus but instead were attending to something else, namely the cross. This led Mack and Rock to term this phenomenon as Inattentional Blindness, and then go on to adopt the hypothesis that there is no perception without attention. It must however be noted that Mack and Rock were using the term perception to refer to explicit conscious awareness and not subliminal, unconscious, or implicit perception. 33 100 Performance relative to Chance 90 80 70 Shape 60 Number Color 50 Location 40 Perfect Performance 30 Chance Performance 20 10 0 Inattention Divided Full Attentional Condition Figure 2.22: Results from the inattention paradigm. Subjects perform better than chance at recognising location, colour, and number of elements but not shape [Mack and Rock 1998]. Having introduced Inattentional Blindness, Mack and Rock came up with lots more questions which they wanted to solve, such as: did inattention vary with different types of stimuli? Figure 2.22. As well as could they increase the inattentional phenomenon by manipulating attention? This last question led to a second set of experiments where the cross was moved to 2 degrees from the point of fixation and instead of the critical stimulus appearing within 2.3 degrees of the fixation mark, it was placed actually at this point, Figure 2.23. Their expectation was that this change to the experiment should eliminate inattentional blindness because how could an observer fail to detect a stimulus presented for 200ms at the actual point of fixation. However the opposite occurred, not only did the observers not identify the critical stimulus more often, but the amount of inattentional blindness more than doubled, between 60-80% now failed to detect the critical stimulus depending on the type. This result strongly proved even more so Mack and Rock’s earlier hypothesis that there is no perception without attention. ‘It is to be assumed that attention normally is to be paid to objects at fixation, then when a visual task requires attending to an object placed at some distance from the fixation, attention to objects at the fixation might have to be actively inhibited’ [Mack and Rock 1998]. This then could explain the fact that inattentional blindness is so much greater when the inattention stimulus is presented at fixation. 34 Figure 2.23: Critical stimulus at fixation: non-critical and critical stimulus displays [Mack and Rock 1998]. 2.4 Summary In this chapter the anatomy of the human visual system was covered, describing both its strengths and weaknesses. To recap in brief, the human eye only has a very small percentage of the retina where visual acuity is paramount; this area is called the fovea. It is densely packed with cone photoreceptors, which are responsible for visual acuity. Thus to perceive anything in detail humans must move their eyes, via saccades, to locate the fovea on the particular object of interest. Human vision research has shown that humans do not actually consciously perceive anything, even if the fovea is located on a particular object, unless attention is involved [Rensink et al. 1997; Mack and Rock 1998; Simons and Levin 1998]. It is precisely this phenomenon that is the basis for the research in this thesis. 35 Chapter 3 Rendering Realistic Computer Graphical Images As described in Chapter 1 there is a great need for realistic rendering of computer graphical images in real time. The term ‘realistic’ is used broadly to refer to an image that captures and displays the effects of light interacting with physical objects, as occurs in real environments, and looks authentic to the human eye, whether it be a painting, a photograph or a computer generated image, Figure 3.1. You probably learnt this lesson in eating at seafood restaurants; if it smells like fish, it is not good fish. Well a similar principle applies in computer graphics; if it looks like computer graphics, it is not good computer graphics [Birn 2000]. There were no previously agreed-upon standards for measuring the actual realism of computer-generated images. Sometimes physical accuracy is used as the standard to be achieved, other times it is a perceptual criteria, even in many cases an undetermined ‘looks good’ criteria can be used. Thus Ferwerda [2003] proposes three different standards of realism that require consideration when evaluating computer graphical images including the criterion that needs to be met for each kind of realism. These are physical realism- in which the image provides the same visual stimulation as the scene; photo realism- in which the image produces the same visual response as the scene; and functional realism- in which the image provides the same visual information as the scene [Ferwerda 2003]. This chapter describes how different rendering models produce their realistic graphical images. 36 Figure 3.1: The goal of realistic image synthesis: an example from photography [Stroebel et al. 1986]. Rendering is fundamentally concerned with determining the ‘most appropriate’ colour (i.e. RGB) to assign to a pixel in the viewing plane, which is associated with an object modeled in a 3D scene. The colour of an object at a point that is perceived by a viewer depends on several different factors: • The geometry of the object at that point (normal direction), • The position, geometry and colour of the light sources in the scene, • The position and visual response of the viewer, • The surface reflectance properties of the object at that point, and, • The scattering by any participating media (e.g. smoke, rising hot air). Rendering algorithms differ in the assumptions they make regarding lighting and reflectance in the scene. From physics, models can be derived of how light reflects from surfaces and produces what is perceived as colour, these are called Illumination models. In general, light rays leave a light source, e.g. a lamp or the sun, which are then reflected from many surfaces finally being reflected into our eyes, or through an image plane of a camera. There are two main types of rendering illumination models, local and global. Local illumination algorithms consider lighting only from the light sources and ignore the effects of other objects in the scene (i.e. reflection off other objects or 37 shadowing) whilst Global illumination algorithms account for all modes of light transport [Dingliana 2004]. Illumination models can create either 1) view dependent solutions, which determine an image by solving the illumination that arrives through the viewing plane only, for example ray tracing, or 2) view independent solutions, which determine the lighting distribution in an entire scene regardless of the viewing position. Views are then taken, after the lighting simulation has been completed, by sampling the full solution to determine the view through the viewing plane, for example radiosity. 3.1 Local illumination The contribution from the light that goes directly from the light source and is reflected from the surface is called a local illumination model. So, for a local illumination model, the shading of any surface is independent from the shading of all other surfaces. The first problem that has to be addressed in order to create shaded images of threedimensional objects is the interaction of light with a surface. This may include emission, transmission, absorption, refraction, interference and reflection of light [Palmer 1999]. • Emission is when light is emitted from an object or surface, for example the sun or man-made sources, such as candles or light bulbs. Emitted light is composed of photons generated by the matter emitting the light; it is therefore an intrinsic source of light. • Transmission describes a particular frequency of light that travels through a material returning into the environment unchanged, Figure 3.2. As a result, the material will be transparent to that frequency of light. Most materials are transparent to some frequencies, but not to others. For example, high frequency light rays, such as gamma rays and X-rays, will pass through ordinary glass, but the lower frequencies of ultraviolet and infrared light will not. Figure 3.2: Light transmitted through a material. 38 • Absorption describes light as it passes through matter resulting in a decrease in its intensity, Figure 3.3, i.e. some of the light has been absorbed by the object, an incident photon can be completely removed from the simulation with no further contribution to the illumination within the environment if the absorption is great enough. Figure 3.3: Light absorbed by a material. • Refraction describes the bending of a light ray when it crosses the boundary between two different materials, Figure 3.4. This change in direction is due to a change in speed. Light travels fastest in empty space and slows down upon entering matter. The refractive index of a substance is the ratio of the speed of light in space (or in air) to its speed in the substance. This ratio is always greater than one. Figure 3.4: Light refracted through a material. • Interference is an effect that occurs when two waves of equal frequency are superimposed. This often happens when light rays from a single source travel by different paths to the same point. If, at the point of meeting, the two waves are in phase (the crest of one coincides with the crest of the other), they will combine to form a new wave of the same frequency, however the amplitude of this new wave is the sum of the amplitudes of the original waves. The process of forming this new wave is called constructive interference [Flavios 2004]. If the two waves meet out of phase (a crest of one wave coincides with a trough of the other), the result is a wave whose amplitude is the difference of the original amplitudes. This process is called destructive interference [Flavios 2004]. If the original waves have equal amplitudes, they may completely destroy each other, 39 leaving no wave at all. Constructive interference results in a bright spot; destructive interference produces a dark spot. • Reflection considers incident light that is propagated from a surface back into the scene. Reflection depends on the smoothness of the material’s surface relative to the wavelength of the radiation [ME 2004]. A rough surface will affect both the relative direction and the phase coherency of the reflected wave. Thus, this characteristic determines both the amount of radiation that is reflected back to the first medium and the purity of the information that is preserved in the reflected wave. A reflected wave that maintains the geometrical organization of the incident radiation and produces a mirror image of the wave is called a specular reflection, as can be seen in Figure 3.5. Figure 3.5: Light reflected off a material in different ways, from left to right, specular, diffuse, mixed, retro-reflection and finally gloss [Katedros 2004]. Bouknight [1970] introduced one of the first models for local illumination of a surface. This included two terms, a diffuse term and an ambient term. The diffuse term is based upon the Lambertian reflection model, which makes the value of the outgoing intensity equal in every direction and proportional to the cosine of the angle between the incoming light and the surface normal. The ambient term is constant and approximates diffuse inter-object reflection. Gouraud [1971] extended this model to calculate the shading across a curved surface approximated by a polygonal mesh. His method calculated the outgoing intensities at the polygon vertices, and then interpolated these values across the polygon, Figure 3.6 (middle). Phong [1975] introduced a more sophisticated interpolation scheme where the surface normal is interpolated across a polygon, and the shading calculation is performed at every visible point, Figure 3.6 (right). He also introduced a specular term. Specular reflection is when the reflection is stronger in one viewing direction, i.e. there is a bright spot called a specular highlight. This is readily apparent on shiny surfaces. For an ideal reflector, such as a mirror, the angle of incidence equals the angle of specular reflection. Although this model is not physically based, its simplicity and efficiency make it still the most commonly used local reflection model. 40 Figure 3.6: The differences between a simple computer generated polyhedral cone (left), linearly interpolated shading to give appearance of curvature (Gouraud Shading). Note Mach bands at edges of faces (middle) and a more complex shading calculation, interpolating curved surface normals (Phong Shading). This is necessary to eliminate Mach Bands (right). 3.2 Global illumination A global illumination model adds to the local illumination model, the light that is reflected from other non-light surfaces to the current surface. A global illumination model is more comprehensive, more physically correct, and produces more realistic images because it simulates effects such as colour bleeding, motion blur, caustics, soft shadows, anti-aliasing, and area light sources. Global illumination can generate images that are physically accurate. When measured data is used for the geometry and surface properties of objects in a scene, the image produced should then be practically indistinguishable from reality. However it is also more computationally expensive. Global illumination algorithms work by solving the rendering equation proposed by Kajiya [1986]: Lout = LE + LIn ƒr cos(θ) d θ where Lout is the radiance leaving a surface, LE is the radiance emitted by the surface, LIn is the radiance of an incoming light ray arriving at the surface from light sources and other surfaces, ƒr is the bi-directional reflection distribution function of the surface, θ is the angle between the surface normal and the incoming light ray, and d differential solid angle around the incoming light ray. 41 θ is the The rendering equation is graphically depicted in Figure 3.7. In this figure LIn is an example of a direct light source, such as the sun or a light bulb, L’In is an example of an indirect light source i.e. light that is being reflected off another surface, R, to surface S. The light seen by the eye, Lout, is simply the integral of the indirect and direct light sources modulated by the reflectance function of the surface over the hemisphere . Figure 3.7: Graphical Depiction of the rendering equation [Yee 2000]. The problem of global illumination can be seen when you have to solve the rendering equation for each and every point in the environment. In all but the simplest case, there is no closed form solution for such an equation so it must be solved using numerical techniques and therefore this implies that there can only be an approximation of the solution [Lischinski 2003]. For this reason most global illumination computations are approximate solutions to the rendering equation. The two major types of graphics systems that use the global illumination model are radiosity and ray tracing. 3.2.1 Ray tracing The first global illumination model was ray tracing [Whitted 1980], which addresses the problems of hidden surface removal, refraction, reflection and shadows. Rays of light are traced from the eye through the centre of each pixel of the image plane into the scene, these are called primary rays. When each of these rays hits a surface it spawns two child rays, one for the reflected light and one for the refracted light. This process 42 continues recursively for each child ray until no object is hit, or the recursion reaches some specified maximum depth. Rays are also traced to each light source from the point of intersection, these are called shadow rays, and they account for direct illumination of the surface, Figure 3.8. If this shadow ray hits an object before intersecting with the light source(s), then that point under consideration is in shadow. Otherwise there must be clear path from the point of intersection of the primary ray to the light source and thus a local illumination model can be applied to calculate the contribution of the light source(s) to that surface point. The simple ray tracing method outlined above has several problems. Due to the recursion involved, and the possibly large number of rays that may be cast, the procedure is inherently expensive. Diffuse interaction is not modelled, nor is specular interaction, other than that by perfect mirrors and filters. Surfaces receiving no direct illumination appear black. To overcome this an indirect illumination term, referred to as ambient light, is accounted for by a constant ambient term, which is usually assigned an arbitrary value [Glassner 1989]. Shadows are hard-edged, and the method is very prone to aliasing. Also the result of ray tracing is a single image for that particular position of viewing plane, making it a view-dependent technique. In ray tracing each ray must be tested for intersection with every object in the scene. Thus for a scene of significant complexity the method rapidly becomes impracticable. Several acceleration techniques have been developed, which may be broadly categorized into two approaches: reducing the number of rays and reducing the number of intersection tests. Hall and Greenberg noted that the intensity of each ray is reduced by each surface it hits, thus the number of rays should be stopped before any unnecessary recursion to a great depth occurs [Hall and Greenberg 1983]. Another approach, which attempts to minimize the number of ray object intersections, is spatial subdivision. This method encloses a scene in a cube that is then partitioned into discrete regions, each of which contains a subset of the objects in the scene. Each region may then be recursively subdivided until each sub-region (voxel or cell) contains no more than a preset maximum number of objects. Several methods for subdividing space exist. Glassner [1984] proposes the use of an octree, a structure where the space is bisected in each dimension, resulting in eight child regions. This subdivision is repeated for each child region until the maximum tree depth is reached, or a region contains less than a certain number of objects. Using such a 43 framework allows for spatial coherence, i.e. the theory that similar objects in a scene affect neighbouring pixels. Rays are traced through individual voxels, with intersection tests performed only for the objects contained within, rather than for all the objects in the scene. The ray is then processed through the voxels by determining the entry and exit points for each voxel traversed by the ray until an object is intersected or the scene boundary is reached. Figure 3.8: Ray tracing. 3.2.2 Radiosity The radiosity method of computer image generation has its basis in the field of thermal heat transfer [Goral et al. 1984]. Heat transfer theory describes radiation as the transfer of energy from a surface when that surface has been thermally excited. This encompasses both surfaces that are basic emitters of energy, as with light sources, and surfaces that receive energy from other surfaces and thus have energy to transfer. This thermal radiation theory can be used to describe the transfer of many kinds of energy between surfaces, including light energy. As in thermal heat transfer, the basic radiosity method for computer image generation makes the assumption that surfaces are diffuse emitters and reflectors of energy, emitting and reflecting energy uniformly over their entire area. Thus, the radiosity of a surface is the rate at which energy leaves that surface (energy per unit time per unit area). This includes the energy emitted by the surface as well as the energy reflected from other surfaces in the scene. Light sources in the scene are treated as objects that have self-emittance. 44 Figure 3.9: Radiosity [McNamara 2000] The environment is divided into surface patches, Figure 3.9, each with a specified reflectivity, and between each pair of patches there is a form factor that represents the proportion of light leaving one patch (patch i) that will arrive at the other (patch j) [Siegel and Howell 1992]. Thus the radiosity equation is: Bi = Ei + ρi Σ Bj Fij Where: Σ Bj Fij (energy reaching this patch from other patches) Bi = Radiosity of patch i Ei (energy emitted by patch i) Ei = Emissivity of patch i ρi Σ Bj Fij (energy reflected by patch i) ρi = Reflectivity of patch i Bj = Radiosity of patch j Fij = Form factor of patch j relative to patch i Where the form factor, Fij, is the fraction of energy transferred from patch i to patch j, and the reciprocity relationship [Siegel and Howell 1992] states: Aj Fji = Ai Fij Where Aj and Ai are the areas of patch j and i respectively, Figure 3.10. 45 Figure 3.10: Relationship between two patches [Katedros 2004]. As the environment is closed, the emittance functions, reflectivity values and form factors form a system of simultaneous equations that can be solved to find the radiosity of each patch. The radiosity is then interpolated across each of the patches and finally the image can then be rendered. The basic form factor equation is difficult even for simple surfaces. Nusselt [1928] developed a geometric analog that allows the simple and accurate calculation of the form factor between a surface and a point on a second surface. The Nusselt Analog involves placing a hemispherical projection body, with unit radius, at a point on the surface Ai. The second surface, Aj, is spherically projected onto the projection body, then cylindrically projected onto the base of the hemisphere. The form factor then may be approximated by the area projected on the base of the hemisphere divided by the area of the base of the hemisphere, Figure 3.11. Cohen and Greenberg [1985] proposed that the form factor between each pair of patches could also be calculated by placing a hemi-cube on each patch and projecting the environment on to it as defined by the Nusselt Analog. Each face of the hemicube is subdivided into a set of small, usually square (‘discrete’) areas, each of which has a precomputed delta form factor value, Figure 3.12. When a surface is projected onto the hemicube, the sum of the delta form factor values of the discrete areas of the hemicube faces which are covered by the projection of the surface is the form factor between the point on the first surface (about which the cube is placed) and the second surface (the one which was projected). The speed and accuracy of this method of form factor calculation can be affected by changing the size and number of discrete areas on the faces of the hemicube. 46 Figure 3.11: Nusselt’s analog. The form factor from the differential area dAi to element Aj is proportional to the area of the double projection onto the base of the hemisphere [Nusselt 1928]. Figure 3.12: The hemicube [Langbein 2004]. Radiosity assumes that an equilibrium solution can be reached; that all of the energy in an environment is accounted for, through absorption and reflection. It should be noted that the because of the assumption of only perfectly diffuse surfaces, the basic radiosity method is viewpoint independent, i.e. the solution will be the same regardless of the viewpoint of the image. The diffuse transfer of light energy between surfaces is unaffected by the position of the camera. This means that as long as the relative position of all objects and light sources remains unchanged, the radiosity values need not be recomputed for each frame. This has made the radiosity method particularly popular in architectural simulation, for high-quality walkthroughs of static environments. Figure 3.13 demonstrates the difference in image quality that can be achieved with radiosity compared to ray tracing. However, there are several problems with using the hemicube radiosity method. It can only model diffuse reflection in a closed environment, it is limited to polygonal 47 environments, prone to aliasing and has excessive time and memory requirements. Also, only after all the radiosities have been computed in the scene is the resultant image displayed. There is a form factor between each pair of patches, so in an environment with N patches, N2 form factors must be stored. For a scene of moderate complexity this will require a vast amount of storage, and as the form factor calculation is non-trivial the time taken to produce a solution can be extensive. This means that the user is unable to alter any of the parameters of the environment until the entire computation is complete. Then once the alteration is made, the user must once again wait until the full solution is recomputed. The visual quality of the rendered images in radiosity also strongly depends on the method employed for discretizing the scene into patches. A too fine discretization may give rise to artefacts, while with a coarse discretization, areas with high radiosity gradients may appear [Gibson and Hubbold 1997]. To overcome these problems, the discretization should adapt to the scene. That is, the interaction between two patches should account for the distance between them as well as their surface area. In other words, surfaces that are far away are discretized less finely than surfaces that are nearby. These aspects are considered by the adaptive discretization method proposed by Languénou et al. [1992]. It performs both discretization and system resolution at each iteration of the shooting process, which allows for interactivity. Gibson and Hubbold [1997] demonstrated another solution for this problem by presenting an oracle that stops patch refinement once the difference between successive levels of elements becomes perceptually unnoticeable. Progressive refinement radiosity [Cohen et al. 1988] works by not attempting to solve the entire system simultaneously. Instead, the method proceeds in a number of passes and the result converges towards the correct solution. At each pass, the patch with the greatest unshot radiosity is selected, and this energy is propagated to all other patches in the environment. This is repeated until the total unshot radiosity falls below some threshold. Progressive refinement radiosity generally yields a good approximation to the full solution in far less time and with lesser storage requirements, as the form factors do not all need to be stored throughout. Many other extensions to radiosity have been developed, a very comprehensive bibliography of these techniques can be found in [Ashdown 2004]. 48 Figure 3.13: The difference in image quality between ray tracing (middle) and radiosity (right hand image). 3.3 Radiance Radiance is a physically based lighting simulation tool for highly accurate visualization of lighting in virtual environments [Ward 1994, Ward Larson and Shakespeare 1998]. It synthesizes images from three-dimensional geometric models of physical environments. The input model may often contain thousands of surfaces and for each there must be a description of its shape, size, location, and composition. Once the geometry has been defined the information can be compiled into an octree [Glassner 1984]. As stated in Section 3.2.1, the octree data structure is necessary to accelerate the ray tracing process, i.e. for efficient rendering. Radiance employs a light-backwards ray-tracing method, extended from the original algorithm introduced to computer graphics by Whitted in 1980 [Whitted 1980], to achieve accurate simulation of propagation of light through an environment. The approach encompasses a hybrid deterministic/stochastic ray tracing approach to efficiently solve the rendering equation, while maintaining an optimum balance between speed of computation and accuracy of the solution. Light is followed along geometric rays from the viewpoint into the scene and back to the light sources. The result is mathematically equivalent to following light forward, but the process is generally more efficient because most of the light leaving a source never reaches the point of interest. The chief difficulty of light-backwards ray tracing as practiced by most rendering software is that it is an incomplete model of light interaction. In particular, as stated in section 3.2.1, this type of algorithm fails for diffuse interreflection between objects, which it usually approximated with a constant ambient term in the illumination equation. Without a complete computation of global illumination, a rendering method 49 cannot produce accurate values and is therefore of limited use as a predictive tool for lighting visualisation. Radiance overcomes this shortcoming with an efficient algorithm for computing and caching indirect irradiance values over surfaces, while also providing more accurate and realistic light sources and surface materials [Ward and Heckbert 1992]. A comprehensive description on how Radiance renders its realistic images can be found at [Radiance 2000] or by reading Rendering with Radiance - The Art and Science of Lighting Visualization [Ward Larson and Shakespeare 1998]. A strictly deterministic ray tracer produces the exactly same rendering each time that it is run, by contrast a stochastic renderer employs random processes and thus each time the algorithm is repeated it will produce slightly different results. Light in itself is stochastic, and it is only the fact that due to the large number of photons in a scene that gives the persona that it is stable at any one point [Stam 1994]. Therefore a stochastic renderer produces a more accurate outcome; however is quite time consuming to achieve a noise-free solution. Figure 3.14: Renderings of a simple environment. Ray traced Solution (left), Radiosity Solution (center), and Radiance Solution (right) [McNamara 2000]. Studies have shown that Radiance is capable of producing highly realistic and accurate imagery [Khodulev and Kopylov 1996; McNamara 2000]. It has been used to visualize the lighting of homes, apartments, offices, churches, archaeological sites, museums, stadiums, bridges, and airports [Radiance 2000]. It has also answered questions about light levels, aesthetics, daylight utilization, visual comfort and visibility, energy savings potential, solar panel coverage, computer vision, and circumstances surrounding accidents [Ward Larson and Shakespeare 1998]. For these reasons Radiance was chosen as a lighting simulation package in many of the experiments presented in this thesis. 50 To give an idea of the differences between these three approaches, ray tracing, radiosity and Radiance, Figure 3.14 Shows from left to right, a ray traced image, an image generated using radiosity and finally an image computed with the Radiance lighting simulation package [McNamara 2000]. 3.4 Visual Perception in Computer Graphics Creating realistic computer graphical images is an expensive operation, and therefore any saving of computational costs without reducing the perceived quality must be substantially beneficial. Since human observers are the final ones to judge the fidelity and quality of the resulting images, visual perception issues should be involved in the process of creating these realistic images, and can be considered at the various stages of computation, rendering and display [Chalmers et al. 2000]. 3.4.1 Image Quality Metrics Typically the quality of synthesised images is evaluated using numerical techniques that attempt to quantify the fidelity of the images by a pair-wise comparison, this is often a direct comparison to a photo taken of the scene that is being recreated. Several image quality metrics have been developed whose goal is to predict the differences between a pair of images. Mean Squared Error (MSE) is a simple example of such a program. This kind of error metric is known as physically based. However to create more meaningful measures of fidelity which actually correspond to the assessments made by humans when viewing images, error metrics should be based on the computational models of the human visual system. These are known as perceptual error metrics. For these metrics a better understanding of the Human Visual System is needed, which can lead to more effective comparisons of images, but also can steer image synthesis algorithms to produce more realistic, reliable images and, as previously stated, avoid realistically synthesizing any feature that is simply not visible to the Human Visual System [Daly 1993; Myszkowski 1998]. Perceptual error metrics operate on two intermediate images of a global illumination solution in order to determine if the visual system is able to differentiate these two images. They inform a rendering algorithm when and where it can stop an 51 iterative calculation prematurely because the differences between the present calculation and the next are not perceptually noticeable. Thus, perceptually-based renderers attempt to expend the least amount of work to obtain an image that is perceptually indistinguishable from the fully converged solution, Figure 3.15. Rendering Engine Lighting Solution N Lighting Solution N+1 Perceptible difference, render solution N + 2 Visual Model Visual Model Perceptual Response Perceptual Response Perceptual Difference No perceptual difference Stop! Figure 3.15: Conceptually how a perceptually-assisted renderer makes use of a perceptual error metric to decide when to halt the rendering process [Yee 2000]. One such perceptual metric is the Visual Difference Predictor (VDP) [Daly 1993]. The algorithm consists of three major components, as shown in Figure 3.16: a calibration component, used to transform two input images into values that can be understood by the second component, a model of the human visual system. The difference of responses of the human visual system to the two images is then visualized by the difference visualization component. The VDP output is a difference image map that predicts the probability of detection of the visual differences between the two images for every pixel. 52 Viewing Parameters Image 1 Calibration HVS Model Difference Visualisation Image 2 Calibration Output of image of visible difference HVS Model Figure 3.16: Block structure of the Visual Difference Predictor [Prikryl and Purgathofer 1999]. The first stage, the calibration process takes a number of input parameters which describe the conditions for which the VDP will be computed; these include the viewing distance of the observer, the pixel spacing and the necessary values for the display mapping. The human visual model used in VDP concentrates on the lower-order processing of the visual path, i.e. that from the retina to the visual cortex. The model addresses three main sensitivity variations of the human visual system: the dependence of sensitivity on the illumination level; on the spatial frequency of visual stimuli; and on the signal content itself. Thus the first part of the human visual system model, see Figure 3.17, to account for the variations in sensitivity as a function of light level, is the application of a nonlinear response function to the luminance channel of each of the images. This also accounts for adaptation and the non-linear response of the retinal neurons. Next the image is converted to the frequency domain and the Contrast Sensitivity Function (CSF) is used to determine the visual sensitivity to spatial patterns in the retinal response image. The transformed data is weighted with the CSF i.e. the scaled amplitude for each frequency is multiplied by the CSF for that spatial frequency. This data is then normalized by dividing each point by the original image mean to give local contrast information. The Contrast Sensitivity Function is an experimentally derived equation 53 that quantifies the human visual sensitivity to spatial frequencies, as described in Section 2.2.2. The image is then divided into 31 independent streams. It is known that the human visual system has specific selectivity based on orientation (6 channels) and spatial frequency (approximately one octave per channel). Each of the five overlapping spatial frequency bands is combined with each of the six overlapping orientation bands to split the image into 30 channels. Along with the orientation-independent base band this gives a total of 31 channels. The individual channels are then transformed back into the spatial domain. A mask is associated with each channel, this mask function models the dependency of sensitivity on the signal contents due to the postreceptoral neural circuitry [Ferwerda et al. 1997]. The product of the CSF and the masking function is known as the threshold elevation factor. Contrasts of corresponding channels in one image are subtracted from those of the other images, and the difference is scaled down by the elevation factor i.e. to weight the spatial frequency and orientation signals. The scaled contrast differences are used as the argument to a psychometric function to compute a detection probability. The psychometric function yields a probability of detection of a difference for each location in the image, for each of the 31 channels. The detection probabilities for each of the channels are finally combined to derive a per pixel probability of detection value. The Sarnoff Visual Discrimination Model (VDM) [Lubin 1995] is another well designed perceptual metric that is also used for determining the perceptual differences between two images. VDM takes two images, specified in CIE XYZ colour space, along with a set of parameters for the viewing conditions as input and outputs a Just Noticeable Difference (JND) map. One JND corresponds to a 75% probability that an observer viewing the two images would detect a difference [Lubin 1997]. VDM focuses more attention on modeling the physiology of the visual pathway, therefore it operates in the spatial domain, unlike VDP which operates in the frequency domain. The main components of the VDM include spatial re-sampling, wavelet-like pyramid channelling, a transducer for JND calculations and a final refinement step to account for CSF normalization and dipper effect simulation. Li et al. [1998] discuss and compare in depth both the VDP and VDM metrics. 54 Image 1 Amplitude Non-linearity Contrast Sensitivity Spatial Frequency Hierarchy 31 Channels of Spatial Frequency and Orientation Signals Masking Function From Image 2 Masking Function Difference Psychometric Function Probability Summation Figure 3.17: An Overview of the Visual Difference Predictor – demonstrating in more detail the ordering of the processes that are involved [Yee 2000]. One example of the application of a perceptual metric is that proposed by Myszkowski [1998], which uses the quantitative measurements of visibility produced by VDP developed by Daly [1993] to improve both efficiency and effectiveness of progressive global illumination computation. The technique uses a mixture of stochastic (density estimation) and deterministic (adaptive mesh refinement) algorithms in a sequence and optimizes to reduce the differences between the intermediate and final images as perceived by the human observer in the course of lighting computation. 55 In the Myszkowski [1998] model, the VDP responses are used to support the selection of the best component algorithms from a pool of global illumination solutions, and to enhance the selected algorithms for even better progressive refinement of the image quality. The VDP is also used to determine the optimal sequential order of the component-algorithm execution, and to choose the points at which switchover between the algorithms should take place. However, as the VDP is computationally expensive, it is applied in this method exclusively at the design and tuning stages of the composite technique, and so perceptual considerations are embedded into the resulting solution, although no actual VDP calculations are performed during lighting simulation [Volevich et al. 2000], Figure 3.18. This proposed global illumination technique provides intermediate image solutions of high quality at unprecedented speeds, even for complex scenes, Myszkowski [1998] quotes a speedup of roughly 3.5 times. Myszkowski et al. [1999] addressed the perceptual issues relevant to rendering dynamic environments and proposed a perception-based spatiotemporal Animation Quality Metric (AQM) which was designed specifically for handling synthetic animation sequences. They incorporated the spatiotemporal sensitivity of the human visual system into the Daly VDP model [Daly 1993]. Thus the central part of the AQM is the model for the spatiovelocity Contrast Sensitivity Function (CSF) which specifies the detection threshold for a stimulus as a function of its spatial and temporal frequencies [Kelly 1979]. Myszkowski et al.’s framework assumes that all objects in the scene are tracked by the human eye. The tracking ability of the eye is very important in the consideration of spatiotemporal sensitivity [Daly 1998]. If a conservative approach is taken, as with Myszkowski et al. [1999], and all objects are assumed to be tracked then this effectively reduces a dynamic scene to a static scene, thus negating the benefits of spatiotemporally-based perceptual acceleration [Yee 2000]. The human visual system model that Myszkowski et al. used in the AQM was also limited to the modeling of the early stages of the visual path i.e. that from the retina to the visual cortex, Osberger [1999] showed that adding further extensions to such early vision models does not produce any significant gain. Myszkowski et al. [2001] did demonstrate applying AQM to guide the global illumination computation for dynamic environments. They reported a speedup of indirect lighting computation of over 20 times for their tested scenes, describing the resulting animation quality as much better than the frame-by-frame approach [Myszkowski et al. 2001]. Their test scenes were 56 composed of less than 100,000 triangles; however they foresee that they would achieve a similar speedup for more complex scenes due to the fact that collecting photons in the temporal domain is always cheaper than shooting them from scratch for each frame. (a) (b) (c) (d) Figure 3.18: (a) shows the computation at 346 seconds, (b) depicts the absolute differences of pixel intensity between the current and fully converged solutions, (c) shows the corresponding visible differences predicted by the VDP, (d) shows the fully converged solution which is used as a reference [Volevich et al. 2000]. Bolin and Meyer [1998] also used an application of a perceptual metric to guide their global illumination algorithm. However, instead of using VDP they used a computationally efficient and simplified variant of the Sarnoff Visual Discrimination Model (VDM) [Lubin 1995]. They used the upper and lower bounded images from the computation results at intermediate stages and used the predictor to get an error estimate for that particular stage. The image quality model was then used to control where to take samples in the image, and also to decide when enough samples have been taken across the entire image, providing a visual stopping condition. Their method executed in a 1/60th of the time of the Sarnoff VDM and rendered their example images in only 10 – 28.1% of the time taken to render the images using either uniform or objective sampling 57 techniques, performing well for all test cases even when the uniform and objective sampling techniques failed. Ferwerda et al. [1996] developed a computational model of visual adaptation for realistic image synthesis based on psychophysical experiments. The model captured the changes in threshold visibility, colour appearance, visual acuity, and sensitivity over time, all of which are caused by the visual system’s adaptation mechanisms. They used the model to display the results of global illumination simulations illuminated at intensities ranging from daylight down to starlight. The resulting images capture better the visual characteristics of scenes viewed over a wide range of illumination levels. Due to the fact that the model is based on psychophysical data it can be used to predict faithfully the visibility and appearance of scene features. This allows their model to be used as the basis of perceptual error metrics to limit the precision of global illumination calculations based on visibility and appearance criteria and could eventually lead to time-critical global illumination rendering algorithms that achieve real-time rates [Ferwerda et al. 1996]. Greenberg et al. 1997 had the now common goal to develop a physically based lighting model that encompassed a perceptually based rendering procedure to produce synthetic images that were visually and measurably indistinguishable from real-world images. To obtain fidelity of their physical simulation they subdivided their research into three parts: the local light reflection model, the energy transport phase, and the visual display procedures. The first two subsections being physically based, and the last perceptually based. The approaches and algorithms that they proposed were at the time not practical, and required excessive computational resources. However, it yielded an important scientific insight into the physical processes of light reflection and light transport and clarified the computational bottlenecks. Pattanaik et al. [1998; 2000] introduced a visual model for realistic tone reproduction. The model is based on a multi-scale representation of luminance, pattern, and colour processing in the human visual system, and provides a coherent framework for understanding the effects of adaptation on spatial vision. The model also accounts for the changes in threshold visibility, visual acuity, and colour discrimination, as well as suprathreshold brightness, colourfulness and the apparent contrast that occur with changes in the level of illumination in scenes. They then apply their visual model to the problem of realistic tone reproduction and develop a tone reproduction operator to solve 58 this quandary. Their method however is a static model of vision, thus they propose that future models should incorporate knowledge about the temporal aspects of visual processing in order to allow both dynamic scenes, and scenes where the level of illumination is dynamically changing to properly display images. Ramasubramanian et al. [1999] reduced the cost of using such metrics as VDP and VDM by decoupling the expensive spatial frequency component evaluation from the perceptual metric computation. They argued that the spatial frequency content of the scene does not change significantly during the global illumination computation step. Therefore they proposed pre-computing this information from a cheaper estimate of the scene image. They reused the spatial frequency information during the evaluation of the perceptual metric, without having to recalculate it at every iteration of the global illumination computation. They carried out this pre-computation from the direct illumination solution of the scene. However their technique does not take into account any sensitivity loss due to motion and hence is not well suited for use in dynamic environments. McNamara [2000] and McNamara et al. [2001] introduced a method for measuring the perceptual equivalence between a real scene and a computer simulation of the same scene. The model developed was based on human judgments of lightness when viewing the real scene, a photograph of the real scene and nine different computer graphics simulations including a poorly meshed radiosity solution and a ray traced image. The results of their experiments with human participants showed that certain rendering solutions, such as the tone-mapped one, were of the same perceptual quality as a photograph of the real scene. Dumont et al. [2003] propose using efficient perceptual metrics within a decision theoretic framework to optimally order rendering operations, to produce images of the highest visual quality given the system constraints of commodity graphics hardware. They demonstrate their approach by using their framework to develop a cache management system for a hardware-based rendering system that uses map-based methods to simulate global illumination effects. They show that using their framework significantly increases the performance of the system and allows for interactive walkthrough of scenes with complex geometry, lighting and material properties. 59 3.4.2 Simplification of complex models Other research has investigated how complex detail in the models can be reduced without any reduction in the viewer’s perception of the models, for example Maciel and Shirley’s visual navigation system which uses texture mapped primitives to represent clusters of objects to maintain high and approximately constant frame rates [Maciel and Shirley 1995]. Their approach works by ensuring that each visible object, or a cluster that includes it, is drawn in each frame for the cases where there are more unoccluded primitives inside the viewing frustum than can be drawn in real-time on a workstation. Their system also supports the use of traditional Level-Of-Detail (LOD) representations for individual objects, and supports the automatic generation of a certain type of LOD for objects and clusters of objects. The system supports the concept of choosing a representation from among those associated with an object that accounts for the direction from which the object is viewed. The system as a whole can be viewed as a generalization of the level-of-detail concept, where the entire scene is stored as a hierarchy of levels-of-detail that is traversed top-down to find a good representation for a given viewpoint. However their system does not assume that visibility information can be extracted from the model and thus it is especially suited for outdoor environments. Luebke et al. [2000] presented a polygonal simplification method which uses perceptual metrics to drive the local simplification operations, rather than the geometric metrics common to the other algorithms in this field. Equations derived from psychophysical studies which they ran determine whether the simplification operation will be perceptible; they then only perform the operation if its effect is judged imperceptible. To increase the range of the simplification, they use a commercial eye tracker to monitor the direction of the user’s gaze allowing the image to be simplified more aggressively in the periphery than at the centre of vision. Luebke and Hallen [2001] extended this work by presenting a framework for accelerating interactive rendering produced on their psychophysical model of visual perception. In their user trial the 4 subjects could perform not better than chance in perceiving a difference between a rendering of a full-resolution model and a rendering of a model simplified with their algorithm in 200 trials; thus showing that perceptually driven simplification can reduce model complexity without any perceptually noticeable visual effects, Figure 3.19. 60 Figure 3.19: The original Stanford Bunny model (69,451 faces) and a simplification made by Luebke and Hallen’s perceptually driven system (29,866 faces). In this view the user’s gaze is 29° from the centre of the bunny [Luebke and Hallen 2001]. Marvie et al. [2003] propose a new navigation system, which is built upon a client-server framework. With their system, one can navigate through a city model, represented with procedural models that are transmitted to clients over a low bandwidth network. The geometry of the models that they produce is generated on the fly and in real time at the client side. Their navigation system relies on several different kinds of pre-processing, such as space subdivision, visibility computation, as well as a method for computing some parameters used to efficiently select the appropriate level of detail of the objects in the scene. Both deciding on the appropriate level of detail and the visibility computation are automatically performed by the graphics hardware. Krivánek et al. [2003] devised a new fast algorithm for rendering the depth-offield effect for point-based surfaces. The algorithm handles partial occlusion correctly, does not suffer from intensity leakage and it renders depth-of-field in presence of transparent surfaces. The algorithm is new in that it exploits the level-of-detail to select the surface detail according to the amount of depth-blur applied. This makes the speed of the algorithm practically independent of the amount of depth-blur. Their proposed algorithm is an extension of the Elliptical Weighted Average (EWA) surface splatting [Heckbert 1989]. It uses mathematical analysis to increase the screen space EWA surface splatting to handle depth-of-field rendering with level-of-detail. 3.4.3 Eye Tracking The new generation of eye-trackers replaces the unsatisfactory situation of former laboratory studies (with subject’s immobilisation, prohibited speech, darkness, recalibration of the method after every blink etc.) by ergonomically acceptable 61 measurement of gaze direction in a head-free condition. The new methods are fast and increasingly non-invasive. This means that eye tracking no longer interferes with the activities the participant is trying to carry out. As gaze-direction is the only reliable (albeit not ideal) index of the locus of visual attention there is an almost open-ended list of possible applications of this methodology of eye tracking. Gaze-contingent processing can be used for enhancing low-bandwidth communication, firstly by an appropriate selection of information and channels and, secondly, by transmission with high resolution of only those parts of an image which are at the focus of attention. In this way also low-bandwidth channels can be optimally exploited, e.g. in Virtual Reality applications. There is, however, a general problem on the way to realisation of most of these applications as ‘not every visual fixation is filled with attention because our attention can be directed inward, on internal transformation of knowledge’ [Challis et al. 1996], and as discussed in section 2.3.5, ‘without attention there is no conscious perception’ [Mack and Rock 1998]. This is why knowledge of the actual limits of human visual processing of information is needed to fully take advantage of this exploitation. 3.4.4 Peripheral vision This research builds on human vision work that indicates the human eye only processes detailed information from a relatively small part of the visual field [Osterberg 1935]. Watson et al. [1997a; 1997b] proposed a paradigm for the design of systems that manage level of detail in virtual environments. They performed a user study to evaluate the effectiveness of high detail insets used in head-mounted displays. Subjects performed a search task with different display types. Each of these display types was a combination of two independent variables: peripheral resolution and the size of the high level of detail inset, see Figure 3.20. The high detail inset they used was rectangular and was always presented at the fine level of resolution. The level of peripheral resolution was varied at three possible levels; fine resolution (320x240), medium resolution (192x144) and coarse (64x48). There were three inset sizes; the large inset size was half the complete display’s height and width, the small inset size was 30% of the complete display’s height and width, the final size was no inset at all. 62 Figure 3.20: Watson et al.’s experimental environment as seen with the coarse display [Watson et al. 1997b]. Their results showed observers found their search targets faster and more accurately for the fine resolution no inset condition, however it was not significantly better than the fine resolution inset displays with either medium or coarse peripheral resolutions. Thus peripheral level of detail degradation can be a useful compromise to achieve desired frame rates. Watson et al. are continuing to work on measuring and predicting visual fidelity for simplifying polygonal models [Watson et al. 2000; 2001]. McConkie and Loschky [1997; 2000] and Loschky et al. [1999; 2001] had observers examining complex scenes with an eye linked multiple resolution display, which produces high visual resolution only in the region to which the eyes are directed (Figure 3.21). Image resolution and details outside this ‘window’ of high resolution are reduced. This significantly lowers bandwidth as many interactive single-user image display applications have prohibitively large bandwidth requirements. Their recent study measured viewers’ image quality judgments and their eye movement parameters, and found that photographic images filtered with a window radius of 4.1 degrees produced results statistically indistinguishable from that of a full high-resolution display. This approach did, however, encounter the problem of keeping up with updating the multi-resolutional display after an eye movement without disturbing the visual processing. The work has shown that the image needs to be updated after an eye saccade within 5 milliseconds of a fixation otherwise the observer will detect the low resolution. These high update rates were only achievable by using an extremely high temporal 63 resolution eye tracker and by pre-storing all possible multi-resolutional images that were to be used. Figure 3.21: An example of McConkie’s work with an eye linked multiple resolution display [McConkie and Loschky 1997]. 3.4.5 Saliency models Saliency models determine what is visually important within the whole scene. It is based on the idea first advanced by Koch and Ullman [1985] of the existence, in the brain, of a specific visual map encoding for local visual conspicuity. The purpose of this saliency map is to represent the conspicuity, or saliency, at every location in the visual field by a scalar quantity, and to guide the selection of attended locations, based on the spatial distribution of saliency [Wooding 2002]. A combination of the feature maps provides bottom-up input to the saliency map, modelled as a dynamical neural network. The system developed by Itti et al. [1998] attempts to predict given an input image in which Regions Of Interest (ROIs) in the image will automatically and unconsciously draw your attention towards them. This biologically inspired system, takes an input image which is then decomposed into a set of multi-scale neural feature maps that extract local spatial discontinuities in the modalities of colour, intensity and orientation [Itti et al. 1998]. All feature maps are then combined into a unique scalar saliency map that encodes for the salience of a location in the scene irrespective of the particular feature that detected this location as conspicuous. A winner-takes-all neural network then detects the point of highest salience in the map at any given time, and draws the focus of attention towards this location. In order to allow the focus of attention to shift to the next most salient 64 target, the currently attended target is transiently inhibited in the saliency map. This is a mechanism that has been extensively studied in human psychophysics and is called the inhibition-of-return [Itti and Koch 2001]. The interplay between winner-takes-all and inhibition-of-return ensures that the saliency map is scanned in order of decreasing saliency by the focus of attention, and generates the model’s output in the form of spatio-temporal attentional scanpaths, as can be seen in Figures 3.22 and 3.23. Figure 3.22: How the saliency map is created from the feature maps of the input image [Itti 2003a]. Figure 3.23: Diagrams to show how the saliency model has inhibited the first fixational point that the system has highlighted as the most salient target, so the next most salient point can be found. Left – image showing the first fixation point, right – the corresponding saliency map with the fixation point inhibited [Itti 2003b]. 65 Yee [2000] and Yee et al. [2001] presented a method to accelerate global illumination computation in pre-rendered animations by adapting the model of visual attention by Itti and Koch to locate regions of interest in a scene and to modulate spatiotemporal sensitivity (Figure 3.24). They create a spatiotemporal error tolerance map [Daly 1998], constructed from data based on velocity dependent contrast sensitivity, and saliency map [Itti and Koch 2000] for each frame in the animation. The saliency map is, as previously stated, obtained by combining the conspicuity maps of intensity, colour, and orientation with the new addition of motion. This then creates an image where bright areas denote greater saliency, where attention is more likely to be drawn (Figure 3.24c). An Aleph map is then created by combining the spatiotemporal error tolerance map with the saliency map (Figure 3.24d). This resulting Aleph map is then used as a guide to indicate where less rendering effort should be spent in computing the lighting solution and thus significantly reduce the overall computational time to produce their animations, Yee et al. [2001] demonstrated a 4-10 times speedup over the time it would have taken to render the image in full. Figure 3.24: (a) Original Image. (b) Image rendered using the Aleph map. Figure 3.24: (c) Saliency map of the original image. (d) Aleph map used to re-render the original image [Yee 2000]. 66 Haber et al.’s [2001] perceptually-guided corrective splatting algorithm presents an approach for interactive navigation in photometrical complex environments with arbitrary light scattering characteristics. The model operates by evaluating a projected image on a frame-by-frame basis. At the preprocessing stage they use a Density Estimation Particle Tracing technique [Volevich et al. 2000], which is capable of handling surfaces with arbitrary light scattering functions during global illumination computation. They also perform mesh-to-texture post-processing [Myszkowski and Kunii 1995] to reduce the complexity of the mesh-based illumination maps. During interactive rendering, they use graphics hardware to display the precomputed view-independent part of the global illumination solution using illumination maps. Objects with directional characteristics of scattered lighting are ray traced in the order corresponding to their saliency as predicted by the visual attention model developed by Itti et al. [1998]. However they extended the original purely saliencydriven model of Itti et al. to also take into account volition-controlled and taskdependent attention, Figure 3.25. The attention model developed by Itti et al. was originally designed for static images. However, in interactive application scenarios, the user’s volitional focus of attention should also be considered, and this is what Haber et al. try to capture by the addition of their top-down model. A common observation being that the user tends to place objects of interest in the proximity of the image centre and to zoom in on those objects in order to see them in more detail. Therefore in their approach, which is shown on the right side of Figure 3.25, they determine a bounding box for every non-diffuse object using a unique identification code in the stencil mask and measure the distance between the bounding box centre and the image centre. They normalize the obtained distance with respect to half the length of the image diagonal. They also consider the object coverage in the image plane measured as the percentage of associated pixels in the stencil mask with respect to the number of pixels in the whole image. Although these two factors are often considered as bottom-up saliency measures for static images, Haber et al. argue that for interactive applications their meaning changes towards more task-driven top-down factors. 67 Figure 3.25: General architecture of the attention model. The bottom-up component is a simplified version of the model developed by Itti et al. [1998]. The top-down component was added to compensate for task-dependent user interaction [Haber et al. 2001]. A hierarchical sampling technique is then used to cover the image region representing an “attractive” object rapidly with point samples. These point samples are splatted into the frame buffer using a footprint size that depends on the hierarchy level of the samples. To ensure that this corrective splatting affects only those objects, for which the samples have been computed, they use a stencil test feature of the graphics hardware together with the stencil mask that has been created during rendering. Sample caching can then be performed to reuse the samples computed for previous frames. 68 Rendering of the illumination maps, corrective splatting, and the evaluation of visual attention models are all implemented as independent and asynchronously operating threads, see Figure 3.26, which perform best on multi-processor operation platforms. Their implementation delivers good results when interactively navigating through scenes of medium complexity at about 10 fps. Figure 3.26: Data flow during rendering for the Haber et al. model. The dotted lines depict data flow between different threads. The following abbreviations are used: FB = frame buffer, SB = stencil buffer, PQ = priority queue, SQ = sample queue. The number of ray tracing threads depends on the number of processors available [Haber et al. 2001]. Marmitt and Duchowski [2002] looked at the fidelity of using such a model of visual attention to predict the visually salient features in a virtual scene. Itti et al.’s model [1998] had been previously shown to be quite accurate over still natural images [Privitera and Stark 2000]; however it was not clear how well the model generalises to the dynamic scene content presented during virtual reality immersion. Marmitt and 69 Duchowski’s analysis showed that the correlation between actual human and artificially predicted scan-paths was much lower than predicted. They hypothesise that the problem may lie in the algorithm’s lack of memory, for each time the algorithm is run, as far as it is concerned, it is presented with a completely new image. In contrast a human has already seen most parts of the image and is therefore free to distribute visual attention to new areas, even though, according to the algorithm’s saliency map, the areas may appear to the model as less interesting. Therefore such systems as Yee et al. [2001] and Haber et al. [2001] propose may need some more refining to foster closer correspondence to actual human attentional patterns for use with rendering dynamic virtual reality scenes. 3.5 Summary This chapter firstly described how computer graphical images are created using both local and global illumination models; as well as the advantages and disadvantages to both models. As will be discussed in Chapters 4 and 5 two different modelling packages were used in this thesis, that of Alias Wavefront’s Maya [Maya 2004], a ray tracing model, and Radiance [Ward Larson and Shakespeare 1998], a global illumination model that employs a ray tracing strategy which encompasses a hybrid deterministic/stochastic approach to efficiently solve the rendering equation. The last section of the chapter showed how the two disciplinary subjects, human vision and realistic computer graphical image synthesis, can be combined. Using the knowledge of human visual perception, time can be saved computationally in producing computer graphical images, by rendering only what humans truly perceive. Pioneering research that has been performed in this application of visual perception to computer graphics has mainly exploited the bottom-up visual attention process as discussed in Section 2.3.2. This work has included using knowledge of the human visual system to improve the quality of the displayed image. The methodology which is used in this thesis is closest to that proposed by Yee et al. [2001], Haber at al. [2001] and McConkie et al. [1997], with the crucial difference that this research directly exploits top-down visual attention processing rather than the bottom-up process. 70 Chapter 4 Change Blindness There is a need for realistic graphics in real time. Creating such realistic graphics takes a significant amount of computational power. There are many different ways to approach this problem. Banks of machines can be used to render realistic images, for example Pixar use their ‘Renderfarm’ to render the frames for their films, see Figure 4.1, but even with this huge computer system it still takes on average 6 hours to render a single frame with some of the frames taking up to 90 hours to be rendered [Pixar 2004]. Their ‘Toy Story’ film took over two years to render on the 100 SGI workstations ‘Renderfarm’, which extrapolates to approximately 80 years to render the whole film on a single contemporary PC [Pixar 2004]. Thus improvements in rendering hardware and adaptations to the actual rendering algorithms have been popular topics of recent research in this field in the past decade or more. From an in-depth study of the human visual system and what had previously been done in this field it was noted that nobody at the time had looked at, from a graphical rendering point of view, two particular flaws of the human visual system which cause the inability to notice changes or differences in large parts of a scene when suffering from a visual disruption or when attention is focussed on a particular task. These are Change Blindness and Inattentional Blindness, both discussed in detail in Chapter 2. The goal of this thesis was first to find out whether either of these flaws even occurred when viewing computer graphical images as they do in real life, this chapter, 71 then to find out to what extent they could be exploited (Chapter 5), and finally to design and produce a selective rendering framework based on the findings (Chapter 6). Figure 4.1: Pixar’s ‘Renderfarm’ is used to render the frames for their films [Pixar 2004]. 4.1 Introduction As discussed in section 2.3.3 ‘Change Blindness is the inability of the human to detect what should be obvious changes in a scene’ [Rensink et al. 1997]. If a change occurs simultaneously with a brief visual disruption, such as an eye saccade, flicker or a blink that interrupts attention then a human can miss large changes in their field of view. The onset of a visual disruption swamps the user’s local motion signals caused by a change, short-circuiting the automatic system that normally draws attention to its location. Without automatic control, attention is controlled entirely by slower, higher-level mechanisms in the visual system that search the scene, object by object, until attention finally lands upon the object that is changing. Once attention has latched onto the appropriate object, the change is easy to see, however this only occurs after exhaustive serial inspection of the scene. The same experimental procedures, both the ‘Flicker’ and ‘Mudsplash’ paradigms, were used as proposed by Rensink et al. [1997], but instead of using photographs as in the case of Rensink et al. this work used graphically rendered scenes and timed how long it would take observers to find the object that had been rendered to a lower quality without pre-alerting them to the location of the change. If the experiment was positive then further action could be taken using this idea for dynamic 72 scenes. Results of this work were presented firstly as a sketch at ACM SIGGRAPH 2001 in Los Angeles [Cater et al. 2001] and then as a full paper at Graphite 2003 in Melbourne [Cater et al. 2003b]. 4.2 The Pre-Study To clarify which aspects of the scene were to be altered the initial experiment involved seven images depicting a variety of scenes, rendered with Radiance [Ward Larson and Shakespeare 1998]. A group of six judges, 3 female and 3 male with a range of ages and amount of computer graphics knowledge, were asked to look at each scene for 30 seconds and then give a short description of what they could remember was in the scene. This was based on the experimental procedure described by O’Regan et al. 1999a, however in the literature; how the judges were asked what they could remember was not defined exactly. Thus, as will be seen in the next few pages, several attempts were made before the best solution was achieved. The judges’ descriptions enabled the definition of several aspects that were termed Central Interest aspects [O’Regan et al. 1999a] for each scene. These are defined as those aspects that are mentioned by at least three of the judges. Central Interest (CI) aspects tend to concern what one would be tempted to call the main theme of the scene. Similarly, several aspects of each of the scenes that [O’Regan et al. 1999a] term Marginal Interest aspects were noted. Marginal Interest (MI) aspects are defined as those that are mentioned by none of the judges and were generally parts that constituted the setting in which the main ‘action’ of the scene could take place. Thus, by manipulating whether changes caused are to occur to CI or MI aspects in the experiments, the degree of attention that our subjects were expected to pay to the changes could be controlled. Figure 4.2 shows the results from the first two judges’ responses to viewing the images in Figure 4.3. Each observer was asked to list the aspects that he/she could remember after having seen the image for 30 seconds. The images were displayed to the viewers randomly so there was no bias on the ordering to any of the images. From these initial results it was realised that the observers were being subconsciously forced to list only the main objects in the scene and not the aspects of 73 the scene. Due to this fact certain parts of the scene would always be missed out. Evidence in attention literature suggests that attention may perhaps be better be described, not in terms of space, but in terms of objects: that is not due to the basis of spatially defined connecting regions in the visual field, but rather to collections of attributes that can be grouped together to form objects [Baylis and Driver 1993, Duncan and Nimmo-Smith 1996]. Despite the literature on this topic O’Regan proposes that the term ‘object’ seems unsatisfactory; ‘presumably an observer can, for example, attend to the sky (which is not really an object), or to a particular aspect of a scene (say its symmetry or darkness) without encoding, in detail, all the attributes which compose it’ [O’Regan 2001]. Until further research is done in this area O’Regan suggests that it is safer to suppose attention can be directed to scene ‘aspects’ rather than ‘objects’. Judge 1 DOB: 24/04/73 Occupation: Teacher Glasses/Contacts: Contacts Time of Day: 2.40pm Wednesday 11th July Female/Male: Male Image 1 Stripped Box, Mirror, Yellow Ball, Glass with Straw, Red Cup, White Beaker Image 2 Standing Lamp, Speaker, CD player, Table, Black Tube, CD, Carpet Image 3 Chest of draws, Candle, Bowl, Wine Glass, Mirror, Door Image 4 Mantelpiece, Two candles, Wine Bottle, Wine Glass, Picture with Boat and a church Image 5 Picture, Two candles, Wine Bottle, Wine Glass, Black Fire Guard, Another Candle, Mantelpiece Image 6 Bed Frame, Chest of Draws, Wine Bottle, Wine Glass, Lamp, Beaker Glass, Wardrobe, Pencil Holder, Ball Image 7 Box with Stripes, Beaker, Red Cup, Mirror, Glass with Straw, Yellow Ball Judge 2 DOB: 18/01/77 Occupation: Research Assistant Glasses/Contacts: None Time of Day: 11.20am Wednesday 11th July Female/Male: Female Image 1 Glass with Straw, Orange and Green Box, Dark Ball, Orange thing on the left hand side (maybe some kind of plastic cup), Another Ball, Mirror Image 2 Table, CD, Amplifier, Two Beakers (one dark one light), Speaker and Stand, Lamp Image 3 Chest of draws, Candle, Bowl, Glass, Mirror, Door Image 4 Fireplace, Mantelpiece, Two candles, Picture with Boat, White Cliffs and a House, Wine Glass, Bottle Image 5 Fireplace, Mantelpiece, Two candles, Bottle & Glass on mantelpiece, Candle in front of the Fireplace, Picture, Skirting Boards, Black bit in front of the Fireplace. Image 6 Bed Frame, Chest of Draws, Lamp, Bottle, Glass, Beaker, Red Thing, Wardrobe Image 7 Glass with Straw, Yellow Ball, Red Beaker, White Beaker, Mirror, Green and Orange Box Figure 4.2: Table to show the aspects listed in each of the images by Judges 1 and 2. 74 Image 1 Image 2 Image 3 Image 4 Image 5 Image 6 Image 7 Figure 4.3: Images 1 to 7, used initially to determine the central and marginal interest aspects in the scenes. 75 Due to the experimental setup being based on that of Rensink and O’Regan’s work it was best to keep to the construct of trying to get the judges to list the aspects in the scene rather than just the main objects, thus the judges pre-study was re-run. It was also thought that 30 seconds was too long as it was noticed that the observer’s attention was starting to wander to other things than the image on the screen. The judge’s prestudy was thus re-run decreasing the viewing time to 20 seconds and the viewers were asked to describe the scene and not list the aspects that they could remember. This was thought would give the opportunity for the observers to describe more about the scene. As can be seen in Figure 4.4, very different responses were achieved. Judge 3 Image 1 Image 2 Image 3 Image 4 Image 5 Image 6 Image 7 DOB: 29/06/54 Occupation: Personal Assistant Glasses/Contacts: Glasses Time of Day: 2.15pm Monday 16th July Female/Male: Female I would say these things in the scene are stuff that you would take on holiday with you, or are mainly outside objects. From what I can remember there was a ball, a striped cushion to sit on, a stick, and a mirror, however I’m not sure why a mirror would be in an outdoor scene. This is of a corner of a room. There is a record player on a table, a glass vase, a speaker and it’s a light room so maybe there is a window nearby that’s not in the picture or maybe a light stand but I’m not sure. Again the corner of a room, with a side door opening into the room. A mirror reflecting the candle, blue glass and another object on the chest of draws but I can’t remember what it is. There’s a table, possibly a coffee table with a mosaic appearance and burnt orange tiles on the front of it. There’s a picture on the wall of the sea with a yacht and a lighthouse, at first glance though I thought it was a submarine in the sea. There are candles burning, which reflect light behind onto the wall and picture, and something between the two candles but I don’t know what it is. There is a mantelpiece and a fireplace. There is a picture of the sea and a boat above the mantelpiece. There is a single candle on the floor projecting light up under the picture. I can’t remember if I saw a bottle or a glass, but something related to drinking! This is a bedroom scene, there’s an end of a iron bed without a mattress. There is a bedside table with three draws on top of which is a glass and a modern light with a shade. This table backs onto a single cupboard or a wardrobe that is white. There were some other objects on the table but I’m not sure what they were. Saw a similar scene before, the one with the striped cushion. This has in it a circular red object, a yellow ball, a stick, a mirror reflecting the scene. I would now say that this scene is of modern furniture in a room not the outside scene I originally said it was before. Figure 4.4: Table to show the aspects listed in each of the images by the Judge 3. The results received were not what were expected. Whether or not it was just due to this one observer it was thought best to develop a more robust method without influencing the judges’ decision or by relying on their memory of the scene. Thus, after more discussions with the psychologists it was decided that the best way to run this prestudy with the judges was to sit them down in front of the images and get them to verbally describe each of the scenes with unlimited viewing time. It was also decided that the images should be changed slightly to include a greater range of scenes. Thus the 76 final 14 scenes that were used in the actual experiment, Figure 4.5, were created. (These 14 scenes were altered according to the judges’ responses to make 21 different sets of images, hence why some of the images in Figure 4.5 have two or three numbers to reference them.) The verbal responses of the judges were then transcribed; an example of the responses received can be seen in Figure 4.6. The whole set of judges responses can be seen in Appendix A.1. Image 1 Image 4 & 20 Image 2 Image 5 & 11 Image 3 & 19 Image 6 77 Image 7 Image 12 & 21 Image 8, 14 & 18 Image 15 Image 9 & 13 Image 16 Image 10 78 Image 17 Figure 4.5: Images 1 to 21, used to work out the central and marginal interest aspects in the scenes with Judges 4 – 9, as well as in the final experiment. Judge 4 Image 1 Image 2 Image 3 & 19 Image 4 & 20 Image 5 & 11 Image 6 DOB: 14/04/75 Occupation: PhD Student – Geography Glasses/Contacts: None Time of Day: 5.30pm Friday 20th July Female/Male: Female It’s an office but it looks a kitchen. It’s got two chairs, one that swivels and one that’s rigid with 4 legs. umm. It’s got a mottled floor and then it’s got a set of three draws under the desk and it' s got a picture with a green blob in the middle, which looks like it’s meant to be a tree. It’s got two yellow cupboards on the right hand side mounted up on the walls. There’s a light on a stand thing, which bends over beaming onto something and a cup of tea or something, I think it’s a cup of tea. Looks like a wine bar, a really snazzy wine bar. It has two stools and a table in the middle, which is rigid with two legs. The floor is tiled white and red. The walls are very dark; the ceiling is dark as well. Then in the centre of the picture there is the mirror behind the table, which is reflecting a vase, a red vase, with a flower in it. And either side of that there are two lights and there are two lights on the ceiling. It’s three spherical objects on a piece of text and the three objects look like a tennis ball an orange, but it’s not an orange, and a Christmas ball thing. It’s all set at an angle so the text gets smaller as it moves away. It’s a close up of a rectangular box, a rectangular box with red and green stripes. In the background looks like a cylindrical vase, I don’t know what it is but looks like it has a chop stick in it. Err and there’s a round yellow snooker cue ball with a mirror at the back of it. Looks like a really snazzy office or cafe. You look through the door and you are looking at two yellow chairs back to back and a table in front of it with a wine glass on it. To the right is a picture but you can see it behind another yellow chair. Right at the very back in the centre of the wall is a kind of rainbow picture mounted on the wall. On the ceiling there is ummmm a strip light running down the centre with tiny little lights either side of the room. And then there is some strange art works. On the back on the right is a kind of red base with a swirly art piece. On the other left hand wall is some other artwork mounted like a curly stepladder thing and the carpet is grey. This is a further away view of the other image and again it has the rectangular giftwrapped box, green and red striped. umm. It’s got what looks like a top to a shaving foam red top or a hairspray can, and then it’s got a white empty cylindrical shape hollow cylindrical shape on top of the box. And behind that looks like it’s a plastic glass with a peachy straw in it. To the left of that is a yellow cue ball on top of a women’s vanity mirror, which has been opened up. 79 Image 7 Image 8, 14 & 18 Image 9 & 13 Image 10 Image 12 & 21 Image 15 Image 16 Image 17 Rory’s dinner. An empty dinner umm with a long bar with 5 pink stools, kind of rigid stools umm. The walls are vertical stripy kind of lilac and beige colour, brown beige ummm. At the very front there is kind of seating area which surrounds a table with blue chairs. umm Behind the counter of Rory’s cafe is kind of a glass cabinet that’s empty but I would imagine it would have food in it and a row of glasses at the back. And to the left of the image is a beige cylindrical object, which is holding up the building structure. Ummm same as the other one but closer ummm the thing that I thought was a bit odd was the office chair in the picture the base of it with the wheels the metal base is the same as the floor and that looks a bit odd. Then there is the office desk same as before and there is the reflection of the light coming down onto a book. Umm and a pink coffee mug and that’s it. Err looks like a scene in a kitchen with a close up of a unit. With in the very front a kind of empty cookie jar and behind that is a coffee grinder and a handle on top, which you twist, but the handle looks a bit odd. I don’t know the handle looks like it is sitting on top of the cookie jar. Anyway umm there are three mugs or two mugs and one glass on the panel behind, on the shelf behind umm and then ... and that’s it oh but it looks quite a pink blotchy image. Err again it looks like a funky office with yellow chairs with umm you are drawn to the scene at the back with the main table with the curved legs and it has two tea cups either side so it looks like people have just got up and left. There’s two chairs positioned not facing each other just slightly apart, slightly to the side of one another. Umm in front of that there is a yellow sofa on the right hand side and umm at the back there is writing on the wall looks like something gallery, art gallery? Umm and to the right of that is the window with the light shining through onto the picture. Umm there’s a series of lights on the walls mounted at head height but they don’t look on and then through to the left oh umm sorry still in that room there’s a blue picture mounted on the wall which is kind of blobby and 3D. And through that door is another sort of gold/brass-y looking similar kind of blobby picture 3D picture mounted on the wall and another yellow chair. Umm the image on the left hand side is a dark area which I’m not sure what it is but to the left of that is a free standing lamp on a long white stand which is switched on and umm and then there is a speaker which is free standing on the floor with two legs. And then on the table, a brown table is an amplifier that looks a bit featureless except for the volume umm err and then there’s goodness and another cylindrical object, which looks like a toilet roll, which is black. Looks like a scene from a bedroom it’s in the corner of a bedroom with a creamy coloured table and creamy coloured walls with a picture of a sailing boat and a beach in the background. Umm On the table there are two creamy coloured candles and what looks like a CD cover and an orange object - I think it’s a glass then to the right of that looks like and empty wine bottle which is green. Umm a sort of pine table with a strange feature which I don’t really know what it is, with a brassy base which curves up which has three stages which look like miniature chess boards red and white. Umm then on the table in front of that there is a glass full of red wine. Two blue chairs around the table but the chair furthest away the back of the chair and the legs don’t quite match. I don’t know if it’s just me but umm that' s it apart from a really non-descript background. Err same image as before but from a different angle. You can see more of the picture umm apart from that I don’t know what else, there is a chord to the left of the picture but apart from that I don’t know what’s really different apart from a different angle and further away and the shadows are falling to the right hand side of the picture. Figure 4.6: Table to show the aspects listed in each of the images by Judge 4 (the rest of the results are contained in Appendix A.1). The results that were received from the final 6 judges (judges 4-9 in Appendix A.1) allowed for the ability to choose the Central and Marginal Interest aspects for each of the scenes. It was deemed important to have a wide selection of sets of images to show to the observers, so that each participant got to see a different set of 10 images, thus there could be no experimental bias to the scenes and the type of alteration made, 80 whether they were viewing the flicker paradigm or the mudsplash paradigm. In the end 21 possible images were created all of which were used for both paradigms, each subject seeing 10 of these sets of images (half used the mudsplash paradigm and half the flicker paradigm). However, image 21 was only used as an example for the observers of the experimental setup, see instructions sheet Appendix A.2, and thus was not used in the actual experiment, i.e. in the 10 images shown to the participants. The MI changes were carefully chosen to be as similar as possible in visual conspicuity as the CI changes. This was measured by the number of pixels that were changed, and by the size and position within the scene. The mean centroid of the CI locations were at (x,y) pixel coordinates (-17 ± 63, 19 ± 100) relative to the centre of the screen, and at coordinates (2 ± 177, 108 ± 135) for the MI locations (the ± values are standard deviations). The resolutions of the images were 512 x 512, 512 x 768 or 768 x 512, depending on the scene structure. The nature of the change between the original and modified pictures was very strictly controlled so as to ensure that there was hardly any difference in visual conspicuity of the change for the CI and MI changes. Thus the mean proportion of the modified pixels in the picture was 8.2% ± 5 and 10% ± 7, for CI and MI changes respectively. The mean Euclidean distance in the RGB values across all the changed pixels was 21 ± 16 and 38 ± 31 respectively for CI and MI changes. Aspect Changes Image 1 No Alteration Image 2 Central Interest Aspect: Rendering Quality Alteration to the chequered red and white tiled floor. Image 3 Central Interest Aspect: Rendering Quality Alteration to the Christmas bauble with its reflection of the room and the print it is sitting on. Image 4 Central Interest Aspect: Rendering Quality Alteration to the yellow round snooker ball. Image 5 Central Interest Aspect: Location Alteration, Movement forwards of the yellow chairs, which are back to back, and the low modern coffee table with the silver glass on it. Image 6 Marginal Interest Aspect: Presence Alteration, Removed the purple/lilac pool ball. Image 7 Marginal Interest Aspect: Rendering Quality Alteration to the tiled beige floor. Image 8 Marginal Interest Aspect: Rendering Quality Alteration to the rigid, four-legged blue chair. Image 9 Central and Marginal Interest Aspects: Rendering Quality Alteration to the coffee pot (central) and the shiny marble kitchen work surface (marginal). 81 Image 10 Marginal Interest Aspect: Rendering Quality Alteration to the left hand sidewall. Image 11 No Alteration Image 12 Marginal Interest Aspect: Rendering Quality Alteration to the carpet. Image 13 Marginal Interest Aspect: Rendering Quality Alteration to the shiny marble kitchen work surface. Image 14 Central and Marginal Interest Aspects: Rendering Quality Alteration to the blue, fourlegged chair (marginal) and the yellow swivel chair (central). Image 15 Marginal Interest Aspect: Location Alteration, Movement of Toy Story video from side to side. Image 16 Central Interest Aspect: Presence Alteration, Removed the wine glass half full of red wine from the table. Image 17 Central Interest Aspect: Rendering Quality Alteration to the trolley with all the objects on it. Image 18 Central and Marginal Interest Aspects: Rendering Quality Alteration to the pink coffee mug (central) and shiny gold table post (marginal). Image 19 Central Interest Aspect: Rendering Quality Alteration to the Sunkist orange. Image 20 Marginal Interest Aspect: Rendering Quality Alteration to the white tube. Image 21 Central Interest Aspect: Rendering Quality Alteration to the black speaker and its stand. Figure 4.7: Table to show the aspects that were altered in each of the images. There were four different types of alteration made to the CI or MI aspects selected. Figure 4.7 describes the aspects that were selected to be altered and the alterations made for each of the 21 images. The list of possible alterations was as follows: • A rendering alteration - part of the image has been rendered to a lower quality causing any of the following properties. An example of this can be seen in Figure 4.8, before and after alteration. o An aliasing effect - this is where a jaggy edge can be seen to an area of the scene due to the digital nature of the pixel grid. o A shadow generalization - this is when a shadow has been simplified so that only a harsh shadow is present instead of the more realistic penumbras with the umbras too. o A reflection generalization - this is when a reflection has been simplified so that only a sharp reflection exists instead of a more realistic smoother one. 82 • A location alteration - part of the image has been moved in some way i.e. its location has been moved compared to the original image. • A presence alteration - part of the image has been removed in some way i.e. it is no longer present in the image. An example of this can be seen in Figure 4.9, before and after alteration. • And No alteration at all. Presence and location alterations were made so that some comparison could be made of the timing differences between not only these two, but mainly with the rendering times. Both of these alterations, presence and location, were also tested by O’Regan et al. [1999b] so a comparison with their results could be achieved. Figure 4.8: a) Original Image b) Modified Image - here a Marginal Interest aspect of the scene has been replaced with a low quality rendering (the left hand wall with light fitting), thus a rendering alteration has been made. Figure 4.9: a) Original Image b) Modified Image - here a Central Interest aspect of the scene has been removed in a presence alteration (the wine glass). 83 4.3 Main Experimental Procedure Before each experiment was started the observer was asked to fill out the first part of a simple questionnaire, which asked about personal details such as date of birth, occupation, whether or not they wore glasses or contacts, their sex and their self rating of computer graphics experience. Also noted was the time of day, and whether or not they were given the list of possible alterations made to the images. The reason for these questions was to achieve minimum bias. For example, there were three sessions over which the experiments were run, eight observers in each session. This was so that there was no bias due to the time of day. Also there were approximately 50% females spread throughout the sessions, and also a mix of occupations, ages and computer skills. The observers were then given a written description of what the experiment entailed and two printed examples, of the experiments that would be run, one of the flicker paradigm and one of the mudsplash paradigm, Appendix A.2. This was to make sure that each observer received the same instructions. To maintain the integrity of the experiment, observers were also asked not to discuss with anyone what they did in the experiment room until all the participants had performed the experiment. Each observer also did not receive any feedback on how well they were doing until all of the experiments were completed. Figure 4.10a & b show examples of the experimental setup that was conducted with 24 subjects under strictly controlled viewing conditions. The monitor was situated 45cm away from where the observer was sitting and it was situated such there was no glare on the screen from any light sources in the room. A fixed seat was used so that the observer could not change the angle at which the images were being viewed, also guaranteeing the same viewing distance for all observers. Figure 4.11(a) shows a scene rendered in Radiance to a high level of realism, while 4.11(c) is the same scene rendered with less fidelity for some objects and at significantly reduced computational cost. Figure 4.11(b) shows the same image as 4.11(a), but with mudsplashes and likewise 4.11(d) shows the same image as 4.11(c), but with mudsplashes. Each original image was displayed for 240 milliseconds followed by the mudsplashed image for 290 milliseconds followed by the modified image for a further 84 240 milliseconds and finally the modified image with mudsplashes for 290 milliseconds. This sequence was continually repeated until the user spotted the change and said ‘stop’. If no change had been observed after 60 seconds the experiment for that particular set of images was terminated. Similar experiments were carried out with the flicker paradigm. The flicker paradigm was when a totally ‘medium’ grey image was substituted for the mudsplashed images, as demonstrated in Figure 4.11(e). The high quality image in Figure 4.11a took 27 hours to render in Radiance, yet to render the whole of the same image to low quality took only 1 minute. As Radiance did not at the time have a facility for selecting particular objects to be rendered at different qualities Adobe Photoshop was used to create the selectively rendered images. Each subject saw 10 of these sets of images. As stated before these were completely randomised from the 20 possible sets of images and amongst these were some ‘red herrings’ i.e. there were some sets where no alteration was made to the images. This was to make sure that the observers only said stop when they saw an actual alteration. After the observer said ‘stop’ the time was noted and a verbal description of their perception of the alteration was given. The observers were asked to say ‘stop’ when they noticed the first alteration always and not to look around to see if there were any more alterations. By following this criterion it was possible to see whether, if faced with two or more alterations, which of them were more easily attended to. If the observers did not say ‘stop’ after 60 seconds it was declared that the participant had seen no alteration in the images. Half of the observers were given the complete list of the particular alterations possible beforehand and the other half were not. Figure 4.10 (a & b): Photographs showing the experiments being run. 85 Figure 4.11 a) High quality image. b) High quality image with mudsplashes. Figure 4.11 c) Selective quality image (look at the surface of the tabletop compared to Figure 4.11a). d) Selective quality image with mudsplashes. Figure 4.11 e) The ‘medium’ grey image used in the flicker paradigm. f) The ordering of the images for the two paradigms. 86 4.4 Results The results showed that a significant time was taken to locate the rendering differences even though the observers knew that a change in the images was occurring. Therefore as will be discussed in this section, these results show that Change Blindness occurs in computer graphical images as it does in real life, [Cater et al. 2003b]. Figure 4.12 shows the overall results in a graphical format. The numbers above the columns give the time taken for the observers on average to notice the difference in seconds. The data was categorized according to the type of alteration that the observer had seen, i.e. whether it was a rendering change, a location change or a presence change, as well as which type of paradigm that was used. The blue columns show the time taken for the alteration of CI aspects and the maroon columns show the alteration of MI aspects. The graph shows, as expected, modified CI aspects were always found quicker than modified MI aspects. This is due to the fact that a human’s attention is naturally drawn to CI aspects rather than to MI aspects. When searching the scene for the alteration, each observer will naturally search all the CI objects first before searching the rest of the scene exhaustively and thus attending to the MI aspects. Figures 4.13 and 4.14 both show that most participants detected a change in less number of cycles of the images when viewing with the mudsplash paradigm than when viewing the images with the flicker paradigm. This is most likely due to the fact that mudsplashes provoke a minor disturbance compared with that of the flicker paradigm, due to mudsplashes covering a smaller percentage of the image. However, it is still important to note that the time taken to notice the change for the mudsplash conditions was still significantly impaired relative to having no interruption at all [Rensink et al. 1999]. This demonstrates that Change Blindness is not caused by a covering up of what is changing, flicker paradigm, but is actually due to suffering from a visual disruption in any part of the image, mudsplash paradigm, regardless to whether or not it covers the aspect that is changing. In Figure 4.15 an example can be seen of the rendering alteration made with the flicker paradigm, this pair of images actually took the participants the longest time on average to spot the change, 36 seconds or 34 cycles of the images, even though the alteration was one of the largest taking up nearly 20% of the image. 87 25 20.95 Time (seconds) 20 14.74 15 10 4.90 5 6.28 5.83 5.51 4.35 2.54 4.77 3.82 2.01 1.59 0 Rendering Flicker Paradigm Rendering Mudsplash Paradigm Location Flicker Paradigm Central Interest Location Mudsplash Paradigm Presence Flicker Paradigm Presence Mudsplash Paradigm Marginal Interest Percentage of observers (%) Figure 4.12: Overall results of the three different types of alterations made during the experiment, Rendering, Location and Presence. 30 25 20 15 10 5 0 1 2 3 4 5 Central Interest 6 - 10 11 15 16 20 21 25 > 25 Marginal Interest Figure 4.13: Number of cycles of the images needed to detect the rendering quality change in the Flicker paradigm. 88 Percentage of observers (%) 35 30 25 20 15 10 5 0 1 2 3 4 5 Central Interest 6 - 10 11 - 15 16 - 20 21 - 25 > 25 Marginal Interest Figure 4.14: Number of cycles of the images needed to detect the rendering quality change in the Mudsplash paradigm. Figure 4.15: a) Original Image 7 b) Modified Image 7 - here a Marginal Interest aspect of the scene has been replaced with a low quality rendering, the whole of the tiled floor has been replaced. The most important finding from these set of results is the increase in time taken for the rendering alterations to be noticed in comparison to the time taken for the presence and location alterations, Figure 4.12. This is an increase of on average over 8 times for CI aspects and on average over 4.5 times for MI aspects for the flicker paradigm, and on average over 1.5 times for the CI aspects and on average 1.2 times for MI aspects for the mudsplash paradigm. This proves that a significant time is needed for observers to notice a difference in rendering quality, even when changes occupy large parts of the image (up to 22 visual degrees, when the participants were seated 45cm away from the screen). This increase in time taken could be due to the fact that it is more difficult to spot a rendering change than a presence or a location change simply 89 because in both of these latter cases, the mean Euclidean distance in the RGB values and the intensity values across all the changed pixels are a lot greater than in a rendering case. The mean intensity change of the changed pixels was only 4 ± 14 and 38 ± 31 for the CI and MI rendering quality changes respectively. However in the presence and the location changes the mean intensity change was 131 ± 52 and 146 ± 67. The mean Euclidean distance in the RGB values across all the changed pixels was 21 ± 16 and 38 ± 31 respectively for CI and MI rendering quality changes. However, in the presence and the location changes, the mean Euclidean distance in the RGB values across all the changed pixels was 106 ± 34 and 98 ± 28. This increase in the mean intensity and RGB values is obviously because, in the presence or location cases an aspect is being completely taken away or moved to a new location and thus a completely new aspect with a new mean intensity and RGB value is now located in its place. What is important, though, is that this increase actually exists and in designing a selective renderer taking away aspects of scenes is not of interest, only that of rendering the aspects to a significantly lower quality at a much reduced computational cost. Thus if it takes a long time to spot a rendering quality alteration due to the fact that the RGB and intensity values haven’t changed a lot or whether it is due to another cause, it can still be concluded that this theory is plausible to be used in designing a selective renderer. This renderer will save time computationally by selectively rendering the scene to different qualities without the observers detecting a difference as they will be suffering from Change Blindness. Compared with those of O’Regan et al. [1999a] and Rensink et al. [1999], the results for presence and location were graphically similar, showing that computer generated images may be used when exploiting Change Blindness. As the full results are not displayed in any of their papers a statistical analysis of how similar these results are cannot be carried out. By comparing the results of this experiment for the location and presence changes with the mudsplash paradigm with the results approximately reproduced from Rensink et al. [1999] a visual similarity however can be seen, Figure 4.16. 90 25 Time (seconds) 20 15 10 5 0 Location Presence Type of Alteration made using Mudsplash Paradigm Central Interest [Rensink et al.] Central Interest [Cater et al.] Marginal Interest [Rensink et al.] Marginal Interest [Cater et al.] Figure 4.16: Comparison of the results produced in this experiment with the results reproduced from Rensink et al. [1999] for location and presence with the mudsplash paradigm. It was also noted that in the cases where more than one aspect of a scene was altered most of the subjects only noticed one of the aspects, and out of the choice of aspects the majority of observers attended to lower rendering of the reflections in/on shiny objects over those of a low rendering of a matt object. This suggests that a reflection generalization must be easier to see than those of a matt generalization, however this cannot be due to the fact that the visual conspicuity of the change is greater; as the mean intensity change for the altered pixels for the reflection generalization was on average –2 ± 10 for CI and 37 ± 1 for MI, with the mean Euclidean distance in RGB 18 ± 3 for CI and 37 ± 1 for MI, but for the matt generalization changes it was on average 19 ± 25 for CI and 39 ± 30 for MI intensity changes, and 35 ± 41 and 38 ± 30 for CI and MI respectively for the mean Euclidean distances in RGB. Thus the matt generalization changes were actually on the whole slightly greater and more visually conspicuous than the reflection generalizations. Therefore, the change in intensity or in RGB cannot have been the cause of alerting the observers to reflection generalization quicker than the matt generalization. It is already known from visual psychological researchers such as Yarbus [1967], Yantis [1996] and Itti et al. [1998] that the visual system is highly sensitive to features 91 such as edges, abrupt changes in colour and sudden movements, as well as expectancy and personal saliency. More research, however, needs to be done in this area, in the computer rendering sense, to work out exactly what people attend to easily – this will lead the research into calculating which properties of the scene, such as colour, intensity, aliasing effects, reflections, shadows and orientation, can be rendered to a lower quality without the observer noticing the difference. It was noted that, as expected, the half of the participants that were given the list of possible alterations gave a more accurate description of the change that they perceived, however their timings on spotting the change were not significantly different from those that were not given the list as described next. 4.4.1 Statistical Analysis Statistical analysis shows that the results are statistically significant. The appropriate method of statistical analysis was the t-test for significance since the response of the observers was continuous data from normal distributions and not binary [Coolican 1999]. However, as each person had a different random selection of the images an unrelated t-test had to be used. A t-test for unrelated data tests the difference between means of unrelated groups, typically where each observer has participated in just one of the conditions in an experiment. The t-test gives the probability that the difference between the two means is caused by chance. It is customary to say that if this probability is less than 0.05 then the difference is ‘significant’ i.e. the difference is not caused by chance. Under the null hypothesis it assumes that any difference between the means of the conditions should be zero (i.e. there is no real difference between the two conditions). A large value of t means that difference found between the means is a long way from the value of zero, expected if the null hypothesis is true. The critical value, obtained from the appropriate statistical table for that type of test, is the value that t must reach or exceed if the two conditions are significantly different, and thus the null hypothesis is rejected [Coolican 1999]. The degrees of freedom (df) collates to the notion that parametric tests, such as t-tests, calculate variances based on variability in the scores. Therefore the df for the total variance is calculated by subtracting one from the total number of subjects (N) i.e. df = N-1, due to the fact that the last score in the a group of subjects is predictable and therefore cannot vary [Greene and D’Oliveira 1999]. 92 By performing pair-wise comparisons of all of the rendering modifications to the location or presence alterations it could be determined whether or not the results were statistically significant. For example, the test statistics for the pair-wise comparison of the results for the CI rendering flicker paradigm and the CI location flicker paradigm was t = 6.1668, where the degrees of freedom (df) = 12, and the probability (p) < 0.01 (a p value of 0.05 or less denotes a statistically significant result). For a one-tailed test with the df of 12 t must be 1.782, for significance with p < 0.05. For each of the rest of the rendering results compared with the appropriate location or presence change results a t value of at least 4.76 is achieved which easily exceeds the critical value for significance, see Figure 4.17. Thus, the probability of all the results of this experiment occurring due to the null hypothesis, i.e. not due to the rendering change, is at least as low as 0.01 and probably a lot lower. It can be concluded that the difference in results therefore is (highly) significant. Pair-wise Comparison CI Rendering Flicker MI Rendering Flicker CI Rendering Mudsplash MI Rendering Mudsplash CI Rendering Flicker MI Rendering Flicker CI Rendering Mudsplash MI Rendering Mudsplash CI Location Flicker MI Location Flicker CI Location Mudsplash MI Location Mudsplash CI Presence Flicker MI Presence Flicker CI Presence Mudsplash MI Presence Mudsplash t Value (t 1.782 for significance) df value p value Statistically significant? 6.1668 12 p < 0.01 Significant 4.8843 12 p < 0.01 Significant 5.2139 12 p < 0.01 Significant 4.8194 12 p < 0.01 Significant 5.2610 12 p < 0.01 Significant 5.2277 12 p < 0.01 Significant 4.7637 12 p < 0.01 Significant 5.9466 12 p < 0.01 Significant Figure 4.17: Full results of statistical analysis using unrelated t test for significance. 4.5 Discussion - Experimental Issues and Future Improvements Due to the performance of the graphics card in the computer used for the experiment, the refresh rate could only be set to a maximum of 70Hz. Therefore each time the next image was displayed it was refreshed in sections adding to the visual disruption that observers were suffering. To correct this in the future a higher performance graphics card should be installed. 93 The time taken to notice the alterations when there was no visual disruption to the observer, i.e. no blank flicker images or mudsplashes between the pair of images, should have been timed, for comparison. As this was not done, the results from O’Regan et al. [1999b] can be used, but only as a guideline. This value was on average 1.4 cycles of the images or 1.5 seconds in their experimental setup. As can now be seen if this result is compared to those obtained in this experiment, the time taken to detect a rendering quality change whilst under the flicker paradigm greatly exceeds this value. When observers are viewing a rendering quality change whilst under the mudsplash paradigm, the visual disruption is a lot less and thus the average timings are a lot closer to that of not suffering a visual disruption at all i.e. nearer to 1.5s, and even more so in the presence and location changes. It was later pointed out in further discussions with psychologists that 50% of the sets of images shown to the observers should have been ‘red herrings’, i.e. had no alteration at all. This then guarantees that the observer would not have a clue whether or not there was an alteration until they had exhaustively searched the image. In the experiment run each observer saw only one ‘red herring’. To guarantee that each of the observers perceived the images at the same distance the chair was positioned at the same point on the floor each time. However, when the observers found an alteration hard to observe, they leant towards the screen, meaning that they were no longer observing the images from the same distance as previous observers. For example, the observer’s posture in Figure 4.10a compared to that of the observer in Figure 4.10b. To solve this, the observer’s head should be in a fixed position, not the chair’s location, in future experiments. This could be accomplished by getting the participants to rest their chin on a chin rest and not to move their head from that position until the experiment has finished. 4.6 Summary The results of this experiment showed that Change Blindness does indeed occur in computer generated images as it does in photographs and indeed real life. However, Change Blindness requires a visual disruption to occur and thus it is not clear how, in a typical animation without such disruption, this feature of the human eye could be 94 exploited in a straightforward manner by selective rendering. In Chapter 7, under future work, it is discussed how Change Blindness, however, may be exploited in animated sequences when the user is confronted by multi-sensory disruptions such as visual, audio and motion. Chapter 5 therefore goes on to consider the other feature of the Human Visual System, Inattentional Blindness, and then Chapter 6 shows how this can be incorporated within a selective rendering algorithm. 95 Chapter 5 Inattentional Blindness In pursuit of the goal for this thesis of only rendering to the highest quality that which humans truly perceive, the next stage was to consider Inattentional Blindness. 5.1 Introduction - Visual Attention As discussed in Section 2.3.1, visual scenes typically contain many more objects than can ever be recognized or remembered in a single glance. Some kind of sequential selection of objects for detailed processing is essential so humans can cope with this wealth of information. Coded into the primary levels of human vision is a powerful means of accomplishing this selection, namely the retina. Due to spatial acuity being highest right at the centre of the retina, the fovea, and then falling off rapidly toward the periphery detailed information can only be processed by locating this fovea onto the area of interest. Thus as Yarbus showed [1967], Section 2.3.2, the choice of task is important in the ability to predict the eye-gaze pattern of a viewer, i.e. the movement of their fovea and thus their visual attention. This was confirmed by running a small experiment with an eye tracker, a single observer and a list of four different task instructions, as can be seen in Figure 5.1. It is precisely this knowledge of the expected eye-gaze pattern that is used in this chapter to reduce the rendered quality of objects outside the area of interest without affecting the viewer’s overall perception of the quality of the rendering. 96 Figure 5.1: Effects of a task on eye movements. Eye scans for observers examined with different task instructions; 1. Free viewing, 2. Remember the central painting, 3. Remember as many objects on the table as you can, and 4. Count the number of books on the shelves. 5.2 Experimental Methodology The next stage of this thesis was to develop an experimental methodology, based on the Inattentional Blindness work of Mack and Rock [1998]. However, it was deemed important that this methodology should use computer graphical animations instead and of course a task that focused the observer’s attention only on a certain part of the scene in the animation. It was then hypothesised that this would make the viewers suffer from Inattentional Blindness to the rest of the animation. It was also important to remedy the shortcomings of the experiment discussed in Chapter 4 to achieve a more robust experimental procedure. Results of this work were discussed in two keynotes, [Chalmers and Cater 2002] and [Chalmers et al. 2003], and were presented as a full paper at VRST 2002 in Hong Kong [Cater et al. 2002]. This study involved three rendered animations of an identical fly-through of four rooms, the only difference being the quality to which the individual animations had been rendered. One animation was rendered all to Low Quality (LQ), one all to High Quality (HQ) and the last was rendered selectively according to the task at hand, Selective Quality (SQ). The task of determining which arm of a cross was longer, suggested by Mack and Rock [1998], was not appropriate for animation. This was due to the fact that this methodology needs a homogenous region in the centre large enough for the cross figure to be superimposed on it. This makes sure that there are no contours 97 in the scene overlapping or camouflaging the cross. In a fly-through the camera view is continually moving, thus a large area with no contours is very difficult to maintain when it is also important that the scene contains several distinct, familiar objects [Mack Rock 1998]. Thus several new tasks had to be considered [Hoffman 1979]. The final task chosen was for each user to count the number of pencils that appeared in a mug on a table in a room as he/she moved on a fixed path through four such rooms. To count the pencils, the user needed to perform a smooth pursuit eye movement to track the mug in one room until they had successfully counted the number of pencils in that mug, then perform an eye saccade to the mug in the next room. Each mug also contained a number of superfluous paintbrushes to further complicate the task and thus retain the viewer’s attention, Figure 5.2. Figure 5.2: Close up of the same mug showing the pencils and paintbrushes, (each room had a different number of pencils and paintbrushes). 5.2.1 Creating the Animation Figure 5.3 (a) shows the high quality rendered scene, while (b) shows the same scene rendered at a significantly lower quality, with a much reduced computational time. Each frame for the high quality animation took on average 18 minutes 53 seconds to render in Maya on a Intel Pentium 4 1.5 GHz Processor, while the frames for the low quality animation were each rendered on average in only 3 minutes 21 seconds. All frames were rendered to a resolution of 1280 x 1024. The HQ frames were rendered by selecting Production quality rendering in Maya, whilst the LQ frames were rendered to the Custom - Low setting with the lowest possible value of aliasing, Figure 5.3 d). Due 98 to not being able to select different rendering qualities for specific areas of the scene in Maya, image processing code had to be written in C to take in the Low Quality (LQ) and High Quality (HQ) frames as input and from these to create the Selective Quality (SQ) frames, such as Figure 5.3 (c), by compositing the appropriate frames together in a batch process. To create the Selective Quality (SQ) frames, the actual area covered by the fovea on the computer screen, when focused on the mug containing the pencils, had to be calculated. This was solved by the equation below: Radius = ratio * (D * tan (A / 2)) where Radius = radius of the area covered by the fovea (in pixels), Figure 5.4. Ratio = screen resolution / monitor size A = Visual Angle = 2 degrees, Figure 5.4. D = Distance of the viewer’s eye from the screen = 45cm Figure 5.3(a): High Quality (HQ) image (Frame 26 in the animation) 99 Figure 5.3(b): Low Quality (LQ) image (Frame 26 in the animation) Figure 5.3(c): Selectively rendered (SQ) image with two Circles of high Quality over the first and second mugs (Frame 26 in the animation) 100 Figure 5.3(d): Close up of High Quality rendered chair and the Low Quality version of the same chair. Fovea Radius Fovea Angle (2°) Blend Radius Eye Blend Angle (4.1°) Figure 5.4: Calculation of the fovea and blend areas. Image processing code was written to composite this calculated fovea circle from the HQ frame onto the LQ image. The code then blends this high quality fovea 101 circle into the low quality for the rest of the scene. From McConkie et al.’s [1997] work it is known that this must be blended to 4.1 degrees to simulate the peripheral degradation of the HVS and thus limit the user’s ability of detecting any difference in image quality. Therefore for every pixel in this ‘blend angle’ the amount of low quality blended is increased as you move away from the fovea angle, whilst the high quality is decreased based on a simple cosine curve, till at 4.1 degrees the pixels are totally low quality, see Figure 5.5. Figure 5.5: A Selective Quality (SQ) frame, where the visual angle covered by the fovea for the mugs in the first two rooms, 2 degrees (green circles), is rendered at High Quality and then is blended to Low Quality at 4.1 degrees (red circles). 5.2.2 Experimental Procedure A pilot study involving 30 participants was carried out to finalize the experimental procedure and then a total of 160 subjects were considered. In the final experiments, each subject saw two animations of 35 seconds, displayed at 15 frames per second. Figures 5.6a and 5.6b describe the conditions tested with 16 subjects per 102 condition. Fifty percent of the subjects were asked to count the pencils in the mug while the remaining 50% were simply asked to watch the animations. To minimize experimental bias the choice of condition to be run was randomised and for each, 8 were run in the morning and 8 in the afternoon. All subjects had a variety of experience with computer graphics and all exhibited at least average corrected vision in testing. Acronym Description High Quality: Entire animation rendered at the highest quality. HQ Low Quality: Entire animation rendered at a low quality with no antiLQ aliasing. Selective Quality: Low Quality Picture with high quality rendering in SQ the visual angle of the fovea (2 degrees) centered around the pencils, shown by the green circle in Figure 5.5. The high quality is blended to the low quality at 4.1 degrees visual angle (the red circle in Figure 5.5) [McConkie et al 1997]. Figure 5.6a: The three different types of animations being tested. The orderings of the two animations shown for the experiments were either: HQ + HQ, HQ + LQ, LQ + HQ, HQ + SQ or SQ + HQ No. of Participants for that condition 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 Total: 160 Condition Counting (Task) / Ordering of Animations Watching (No Task) HQ+HQ Counting HQ+HQ Watching HQ+LQ Counting LQ+HQ Counting HQ+LQ Watching LQ+HQ Watching HQ+SQ Counting SQ+HQ Counting HQ+SQ Watching SQ+HQ Watching HQ+HQ Counting HQ+HQ Watching HQ+LQ Counting LQ+HQ Counting HQ+LQ Watching LQ+HQ Watching HQ+SQ Counting SQ+HQ Counting HQ+SQ Watching SQ+HQ Watching 16 Participants per condition Time of Day Morning Morning Morning Morning Morning Morning Morning Morning Morning Morning Afternoon Afternoon Afternoon Afternoon Afternoon Afternoon Afternoon Afternoon Afternoon Afternoon Figure 5.6b: The orderings of the conditions for randomization in the experiment. Before beginning the experiment the subjects read a sheet of instructions on the procedure of the particular task they were about to perform. This guaranteed that all participants received exactly the same instructions, Appendix B.1 and B.2. After the 103 participant had read the instructions they were asked to clarify that they had understood the task. They then rested their head on a chin rest that was located 45cm away from a 17” monitor. The chin rest was located so that their eye level was approximately level with the centre of the screen, see Figure 5.7. The animations were then displayed to the observers at a resolution of 1280 x 1024 and under specific lighting conditions. Figure 5.7: Images to show the experimental setup. To ensure that the viewers focused their attention immediately on the first mug and thus they did not look around the scene to find it, a count down was shown to prepare them that the animation was about to start followed immediately by a black image with a white mug giving the location of the first mug (Figure 5.8). After the first animation had ended participants were shown immediately the second animation. Figure 5.8: Image to give location of the first mug – to focus the observer’s attention. 104 On completion of the experiment, i.e. once both animations had been viewed, each participant was asked to fill in a detailed questionnaire, see Figure 5.9. This questionnaire asked for some personal details, including age, occupation, sex and level of computer graphics knowledge. The participants were then asked detailed questions about the objects in the rooms, their colour, location and quality of rendering. These objects were selected so that questions were asked about objects both near the foveal visual angle (located about the mug with pencils) and in the periphery. They were specifically asked not to guess, but rather state “don’t remember” when they had failed to notice some details. They were also asked not to discuss any of the details of the experiment with any friends or colleagues to prevent any participant having prior knowledge of the experimental procedure. The initial questionnaire designed, Appendix B.3, turned out to be too complex for the 30 participants in the pilot study, as it asked the observer to recount details for each of the four rooms separately. Therefore a new questionnaire was designed which only asked questions about each of the animations as a whole, Appendix B.4, this returned much better results and was used for the final experiment. Figure 5.9: Image to show participants of the experiment filling out the questionnaire after completion of the viewing of the two animations. 5.2.3 Results Figure 5.10 shows the overall results of the experiment. Obviously the participants did not notice any difference in the rendering quality between the two HQ animations (they 105 were the same). Of interest is the fact that, apart from one case in the SQ+HQ experiment, the viewers performing the task consistently failed to notice any difference between the high quality (HQ) rendered animation and the selective quality (SQ) animation, where only the area around the mug was rendered to a high quality. Surprisingly also 25% of the viewers in the HQ+LQ condition and 18% in the LQ+HQ case were so engaged in the task that they completely failed to notice any difference in the quality between these very different qualities of animation. Figure 5.11 (a) and (b) show that having performed the task of counting the pencils, the vast majority of participants were simply unable to recall the correct colour of the mug (90%) which was in the foveal angle and even less the correct colour of the carpet (95%) which was outside this angle. The Inattentional Blindness therefore being even higher for “less obvious” objects, especially those outside the foveal angle. 100 Percetage of people who did notice the rendering quality difference 90 80 70 60 50 40 30 20 10 0 HQ + HQ HQ + LQ LQ + HQ HQ + SQ SQ + HQ Animation Conditions Free For All Counting Pencils Task Figure 5.10: Experimental results for the two tasks: Counting the pencils and simply watching the animations (free for all). Overall the participants who simply watched the animations were able to recall far more detail about the scene, although the generic nature of the task given to them precluded a number from recalling such details as the colour of specific objects, for 106 example 47.5% could not recall the correct colour of the mug and 53.8% the correct Percentage of people who answered what the colour of the carpet was colour of the carpet. 100 90 80 70 60 50 40 30 20 10 0 Blue Other Colour Don't Remember Blue Watching Animation Other Colour Don't Remember Counting Pencils Animation Conditions HQ+HQ HQ+LQ LQ+HQ HQ+SQ SQ+HQ Percentage of people who answered what the colour of the mug was Figure 5.11 a): How observant were the participants: colour of the carpet (outside the foveal angle). 100 90 80 70 60 50 40 30 20 10 0 Red Other Colour Don't Remember Red Watching Animation Other Colour Don't Remember Counting Pencils Animation Conditions HQ+HQ HQ+LQ LQ+HQ HQ+SQ SQ+HQ Figure 5.11 b): How observant were the participants: colour of the mug (inside the foveal angle). 107 5.2.4 Analysis Since the response of the observers was binary, the appropriate method of statistical analysis was the Chi-square test (X2) for significance. Standard linear regression models or analysis of variance (ANOVA) are only valid on continuous data from normal distributions, and therefore were not appropriate. By performing pair-wise comparisons of all the other animations to the animations HQ+HQ it could be determined whether or not the results were statistically significant (Figure 5.12). When simply watching the animation, the test statistics for all the pair-wise comparisons were statistically significant. For example, the result for the pair-wise comparison of HQ+HQ and HQ+SQ was X2 = 32, df = 1, p < 0.005 (a p value of 0.05 or less denotes a statistically significant result). In comparison when counting the pencils the test statistics were significant for the pair-wise comparisons HQ+HQ with HQ+LQ (X2 = 32, df = 1, p < 0.005). However, for the comparisons HQ+HQ with HQ+SQ the results were statistically insignificant, and thus the null hypothesis was retained (X2 = 0.125, df = 1, p > 0.1). From this it can be concluded that when the observers were counting the pencils, the HQ animations and the SQ animations produced the same result, i.e. the observers thought that they were seeing the same animation twice, with no alteration in rendering quality. Pair-wise Comparison X2 Value (X2 3.84 for significance) df value p value Statistically significant? HQ+HQ watching HQ+HQ counting 0 1 p > 0.1 HQ+LQ watching HQ+LQ counting 4.333 1 p < 0.05 Null hypothesis is retained Significant (just!) LQ+HQ watching LQ+HQ counting 3.692 1 p < 0.05 Significant (just!) HQ+SQ watching HQ+SQ counting 28.125 1 p < 0.005 Significant SQ+HQ watching SQ+HQ counting 32 1 p < 0.005 Significant HQ+HQ watching HQ+LQ watching 32 1 p < 0.005 Significant HQ+HQ watching LQ+HQ watching 32 1 p < 0.005 Significant HQ+HQ watching HQ+SQ watching 32 1 p < 0.005 Significant HQ+HQ watching SQ+HQ watching 32 1 p < 0.005 HQ+HQ counting HQ+SQ counting 0.125 1 p > 0.1 HQ+HQ counting SQ+HQ counting 0 1 p > 0.1 Significant Null hypothesis is retained Null hypothesis is retained Figure 5.12: Full results of statistical analysis using Chi-square test (X2) for significance. 108 5.2.5 Verification with an eye-tracker To make certain that the attention of the observer was indeed being captured by the task, counting pencils, the experiment was briefly repeated with the Eyelink Eyetracking System developed by SR Research Ltd. and manufactured by SensoMotoric Instruments. Figure 5.13 shows an example of the scan path of one of the observers whilst performing the counting pencils task for 2 seconds. While all the observers had slightly different scan paths none of their eye scans left the green box (Figure 5.13). Figure 5.14 shows an example of the scan path of one of the observers who was simply watching the animation for 2 seconds. These results demonstrate that Inattentional Blindness may in fact be exploited to significantly reduce the rendered quality of a large portion of the scene without having any affect on the viewer’s perception of the scene. Figure 5.13: An eye scan for an observer counting the pencils. The green crosses are fixation points and the red lines are the saccades. 109 Figure 5.14: An eye scan for an observer who was simply watching the animation. 5.2.6 Conclusions For virtual reality applications in which the task is known a priori the computational savings of exploiting Inattentional Blindness can be dramatic. Our approach works by identifying the area of user fixation, determined by the task being performed, rendering this to a high quality and exploiting the Inattentional Blindness inherent in the human visual system to render the rest of the scene at a significantly lower quality. Our results show that while engaged in the task, users consistently failed to notice the quality difference, and even objects, within the scene. 5.3 Non-Visual Tasks The task tested previously in the experiment described in section 5.2 [Cater et al. 2002], was visual, therefore it was not known what would happen if observers were given a non-visual task. To answer this very question the experiment described in Section 5.2 was repeated but with a non-visual task instead. The task chosen for each user was to 110 count backwards from one thousand in steps of two. The results were then compared to those of the visual task in Section 5.2. The results of this work was presented as a full paper at SPIE 2003 in San Francisco [Cater et al. 2003a], and are also discussed in the chapter written for the Visualization Handbook to appear summer 2004 [Chalmers and Cater 2004]. 5.3.1 The psychophysical experiment The study involved the same three rendered animations, of the fly-through of the four rooms. Again, as with the previous experiment, the only difference between the three animations was the quality to which the individual animations had been rendered. A total of 80 subjects were studied, each subject saw two animations of 35 seconds, displayed at 15 frames per second. All the subjects were asked to count backwards from one thousand in steps of two, out loud. As subjects were a mix of nationalities they were given a choice of language to count in so that the level of task was the same for all nationalities and that language was not an issue. To minimize experimental bias, the choice of condition to be run was randomized and for each, 8 were run in the morning and 8 in the afternoon, see Figure 5.15. Subjects had a variety of experience with computer graphics and all exhibited at least normal or corrected to normal vision in testing. No. of Participants for that condition 8 8 8 8 8 8 8 8 8 8 Total: 80 Condition Counting Backwards Ordering of Animations (Non-Visual Task) HQ+HQ Counting Backwards HQ+LQ Counting Backwards LQ+HQ Counting Backwards HQ+SQ Counting Backwards SQ+HQ Counting Backwards HQ+HQ Counting Backwards HQ+LQ Counting Backwards LQ+HQ Counting Backwards HQ+SQ Counting Backwards SQ+HQ Counting Backwards 16 Participants per condition Time of Day Morning Morning Morning Morning Morning Afternoon Afternoon Afternoon Afternoon Afternoon Figure 5.15: The orderings of the conditions for randomization in the experiment. Before beginning the experiment the subjects read a sheet of instructions on the procedure of the task they were to perform, Appendix B.5. After the participant had read the instructions they were asked to clarify that they had understood the task. They then rested their head on a chin rest that was located 45cm away from a 17-inch 111 monitor. The chin rest was located so that their eye level was approximately level with the centre of the screen. The animations were displayed at a resolution of 1280 x 1024. The subjects were told to start counting backwards in steps of two out loud as soon as the countdown shown at the beginning of the animation had finished. They were shown the second animation immediately afterwards, again the subjects started counting from one thousand backwards in steps of two after the countdown. On completion of the experiment each participant was asked to fill in the same detailed questionnaire as described in section 5.2.2, which can be seen in Appendix B.4. Again the participants were then asked detailed questions about the objects in the rooms, their colour, location and quality of rendering. They were specifically asked not to guess, but rather state ‘don’t remember’ when they had failed to notice some details. 5.3.2 Results Once again the participants did not notice any difference in the rendering quality between the two HQ animations (they were the same), Figure 5.16. Of interest is the fact that when performing the non-visual task, counting backwards from one thousand in steps of two, the majority of viewers noticed both the difference between the high quality rendered animation and the low quality animation (HQ+LQ: 87.5% and LQ+HQ: 75%) as well as noticing the difference between the high quality rendered animation and the selective quality animation (HQ+SQ: 81.25% and SQ+HQ: 75%) where the area around the mug was rendered to a high quality. Thus, it can be deduced that when performing a non-visual task the majority of observers can still detect the difference in rendering quality. If these results are compared to those described in Section 5.2, where the observers performed the counting pencils task, a visual task, apart from one case in the SQ+HQ experiment, the viewers performing the visual task consistently failed to notice any difference between the high quality rendered animation and the selective quality animation. The same phenomenon occurs if the results from the questions what was the colour of the mug and what was the colour of the floor are compared. Figure 5.17 (a) and (b) show that having performed the non-visual task, the majority of participants were unable to recall the correct colour of the mug (69%) and even less the correct colour of the carpet (79%). However, this is still less than when 112 the participants were asked to perform the counting pencils task, with 90% unable to recall the colour of the mug and 95% unable to recall the colour of the carpet. Overall, the participants who simply watched the animations were able to recall far more detail of the scenes than the non-visual task, although the generic nature of the task given to them precluded a number from recalling such details as the colour of specific objects, for example 47.5% could not recall the correct colour of the mug and 53.8% the correct colour of the carpet. Therefore, it can be concluded that whilst performing a non-visual task, participants can detect a rendering quality alteration but they are unable to recall much detail about the scene due to having to perform the non-visual task. Percentage of People who did notice the rendering quality difference 100 90 80 70 60 50 40 30 20 10 0 HQ+HQ HQ+LQ LQ+HQ HQ+SQ SQ+HQ Animation Conditions Watching Animation Counting Pencils Non-Visual Task Graph 5.16: Experimental results for the three tasks: Simply watching the animations, the Visual task: Counting the pencils and the Non-visual task: Counting backwards from 1000 in steps of 2. 113 90 80 70 60 50 40 30 20 10 Watching Animation Counting Pencils Don't Remember Other Colour Red Don't Remember Other Colour Red Other Colour Don't Remember 0 Red Percentage of people who answered what the colour of the mug was 100 Non-Visual Task Animation Conditions HQ+HQ HQ+LQ LQ+HQ HQ+SQ SQ+HQ Percentage of people who answered what the colour of the carpet was Figure 5.17 (a): How observant were the participants depending on the task: Colour of the mug. 100 90 80 70 60 50 40 30 20 10 Watching Animation Counting Pencils Don't Remember Other Colour Blue Don't Remember Other Colour Blue Don't Remember Other Colour Blue 0 Non-Visual Task Animation Conditions HQ+HQ HQ+LQ LQ+HQ HQ+SQ SQ+HQ Figure 5.17(b): How observant were the participants depending on the task: Colour of the carpet. 114 5.3.3 Analysis Statistical analysis shows that the results are statistically significant. The appropriate method of statistical analysis was the t-test for significance and since each person had a different random selection of the animations an unrelated t-test had to be used. By performing pair-wise comparisons of all the other animations to the animations HQ+HQ it could be determined whether or not the results were statistically significant, Figure 5.18. Pair-wise Comparison (t 2.042for significance) t Value df value p value Statistically significant? Null hypothesis is retained Significant (just!) Null hypothesis is retained Highly Significant HQ+HQ watching HQ+HQ counting 0 30 p > 0.1 HQ+LQ watching HQ+LQ counting 2.236 30 p < 0.05 LQ+HQ watching LQ+HQ counting 1.464 30 p > 0.1 HQ+SQ watching HQ+SQ counting 15 30 p < 0.05 SQ+HQ watching SQ+HQ counting 15 30 p < 0.05 HQ+HQ watching HQ+HQ non-visual 0 30 p > 0.1 HQ+LQ watching HQ+LQ non-visual 1.464 30 p > 0.1 LQ+HQ watching LQ+HQ non-visual 2.236 30 p < 0.05 HQ+SQ watching HQ+SQ non-visual 1.861 30 p > 0.1 SQ+HQ watching SQ+HQ non-visual 1.464 30 p > 0.1 HQ+HQ watching HQ+LQ watching 30 HQ+HQ watching LQ+HQ watching 30 HQ+HQ watching HQ+SQ watching 30 HQ+HQ watching SQ+HQ watching 15 30 p < 0.05 Highly Significant Null hypothesis is retained Null hypothesis is retained Significant (just!) Null hypothesis is retained Null hypothesis is retained Maximum Significant Maximum Significant Maximum Significant Highly Significant HQ+HQ counting HQ+LQ counting 6.708 30 p < 0.05 Significant HQ+HQ counting LQ+HQ counting 8.062 30 p < 0.05 HQ+HQ counting HQ+SQ counting 1 30 p > 0.1 HQ+HQ counting SQ+HQ counting 0 30 p > 0.1 HQ+HQ non-visual HQ+HQ non-visual HQ+HQ non-visual HQ+HQ non-visual HQ+LQ non-visual LQ+HQ non-visual HQ+SQ non-visual SQ+HQ non-visual 10.247 6.708 8.062 6.708 30 30 30 30 p < 0.05 p < 0.05 p < 0.05 p < 0.05 Significant Null hypothesis is retained Null hypothesis is retained Highly Significant Significant Significant Significant p < 0.05 p < 0.05 p < 0.05 Figure 5.18: Full results of statistical analysis using t-test for significance. When the observers were performing the non-visual task, the test statistics for all the pair-wise comparisons, were statistically significant. For a two-tailed test with the df 115 = 30, t must be 2.042 for significance with p < 0.05. The result for the pair-wise comparison of HQ+HQ and HQ+SQ was t = 8.062, df = 30, p < 0.05 (a p value of 0.05 or less denotes a statistically significant result). In comparison when the observers were performing the visual task, counting the pencils, the test statistics were significant for the pair-wise comparisons HQ+HQ with HQ+LQ (t = 6.708, df = 30, p < 0.05). However, for the comparisons HQ+HQ with HQ+SQ the results were statistically insignificant, and thus the null hypothesis was retained (t = 1, df = 30, p > 0.1). From this it can be concluded that when the observers were performing the visual task the HQ+HQ animations and the HQ+SQ animations produced the same result, i.e. the observers thought that they were seeing the same animation twice, with no alteration in rendering quality. However, when the observers were performing the non-visual task the result was significantly different, i.e. the observers could distinguish that they were shown two differently rendered animations. If statistical analysis is also run on the pairwise comparison of watching animation (no task) and counting backwards (non-visual task), the results are statistically insignificant, for example for the comparison between watching animation HQ+SQ with counting backwards HQ+SQ the null hypothesis is retained as t = 1.4638501, df = 30, p > 0.1, thus concluding there is no significant difference between the results caused when watching the animation and those caused when counting backwards. 5.3.4 Verification with an eye-tracker Again, as done in Section 5.2, to compare the scan paths made by subjects performing the counting backwards task the experiment was briefly repeated with the Eyelink Eyetracking System. The results were then compared with previous scan paths recorded when subjects were asked to count pencils or simply to watch the animation. Figure 5.19 shows an example of the scan path of one of the observers whilst performing the counting backwards task for 2 seconds, note how less fixations are on actual objects, this is because the observers are having to multi-task and thus less effort can be put into simply watching the animation. This image can then be compared to the two eye-scan path images produced whilst performing the counting pencils task for 2 seconds, Figure 5.13, and produced whilst simply watching the animation for 2 seconds, Figure 5.14. 116 Figure 5.19: An eye scan for an observer counting backwards. The green crosses are fixation points and the red lines are the saccades. 5.3.5 Conclusions For a non-visual task it is very difficult to predict where the observers are going to look, although a non-visual task does affect the eye-scan path of an observer it does not take control, i.e. guide the eyes only to specific areas of the scene, as can be see in Figure 5.19. Therefore using a top-down processing method to render the scene for a nonvisual task scenario is impractical. A more appropriate method would be one based on bottom-up processing, such as the one described by Yee et al. [2001], or by using a method which uses an eye-tracker, such as that proposed by McConkie and Loschky [2000]. 5.4 Inattentional Blindness versus Peripheral Vision Sections 5.2 and 5.3 showed that conspicuous objects in a scene that would normally attract the viewer’s attention are ignored if they are not relevant to the visual task at hand. This section investigates whether this failure to notice quality difference when 117 performing a visual task is merely due to the loss of visual acuity in our peripheral vision, or is indeed due to Inattentional Blindness. 5.4.1 Task Maps: Experimental Validation This section demonstrates Inattentional Blindness experimentally in the presence of a high-level task focus. The hypothesis, based on the previous findings in this thesis, was that viewers would not notice what would be normally visible degradations in an image when instructed to perform a task, as long as the objects which were being instructed to find were not affected by the degradation. The experiments performed in this section confirmed this hypothesis with a high level of certainty and were presented along with the rendering framework (Chapter 6) at the Eurographics Symposium on Rendering in Belgium [Cater et al. 2003c]. A pre-pilot study was run to refine the experimental procedure as well as which method of processing was to be used. Initially Alias Wavefront Maya was used to design a scene in which a task, counting the number of teapots in a scene, could be carried out, Figure 5.20. However it was decided that a scene implemented in Radiance would be more appropriate due to the fact that the long term goal was to develop a selective rendering framework based on the global illumination package of Radiance, Figure 5.21. Figure 5.20: Images to show the initial experimental scene developed in Alias Wavefront Maya. 118 Percentage of Participants that noticed the difference between a 3072 Resolution Image (HQ) and the Low Quality Rendered Image (LQ) Figure 5.21: Image to show the final experimental scene developed in Radiance rendered with a sampling resolution of 3072x3072, in the experiment this is referred to as High Quality (HQ). 100 100 90 80 71.9 70 60 50 43.8 40 30 20 10 0 0 256 512 768 1024 1280 1536 1792 2048 2304 2560 2816 3072 Resolution of Low Quality Rendered Image (LQ) Figure 5.22: Results from the pilot study: determining a consistently detectable rendering resolution difference. 119 A pre-study was run with 10 participants, who were asked to count the number of teapots in a computer rendered image, to find out how long subjects took to count them correctly. This was found to be 2 seconds on average. In the previous experiments the low quality was chosen simply by personal choice, however by using this method it couldn’t be justified that all the participants could easily detect the chosen low quality resolution. To eliminate this experimental flaw a pilot study was conducted to deduce the appropriate detectable image resolution to use for the main experiment. Therefore 32 participants were shown 24 pairs of images at random, and asked if they could distinguish a change in resolution or quality between the two images. Each image was displayed for 2 seconds. One image was always the High Quality image rendered at a 3072x3072 sampling resolution, whilst the other image was one selected from images rendered at sampling resolutions of 256x256, 512x512, 768x768, 1024x1024, 1536x1536 and 2048x2048. In half of the pairs of images, there was no change in resolution; i.e., they saw two 3072x3072 resolution images. The results can be seen in Figure 5.22. All the participants could easily detect a quality difference with the resolutions of 256x256 through to 1024x1024 in comparison to a resolution of 3072x3072. 72% still detected a quality difference between a resolution image of 1536x1536 and 3072x3072, this just being under the 75% probability that an observer would notice a difference, declared as one JND (Just Noticeable Difference) [Lubin 1997], and thus it could have been an acceptable resolution to use for the main experiment. However, it was decided that a resolution of 1024x1024 would be used in the main study as 100% of participants in the pilot study detected the difference, thus guaranteeing that when the participants are not performing a task the difference in resolution was easy to detect. The main study involved two models of an office scene, the only difference being the location of items in the scene, mainly teapots (Figure 5.21). Each scene was then rendered to three different levels of resolution quality, the entire scene at High Quality (HQ), a sampling resolution of 3072x3072 (Figure 5.23a), the entire scene at Low Quality (LQ), a sampling resolution of 1024x1024 (Figure 5.23b), and Selective Quality (SQ). 120 Figure 5.23: Sampling resolutions: a(left) 3072x3072 (HQ), b(right) 1024x1024(LQ) Fig 5.23: Sampling resolutions: c(left) 768x768 d(right) 512x512 5.4.2 Creating the Selective Quality (SQ) images The Selective Quality (SQ) images were created by selectively rendering the majority of the scene in low quality (1024x1024) apart from the visual angle of the fovea (2°) centered on each teapot, shown by the black circles in Figure 5.24, which were rendered at the higher rate corresponding to a 3072x3072 sampling. As the selective rendering framework had not yet been accomplished, as the exact implementation was waiting on the results of this experiment, a way of simulating the same effect in Radiance had to be created. This was done by using a selection of Radiance functions. First the office scene was created in Radiance, scene.all, along with a scene which had only five circles located in the viewing plane, spheres.rad. The location and size of these circles in this file was very important for they had to a) be the size of the foveal angle, from 45cm away, and b) match up with the location of each of the teapots in the scene from the viewing plane. These circles can be seen in Figure 121 5.25, which also shows the geometry of the office scene as well as the viewing plane. The spheres file then had to be rendered to high quality (3072x3072), Figure 5.26 (left) using vwrays, whilst the scene file was rendered to low quality (1024 x 1024) using rpict. The two files were then combined using pcomb, which blended the two files together, simulating the degradation in acuity around the foveal angle, as can be seen below. oconv spheres.rad > spheres.oct oconv scene.all > room.oct vwrays -vf default.vf -x 3072 -y 3072 -ff \ | rtrace -w -h -ff -opd spheres.oct \ | rtrace -ffc `vwrays -d -vf default.vf -x 3072 -y 3072`room.oct \ | pfilt -1 -x /3 -y /3 -e +2 > highdetailfilt.pic rpict -pj 0 -x 1024 -y 1024 -vf default.vf spheres.oct > alpha.pic rpict -pj 0 -x 2048 -y 2048 -vf default.vf room.oct \ | pfilt -1 -x /2 -y /2 -e +2 > lowdetailfilt.pic pcomb -e ' s1=gi(3);s2=1-s1;'\ -e ' ro=s1*ri(1)+s2*ri(2);go=s1*gi(1)+s2*gi(2);bo=s1*bi(1)+s2*bi(2)'\ lowdetailfilt.pic highdetailfilt.pic alpha.pic > combined.pic Figure 5.24: Selective Quality (SQ) image showing the high quality rendered circles located over the teapots (black). 122 The High Quality (HQ) images took 8.6 hours to render with full global illumination in Radiance [Ward 1994] on a 1 GHz Pentium processor, whilst the images for the Low Quality (LQ) were rendered in 4.3 hours, i.e. half the time of the HQ images, and the Selective Quality (SQ) images were rendered in 5.4 hours. Viewing Plane Figure 5.25: The circles in the viewing plane from which the high quality fovea circles are generated. Figure 5.26: The high quality fovea circles (left) are then composited automatically with the low quality image, adding a glow effect (blend) around each circle to reduce any pop out effects, resulting in the Selective Quality image (SQ) (right). 5.4.3 Experimental Methodology In the study, a total of 96 participants were considered. Each subject saw two images, each displayed for 2 seconds. Figures 5.27a and 5.27b describe the conditions tested with 32 subjects for the HQ+HQ condition and 16 subjects for the other conditions. From the pilot study it was known that all participants should 100% be able to detect the 123 rendering quality difference if given no task; i.e. they are simply looking at the images for 2 seconds, thus there was no need to repeat the watching conditions. The task chosen to demonstrate the effect of Inattentional Blindness had the subjects counting teapots located all around the scene. There were 5 teapots in both images. By placing teapots and similar looking objects all over the scene, it was possible to see whether or not having to scan the whole image, and thus fixate on low quality as well as high quality regions, would mean that the viewers would indeed be able to detect the rendering quality difference. Acronym HQ LQ SQ Description High Quality: Entire frame rendered at a sampling resolution of 3072x3072. Low Quality: Entire frame rendered at a sampling resolution of 1024x1024. Selective Quality: A sampling resolution of 1024x1024 all over the image apart from in the visual angle of the fovea (2°) centred around each teapot, shown by the circles in Figure 5.24, which were rendered to a sampling resolution of 3072x3072. Figure 5.27a: The three different types of images being tested. The ordering image pairs shown in the experiment were: (1) HQ+HQ, (2) HQ+LQ, (3) LQ+HQ, (4) HQ+SQ and (5) SQ+HQ. No. of Participants for that condition 16 8 8 8 8 16 8 8 8 8 Total: 96 Condition Time of Day Counting (Task) / Ordering of Animations Watching (No Task) HQ+HQ Counting Morning HQ+LQ Counting Morning LQ+HQ Counting Morning HQ+SQ Counting Morning SQ+HQ Counting Morning HQ+HQ Counting Afternoon HQ+LQ Counting Afternoon LQ+HQ Counting Afternoon HQ+SQ Counting Afternoon SQ+HQ Counting Afternoon 32 Participants per condition when the results are combined for HQ+LQ and LQ+HQ as well as HQ+SQ and SQ+HQ Figure 5.27b: The orderings of the conditions for randomisation in the experiment. To minimize experimental bias, the choice of which condition to run was randomized and for each; 8 were run in the morning and 8 in the afternoon. Subjects had a variety of experience with computer graphics, and all exhibited normal or corrected to normal vision in testing. 124 Before beginning the experiment, the subjects read a sheet of instructions on the procedure of the task they were to perform, Appendix C.2. After each participant had read the instructions, they were asked to clarify that they understood the task. They then placed their head on a chin rest that was located 45cm away from a 17-inch monitor, see Figure 5.28. The chin rest was located so that their eye level was approximately level with the centre of the screen. The participants’ eyes were allowed to adjust to the ambient lighting conditions before the experiment was begun. The first image was displayed for 2 seconds, then the participant stated out loud how many teapots they saw. Following this, the second image was displayed for 2 seconds, during which the task was repeated. Figure 5.28: Image to show the experimental setup. Immediately following the experiment, each participant was asked to fill out a detailed questionnaire. This questionnaire asked for some personal details including age, sex, and level of computer graphics knowledge. The participants were then asked detailed questions about the quality of the two images they had seen. There were a series of questions all about whether they noticed any difference at all between the images, as can be seen by the full questionnaire in Appendix C.3. This was so that no matter what experience of computer graphics the participant would be jogged to answer ‘yes’ to at least one of the questions if they had perceived a change. If they had noticed 125 a difference in quality it was then possible to identify what had triggered it from the selection of alternatives. On completion of the questionnaire the subjects were finally shown a high quality and a low quality image side-by-side and asked which one they saw for the first and second displayed images, even if the observers hadn’t reported seeing any difference. This was to confirm that participants had not simply failed to remember that they had noticed a quality difference, but actually could not distinguish the correct image when shown it from a choice of two, i.e. to guarantee that the observers had not in fact perceived without awareness or perceived and quickly forgotten that they had noticed a difference. 5.4.4 Results Figure 5.29 shows the overall results of the experiment. Obviously, the participants did not notice any difference in the rendering quality between the two HQ images (they were the same). Of interest is the fact that, apart from two cases in the HQ/SQ condition (HQ+SQ and SQ+HQ), the viewers performing the task consistently failed to notice any difference between the HQ rendered image and the SQ image. Surprisingly, nearly 20% of the viewers in the HQ/LQ (HQ+LQ and LQ+HQ) condition were so engaged in the task that they failed to notice any difference between these very different quality images. Mack and Rock [1998] state that ‘It is to be assumed that attention normally is to be paid to objects at fixation, however when a visual task requires attending to an object placed at some distance from a fixation, attention to objects at the fixation might have to be actively inhibited.’ This then could explain the fact that Inattentional Blindness occurs to a great extent even when an object rendered to low quality occurs at a fixation, the rendering quality of that object is simply inhibited from the observer’s attention due to having to perform a visual task, counting the teapots. 126 Percentage of People who Did notice the rendering resolution quality difference 100 90 80 70 60 50 40 30 20 10 0 HQ/HQ HQ/LQ HQ/SQ Image Conditions Counting Teapots Watching Images (pilot study) 100 90 80 70 60 50 40 30 20 10 0 Va se s Pe n C s ha ir Bo s o Te ks ap Pi ots ct u C res ra yo n Pa s le A s tte C htra om y pu te Ph r o To ne y C a Vi r de M Bo o rP ttl es ot at o Clo H ea ck Si d lv er Te Toy Ph a ot cup o Fr s am e Percentage of People that Selected that Item being in the Image Figure 5.29: Experimental results for the two tasks: counting the teapots vs. simply looking at the images. List of Items Participants Could Select From Figure 5.30: Experimental results for asking the participants what objects there were in the scene, for the counting teapots criteria only. 127 As well as asking questions about the quality of the images in the questionnaire there was also a question which asked the participants to select from a list of 18 objects which were actually in the scene, Appendix C.3. Of the 18 objects listed there were 9 that existed in the scene and 9 that did not i.e. were ‘red herrings’. Figure 5.30 shows the results of the selections made when the participants were performing the counting teapots task. The objects that were and weren’t in the scene are listed in Figure 5.31. Objects located in the scene Vases Chairs Books Teapots Pictures Phone Video Bottles Teacups Number of times that object was selected 28 13 22 32 12 7 1 11 16 Objects NOT located in the scene Pens Crayons Palette Ashtray Computer Toy Car Clock Mr Potato Head Toy Silver Photo Frame Number of times that object was selected 7 1 0 0 8 1 4 0 9 Figure 5.31: List of the objects that were and were not in the scene. From Figures 5.30 and 5.31, it can be seen that participants were more likely to list objects that are commonly found in an office type of scene (chairs, books, pictures, pens, silver photo frame, clock and a computer). However from the results it can be seen that participants did not have a problem in selecting correctly the right objects in the scene, thus showing that the participants did not have a problem in remembering aspects about the image. This backs up the theory that participants were not registering that the quality of the images had been altered due to their lack of memory of it, for as can be seen their memory of other aspects in the scene was fine. Therefore it can be concluded that the participants had not perceived and then forgotten the quality difference but had indeed suffered Inattentional Blindness to the quality difference. This was true for all of the criteria apart from the case of the pens, silver photo frame, clock and computer. As stated this was probably due to participants assuming that these objects would be in an office type of scene such as this one. Therefore it is hypothesized that these were more likely guesses rather than the participants definitely having thought they saw this object in the scene. To avoid having this result in future experiments participants could be asked to select on a scale their ranking of how confident they feel that the object is in the scene. From this scale it could then be seen if the participants had truly perceived, i.e. were truly confident, or were guessing. The most frequently selected objects, apart from the teapots, were ones that had probably 128 been fixated on to see whether or not they were teapots and thus should be counted (vases, bottles and teacups). 5.4.5 Statistical Analysis Statistical analysis shows where the results of this experiment are significant. The appropriate method of analysis is a “paired samples” t-test for significance, and since each subject had a different random selection of the images, an unrelated t-test was applied [Coolican 1999]. By performing comparisons of the other image pairings to the HQ/HQ data, it could be determined whether the results were statistically significant or not, Figure 5.32. When the observers were counting teapots, the difference between HQ/HQ and HQ/LQ counts were statistically very significant. For a two-tailed test with the df = 62 (df is related to the number of subjects), t must be 2.0 for significance with p < 0.05 (less than 5% chance of random occurrence). The result for the pair-wise comparison of HQ/HQ and HQ/LQ was t = 11.59 with p < 0.05. Pair-wise Comparison (t 2.0 for significance) t Value df value p value HQ/HQ watching HQ/HQ counting 0 62 p > 0.1 HQ/LQ watching HQ/LQ counting 2.67 62 p < 0.05 HQ/SQ watching HQ/SQ counting 21.56 62 HQ/HQ watching HQ/LQ watching 62 p < 0.05 p < 0.05 HQ/HQ watching HQ/SQ watching 62 HQ/HQ counting HQ/LQ counting 11.59 62 p < 0.05 HQ/HQ counting HQ/SQ counting 1.44 62 p > 0.1 p < 0.05 Statistically significant? Null hypothesis is retained Significant (just!) Highly Significant Maximum Significant Maximum Significant Significant Null hypothesis is retained Figure 5.32: Full results of statistical analysis using t-test for significance. However, if the statistics are analyzed on the pair-wise comparison of HQ/HQ and HQ/SQ, the results are not statistically significant – the null hypothesis is retained, as t = 1.44, df = 62, and p > 0.1. From this it can be concluded that when observers were counting teapots, the HQ/HQ images and the HQ/SQ images produced the same result; i.e., the observers thought they were seeing the same pair twice, with no alteration in rendering quality. However, when the observers were simply looking at the images without searching for teapots in the pilot study, the result was significantly different; 129 i.e., the observers could distinguish that they were shown two images rendered at different qualities. An additional experiment was run to see at what value the results became significantly different from the HQ resolution of 3072x3072. At a sampling resolution of 768x768 (Figure 5.23c) the results were only just significant, t = 2.95, df = 62, and p < 0.05, i.e., only 7 participants out of the 32 people studied noticed the difference between the high quality image and a selectively rendered image whilst performing the teapot counting task. This only increased to 8 people out of 32 when the sampling resolution was dropped again to 512x512 (Figure 5.23d)! 5.4.6 Verification with an Eye-tracker Once again, as performed in the previous experiments the experiment was repeated using the Eyelink Eyetracking System to confirm that the attention of an observer was being fully captured by the task of counting teapots. Figure 5.33 shows an example of a scan path of an observer whilst performing the counting teapots task for 2 seconds. Whilst all the observers had slightly different scan paths across the images, they fixated both on the teapots and on other objects as well. The vases were the most commonly non-teapot object fixated upon due to the fact they were the most similar looking item to a teapot in the scene. Participants therefore made fixations on non-teapot objects in the image to make sure whether or not they were in fact a teapot, but could not distinguish the different quality to which they were rendered. The time the observers spent fixating on the teapots was slightly longer than the other fixations made when looking at the images. Thus it could be hypothesized that maybe the length of fixation is important to the human’s ability to register the quality for each aspect they perceive. However as the primary use for using the eye-tracker was to confirm whether or not the tasks were capturing the observers’ attention, by looking at their eye-saccades, only a few participants were experimented with. Thus to conclude significantly with statistics whether this is the case another experiment would have to be run with at least 16 participants, and the eye tracking data analysed thoroughly. Figure 5.34 shows the perceptual difference between the Selective Quality (SQ) and Low Quality (LQ) images computed using Daly’s Visual Difference Predictor 130 [Daly 1993; Myszkowski 1998]. The recorded eye-scan paths clearly cross, and indeed fixate, on areas of high perceptual difference. It can therefore be concluded that the failure to distinguish the difference in rendering quality between the teapots, selectively rendered to high quality, and the other low quality objects, is not due purely to peripheral vision effects. The observers are fixating on low quality objects, but because they are not relevant to the given task of counting teapots, they fail to notice the reduction in rendering quality. This is Inattentional Blindness. The results presented in this chapter demonstrate that Inattentional Blindness, and not just peripheral vision, may be exploited to significantly reduce the rendered quality of a large portion of the scene without having any significant effect on the viewer’s perception of the scene. Figure 5.33: An eye scan for an observer counting the teapots. The X’s are fixation points and the lines are the saccades. 131 Figure 5.34: Perceptual difference between SQ and LQ images using VDP [Daly 1993]. Red denotes areas of high perceptual difference. 5.5 Summary In this chapter the focus of the research was changed from Change Blindness, to that of Inattentional Blindness. Experiments were developed and run to determine if Inattentional Blindness does in fact work with computer graphics imagery. The experiment revolved around showing two 30s animations to an observer when they were either performing a visual task or not, and then asking them a series of questions on what they had just seen. What the results showed was that when the observers were performing the task they only perceived the quality of the task-related objects i.e. they did not notice when the rest of the scene was rendered to a lower resolution. However 132 when observers were asked simply to watch the animation, they perceived the quality difference easily. The experiment was then rerun using a non-visual task, that of counting backwards from 1000 in steps of two. It was found that when performing a non-visual task at least 75% of the observers could still detect when there was a change in alteration of rendering quality. This shows that for this phenomenon of Inattentional Blindness to be exploited, the task must be a visual one. From the results it can be seen that if the task was in a fixed location, such as the pencils in the mug, then the observers do not perceive an alteration in quality in the rest of the scene, but could this be due to the human visual system’s poor ability to detect acuity in the periphery, not due to Inattentional Blindness? As a result on its own this did not matter, for the principle could be still used as a selective renderer. However for completeness, it was deemed important to take this a step further and answer the following question: What happens if the task is all over a scene, thus what happens when observers actually fixate on the low quality rendering if it’s not related to the task? This would prove whether or not Inattentional Blindness is the key to be used in a selective render to save time computationally or whether peripheral vision is sufficient. A final experiment was run to confirm that the failure of viewers to notice quality difference in a selective rendered scene was due to inattentional blindness and not simply peripheral vision. To perform the task of counting the teapots in the scene, the participant’s eyes crossed the scene and fixated on both task and non-task objects. When performing the visual task the viewers consistently failed to notice the quality difference in the various parts of the scene even those non-task related objects upon which they had fixated. When, however, viewers were simply looking at the scene, 100% of them noticed the quality differences. The next chapter shows how Inattentional Blindness can be incorporated into a working selective rendering framework for computing animations. 133 Chapter 6 An Inattentional Blindness Rendering Framework The framework discussed in this chapter was achieved in collaboration with Greg Ward, inventor and creator of the global illumination package Radiance [Ward 1994]. This chapter shows that Inattentional Blindness can be used as a basis for a selective rendering framework [Cater et al. 2003c]. A view-dependent system was created that determines the regions of interest that a human observer is most likely going to attend to whilst performing a visual task. The results from the experiments discussed in Chapters 4 and 5 showed that Inattentional Blindness can indeed be used to selectively render the scene without the observers noticing any difference in quality whilst they were performing a visual task. However, all the experiments were run by either using image processing code post rendering or by altering code at the command line, as described in the appropriate chapters. Thus, the next stage was to incorporate the techniques in a selective renderer. A ‘just in time’ rendering system was designed. Such a ‘just in time’ rendering system allows the user to be able to specify how long the framework has to spend on each frame, i.e. each frame is calculated within a specified time budget. The system designed not only uses the theories of Inattentional Blindness, but also, for computational efficiency, image based rendering techniques as well. It was also important that the system should be general so that it could be used not just for Radiance, but could be adapted for raytracing, radiosity, and to multi-pass hardware rendering techniques, allowing for the potential of real-time applications. 134 From the experiments performed in this thesis it is known that selective rendering is cost effective for briefly viewed still images, and, in fact, task focus seems to override low-level visual attention when it comes to noticing artefacts. In the more general case of animated imagery, this can take even greater advantage of Inattentional Blindness, because it is known that the eye preferentially tracks task objects at the expense of other details [Cater et al. 2002]. Using Daly’s model of human contrast sensitivity for moving images [Daly 1998; 2001], and Yee’s insight to substitute saliency for movement-tracking efficacy [Yee 2000], the a priori knowledge of tasklevel saliency can be applied to optimise the animation process. It was decided to follow this approach, of combining the three methods, rather than just implementing a task based framework due to that fact that the method of using Inattentional Blindness falls down if an observer isn’t performing the task 100%. As most tasks being performed in computer graphics are over sustained periods the likelihood of an observer attending avidly (100%) to the task over the whole time period is slim [Most et al. 2000]. Thus if these methods are combined if an observer changes his/her focus of attention to a salient non-task object briefly then the user will continue not to notice any difference due to this object also being covered by this framework. The approach that is described in this chapter has a number of key advantages over previous methods using low-level visual perception. First, task-level saliency is very quick to compute, as it is derived from a short list of important objects and their known whereabouts. Second, this framework introduces a direct estimate of pixel error (or uncertainty), avoiding the need for expensive image comparisons and Gabor filters as required by other perceptually based methods [Yee et al. 2001; Myszkowski et al. 2001]. Third, the animation frames are rendered progressively, enabling the ability to specify exactly how long a user is willing to wait for each image, or stopping when the error has dropped below the visible threshold. Frames are still rendered in order, but the time spent refining the images is under the control of the user. The implementation of this framework proposed in this chapter is suitable for quick turnaround animations at about a minute per frame, but it is also proposed that this method could be used to achieve interactive and real-time renderings such as those of Parker et al. [1999] and Wald et al. [2002]. A frame may be refined by any desired means, including improvements to resolution, anti-aliasing, level of detail, global illumination, and so forth. In the 135 demonstration of this system, primarily focus is on resolution refinement (i.e. samples/pixel), but greater gains are possible by manipulating other rendering variables as well. 6.1 The Framework The diagram shown in Figure 6.1 shows an overview of the system. The boxes represent data, and the ovals represent processes. The inputs to the system, shown in the upper left, are the viewer’s known task, the scene geometry, lighting, and view, all of which are a function of time. The processes shown outside the “Iterate” box are carried out just once for each frame. The processes shown inside the box may be applied multiple times until the frame is considered “ready,” by whatever criteria that is set. In most cases, a frame is called ready when the time allocated has been exhausted, but the iteration can also be broken from the point when the error conspicuity (EC) drops below a certain threshold over the entire image. The framework is designed to be general, and the implementation presented here is just one realization. Input: • Task • Geometry • Lighting • View High-level Vision Model Geometric Entity Ranking Object Map & Motion Lookup Task Map First Order Render Contrast Sensitivity Model Current Frame & Error Estimate Iterate Frame Ready? No Error Conspicuity Map Refine Frame Yes Output Frame Last Frame Figure 6.1: A framework for progressive refinement of animation frames using tasklevel information. 136 The high-level vision model, Figure 6.1, takes the task and geometry as input, and produces a table quantifying relative object importance for this frame. This is called geometric entity ranking. Specifically, a table of positive real numbers is derived, where zero represents an object that will never be looked at, and 1 is the importance of scene objects unrelated to the task at hand. Normally, only task-relevant objects will be listed in this table, and their importance values will typically be between 1.5 and 3, where 3 is an object that must be followed very closely in order to complete the task. For the first order rendering, any method may be used that guarantees to finish before the time is up. From this initial rendering, an object map and depth value for each pixel is needed. If subsampling is applied, and some pixels are skipped, it is important to separately project the scene objects onto a full resolution frame buffer to obtain this map. The pixel motion map, or image flow, is computed from the object map and the knowledge of object and camera movement relative to the previous frame. The object map is also logically combined with the geometric entity ranking to obtain the task map. This is usually accessed via a lookup into the ranking table, and does not require actual storage in a separate buffer. Once a first order rendering of our frame is achieved and a map with the object ID, depth, motion, and task-level saliency at each pixel is established, the system can proceed with image refinement. First, the relative uncertainty in each pixel estimate is computed. This may be derived from the knowledge of the underlying rendering algorithm, or from statistical measures of variance in the case of stochastic methods. This was first thought that this might pose a serious challenge, but it turned out to be a modest requirement, for the following reason. Since there is no point in quantifying errors that the system cannot correct for in subsequent passes, it only needs to estimate the difference between what the system has and what might be achieved after further refinement of a pixel. For such improvements, the system can usually obtain a reasonable bound on the error. For example, going from a calculation with a constant ambient term to one with global illumination, the change is generally less than the ambient value used in the first pass, times the diffuse material colour. Taking half this product is a good estimate of the change that might be seen in either direction by moving to a global illumination result. Where the rendering method is stochastic, it is possible to collect neighbour samples to obtain a reasonable estimate of the variance in each pixel neighbourhood and use this as the error estimate [Lee et al. 1985]. In either 137 case, error estimation is inexpensive as it only requires local information, plus the knowledge of the scene and the rendering algorithm being applied. With the current frame and error estimate in hand, the system can make a decision whether to further refine this frame, or finish it and start the next one. Figure 6.2 shows an example of a frame with no refinement iterations. This “frame ready” decision may be based as stated earlier on time limits or on some overall test of frame quality. In most cases, the system will make at least one refinement pass before it moves on, applying Image-Based Rendering (IBR) to gather useful samples from the previous frame and adding them to this one. See Figure 6.3 for an example of the difference that IBR makes. Figure 6.2: A frame from our renderer with no refinement iterations at all. 138 Figure 6.3: The same frame as Figure 6.2 after the IBR pass, but with no further refinement. In an IBR refinement pass, the object motion map is used to correlate pixels from the previous frame with pixels from this frame. This improves the ability to decide when and where IBR is likely to be beneficial. This framework doesn’t however rely on IBR to fully cover the image; it is only used to fill in areas where the framework might otherwise have had to generate new samples. Holes therefore are filled in by the normal sampling process. Thus the selection of replacement pixels is based on the following heuristics: 1. The pixel pair in the two frames corresponds to the same point on the same object, and does not lie on an object boundary. 2. The error estimate for the previous frame’s pixel must be less than the error estimate for the current frame’s pixel by some set amount. (15% is used.) 3. The previous frame’s pixel must agree with surrounding pixels in the new frame within some tolerance. (A 32% relative difference is used.) The first criterion prevents from using pixels from the wrong object or the wrong part of the same object. Position correspondence is tested by comparing the 139 transformed depth values, and for object boundaries by looking at neighbouring pixels above, below, right, and left in the object map. The second criterion prevents from degrading the current frame estimate with unworthy prior pixels. 15% was deemed reasonable by experimenting with different values until a logical percentage was obtained. The third criterion reduces pollution in shadows and highlights that have moved between frames, though it also limits the number of IBR pixels taken in highly textured regions. If a pixel from the previous frame passes these three tests, the current pixel estimate is overwritten with the previous one, and the error is reset to the previous value degraded by the amount used for the second criterion. In this way, IBR pixels are automatically retired as the system moves from one frame to the next. 32% was deemed a reasonable percentage for the relative difference agreement by experimenting with different values until a balance had been achieved between the number of pixels being recomputed against the artifacts in the animation. Reprojection is handled by combining the object motion transform with the camera transform between the two frames, and reprojecting the last frame’s intersection point to the corresponding position on the object in the new frame. From there, z-buffering is performed to make sure that it is the front pixel. For this next paragraph let it be assumed that there is time for further refinement. Once the system has transferred what samples the system can using IBR, it determines which pixels have noticeable, or conspicuous, errors so that it may select these for improvement. Here the spatiotemporal contrast sensitivity function (CSF) defined by Daly [Daly 1998; 2001] is combined with the task-level saliency map. Daly’s CSF model is a function of two variables, spatial frequency in cy/deg, ρ, and retinal velocity in deg/sec, vR: CSF ( ρ , v R )=k ⋅ c0 ⋅ c2 ⋅ v R ⋅ (c1 2πρ ) 2 exp − where: k = 6.1+ 7.3 log(c 2v R /3) ρmax =45.9 /(c2vR +2) 140 3 c1 4πρ ρ max c0 =1.14,c1 =0.67,c2 =1.7 for CRT at 100 cd/m2 Constants c0, c1 and c2 are modifications that Daly made to the original model proposed by Kelly [1979], all originally being set to be equal to 1.0. These constants allow for fine tuning, and can be adjusted to give a peak sensitivity near 250 and a maximum spatial frequency cut-off near 30 cy/deg, so its maximum spatial performance is closer to the results from light levels greater than 100 cd/m2, thus by making c0 = 1.14, c1 = 0.67 and c2 = 1.7 the equation is more applicable to current CRT displays [Daly 1998]. The term k is primarily responsible for the vertical shift of the sensitivity as a function of velocity, while the term max controls the horizontal shift of the function’s peak frequency. The results of this spatiovelocity model for a range of different traveling wave velocities are shown in Figure 6.4. The model’s overall results are consistent with both the vast drop in sensitivity for saccadic eye movements which have velocities greater than 160 deg/sec, as well as near-zero sensitivity with traditionally stabilized imagery (i.e., zero velocity and zero temporal frequency). Finally, the CSF for a retinal velocity of 0.15 deg/sec is close to that for the conventional static CSF with natural viewing conditions. Figure 6.4: CSFs for different retinal velocities [Daly 1998]. Following the work proposed by Yee [2000], the system substitutes saliency for movement-tracking efficiency, based on the assumption that the viewer pays 141 proportionally more attention to task-relevant objects in their view. The equation for retinal image velocity, vR, (in °/second) thus becomes: v R = v I −min(v I ⋅ S /Smax + v min , v max ) where: vI = local pixel velocity (from motion map) S = task-level saliency for this region Smax = maximum saliency in this frame, but not less than 1/0.82 vmin = 0.15°/sec (natural drift velocity of the eye [Kelly 1979]) vmax = 80°/sec (maximum velocity that the eye can track efficiently [Daly 1998]) The eye’s movement tracking efficiency is computed as S/Smax, which assumes the viewer tracks the most salient object in view perfectly. Daly [2001] recommends an overall value of 82% for the average efficiency when tracking all objects at once within the visual field. The solid red line in Figure 6.5 was constructed using this fit. Therefore the system does not allow Smax to drop below 1/0.82. This prevents it from predicting perfect tracking over the whole image when no task-related objects are in view. Figure 6.5: Smooth pursuit behaviour of the Eye. The eye can track targets reliably up to a speed of 80.0 deg/sec beyond which tracking is erratic [Daly 1998]. 142 Since peak contrast sensitivity shifts towards lower frequencies as retinal velocity increases, objects that the viewer is not tracking, because they are not important, will be visible at lower resolution than our task-relevant objects. However, if the entire image is still or moving at the same rate, the computed CSF will be unaffected by our task information because of this, the task map is reintroduced as an additional multiplier in the final error conspicuity map, which is defined as: EC=S ⋅ max(E ⋅ CSF / ND−1,0) where: E = relative error estimate for this pixel ND = noticeable difference threshold Dissecting this equation, the error E times the contrast sensitivity function CSF tells gives a ratio of the error relative to the threshold. This is then divided by the number of JNDs. Due to the fact that the relative error multiplied by the CSF yields the normalized contrast, where 1.0 is just noticeable, a threshold difference value is introduced, ND, below which errors are deemed to be insignificant. A value of 2 JNDs is the threshold where 94% of viewers are predicted to notice a difference, and this is the value commonly chosen for ND. The noticeable difference (ND) is subtracted by 1 to get a value that is below zero when the difference is allowable and above zero when it is not. Further, this inner expression (E*CSF/ND - 1) will be 1.0 when the error visibility is exactly twice the allowed threshold (i.e., 1.0 for E * CSF = 4 * JND and ND = 2). However, this latter normalization is not important, since the test is only for where the EC map is non-zero, and the overall scaling is irrelevant to the computation. Finally, the max ( ) comparison to 0 merely prevents negative values, which are ignored since this framework only cares about above-threshold errors. The final multiplier by S is included due to the following reason; when the camera is momentarily stopped, it is still important to care about the task map because of course the entire animation cannot be static otherwise it wouldn’t be an animation! At this point information about the higher-order derivatives of motion is merely missing (velocity being the first derivative, acceleration the second, etc, in this framework only velocity was covered). Since this framework is most interested in where EC is non-zero, 143 the final multiplication by the task map S only serves to concentrate additional samples on the task objects, and does not bend the calculation to focus on these areas exclusively. To compute the CSF, an estimate of the stimulus spatial frequency, ρ, is needed. This is obtained by evaluating an image pyramid. Unlike previous applications of the CSF to rendering, two images are not being compared, so there is no need to determine the relative spatial frequencies in a difference image. The system only needs to know the uncertainty in each frequency band to bound the visible difference between the current estimate and the correct image. This turns out to be a great time-saver, as it is the evaluation of Gabor filters that usually takes longest in other approaches. Due to the CSF falling off rapidly below spatial frequencies corresponding to the foveal diameter of 2°, and statistical accuracy improving at lower frequencies as well, the system need only compute the image pyramid up to a ρ of 0.5 cycles/degree. The procedure is as follows, first the EC map is cleared up, and the image is subdivided into 2° square cells. Within each cell, a recursive function is called that descends a local image pyramid to the pixel level, computing EC values and summing them into the map on the return trip. At each pyramid level, the EC function is evaluated from the stimulus frequency (1/subcell radius in°), the task-level saliency, the combined error estimate, and the average motion for pixels within that subcell. The task-level saliency for a subcell is determined as the maximum of all saliency values within a 2° neighbourhood. This may be computed very quickly using a 4-neighbour check at the pixel level, where each pixel finds the maximum saliency of itself and its neighbours 1° up, down, left, and right. The saliency maximum and statistical error sums are then passed back up the call tree for the return evaluation. The entire EC map computation, including a statistical estimation of relative error, takes less than a second for a 640x480 image on a 1 GHz Pentium processor. 6.2 Implementation In the implementation of the above framework, Radiance was modified to perform progressive animation. Figure 6.6 shows a frame from a 4-minute long animation that was computed at a 640x480 resolution using this software. Figure 6.7 shows the 144 estimate of relative error at each pixel in the first order rendering, and Figure 6.8 shows the corresponding error conspicuity map. The viewer was assigned the task of counting certain objects in the scene related to fire safety, emergency lanterns and fire extinguishers. There are two task objects visible in Figure 6.6, the fire extinguisher and the narrator’s helicopter (the checkered ball), so the regions around these objects show strongly in the conspicuity map. The geometric entity ranking, i.e. object importance, for this model were as follows: narrator’s helicopter – 1.5, fire extinguishers – 2, emergency lanterns – 2.5, rest of the scene – 1. The emergency lanterns were given a higher ranking due to the fact that the viewer was also asked to detect which lanterns had cracked glass fronts, therefore the lanterns would require more of the viewer’s attention than the fire extinguishers. Figure 6.9 shows the final number of samples taken at each pixel in the refined frame, which took two minutes to compute on a single 400 MHz G3 processor. This time was found to be sufficient to render details on the task-related objects, but too short to render the entire frame accurately. It was deemed important for there to be artifacts in each frame in order to demonstrate the effect of task focus on viewer perception. About 50% of the pixels received IBR samples from the previous frame, and 20% received one or more high quality refinement samples. For comparison, Figure 6.10 shows the scene rendered as a still image in the same amount of time. Both images contain artifacts, but the animation frame contains fewer sampling errors on the task-related objects. In particular, the fire extinguisher in the corner, which is one of the search objects, has better anti-aliasing than the traditionally rendered image. This is at the expense of some detail on other parts of the scene, such as the hatch door. Since the view is moving down the corridor, all objects will be in motion, and it is assumed that the viewer will be tracking the task-related objects more than the others. Rendering the entire frame to the same detail as the task objects in Figure 6.6, takes 7 times longer than this optimized method. Figure 6.11 shows the entire frame rendered to the same quality as the task-related objects in Figure 6.6, i.e. rendered in 14 minutes. 145 Figure 6.6: A frame from our task-based animation. Figure 6.7: Initial frame error. 146 Figure 6.8: Initial error conspicuity. Figure 6.9: Final frame samples. 147 Figure 6.10: Standard rendering taking same time as Figure 6.6, i.e. two minutes. Figure 6.11: Standard rendering taking 7 times that of Figures 6.6 and 6.10, i.e. 14 minutes. 148 a) b) c) d) Figure 6.12: Perceptual differences using VDP [Daly 1993]. Red denotes areas of high perceptual difference. a) Visible differences between a frame with no iterations (Figure 6.2) and a frame after the IBR pass with no further refinement (Figure 6.3), b) Visible differences between a frame after the IBR pass with no further refinement (Figure 6.3) and a final frame created with our method in 2 mins (Figure 6.6), c) Visible differences between a final frame created with our method in 2 mins (Figure 6.6) and a standard rendering in 2 mins (Figure 6.10), and d) Visible differences between a final frame created with our method in 2 mins (Figure 6.6) and a standard rendering in 14 mins (Figure 6.11). Figure 6.12 shows the perceptual differences using VDP [Daly 1993] between the different stages of our optimized method and as well as between different methods of rendering. From studying Figures 6.12a) and b) the benefits of using IBR and further conspicuity refinement can easily be seen, the more red in the image the greater the visible differences between the images. Figure 6.12c) shows the visible differences between a rendering in 2 minutes using our optimized method and a standard rendering in the same time, note our method is more efficient on the task based aspects as well as the high conspicuity areas and thus these areas appear red, denoting a high level of difference between the images. Figure 6.12d) shows the visible differences between a rendering in 2 minutes using our optimized method and a standard rendering in 14 minutes, i.e. 7 times that of our optimized rendering. Note how this time the 14 minute 149 image is more efficient on most aspects of the image apart from those areas where our method focused its sampling, i.e. the task-based aspects and the high conspicuity areas. Therefore in Figure 6.12d) these areas show less difference i.e. are less red or even green or grey. This show that our optimized method produces a similar output to that of a 14 minute standard rendering for the high conspicuity and task-related aspects in a 1/7th of the time. Although direct comparisons to other methods are difficult due to differences in the rendering aims, Yee et al. demonstrated a 4-10 times speedup in [Yee et al. 2001] and Myszkowski et al. showed a speedup of roughly 3.5 times in [Myszkowski et al. 1999]. This shows that the system proposed is able to achieve similar speedups controlling only rendered sampling resolution. If the system was to refine the global illumination calculation also, similar to Yee, it is hypothesized that even greater gains could be achieved. There are only a few aspects of this framework that must be tailored to a raytracing approach. Initially, a low quality, first order rendering is computed from a quincunx sampling of the image plane, where one out of every 16 pixels is sampled, see Figure 6.13a. This sampling pattern is visible in unrefined regions of Figure 6.9. Qunicunx sampling has been proven to be well suited to the human visual system [Bouville et al. 1991], for the keenness of sight is lower for diagonal directions than for horizontal or vertical ones. In fact, the spatial sensitivity of the visual system in the frequency domain is roughly diamond-shaped, Figure 6.13b. Thus by using a nonorthogonal sampling pattern with a reduced sampling density, such as Qunicunx sampling, this anisotropy of the human eye response can be exploited. This is why this sampling technique was chosen for this implementation, rather than any other sampling techniques. To obtain the object and depth maps at unsampled locations, the system casts rays to determine the first intersected object at these pixels. An estimate of the rendering error is then calculated by finding the 5 nearest samples to each pixel position, and computing their standard deviation. This is a very crude approximation, but it suited this system’s purposes well for it gives a good estimate of the error 75% of the time, whilst being convenient and quick to compute. In cases where the high-quality samples in the refinement pass have an interreflection calculation that the initial samples do not, the method described earlier for estimating the error due to a constant ambient term is used. 150 Figure 6.13 a (left): Quincunx Sampling, used to find the initial sample locations, the other pixels are sampled when determined necessary by the algorithm; b (right) Visual Sensitivity threshold of spatial frequencies [Bouville et al. 1991]. Following the IBR refinement described in the previous section, and provided the system is not out of the time allocated, the error conspicuity map is then computed, sorting the pixels from most to least conspicuous. For pixels whose EC value are equal (usually 0), the system orders from highest to lowest error, then from fewest to most samples. Going down this list, the system adds one high-quality ray sample to each pixel, until it has sampled them all or run out of time. If it managed to get through the whole list, the system re-computes the error conspicuity map and re-sorts. This time, only the samples are added to the top 1/8th of the list before sorting again. Smoother animations can be achieved by sampling each pixel at least once before honing in on the regions that are deemed to be conspicuous. The system could insist on sampling every pixel in the first order rendering, but this is sometimes impossible due to time constraints. Therefore, it is incorporated in the refinement phase instead. Prior to frame output, a final filtering stage is performed to interpolate unsampled pixels and add motion blur. Pixels that did not receive samples in the first order rendering or subsequent refinements must be given a value prior to output. A Gaussian filter kernel is applied whose support corresponds to the initial sample density to arrive at a weighted average of the 4 closest neighbours. Once a value at each pixel is achieved, the object motion map is multiplied by a user-specified blur parameter, corresponding to the fraction of a frame time the virtual camera’s shutter is open. The blur vector at each pixel is then applied using an energy-preserving smear filter to arrive at the final output image. This technique is crude in the sense that it linearizes motion, and does not discover obstructed geometry, but it has not been found to be objectionable 151 in any of the tests performed. However, the lack of motion blur on shadows does show up as one of the few distracting artifacts in this implementation. Also at this stage any exposure problems are handled by a simple tone-mapping process before the image is displayed. These filtering operations takes a small fraction of a CPU second per video resolution frame, and is inconsequential to the overall rendering time. Of the two minute rendering time for the frame shown in Figure 6.6, 1 second is spent updating the scene structures, 25 seconds is spent computing the 19,200 initial samples and the object map, 0.25 seconds is spent on IBR extrapolation, 0.9 seconds to compute the error map (times three evaluations – total 2.7 seconds), 1.25 seconds for the EC map, 0.4 seconds for filtering, and the remaining 89.4 seconds to compute about 110,000 high quality refinement samples, see Figure 6.14. In this test, the Radiance rendering parameters were set so there was little computational difference between an initial sample and a high-quality refinement sample; diffuse inter-reflections were not evaluated for either. This method’s combined overhead for a 640x480 frame is thus in the order of 14 seconds, 10 of which are spent computing the object map by ray casting. Intuitively and by measurements, this overhead scales linearly with the number of pixels in a frame. 1s Updating the scene structures 25s Computing the 19,200 initial samples & object map 0.25s IBR extrapolation 2.7s Computing the error map (three evaluations) 1.25s Calculating the EC map 0.4s Filtering 89.4s Computing approximately 110,000 high quality refinement samples Figure 6.14: Pie chart to show where the two minutes are spent in rendering the frame. 152 It is worth noting that IBR works particularly well in this progressive rendering framework, allowing the system to achieve constant frame generation times over a wide range of motions. When motion is small, IBR extrapolation from the previous frame provides the system with many low-error samples for the first refinement pass. When motion is great, and thus fewer extrapolated samples are available, the eye’s inability to track objects and the associated blur means the system does not need as many. This holds promise for realistic, real-time rendering using this approach with hardware support. 6.3 Summary Over the last decade significant advances have been made in the development of perceptual rendering metrics, which as discussed in Chapter 2, use computational models of visual thresholds to efficiently produce approximated images that are indistinguishable to a human being from an image produced at the highest possible quality [Bolin and Meyer 1998; Myszkowski 1998]. However, as Dumont et al. [2003] state, although these perceptually-based approaches are promising, metrics are limited in their effectiveness for interactive rendering by two factors, 1) calculating these metrics is a computationally intensive process in itself; 2) these metrics are based on threshold measures, which collate to the ability of the HVS detecting differences between a rendered image and the highest possible quality image. This second point is particularly important due to the fact that in an interactive scenario time and resources are so limited that this threshold value created by the perceptual metrics is far above what the system can achieve within its constraints. Thus Dumont et al. [2003] propose that what the rendering community should be asking themselves is not “How can I make an image that is visually indistinguishable from the highest possible quality” but should be “How can I make an image of the highest possible quality given my constraints.” This backs up the ‘just in time’ attitude of the framework proposed in this thesis, for it does exactly this – it orders the possible rendering operations to achieve the highest quality it can within the system’s constraints. Thus this is one of the major strengths of the proposed framework in this thesis that other perceptual rendering methods such as Myszkowski et al. [2001], Yee [2001] etc. do not take into account. 153 The ‘just in time’ rendering framework proposed in this chapter produces frames, within a specified time budget or to below a certain threshold value of error conspicuity, by selectively rendering the scene depending on a prior knowledge of what visual task the user is performing. The method described should pertain to any type of rendering algorithm that might be used, from ray tracing to multi-pass hardware rendering techniques. The framework was shown to be working within the global illumination package Radiance. Examples of frames from an animation that was created using this framework are also shown along with VDP comparisons to show that this framework produces visually better images for people that are performing visual tasks in a fraction of the time that it would take to render the whole image to the same quality. This chapter therefore completed the goal of this thesis and showed that an inherent feature of the human visual system, Inattentional Blindness, can indeed be used as a basis for a selective rendering framework. In addition, a framework can save significant time computationally, at least 7 times, without the users perceiving any noticeable differences in rendering quality. 154 Chapter 7 Conclusions and Future Work The main aim of this thesis was to develop a methodology to help solve the ongoing problem that creating realistic computer generated images in real time to date is impossible. Such images need vast amounts of computational power to be rendered in anything close to real time. Despite the availability of modern graphics hardware this issue remains one of the key challenges in computer graphics. 7.1 Contributions of this thesis to the research field Knowing that the goal is to compute images for humans to view, this thesis presented the concept of exploiting known flaws in the human visual system. These limit the ability of humans to perceive what they are actually viewing, thus allowing graphical scenes to be selectively rendered to save time computationally without viewers being aware of any difference in quality. A similar approach has been previously researched in the last few years from the point of view of saliency and peripheral vision, both bottomup visual processing methods. Both of these methods were successful in saving computational time in rendering images selectively without users perceiving any quality differences; however both have their disadvantages as well, as discussed in Section 3.4. Many open questions still remain even for bottom-up selective rendering. 155 From in depth research in psychology, vision and computer graphics it was found and hypothesized that there were two particular flaws that hadn’t previously been used in this field which could potentially save significant time computationally rendering images. These were Change Blindness and Inattentional Blindness. Firstly Change Blindness, the inability to detect what should be obviously changes in a scene, was investigated. As this was the first experiment that was run, a lot of lessons had to be learnt on how to perform psychophysical experiments without introducing any biasing. The methodology chosen was built upon that of Rensink et al. [1997] and O’Regan et al. [1999a], where a specific sequence of altering images were displayed to the participants until they perceived any difference between the images. This specific alteration of images caused the participants to suffer from Change Blindness, thus making any detection of a difference a lot harder than normal circumstances. Typically observers who were shown a rendering quality change using the flicker paradigm took 8 times that of an observer noticing a presence or location alteration, when suffering from Change Blindness. Knowing this meant that if observers were slow to notice any alteration in rendering quality when suffering from this HVS defect, it was highly plausible that this methodology could then be used in a selective renderer to render specific parts of a scene to varying measures without the observer perceiving any difference in quality. From further investigation it was found that it was not clear how Change Blindness could be exploited in an animation without requiring a visual disruption which in itself would affect the observer’s overall experience of the animation. Thus another similar flaw of the HVS was considered that would solve this very problem. This was Inattentional Blindness, the failure to perceive unattended items in a scene. It was important firstly to find out if Inattentional Blindness could indeed be used to selectively render an animation without the viewers perceiving any difference in quality from that of a fully rendered animation. Thus a new experiment was conducted with an animation of a fly through of four rooms. The task given to half the observers was to count the number of pencils in a mug located on the table of each room. The task was made deliberately hard by adding superfluous paintbrushes in the mug as well as the pencils; this was to focus the observer’s attention on the task only. The other half of the observers were asked simply to watch the animations. Each observer was shown two animations, one always a fully, high quality rendered animation and one which was total 156 low quality rendered or one that had been selectively rendered. This last animation was created by rendering high quality circles (2 visual degrees) located on the mugs with the pencils, then blending these circles to the low quality rendering for the rest of the scene. From the results it could be seen that Inattentional Blindness did indeed cause the observers not to notice that the animation had been selectively rendered, in fact the observers reacted exactly the same as those that saw two fully, high quality rendered animations. However when the observers were shown an animation that was low quality all over then the observers could detect the quality difference between the two animations. Thus it could be concluded that as long as the aspects related to the task at hand were rendered to high quality then observers would not be able to detect the difference in rendering quality. However several unanswered questions needed to be solved. What happens if the task was not a visual one? Were the results actually because of the decreasing acuity in peripheral vision? To solve these quandaries more experiments had to be performed. The first was to see whether or not a non-visual task would cause the same effects as previously described. The non-visual task chosen was to count backwards from 1000 in steps of two. However this resulted in the observers still noticing the quality difference between all cases (high, selective and low) even though they were performing the non-visual task. From this it is concluded that for Inattentional Blindness to be exploited in this scenario it must be a visual task that is employed. Next was to resolve the quandary about whether the results being achieved were actually due to peripheral vision rather than Inattentional Blindness. To overcome this dilemma a final experiment was conducted, this time a task was chosen that was all over the scene. This then caused the participants to fixated on aspects in the scene that were both high quality as well as low quality when the scene was selectively rendered. The task in this experiment was to count the number of teapots. To make the participants perform fixations on aspects that were not teapots and were not rendered in high quality, other objects such as vases, similar in appearance to the teapots, were added to the scene. From the results of this experiment it could be seen that even though participants fixated on the low quality rendered objects it was only the quality of the teapots, i.e. the task aspects, which affected the observers’ opinion on the overall quality of the scene whilst they were performing the task. 157 After having proven that Inattentional Blindness could indeed be used in selective rendering without observers perceiving any difference in quality, a selective rendering framework was developed. It was deemed important a) to create a framework that was transferable across a variety of different rendering techniques and b) to design a system that combined spatiotemporal contrast sensitivity with pre-determinable task maps. This second point was, as discussed in Chapter 6, due to the fact that if an observer did not focus 100% on the task at hand then some of the user’s attention may be attracted to the most salient aspects in the scene. By encompassing spatiotemporal contrast sensitivity with pre-determinable task maps the aspects in the scene that were most salient would also be rendered in a higher quality than the rest of the scene. This therefore decreased the chance that an observer would perceive any difference in rendering quality even more than just implementing the framework with task maps. The framework designed in this thesis guided a progressive animation system which also took full advantage of image-based rendering techniques. The framework was demonstrated with a Radiance implementation, resulting in a system that completes its renderings in approximately a 7th of the time needed to render an entire frame to the same detail that the task objects were rendered. As the system proposed in this thesis controlled only the sampling resolution over each frame it can only be hypothesized that if the framework was used to refine the global illumination calculation as well then even greater gains should be achieved, especially if this is then supported with latest gains in graphics hardware as well. However, this framework shows promise of achieving truly realistic renderings of frames in real-time. 7.2 Future Research 7.2.1 Visual attention. It is already known from visual psychological researchers such as Yarbus [1967], Itti and Koch [2000] and Yantis [1996] that the visual system is highly sensitive to features such as edges, abrupt changes in colour and sudden movements. Much evidence has accumulated in favor of a two-component framework for the control of where in a visual scene attention is deployed [James 1890; Treisman and Gelade 1980; Treisman 1985; Itti and Koch 2000]: A bottom-up, fast, primitive mechanism that biases the 158 observer towards selecting stimuli based on their saliency (most likely encoded in terms of centre-surround mechanisms) and a second slower, top-down mechanism with variable selection criteria, which directs the ‘spotlight of attention’ under cognitive, volitional control. Whether visual consciousness is achieved by either saliency-based or top-down attentional selection, or by both, remains controversial. Therefore, from a further understanding of the complex interaction between the bottom-up and top-down visual attention processes of the human visual system the ability to combine, in the right proportions, saliency models and predictors of areas of visible differences, with the Inattentional Blindness approach described in this thesis could then be achieved. This would then determine more precisely the “order” in which people may attend to objects in a scene. Figure 7.1 shows a hypothesis on how this priority might be approached. Ordering of Priority Visible Differences Based (V) T S V P1 = T S P2 = T V V P3 = T S V T P4 = S V V P5 = S Saliency Based (S) Task Based (T) P6 = V P7 = Rest of the scene that doesn’t fall into any of the criteria T, S or V. Figure 7.1: Hypothesis on how saliency, task and visible difference methodologies might be combined in terms of their priority of rendering. Such knowledge would then also provide the framework with the high-level vision model that remains to be implemented. This will give us the relative importance of all of the objects in the scene, i.e. a detailed “priority queue” for the selective rendering, providing the best perceptibly high-quality images within the time constraints of the interactive display system. Such a priority-rendering queue also offers exciting 159 possibilities for efficient task scheduling within any parallel implementation of this methodology. 7.2.2 Peripheral vision. Foveal information is clear and chromatic, whereas peripheral information is blurry and colour-weak to a degree that depends on the distance from the fovea. Thus more research could be done on decreasing the amount of time spent rendering good colour in the periphery. Also the fovea is almost entirely devoid of S-cones (blue) photoreceptors, as described in Section 2.1. Thus for a true representation of the human visual system then it is hypothesized that a selective renderer could get away with not rendering a blue value for where the observer’s fixations are believed to be located when studying the scene. Along this line also a small area where the optic nerve is located could be not rendered at all without the observer noticing the difference. However with both of these conditions the frame would have to be refreshed for every fixation, thus a model such as McConkie and Loschky [1997; 2000] would be more appropriate which is based on using an eye-tracker to know exact eye fixation positions. 7.2.3 Multi-sensory experiences The task undertaken is crucial in determining the eye-gaze patterns of users studying the images. The introduction of sound and motion within the virtual environments may further increase the level of Inattentional Blindness and the related Change Blindness [Cater et al. 2001]. This is based on the research that Inattentional Blindness is affected by four factors, one of which is the amount of mental workload being performed [TMHF 2004]. This is because the amount of attention that humans have is roughly fixed, the more attention that is focused on one task, the less there is for others. The hypothesis is that if an observer has to perform a visual task as well as an auditory task, or a secondary visual task, then there is a high likelihood that the observer would not only suffer from Inattentional Blindness, but also that it would be greater than if the observer was just performing one visual task. This would need to be confirmed with detailed psychophysical studies. 7.2.4 Type of task. The task undertaken is crucial in determining the eye-gaze patterns of users studying the images. In this thesis several different tasks were covered, from visual to non-visual, 160 however there are so many different types of task a user could be asked to do. Thus there is a significant area for future work involving verifying whether or not each different type of task effects the observer’s perception of the selective rendering quality. 7.2.5 Varying Applications Although this thesis was primarily interested in high-quality image synthesis, the technique proposed may also be applied to other areas of computer graphics, for example: user interface design [Velichkovsky and Hansen 1996] and control [Hansen et al. 1995], geometry level of detail selection, video telephony and video compression [Yang et al. 1996]. Also, from discussions with artists, this type of selective rendering algorithm may be used to alter the viewer’s impression of the meaning of a created image by altering the quality of different parts of a computer generated artistic scene. This technique was used by artists during Renaissance and Impressionist periods, where they would deliberately focus on a particular aspect in detail, almost blurring the rest of the scene to highlight it even more, Figure 7.2, [Hockney and Falco 2000]. This is also a well known technique in photography and film. Figure 7.2: Lorenzo Lotto ‘Husband and Wife’ (c.1543). Note the incredible detail of the table cloth which attracts the viewer’s attention more than the rest of the scene [Hockney and Falco 2000]. 161 7.2.6 Alterations for Experimental Procedures The experiments performed in this thesis were based on experimental procedures carried out by vision psychologists and at each stage of the thesis more was learnt in how to design and carry out a psychophysical experiment without introducing bias, or experiment any other criteria other than those being tested. This learning curve would only improve more with additional experiments and research. As it took a great deal of time to design, test, re-design, perform and then analyse the data for each experiment, only a limited number of experiments could be carried out. Thus there are many different types of experiments whose results may affect the theories proposed in this thesis both in positive and negative ways. For example, this thesis can only truly hypothesize that Inattentional Blindness can be recreated for all computer graphical scenes, but to find out if this is truly the case it would take a lifetime of more experiments! 7.3 A Final Summary The study of the limits of human perception in order to exploit them for improving computer graphics rendering is essential for further development of the computer graphics field. Such research has already resulted in numerous perceptually based algorithms and image based quality metrics. However, this topic is far from being exhausted, and thus will continue to lead the graphics community to exciting avenues of future research. For it is always important to remember that the satisfaction of the human perception is the end goal in the rendering pipeline and thus there is no point displaying what humans cannot perceive. This thesis demonstrated how flaws in the human visual system, such as Change Blindness and Inattentional Blindness, can be incorporated into a selective rendering framework. Although this topic is by no means complete, this thesis having merely scratched the surface of a wide and important area of research, it is however a step in the right direction to achieving realistic graphical imagery in real time. 162 References/Bibliography [Akeine-Moller and Haines 2002] Akeine-Moller, T., and Haines, E. 2002. Real-time Rendering. Second Edition. A.K. Peters Ltd. [ANST 2003] American National Standard for Telecommunications [www], http://www.atis.org/tg2k/, (Accessed September 2003) [Ashdown 2004] Ian Ashdown, Radiosity Bibliography [www], (Accessed http://tralvex.com/pub/rover/abs-ian0.htm March 2004) [Balazas 1945] Balazas B. 1945. Theory of Film. New York: Dover. [Baylis and Driver 1993] Baylis, G.C., and Driver J. 1993. “Visual attention and objects: evidence for hierarchical coding of location.” Journal of Experimental Psychology: Human Perception and Performance. 19(3) 451-470. [Baylor et al. 1987] Baylor, D. A., Nunn B. J., and Schnapf J. L. 1987. “Spectral sensitivity of cones of the monkey Macaca Fascicularis.” Journal of Physiology 390:145-160. [Birn 2000] Birn, J. 2000. [digital] Lighting and Rendering. New Riders. [Bolin and Meyer 1998] Bolin M.R. and Meyer G.W. 1998 “A Perceptually Based Adaptive Sampling Algorithm”, In proceedings of ACM SIGGRAPH 1998, 299-309. [Bouknight 1970] Bouknight J. 1970. “A procedure for generation of threedimensional half-toned computer graphics presentations.” Communications of the ACM, Vol. 13 (9) 527—536. [Bouville et al. 1991] Bouville C., Tellier P. and Bouatouch K. 1991. “Low sampling densities using a psychovisual approach” In proceedings of EUROGRAPHICS ‘91, 167 -182. [Broadbent 1958] Broadbent, D.E. 1958. Perception and Communication. Oxford: Pergamon Press. [Buswell 1935] Buswell, G.T. 1935. How people look at pictures. Univ. of Chicago Press, Chicago. [Cater et al. 2001] Cater K., Chalmers A.G. and Dalton C. 2001 “Change blindness with varying rendering fidelity: looking but not seeing”, Sketch ACM SIGGRAPH 2001, Conference Abstracts and Applications. 163 [Cater et al. 2002] Cater K., Chalmers A.G., and Ledda P. 2002 “Selective Quality Rendering by Exploiting Human Inattentional Blindness: Looking but not Seeing”, In Proceedings of Symposium on Virtual Reality Software and Technology 2002, ACM, 17-24. [Cater et al. 2003a] Cater K., and Chalmers A.G. 2003 “Maintaining Perceived Quality For Interactive Tasks”, In IS&T/SPIE Conference on Human Vision and Electronic Imaging VIII, SPIE Proceedings Vol. 5007-21. [Cater et al. 2003b] Cater K., Chalmers A.G. and Dalton C. 2003 “Varying Rendering Fidelity by Exploiting Human Change Blindness” In proceedings of GRAPHITE 2003, ACM, 3946. [Cater et al. 2003c] Cater K., Chalmers A.G. and Ward G. 2003. “Detail to Attention: Exploiting Visual Tasks for selective Rendering” in the proceedings of the Eurographics Symposium on Rendering 2003, ACM, 270-280. [CCBFTRI 2004] A wellness Center of the Chaitanya Bach Flower Therapy Research Institute, Unwanted Thoughts [www], http://www.charminghealth.com/applicability/unwantedtho ughts.htm (Accessed January 2004) [Challis et al. 1996] Challis, B.H. Velichkovsky, B.M. and Craik, F.I.M. 1996. “Levels-of-processing effects on a variety of memory tasks: New findings and theoretical implications.” Consciousness and Cognition, 5 (1). [Chalmers et al. 2000] Chalmers A.G., McNamara A., Troscianko T., Daly S. and Myszkowski K. 2000. “Image Quality Metrics”, ACM SIGGRAPH 2000, course notes. [Chalmers and Cater 2002] Chalmers A.G, and Cater K. 2002 “Realistic Rendering in Real-Time.” In proceedings of the 8th International EuroPar Conference on Parallel Processing, Springer-Verlag 21-28. [Chalmers et al. 2003] Chalmers A., Cater K. and Maffioli D. 2003. “Visual Attention Models for Producing High Fidelity Graphics Efficiently” in proceedings of the Spring Conference on Computer Graphics, ACM, 47-54 [Chalmers and Cater 2004] Chalmers A.G, and Cater K. 2004 “Exploiting human visual attentional models in visualisation.” In C. Hansen and C. Johnson (Eds.) Visualisation Handbook, Academic Press, to appear. 164 [Cherry 1953] Cherry, E.C. 1953. “Some experiments on the recognition of speech, with one and with two ears.” Journal of the Acoustical Society of America, 25(5):975-979. [CIT 2003] California Institute of Technology, Seeing the world through a retina [www], http://www.klab.caltech.edu/~itti/retina/index.html (Accessed September 2003) [Cohen and Greenberg 1985] Cohen M.F., and Greenberg D.P. 1985. “The Hemi-Cube: A radiosity solution for complex environments.” In B. A. Barsky, editor, Computer Graphics (SIGGRAPH ' 85 Proceedings), volume 19, pages 31-40. [Cohen et al. 1988] Cohen M.F., Chen S.E., Wallace J.R., and Greenberg D.P. 1988. “A progressive refinement approach to fast radiosity image generation.” In John Dill, editor, Computer Graphics (SIGGRAPH 1988 Proceedings), volume 22, pages 75—84. [Coolican 1999] Coolican, H. 1999. Research Methods and Statistics in Psychology, Hodder & Stoughton Educational, U.K. [CS 2003] Contrast Sensitivity, Channel Model [www], http://www.contrast-sensitivity.com/channel_model/ (Accessed January 2003) [CS 2004] Contrast Sensitivity, Visual System [www], http://www.contrast-sensitivity.com/visual_system/ (Accessed December 2004) [Daly 1993] Daly S. 1993. “The Visible Differences Predictor: an algorithm for the assessment of image fidelity.” In A.B. Watson, editor, Digital Image and Human Vision, Cambridge, MA: MIT Press, 179-206. [Daly 1998] Daly S. 1998. “Engineering observations from spatiovelocity and spatiotemporal visual models.” In IS&T/SPIE Conference on Human Vision and Electronic Imaging III, SPIE Proceedings Vol. 3299, 180-193. [Daly 2001] Daly S. 2001. “Engineering observations from spatiovelocity and spatiotemporal visual models”. Chapter 9 in Vision Models and Applications to Image and Video Processing, ed. C. J. van den Branden Lambrecht, Kluwer Academic Publishers. [Dingliana 2004] Dingliana, J., Image Synthesis Group, Trinity College, Dublin (Presentation Slides) [www], http://isg.cs.tcd.ie/dingliaj/3d4ii/Light1.ppt (Accessed March 2004) 165 [Dowling 1987] Dowling, J. E. 1987. The Retina: An Approachable Part of the Brain. Cambridge, MA: Harvard University Press. [Dumont et al. 2003] Dumont R., Pellacini F., and Ferwerda J. 2003. “Perceptually-Driven Decision Theory for Interactive Realistic Rendering.” ACM Transactions on Graphics, Vol. 22, 152- 181. [Duncan and Nimmo-Smith 1996] Duncan J., and Nimmo-Smith, I. 1996. “Objects and attributes in divided attention: surface and boundary systems.” Perception and Psychophysics. 58(7) 1076-1084 [D’Zmura 1991] D’Zmura, M. 1991. “Color in visual search.” Vision Research, 31, 951-966. [Ferwerda et al. 1996] Ferwerda J.A., Pattanaik S.N., Shirley P.S., and Greenberg D.P. 1996. “A Model of Visual Adaptation for Realistic Image Synthesis.” In proceedings of ACM SIGGRAPH 1996, ACM Press / ACM SIGGRAPH, New York. H. Rushmeier, Ed., Computer Graphics Proceedings, Annual Conference Series, ACM, 249-258. [Ferwerda 1997] Ferwerda J.A., Pattanaik S.N., Shirley P.S., and Greenberg, D.P. 1997. “A Model of Visual Masking for Computer Graphics.” In proceedings of ACM SIGGRAPH 1997, ACM Press / ACM SIGGRAPH, New York. T. Whitted, Ed., Computer Graphics Proceedings, Annual Conference Series, ACM, 143-152. [Ferwerda 2003] Ferwerda, J.A. 2003. “Three varieties of realism in computer graphics.” In IS&T/SPIE Conference on Human Vision and Electronic Imaging VIII, SPIE Proceedings Vol. 5007, 290-297. [Flavios 2004] Flavios, Light and optic theory and principles [www], http://homepages.tig.com.au/~flavios/diffrac.htm (Accessed March 2004) [Gibson and Hubbold 1997] Gibson, S., and Hubbold R.J. 1997. “Perceptually Driven Radiosity.” Computer Graphics Forum, 16 (2): 129-140. [Gilchrist et al. 1997] Gilchrist, A., Humphreys, G., Riddock, M., and Neumann, H. 1997. “Luminance and edge information in grouping: A study using visual search.” Journal of Experimental Psychology: Human Perception and Performance, 23, 464-480. [Glassner 1984] Glassner, A.S. 1984. “Space Subdivision for Fast Ray Tracing”, IEEE Computer Graphics & Applications, Vol. 4, No. 10, pp 15-22. 166 [Glassner 1989] Glassner, A.S. 1989. An Introduction to Ray tracing. Morgan Kaufmann. [Goldberg et al. 1991] Goldberg, M.E., Eggers, H.M., and Gluras, P. 1991. Ch. 43. The Ocular Motor System. In E.R. [Goral et al. 1984] Goral C.M., Torrance K.E., Greenberg B. 1984. “Modeling the interaction diffuse surfaces.” In proceedings of Conference on Computer Graphics Techniques. Vol. 18 (3) 212-222. [Gouraud 1971] Gouraud H., “Continuous shading of curved surfaces”, IEEE Transactions on Computers, 20(6): 623-628. [Green 1991] Green M. 1991. “Visual Search, visual streams and visual architectures.” Perception and Psychophysics 50, 388 – 403 [Green 1992] Green M. 1992. “Visual Search: detection, identification and localisation.” Perception, 21, 765 –777. [Greenberg et al. 1997] Greenberg D.P., Torrance K.E., Shirley P., Arvo J., Ferwerda J., Pattanaik S.N., Lafortune E., Walter B., Foo S-C. and Trumbore B. 1997 “A Framework for Realistic Image Synthesis”, In proceedings of SIGGRAPH 1997 (Special Session), ACM Press / ACM SIGGRAPH, New York. T. Whitted, Ed., Computer Graphics Proceedings, Annual Conference Series, ACM, 477-494. D.P., and Battaile of light between the 11th Annual and Interactive [Greene and D’Oliveira 1999] Greene J., and D’Oliveira M. 1999. Learning to use statistical tests in psychology. Open University Press. [Haber et al. 2001] Haber, J., Myskowski, K., Yamauchi, H., and Seidel, H-P. 2001 “Perceptually Guided Corrective Splatting”, In Proceedings of EuroGraphics 2001, (Manchester, UK, September 4-7). [Hall and Greenberg 1983] Hall R.A., and Greenberg D.P. 1983. “A Testbed for Realistic Image Synthesis.” IEEE Computer Graphics and Applications, Vol. 3, No. 8. pp. 10-19. [Hansen et al. 1995] Hansen, J.P., Andersen, A.W. and Roed, P. 1995. “EyeGaze control of multimedia Systems.” In Symbiosis of human and artifact. Proceedings of the 6th international conference on human computer interaction, Elsevier Science Publisher. [Heckbert 1989] Heckbert P.S. 1989. Fundamentals of Texture Mapping and Image Warping. Master’s thesis, University of California, Berkeley. 167 [Hockney and Falco 2000] Hockney, D. and Falco, C. 2000. “Optical insights into Renaissance art.” Optics and Photonics News, 11(7), 5259. [Hoffman 1979] Hoffman, J. 1979. “A two-stage model of visual search.” Perception and Psychophysics, 25, 319-327. [Itti et al. 1998] Itti, L., Koch, C., and Niebur, E. 1998 “A Model of Saliency-Based Visual Attention for Rapid Scene Analysis”, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 20, 11, 1254-1259. [Itti and Koch 2000] Itti, L., and Koch, C. 2000 “A saliency-based search mechanism for overt and covert shifts of visual attention”, In Vision research, Vol. 40, no 10-12, 1489-1506. [Itti and Koch 2001] Itti, L., and Koch, C. 2001 “Computational modeling of visual attention”, In Nature Reviews Neuroscience, Vol. 2(3), 194-203. [Itti 2003a] Itti, L. Bottom-up Visual Attention, University of Southern California, http://ilab.usc.edu/bu/theory/ (Accessed March 2003) [Itti 2003b] Itti, L. Visual Attention: Movies, University of Southern California [www], http://ilab.usc.edu/bu/movie/index.html (Accessed March 2003) [James 1890] James W. 1890 Principles of Psychology, New York: Holt. [JMEC 2003] John Morgan Eye Center, Webvision, The organisation of the retina and visual system [www], http://webvision.med.utah.edu/imageswv/Sagschem.jpeg, (Accessed March 2003) [Kajiya 1986] Kajiya, J.T. 1986. “The Rendering Equation.” ACM SIGGRAPH 1986 Conference Proceedings, volume 20,143-150. [Katedros 2004] Illumination: simulation and perception (course), http://www.maf.vu.lt/katedros/cs2/lietuva/courses/spalvos/ illumination5.pdf (Accessed February 2004) [Kelly 1979] Kelly D.H. 1979. “Motion and Vision 2. Stabilized spatiotemporal threshold surface.” Journal of the Optical Society of America, 69 (10): 1340-1349. [Khodlev and Kopylov 1996] Khodulev A., and Kopylov E. 1996. “Physically accurate lighting simulation in computer graphics software.” In GraphiCon ’96 – The sixth international conference on computer graphics and visualisation, volume 2, pages 111119. 168 [Kirk 2003] Kirk, D. 2003. “Graphics Architectures: The Dawn of Cinematic Computing.” In proceedings of GRAPHITE 2003, ACM, 9. [Koch and Ullman 1985] Koch C., and Ullman S. 1985. “Shifts in selective visual attention: towards the underlying neural circuitry.” Human Neurobiology, 219--227. [Kowler 1995] Kowler, E. 1995. “Eye movements.” In: S. Kosslyn and D.N. Osherson (Eds.): Visual Cognition. MIT Press, Camgbridge, MA. [Krivánek et al. 2003] Krivánek J., Zara J., and Bouatouch K. 2003. “Fast Depth of Field Rendering with Surface Splatting.” Computer Graphics International 2003: 196-201. [Land and Furneaux 1997] Land M.F., and Furneaux S. 1997. “The knowledge base of the oculomotor system.” In proceedings of the Royal Society Conference on Knowledge-Based Vision, February. [Langbein 2004] Langbein, F.C., Advanced Rendering - Radiosity, Cardiff University, (course notes) http://cyl.cs.cf.ac.uk/teaching/graphics/G-20-V_2.pdf (Accessed February 2004) [Languénou et al. 1992] Languénou E., Bouatouch K., and Tellier P. 1992. “An adaptive Discretization Method for Radiosity.” Computer Graphics Forum 11(3): 205-216. [Lee et al. 1985] Lee M., Redner R., and Uselton S. 1985. “Statistically Optimized Sampling for Distributed Ray Tracing.” In proceedings of ACM SIGGRAPH Vol. 19, No. 3. [Levin and Simons 1997] Levin, D.T., and Simons, D.J. 1997. “Failure to detect changes to attended objects in motion pictures.” Psychonomic Bulletin and Review, 4 (4) pp. 501-506. [Li et al. 1998] Li B, Meyer G.W., and Klassen V. 1998. “A Comparison of Two Image Quality Models.” In Human Vision and Electronic Imaging III (Proceedings of SPIE), vol 3299, p 98-109, San Jose, California. [Lightwave 2003] Lightwave! Finding your blindspot [www], http://www.lightwave.soton.ac.uk/experiments/blindspot/b lindspot.html (Accessed September 2003) [Lischinski 2003] Lischinski, D., Radiosity, (Lecture notes) http://www.cs.huji.ac.il/~danix/advanced/notes3.pdf (Accessed December 2003) 169 [Loschky and McConkie 1999] Loschky, L.C. and McConkie, G.W. 1999 “Gaze Contingent Displays: Maximizing Display Bandwidth Efficiency.” ARL Federated Laboratory Advanced Displays and Interactive Displays Consortium, Advanced Displays and Interactive Displays Third Annual Symposium, College Park, MD. February 2-4, 79-83. [Loschky et al. 2001] Loschky, L.C., McConkie, G.W., Yang, J and Miller, M.E. 2001 “Perceptual Effects of a Gaze-Contingent MultiResolution Display Based on a Model of Visual Sensitivity”. In the ARL Federated Laboratory 5th Annual Symposium - ADID Consortium Proceedings, 53-58, College Park, MD, March 20-22. [Lubin 1995] Lubin, J. 1995. “A Visual Discrimination Model for Imaging System Design and Evaluation.” In Vision Models for Target Detection and Recognition, 245-283, World Scientific, New Jersey. [Lubin 1997] Lubin J. 1997. “A human vision model for objective picture quality measurements.” Conference Publication No 447, IEE International Broadcasting Convention, 498-503. [Luebke et al. 2000] Luebke D., Reddy M., Watson B., Cohen J. and Varshney A. 2001. “Advanced Issues in Level of Detail.” Course #41 at ACM SIGGRAPH 2000. Los Angeles, CA. August 12-17. [Luebke and Hallen 2001] Luebke D. and Hallen B. 2001 “Perceptually driven simplification for interactive rendering”, 12th Eurographics Workshop on Rendering, 221-223. [Machover 2003] Machover Associates Corporation [www], http://www.siggraph.org/s2003/media/factsheets/forecasts. html, (Accessed August 2003) [Maciel and Shirley 1995] Maciel P.W.C. and Shirley P. 1995 “Visual Navigation of Large Environments Using Textured Clusters”, Symposium on Interactive 3D Graphics, 95-102. [Mack and Rock 1998] Mack, A. and Rock, I. 1998. Inattentional Blindness. Massachusetts Institute of Technology Press. [Marmitt and Duchowski 2002] Marmitt G., and Duchowski A.T. 2002. “Modeling Visual Attention in VR: Measuring the Accuracy of Predicted Scanpaths.” Eurographics 2002. Short Presentations, 217-226. [Marvie et al. 2003] Marvie J-E., Perret J., and Bouatouch K. 2003. “Remote Interactive Walkthrough of City Models.” Pacific Conference on Computer Graphics and Applications, 389393. 170 [Maya 2004] Alias Wavefront Maya [www], http://www.alias.com/eng/productsservices/maya/index.shtml (Accessed March 2004) [McConkie and Loschky 1997] McConkie, G.W. and Loschky, L.C. 1997 “Human Performance with a Gaze-Linked Multi-Resolutional Display”. ARL Federated Laboratory Advanced Displays and Interactive Displays Consortium, Advanced Displays and Interactive Displays First Annual Symposium, Adelphi, MD. January 28-29, (Pt. 1)25-34. [McConkie and Loschky 2000] McConkie, G.W. and Loschky, L.C. 2000. “Attending to Objects in a complex Display”. Proceedings of ARL Federated Laboratory Advanced Displays and Interactive Displays Consortium Fourth Annual Symposium, 21-25. [McNamara 2000] McNamara, Ann, Comparing Real and Synthetic Scenes using Human Judgements of Lightness, PhD Thesis, Bristol, October 2000. [McNamara et al. 2001] McNamara A., Chalmers A.G., Troscianko T., and Gilchrist I. 2001 “Comparing Real and Synthetic Scenes using Human Judgements of Lightness”. In B Peroche and H Rushmeier (eds), 12th Eurographics Workshop on Rendering. [MD support 2004] MD support, Snellen Chart http://www.mdsupport.org/snellen.html [ME 2004] Molecular Expressions, Physics of light and colour [www],http://micro.magnet.fsu.edu/primer/java/reflection/ specular/ (Accessed February 2004) [MM 2003] Movie Mistakes [www], http://www.movie-mistakes.com/ (Accessed December 2003) [Moray 1959] Moray N. 1959. “Attention in dichotic listening: Affective cues and the influence of instructions.” Quarterly Journal of Experimental Psychology, 11, 56-60. [Most et al. 2000] Most, S.B, Simons, D.J., Scholl, B.J. & Chabris, C.F. 2000. “Sustained Inattentional Blindness: The role location in the Detection of Unexpected Dynamic Events”. PSYCHE, 6(14). [www], [Myszkowski and Kunii 1995] Myszkowski, K., and Kunii T. L. 1995. “Texture Mapping as an Alternative for Meshing During Walkthrough Animation.” In G. Sakas, P. Shirley, and S. Müller, editors, Photorealistic Rendering Techniques, 389–400, Springer–Verlag. 171 [Myszkowski 1998] Myszkowski, K. 1998. “The Visible Differences Predictor: Applications to global illumination problems.” In proceedings of the 1998 Eurographics Workshop on Rendering Techniques, G. Drettakis and N. Max, Eds. 223236. [Myszkowski et al. 1999] Myszkowski K., Rokita P., and Tawara T. 1999. “Perceptually-informed Accelerated Rendering of High Quality Walkthrough Sequences.” In proceedings of the 1999 Eurographics Workshop on Rendering, G.W. Larson and D. Lischinksi, Eds., 5-18. [Myszkowski et al. 2001] Myszkowski K., Tawara T., Akamine H. and Seidel H-P. 2001. “Perception-Guided Global Illumination Solution for Animation Rendering.” In proceedings of SIGGRAPH 2001, ACM Press / ACM SIGGRAPH, New York. E. Fiume, Ed., Computer Graphics Proceedings, Annual Conference Series, ACM, 221-230. [Neisser 1967] Neisser, U. 1967. Cognitive psychology. New York: Appleton-Century Crofts. [Noe et al. 2000] Noe, A., Pessoa, L. & Thompson, E. 2000. “Beyond the Grand Illusion: What Change Blindness really teaches us about vision.” Visual Cognition 7. [Northdurft 1993] Nothdurft, H. 1993. “The role of features in preattentive vision: Comparison of orientation, motion, and color cues.” Vision Research, 33, 1937-1958. [Nusselt 1928] Nusselt, W. 1928. “Grapische Bestimmung des Winkelverhaltnisses bei der Warmestrahlung,” Zeitschrift des Vereines Deutscher Ingenieure 72(20):673. [OL 2003] Optical Illusions – index [www], http://members.lycos.co.uk/brisray/optill/oind.htm (Accessed November 2003) [O’Regan et al. 1999a] O’Regan, J.K., Deubel, H., Clark, J.J. and Rensink, R.A. 1999 “Picture changes during blinks: looking without seeing and seeing without looking.” Visual Cognition. [O’Regan et al. 1999b] O’Regan, J.K., Clark, J.J. and Rensink, R.A. 1999. “Change blindness as a result of mudsplashes”. Nature, 398(6722), 34. [O’Regan and Noe 2000] O’Regan, J.K and Noe, A. 2000. “Experience is not something we feel but something we do: a principled way of explaining sensory phenomenology, with Change Blindness and other empirical consequences.” The Unity of Consciousness: Binding, Integration, and Dissociation. 172 [O’Regan 2001] O’Regan, J.K. “Thoughts on Change Blindness.” L.R. Harris & M. Jenkin (Eds.) Vision and Attention. Springer, 281-302. [O’Regan et al. 2001] O’Regan, Rensink and Clark. Supplementary Information for “Change blindness as a result of mudsplashes” [www], http://nivea.psycho.univparis5.fr/Mudsplash/Nature_Supp_Inf/Nature_Supp_Inf.ht ml (Accessed March 2001) [Osberger 1999] Osberger W. 1999. Perceptual Vision Models for Picture Quality Assessment and Compression Applications. PhD thesis, Queensland University of Technology. [Osterberg 1935] Osterberg, G. 1935. “Topography of the layer of rods and cones in the human retina.” Acta Ophthalmologica, 6, 1102. [Palmer 1999] Palmer, S.E. 1999. Vision Science - Photons to Phenomenology. Massachusetts Institute of Technology Press. [Pannasch et al. 2001] Pannasch, S., Dornhoefer, S.M., Unema P.J.A. and Velichkovsky, B.M. 2001. “The omnipresent prolongation of visual fixations: Saccades are inhibited by changes in situation and in subject’s activity.” Vision Research, 41 (25-26). 3345-3351. [Parker 1999] Parker S., Martin W., Sloan P-P., Shirley P., Smits B., and Hansen C. 1999. “Interactive ray tracing.” In Symoposium on Interactive 3D Graphics, ACM, 119-126. [Pashler 1998] Pashler, H. 1988. “Familiarity and visual change detection.” Perception and Psychophysics, 44(4), 369–378. [Pattaniak et al. 1998] Pattanaik S.N., Ferwerda J, Fairchild M.D., and Greenberg D.P. 1998 “A Multiscale Model of Adaptation and Spatial Vision for Realistic Image Display”, Proceedings of ACM SIGGRAPH 1998, 287-298, Orlando, July. [Pattanaik et al. 2000] Pattanaik S.N., Tumblin J.E.,Yee H. and Greenberg D.P. 2000 “Time-Dependent Visual Adaptation for Realistic Real-Time Image Display”, Proceedings of ACM SIGGRAPH 2000, 47-54. [Phillips 1974] Phillips, W.A. 1974. “On the distinction between sensory storage and short-term visual memory.” Perception and Psychophysics , 16, 283–290. [Phong 1975] Phong, B.T. 1975. “Illumination for Computer-Generated Images.” Communications of the ACM, Vol 18 (6) 449— 455. 173 [Pixar 2004] Pixar, The Pixar Process http://www.pixar.com/howwedoit/index.html March 2004) [www], (Accessed [Prikryl and Purgathofer 1999] Prikryl J. and Purgathofer W. 1999. “Overview of Perceptually-Driven Radiosity Methods.” Institute of Computer Graphics, Vienna University of Technology, Technical Report, TR-186-2-99-26. [Privitera and Stark 2000] Privitera, C.M. and Stark, L.W. 2000. “Algorithms for defining visual regions-of-interest: Comparison with eye fixations.” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 22, 9, 970-982. [Radiance 2000] Radiance Home Page [www], http://radsite.lbl.gov/radiance/ (Accessed September 2000) [Ramasubramanian et al. 1999] Ramasubramanian M., Pattanaik S.N., Greenberg D.P. 1999 “A Perceptually Based Physical Error Metric for Realistic Image Synthesis”, Proceedings of ACM SIGGRAPH 1999, 73-82, Los Angeles, 8-13 August. [Rao et al. 1997] Rao R.P.N., Zelinsky G.J., Hayhoe M.M., and Ballard D.H. 1997. “Eye movements in visual cognition: a computational study.” In Technical Report 97.1, National Resource Laboratory for the Study of Brain and Behavior, University of Rochester. [Rensink et al. 1997] Rensink, R. A., O’Regan, J. K., and Clark, J. J. 1997. “To see or not to see: The need for attention to perceive changes in scenes.” Psychological Science, 8, 368-373. [Rensink 1999a] Rensink, R.A. 1999. “The Dynamic Representation of Scenes”. Visual Cognition. [Rensink 1999b] Rensink, R.A. 1999. “Visual Search for Change: A Probe into the Nature of Attentional Processing”. Visual Cognition. [Rensink et al. 1999] Rensink, R.A., O’Regan, J.K., and Clark, J.J. 1999. “On Failure to Detect Changes in Scenes Across Brief Interruptions”. Visual Cognition. [Rensink 2001] Rensink, R.A. The need for Attention to see change [www], http://www.psych.ubc.ca/~rensink/flicker/ (Accessed March 2001) [Siegel and Howell 1992] Siegel R., and Howell J.R. 1992. Thermal Radiation Heat Transfer, 3rd Edition. Hemisphere Publishing Corporation, New York, NY. 174 [Simons 1996] Simons, D. J. 1996. “In sight, out of mind: When object representations fail.” Psychological Science, 7(5), 301305. [Simons and Levin 1998] Simons, D. J., and Levin, D. T. 1998. “Failure to detect changes to people in a real-world interaction.” Psychonomic Bulletin and Review, 5(4), 644-649. [Stam 1994] Stam J. 1994. “Stochastic Rendering of Density Fields.” Proceedings of Graphics Interface 1994, 51-58. [Stroebel et al. 1986] Stroebel L., Compton J., Current I., and Zakia R. 1986. Photographic Materials and Processes, Boston: Focal Press. [SVE 2003] Summit View Eyecare, Bird in the cage [www], http://business.gorge.net/eyecare/explore/birdincage.asp (Accessed October 2003) [TMHF 2004] The Mental Health Foundation, Obsessive Compulsive Disorder [www], http://www.mentalhealth.org.uk/html/content/ocd.cfm (Accessed January 2004) [Treisman 1960] Treisman, A. 1960. “Contextual cues in selective listening.” Quarterly Journal of Experimental Psychology, 12, 242-248. [Treisman and Gelade 1980] Treisman A., and Gelade 1980. “A Feature Integration Theory of Attention”. Cognitive Psychology 12, 97-136. [Triesman 1982] Treisman, A. 1982. “Perceptual grouping and attention in visual search for features and for objects.” Journal of Experimental Psychology: Human Perception and Performance, 8(2), 194-214. [Treisman 1985] Treisman A. 1985. “Search asymmetry: A diagnostic for preattentive processing of separable features”. Journal of Experimental Psychology: General, 114 (3), 285-310. [Unema et al 2001] Unema, P.J.A., Dornhoefer, S.M., Steudel, S. and Velichkovsky, B.M. 2001. “An attentive look at driver' s fixation duration.” In A.G.Gale et al. (Eds.), Vision in vehicles VII. Amsterdam/NY: North Holland. [VEHF 2003a] Visual Expert Human Factors, Inattentional Blindness [www], http://www.visualexpert.com/Resources/inattentionalblind ness.html (Accessed November 2003) 175 [VEHF 2003b] Visual Expert Human Factors, Attention and Perception [www], http://www.visualexpert.com/Resources/attentionperceptio n.html (Accessed September 2003) [VEHF 2004] Visual Expert Human Factors, Visual Field [www], http://www.visualexpert.com/Resources/visualfield.html, (Accessed February 2004) [Velichkovsky and Hansen 1996] Velichkovsky, B.M., and Hansen. J.P. 1996. “New technological windows into mind: there is more in eyes and brains for human-computer interaction”, Proceedings of the SIGCHI conference on Human factors in computing systems: common ground, 496-503. [Volevich et al. 2000] Volevich V., Myszkowski K., Khodulev A., and Kopylov E.A. 2000. “Using the Visual Differences Predictor to Improve Performance of Progressive Global Illumination Computation.” ACM Transactions on Graphics, Vol. 19, No. 1, April 2000, 122–161. [Wald et al. 2002] Wald I., Kollig T., Benthin C., Keller A., and Slusallek P. 2002. “Interactive global illumination using fast ray tracing”. In proceedings of 13th Eurographics Workshop on Rendering, Springer-Verlag, 9-19. [Wang et al. 1994] Wang, Q., Cavanagh, P., and Green, M. 1994. “Familiarity and pop-out in visual search.” Perception and Psychophysics, 56, 495-500. [Ward and Heckbert 1992] Ward, G., and Heckbert P. 1992. “Irradiance Gradients.” Third Annual Eurographics Workshop on Rendering, Springer-Verlag. [Ward 1994] Ward, G. 1994. “The RADIANCE Lighting Simulation and Rendering System”. In Proceedings of ACM SIGGRAPH 1994, ACM Press / ACM SIGGRAPH, New York. Computer Graphics Proceedings, Annual Conference Series, ACM, 459-472. [Ward Larson and Shakespeare 1998] Ward Larson, G and Shakespeare, R. 1998. “Rendering with RADIANCE: The art and science of lighting simulation”, San Francisco: Morgan Kauffman. [Watson et al. 1997a] Watson, B., Friedman, A. and McGaffey, A. 1997 “An evaluation of Level of Detail Degradation in HeadMounted Display Peripheries”. Prescence, 6, 6, 630-637. 176 [Watson et al. 1997b] Watson, B., Walker, N., Hodges, L.F., and Worden, A. 1997 “Managing Level of Detail through Peripheral Degradation: Effects on Search Performance with a HeadMounted Display”. ACM Transactions on ComputerHuman Interaction 4, 4 (December 1997), 323-346. [Watson et al. 2000] Watson, B., Friedman, A. and McGaffey, A. 2000. “Using naming time to evaluate quality predictors for model simplification.” Proceedings of the SIGCHI conference on Human factors in computing systems, 113-120. [Watson et al. 2001] Watson, B., Friedman, A. and McGaffey, A 2001 “Measuring and Predicting Visual Fidelity”, Proceedings of ACM SIGGRAPH 2001. In Computer Graphics Proceedings, Annual Conference Series, 213 – 220. [Wertheimer 1924/1950] Wertheimer M 1924/1950 Gestalt Theory. In W.D. Ellis (ed.), A sourcebook of Gestalt psychology. New York: The Humanities Press. [Whitted 1980] Whitted T. 1980. “An Improved Illumination Model for Shaded Display,” Communications of the ACM, vol. 23, no. 6, 343-349. [Wickens 2001] Wickens, C.D. 2001. “Attention to safety and the Psychology of Surprise”, 11th International symposium on Aviation Psychology, Columbus, OH: The Ohio State University. [Wooding 2002] Wooding, D.S. 2002 “Eye movements of large populations: II. Deriving regions of interest, coverage, and similarity using fixation maps.” Behaviour Research Methods, Instruments & Computers. 34(4), 518-528. [Yang et al. 1996] Wang J, Wu L. and Waibel A. 1996. “Focus of Attention in Video Conferencing.” CMU CS technical report, CMUCS-96-150. [Yantis 1996] Yantis S. 1996 “Attentional capture in vision”, In A. Kramer, M. Coles and G. Logan (eds), Converging operations in the study of selective visual attention, 45-76, American Psychological Association. [Yarbus 1967] Yarbus, A. L.1967 “Eye movements during perception of complex objects”, in L. A. Riggs, ed., Eye Movements and Vision, Plenum Press, New York, chapter VII, 171-196. [Yee 2000] Yee H. 2000. Spatiotemporal sensitivity and visual attention for efficient rendering of dynamic environments. MSc Thesis, Program of Computer Graphics, Cornell University. 177 [Yee et al. 2001] Yee, H., Pattanaik, S., and Greenberg, D.P. 2001 “Spatiotemporal sensitivity and Visual Attention for efficient rendering of dynamic Environments”, In ACM Transactions on Computer Graphics, Vol. 20, 39-65. 178 APPENDIX A 179 Materials A.1 – Judges Responses Included in this appendix are all the results of the experimental data from experiment 1, which looked into whether or not change blindness can be induced whilst looking at computer generated images. Judge 1 Image 1 DOB: 24/04/73 Occupation: Teacher Glasses/Contacts: Contacts Time of Day: 2.40pm Wednesday 11th July Female/Male: Male Stripped Box, Mirror, Yellow Ball, Glass with Straw, Red Cup, White Beaker Image 2 Standing Lamp, Speaker, CD player, Table, Black Tube, CD, Carpet Image 3 Chest of draws, Candle, Bowl, Wine Glass, Mirror, Door Image 4 Image 5 Image 6 Image 7 Mantelpiece, Two candles, Wine Bottle, Wine Glass, Picture with Boat and a church Picture, Two candles, Wine Bottle, Wine Glass, Black Fire Guard, Another Candle, Mantelpiece Bed Frame, Chest of Draws, Wine Bottle, Wine Glass, Lamp, Beaker Glass, Wardrobe, Pencil Holder, Ball Box with Stripes, Beaker, Red Cup, Mirror, Glass with Straw, Yellow Ball Judge 2 DOB: 18/01/77 Occupation: Research Assistant Glasses/Contacts: None Time of Day: 11.20am Wednesday 11th July Female/Male: Female Glass with Straw, Orange and Green Box, Dark Ball, Orange thing on Image 1 the left hand side (maybe some kind of plastic cup), Another Ball, Mirror Table, CD, Amplifier, Two Beakers (one dark one light), Speaker and Image 2 Stand, Lamp Image 3 Image 4 Image 5 Image 6 Image 7 Chest of draws, Candle, Bowl, Glass, Mirror, Door Fireplace, Mantelpiece, Two candles, Picture with Boat, White Cliffs and a House, Wine Glass, Bottle Fireplace, Mantelpiece, Two candles, Bottle & Glass on mantelpiece, Candle in front of the Fireplace, Picture, Skirting Boards, Black bit in front of the Fireplace. Bed Frame, Chest of Draws, Lamp, Bottle, Glass, Beaker, Red Thing, Wardrobe Glass with Straw, Yellow Ball, Red Beaker, White Beaker, Mirror, Green and Orange Box Judge 3 DOB: 29/06/54 Occupation: Personal Assistant Glasses/Contacts: Glasses Time of Day: 2.15pm Monday 16th July Female/Male: Female 180 Image 1 Image 2 Image 3 Image 4 Image 5 Image 6 Image 7 I would say these things in the scene are stuff that you would take on holiday with you, or are mainly outside objects. From what I can remember there was a ball, a striped cushion to sit on, a stick, and a mirror, however I’m not sure why a mirror would be in an outdoor scene. This is of a corner of a room. There is a record player on a table, a glass vase, a speaker and it’s a light room so maybe there is a window nearby that’s not in the picture or maybe a light stand but I’m not sure. Again the corner of a room, with a side door opening into the room. A mirror reflecting the candle, blue glass and another object on the chest of draws but I can’t remember what it is. There’s a table, possibly a coffee table with a mosaic appearance and burnt orange tiles on the front of it. There’s a picture on the wall of the sea with a yacht and a lighthouse, at first glance though I thought it was a submarine in the sea. There are candles burning, which reflect light behind onto the wall and picture, and something between the two candles but I don’t know what it is. There is a mantelpiece and a fireplace. There is a picture of the sea and a boat above the mantelpiece. There is a single candle on the floor projecting light up under the picture. I can’t remember if I saw a bottle or a glass, but something related to drinking! This is a bedroom scene, there’s an end of an iron bed without a mattress. There is a bedside table with three draws on top of which is a glass and a modern light with a shade. This table backs onto a single cupboard or a wardrobe that is white. There were some other objects on the table but I’m not sure what they were. Saw a similar scene before, the one with the striped cushion. This has in it a circular red object, a yellow ball, a stick, a mirror reflecting the scene. I would now say that this scene is of modern furniture in a room not the outside scene I originally said it was before. Judge 4 DOB: 14/04/75 Occupation: PhD Student – Geography Glasses/Contacts: None Time of Day: 5.30pm Friday 20th July Female/Male: Female It’s an office but it looks a kitchen. It’s got two chairs, one that swivels and one that’s rigid with 4 legs. umm. It’s got a mottled floor and then it’s got a set of three draws under the desk and it' s got a picture with a green blob in the middle, which looks like it’s meant to be a tree. It’s Image 1 got two yellow cupboards on the right hand side mounted up on the walls. There’s a light on a stand thing, which bends over beaming onto something and a cup of tea or something, I think it’s a cup of tea. Looks like a wine bar, a really snazzy wine bar. It has two stools and a table in the middle, which is rigid with two legs. The floor is tiled white and red. The walls are very dark; the ceiling is dark as well. Then in the Image 2 centre of the picture there is the mirror behind the table, which is reflecting a vase, a red vase, with a flower in it. And either side of that there are two lights and there are two lights on the ceiling. It’s three spherical objects on a piece of text and the three objects look Image 3 like a tennis ball an orange, but it’s not an orange, and a Christmas ball & 19 thing. It’s all set at an angle so the text gets smaller as it moves away. 181 Image 4 & 20 Image 5 & 11 Image 6 Image 7 Image 8, 14 & 18 Image 9 & 13 It’s a close up of a rectangular box, a rectangular box with red and green stripes. In the background looks like a cylindrical vase, I don’t know what it is but looks like it has a chop stick in it. Err and there’s a round yellow snooker cue ball with a mirror at the back of it. Looks like a really snazzy office or cafe. You look through the door and you are looking at two yellow chairs back to back and a table in front of it with a wine glass on it. To the right is a picture but you can see it behind another yellow chair. Right at the very back in the centre of the wall is a kind of rainbow picture mounted on the wall. On the ceiling there is ummmm a strip light running down the centre with tiny little lights either side of the room. And then there is some strange art works. On the back on the right is a kind of red base with a swirly art piece. On the other left hand wall is some other artwork mounted like a curly stepladder thing and the carpet is grey. This is a further away view of the other image and again it has the rectangular gift-wrapped box, green and red striped. umm. It’s got what looks like a top to a shaving foam red top or a hairspray can, and then it’s got a white empty cylindrical shape hollow cylindrical shape on top of the box. And behind that looks like it’s a plastic glass with a peachy straw in it. To the left of that is a yellow cue ball on top of a women’s vanity mirror, which has been opened up. Rory’s dinner. An empty dinner umm with a long bar with 5 pink stools, kind of rigid stools umm. The walls are vertical stripy kind of lilac and beige colour, brown beige ummm. At the very front there is kind of seating area which surrounds a table with blue chairs. umm Behind the counter of Rory’s cafe is kind of a glass cabinet that’s empty but I would imagine it would have food in it and a row of glasses at the back. And to the left of the image is a beige cylindrical object, which is holding up the building structure. Ummm same as the other one but closer ummm the thing that I thought was a bit odd was the office chair in the picture the base of it with the wheels the metal base is the same as the floor and that looks a bit odd. Then there is the office desk same as before and there is the reflection of the light coming down onto a book. Umm and a pink coffee mug and that’s it. Err looks like a scene in a kitchen with a close up of a unit. With in the very front a kind of empty cookie jar and behind that is a coffee grinder and a handle on top, which you twist, but the handle looks a bit odd. I don’t know the handle looks like it is sitting on top of the cookie jar. Anyway umm there are three mugs or two mugs and one glass on the panel behind, on the shelf behind umm and then ... and that’s it oh but it looks quite a pink blotchy image. 182 Image 10 Image 12 & 21 Image 15 Image 16 Image 17 Err again it looks like a funky office with yellow chairs with umm you are drawn to the scene at the back with the main table with the curved legs and it has two tea cups either side so it looks like people have just got up and left. There’s two chairs positioned not facing each other just slightly apart, slightly to the side of one another. Umm in front of that there is a yellow sofa on the right hand side and umm at the back there is writing on the wall looks like something gallery, art gallery? Umm and to the right of that is the window with the light shining through onto the picture. Umm there’s a series of lights on the walls mounted at head height but they don’t look on and then through to the left oh umm sorry still in that room there’s a blue picture mounted on the wall which is kind of blobby and 3D. And through that door is another sort of gold/brass-y looking similar kind of blobby picture 3D picture mounted on the wall and another yellow chair. Umm the image on the left hand side is a dark area which I’m not sure what it is but to the left of that is a free standing lamp on a long white stand which is switched on and umm and then there is a speaker which is free standing on the floor with two legs. And then on the table, a brown table is an amplifier that looks a bit featureless except for the volume umm err and then there’s goodness and another cylindrical object, which looks like a toilet roll, which is black. Looks like a scene from a bedroom it’s in the corner of a bedroom with a creamy coloured table and creamy coloured walls with a picture of a sailing boat and a beach in the background. Umm On the table there are two creamy coloured candles and what looks like a CD cover and an orange object - I think it’s a glass then to the right of that looks like and empty wine bottle which is green. Umm a sort of pine table with a strange feature which I don’t really know what it is, with a brassy base which curves up which has three stages which look like miniature chess boards red and white. Umm then on the table in front of that there is a glass full of red wine. Two blue chairs around the table but the chair furthest away the back of the chair and the legs don’t quite match. I don’t know if it’s just me but umm that' s it apart from a really non-descript background. Err same image as before but from a different angle. You can see more of the picture umm apart from that I don’t know what else, there is a chord to the left of the picture but apart from that I don’t know what’s really different apart from a different angle and further away and the shadows are falling to the right hand side of the picture. 183 Judge 5 Image 1 Image 2 Image 3 & 19 Image 4 & 20 Image 5 & 11 Image 6 Image 7 DOB: 23/07/1975 Occupation: PhD Student – Geography Glasses/Contacts: None Time of Day: 6.00pm Friday 20th July Female/Male: Male Err there’s a desk which goes all around the room with a lamp on it and there’s a picture of a tree on the wall and a filling cabinet under the desk and the window is reflecting the image back of the room. There is a blue chair, which has wheels on the bottom and a weird crosshatched floor that’s pink and there are cupboards above the desk, and that’s about it. Oh and there’s shutters slanted shutters on the window. A red and white chess floor with a table at the back, a high table with two high chairs and top of the table there is a red ball vase with a green thing in it which looks like a caterpillar. There are two diamond lights on the back wall and there are two lights on the ceiling and the walls are black and that’s about it. There’s three err round objects on a newspaper one of which is an orange which says Sunkist on it, one of which is a silver Christmas decoration and the other is a sort of white tennis ball. The text says windows 3 or something in the middle one and it says windows 2 in the left hand column and there is a 2.5 in the right hand column that sticks out and the wall is pink behind it. In the foreground there is a red candle to the left, there’s a sort of a gift box that is red and green striped in front. Then behind that there is a yellow ball from a pool table and there’s a makeup mirror that’s black that’s sort of open behind the cue ball. To the right there’s like a glass or something and a pool cue and the wall behind it is a sort of light yellow beige colour. It’s a sort of future waiting room. In the centre is a yellow chair which faces both ways and in front of that there is a chrome table with a glass on it to the left hand side. umm there’s sort of spot lights which hang down from the ceiling to the sides. On the rear wall there is a colourful spectrum sculpture thing. To the sides of both of them are reflective sculpture things on the walls, one sort of silver and one is sort of more golden. To the left on the left hand wall there is a weird mad sculpture, which is blue and sort of sticks out. The carpet is a grey fleck kind of like the carpet in here. This is a kind of repeat of the image before. Instead to the left though the red thing seems to be the top of a canister, to the right is a white cup or something. Behind that what we couldn’t see before is a purple ball. What I thought was a pool cue is now a straw. The makeup mirror, yellow cue ball and gift box, which is red and green, are all still there. It’s in the corner of a room with beige walls. It’s a sort of milk bar thing with five stools in front of it with pink tops and gold bases and there is a gold bar that you can put your feet on. In the room itself there is a white table with blue chairs. The walls are painted a sort of green with pink purple stripes going downwards. On the wall behind above are sort of cabinets where you would keep the food. And it’s called Rory’s and there is two columns of prices on the menu thing behind the counter. 184 Image 8, 14 & 18 Image 9 & 13 Image 10 Image 12 & 21 Image 15 Image 16 Image 17 This is a closer image of the one with the desk. You can see in more detail the yellow chair with the rollers. There’s a pink mug on the desk and the spot light is highlighting a book that is open underneath it. You can see the tree picture on the wall at the back. It’s a black kitchen top with beige tiles behind it. There’s a white mug, a coffee grinder that says coffee on it, and a cafetierre in the foreground. Above that there is a shelf that’s got another mug and a cup to the left hand side of it. There’s a yellow bench seat in the foreground and then there’s a small coffee table and with a mug on it. Then there’s another yellow chair, a yellow desk behind that and another yellow chair behind that. On the yellow desk there is a white mug and a red blob thing. Behind that on the wall it says art gallery in big letters. There’s a blue painting hanging on the wall next to the desk. There are spot lights above the desk and to the left hand side wall is an opening and through that there is a gold thing on the rear wall behind that and there is sort of two wall lights hung by the opening and by the desk there is a window. It’s a corner of a room to the left is like a lamp stand, then there’s a speaker which is on a stand, then there is a brown table with a sort of stereo on top of it and on top of that is one black cylindrical object and one more lighter coloured cylindrical object and then there is a CD lying next to the unit on the table. umm that’s it. Corner of a room again on the back of the wall there is a big bright painting by umm looks like it' s that American bloke that paints in that style but I can’t remember his name. Then there is a coffee table thing, there is a CD sleeve on top with two white candles and an orange cup with a green wine bottle to the right. It’s a table shot again. There are two blue chairs and the table is fake wood looking. On top of the table is a weird futuristic chess game thing, which has three levels. The frame is gold and the things are red and white - the three sorts of chess bits. Then there is a wine glass with a rose wine in it. This is a repeat of the upper scene onto the coffee table with the painting in the background and the bloke was I think called Hopper. There is the toy story video again on the bottom, the coffee table thing has wheels, black wheels, and it is coloured beige and again two candles on the thing with the yellow mug and the CD case and the green wine bottle on the floor. 185 Judge 6 DOB: 21/10/73 Occupation: Research Assistant - Computer Science Glasses/Contacts: None Time of Day: 2.30pm Female/Male: Male It is an office with two chairs and a curved table that is in a C shape. There is a set of draws at one end and a cabinet on the wall as well. At Image 1 the back of the office there is a metal shiny blind. On the table there is a pink mug and a lamp which is shining on what looks like a book. There only seems to be on wall to this room…weird! The other walls are black so it looks like they aren’t there. On the wall that isn’t black Image 2 there is a mirror and two diamond lights. In front of it there is a table with a vase on it and two stools. The floor is chequered red and white. Image 3 There is an orange, a tennis ball and a shiny Christmas bauble all sat on & 19 a piece of newspaper or a book. There’s a glass with a straw in it, a yellow cue ball, a red beaker, the red one is to the left and the white one is to the right. umm There’s a Image 4 mirror which has a black rim, kind of like one of those lady’s makeup & 20 mirrors. Oh and there is a green and orange striped box in the foreground. This looks like an arty modern room, with lots of futuristic paintings Image 5 and sculptures on the walls. In the centre there are two yellow chairs & 11 back to back and in front of them there is a table. It’s the same scene again, the one with the objects in the corner of the room. There’s the glass and straw, the orange and green box. There is a Image 6 dark coloured ball which we couldn’t see before, a red beaker on the left hand side of the image, another ball in the centre and of course the mirror. Rory’s café. There is a high counter with five swivel stools in front of it. Behind the counter there is a white cabinet with lots of glasses Image 7 stacked on top of it. On the wall there is the menu. Oh and to the front of the picture there is a round table with some blue chairs. There is a rounded oak table which curves in an L shape. On the table is Image 8, a pink mug and there is something bright in the corner but I don’t know 14 & 18 what it is. Looks like maybe a bright piece of white paper, but it doesn’t look very realistic. There is a swivel yellow chair. Image 9 A coffee mug, coffee grinder and a coffee glass pot – one of those & 13 plunger things. They are all sat on a marble work surface. This is a reception area for an art gallery. There is a yellow table with yellow chairs to the back of the entrance. A blue piece of art on the wall Image 10 and another yellow chair but this time it is a long bench to the front of the picture. Then there is the entrance way which leads into the art gallery. It’s a corner of what looks like a sitting room. There is a table with a Image 12 CD and amplifier on it. There are two beakers, one dark and one light & 21 coloured. A speaker on its stand in the centre of the image and a lamp to the left of it. This image looks down onto a table top – there are two candles, an orange cup, the ones that tango were giving a way a few years ago, a Image 15 CD case and a green bottle of wine. There is a sailing picture on the wall. 186 There is a weird sculpture sitting on a round wooden table next to a glass of red wine. Two chairs are pulled up to the table. A picture of a sailing scene is on the wall, in front of this there is a table Image 17 with candles, an orange thing and a leaflet. Beneath is a blue box, maybe a video. On the floor next to the table is a green bottle. Image 16 Judge 7 DOB: 30/03/76 Occupation: PhD Student - Computer Science Glasses/Contacts: None Time of Day: 2.00pm Female/Male: Female An office with two chairs, one blue, one yellow. On the table is a book, Image 1 a pink mug and a lamp. A fancy looking bar, but it doesn’t have any side walls! The two chairs Image 2 and table look quite high to sit on. There is a mirror on the back wall which is reflecting the vase and flowers on the table. Image 3 A tennis ball, an orange and a Christmas tree decoration all sat on a & 19 piece of text. There are a lot of objects all crammed into the corner of a room. There Image 4 is a stripy box, a red cup on the left hand side of the image. umm a & 20 mirror at the back, a glass with a straw in it and what looks like a yellow pool table ball. It looks like a futuristic room, too clean for my liking, either that or a weird art gallery with lots of sculptures and paintings. There are two Image 5 wavy paintings, one gold and one silver, either side of a stripy rainbow & 11 coloured painting. In the centre of the room there are two chairs back to back and a table. It’s the corner with all those objects again but this time it’s from a different view. You can see all the same objects as in the other one, Image 6 that’s the stripy box, the mirror at the back, the yellow cue ball ummmm the glass with the straw in it, a red cup and a white beaker. A café, but it looks very American, with pink stools at the counter and Image 7 stripy walls. In the foreground there is a table with 4 blue chairs. It’s the office scene that I’ve seen before but a lot closer than before. Image 8, This time you can only see the yellow chair and the table. Again on the 14 & 18 table is the pink mug and the book but this time the lamp is out of view. Image 9 A kitchen scene with a shelf with mugs on it. Also there is a cafetierre & 13 and a coffee grinder. Oh it was an art gallery then, the image before. But this time it is a different room – looks like more the reception area. You can still see Image 10 the gold wavy painting but this time there is a blue one as well. There are several of the yellow chairs and a yellow table too. Above the table is a sign saying art gallery, that’s why I know it’s now an art gallery!! There is a standing lamp, which is on, shining very brightly. Next to it Image 12 is a speaker. Then there is a table with a CD player on it, a black tube & 21 and ummmm a CD, oh and the carpet is a kind of pink brown colour. It’s the living room I saw before but from a different angle, this time we are looking more down onto the trolley. Again on the trolley are two Image 15 candles, an orange cup and a CD case. Next to the trolley is the green wine bottle. 187 There is a round oak table with a glass of red wine and what looks like a weird cake stand or is it a three layered chess board? Anyway Image 16 whatever it is, it has 3 chequered red and white layers. Around the table are two blue chairs. It’s a living room scene with a trolley in the corner. On the trolley is an orange cup, a CD case and two white candles. There is a lower shelf Image 17 and on that is a video. Next to the trolley on the floor is a green wine bottle. Judge 8 DOB: 15/02/54 Occupation: Computer Science Departmental Secretary Glasses/Contacts: Glasses Time of Day: 3.00pm Female/Male: Female This is an office at night time because the shutters on the window are open but it’s dark outside. There are two chairs separated by a Image 1 ‘breakfast bar’ desk, a bright lamp shinning onto a book and a pink mug on the desk. This looks like a new bar with high chairs and a table. There are two Image 2 diamond shaped lights on the wall and a circular mirror which is reflecting what looks to be a vase with a flower in it. Image 3 & 19 There are three types of balls on a piece of newspaper, an orange, a Christmas bauble and a tennis ball. There’s a kind of striped cushion, a circular red object. A yellow ball, Image 4 maybe a beach ball, a stick and a mirror which reflects the scene. I & 20 would say that the scene is of furniture, modern furniture in a room. This looks like a modern art gallery, principally for weird sculptures, the view is looking through a doorway. In the mid-ground there is a Image 5 table with a glass on it suggesting that the gallery has recently hosted a social. Behind this table there are two yellow chairs with theirs backs to & 11 each other. There are several paintings and sculptures located in both corners of the room. There is a collection of objects clustered into a corner of a desk or Image 6 counter. There is a yellow ball, a striped box, a stick, a red top of a spray can and a mirror. This looks like a modern lounge, with a sound system consisting of a Image 7 speaker and amplifier and a CD. There is a bright lamp and some containers sitting on the amplifier. Image 8, This is an office scene with an oak table. There is a pink mug on the 14 & 18 table along with a piece of paper and there is a yellow chair. This is a shiny kitchen. There is a cafetierre and a coffee grinder along Image 9 with a white mug. Oh and there are another couple of mugs on the shelf & 13 about the work surface. This looks like the reception area of the art gallery we saw before. There are several chairs and a desk so it looks like this is where the art Image 10 is sold or where meetings are held. There are some art pieces on the wall and a big sign saying art gallery. This is of a corner of a room. There’s a record player on a table, a glass Image 12 vase, a speaker and its light room as there is a light stand giving off a & 21 lot of light. 188 This seems to be a strange arrangement of objects in the corner of a Image 15 living room on what seems to be a table. There is a huge painting of a lighthouse and sailing boat on the wall and a wine bottle on the ground. There is an oak table with a glass of red wine on one side and on the other a strange type of cake stand made of brass. It has three layers to it Image 16 which are chequered in red and white. There are also two blue/grey chairs. This is a corner of a room with a trolley and lots of stuff on it. There is Image 17 an orange cup, two white candles and a CD. On the lower shelf of the trolley is a video and next to that is a green bottle. Judge 9 Image 1 Image 2 Image 3 & 19 Image 4 & 20 Image 5 & 11 Image 6 Image 7 Image 8, 14 & 18 DOB: 12/08/1975 Occupation: Hairdresser Glasses/Contacts: None Time of Day: 7.00pm Female/Male: Male It’s a scene of an office. There is a desk which goes around the room in a sort of curve shape. On it there’s a pink mug and a lamp. There are two chairs, a filing cabinet underneath the desk, a picture of a tree on the wall and two cupboards on the wall above the desk. Looks like a futuristic cafe although it’s very dark and I wouldn’t want to go there. There is a table in the centre with two chairs; well they are kind of high stools really. On the table is a vase with a green swirl thing coming out of it. There are two diamond lights on the back wall and the floor is red and white, I think that’s it. It’s a dark image and doesn’t look very realistic. There are three ball objects on a piece of text. The balls were an orange which said Sunkist, a tennis ball and a Christmas silver decoration. There are lots of objects in this with no kind of theme. There is a red thing to the left, not sure what it is, maybe a plastic spray can top. There is a stripy green and red box in the centre of the image, a mirror at the back with a yellow cue ball in front of it and a long brown thing coming out of what looks like a glass at the back to the right of the mirror. It’s a scene looking through and archway looking at lots of futuristic looking things. There are two paintings at the back, one gold and one silver, a rainbow picture thing in the centre. In front of that there is a yellow chair and a chrome table. Errr the carpet is a blue speckled black & grey colour. It’s the scene that we had before, the one with lots of objects and you can now see that they were all in the corner of a beige coloured room. There is the red thing which now looks more like a top of a shaving foam can, a white beaker, the stripy box again. Then behind them is the mirror with the ball on it and to the right is the glass with a straw in it. This looks like a very cheesy American dinner. It has a sign above the bar saying Rory’s and it looks like it would display the prices or the menu. In front of the bar are stools to sit on and in front of that is a table with some chairs around it. To the left is a pillar which is a yellow beige colour and the walls are a pink and green stripy thing. We saw this image at the start, it’s the one of an office but this time it’s a lot closer, more focused on the pink cup on the desk. You can just see the picture of the tree still. Ummmm there is something bright white to the right but I can’t work out what it is. 189 Image 9 & 13 Image 10 Image 12 & 21 Image 15 Image 16 Image 17 This is definitely a kitchen. It has beige tiles and there is a coffee grinder, a cafetierre and a mug on the worktop. Oh and there is a shelf above with two mugs on it. This is of an art gallery but I only got that cause it says art gallery in big letters on the wall at the back. Below that there’s a desk with two yellow chairs and another yellow chair at the front of the image with a smaller coffee table. There is a window to the right which is shining onto the big table and a blue picture thing on the left hand wall. There’s also an archway which leads through to another room where there is a gold thing on the wall and something dark too. This looks like my sitting room. There is a free standing lamp which is shining brightly, a speaker, a table which is brown. On that there is a black thing which I think is a stereo, in front of that there is a CD. There is a white table trolley thing with objects on it. There is an orange cup thing, one of those ones which you can fold flat, a CD sleeve and two candles. Next to the trolley is a green wine bottle. On the wall above all of this is a painting of a boat and a beach with what looks like a lighthouse on it. Another futuristic looking thing - I think it’s meant to be a chess game of some sort. It has three layers with red and white checkers. The base of it which curves up to each layer is a brass colour and it is sitting on a pine table. In front of that is a full glass of red wine. There are two chairs. It’s that trolley table scene but at a lower angle, more looking at it all from the side. Again there’s the picture on the wall, a green wine bottle. And then ummm on the table there are two candles that orange thing and a CD sleeve. 190 Materials A.2 – Instructions and Questionnaire Please fill out this form. For the observers DOB: Occupation: Glasses/Contacts: Time of Day: Female/Male: Person Number: Remember for the results of this experiment to be accurate I need you not to discuss with anyone what you did in this room until all the experiments have been carried out. You will be shown two images, with either what I call a mudsplashed image, which is the image with random sized checkered blobs over the image, or a grey image in between them. Now these two images may be the same image or there may be an alteration of some type. What I want you to do is say stop when you know what it is that is being altered or that there is nothing changing. Don’t say stop until you are sure what exactly is being altered between the images or that there is no change. I don’t want you to say stop when you just note that something is changing, for I will need you to describe verbally to me what it is that is actually changing. The images will keep on altering between the original and the altered image with either the mudsplashes or the grey image between for 60 seconds or until you say stop. Each image is only displayed for approximately 240ms before the next image is flashed up. Once all four images have been displayed the cycle is repeated. If you haven’t said stop in that 60 seconds we will move straight onto the next image set. Here are some still examples: A Mudsplashed Example 240ms 290ms 240ms A Flicker Example 191 290ms 240ms 290ms 240ms 290ms There are 10 different sets of images all together that you will be shown. If you do not understand any of this please ask now before we start the experiments. Image 1 Time: Description: Image 2 Time: Description: Image 3 Time: Description: Image 4 Time: Description: Image 5 Time: Description: Image 6 Time: Description: Image 7 Time: Description: Image 8 Time: Description: Image 9 Time: Description: Image 10 Time: Description: Do you have any general comments about the images that you have just seen? 192 APPENDIX B 193 Materials B.1 – Instructions for free viewing of the animation in chapter 5 Free For All Instructions You will be shown two 30 second animations of a series of 4 rooms. During the animations look around the rooms, see what you can see and generally enjoy the animation. There is a visual countdown before each animation starts. Both animations will last approximately 30 seconds. The second animation will start after the first one has finished. After the animations you will be asked to fill out a questionnaire about what you have seen. This will be carried out on the table outside this room. If you do not understand any of the instructions please ask now! 194 Materials B.2 – Instructions for observers performing the counting pencils task in chapter 5 Counting Pencils Instructions You will be shown two 30 second animations of a series of 4 rooms. During the animations I want you to count the number of pencils located in the mug on the table in the centre of each room. Therefore there are 4 mugs each with a different number of pencils in them. As soon as you know the number of pencils say it out loud, then start counting the next set of pencils. Initially you will start far away from each mug but gradually you will get closer until you’ll fly right over the mug towards the next room. Thus the pencils (and mug) will increase in size and therefore increasing the ease of counting them. Warning there are some red herrings in the mug as well, namely paintbrushes, just to make life that much harder. So make sure you have counted the pencils and not the paintbrushes before you give you final answer for each room. There is a visual countdown then a black image with a white mug, see below, is displayed for 1 second. This will alert you to the location of the first mug, then the first animation will start. Each animation will last approximately 30 seconds. Once the first animation has been played then the second animation will be started with the visual countdown. Again a black image with white mug is displayed to alert you to the location of the first mug. After the animations you will be asked to fill out a questionnaire about what you have seen. This will be carried out on the table outside this room. If you do not understand any of the instructions please ask now! 195 Materials B.3 – First attempt at the questionnaire for the experiment in chapter 5, however it was rewritten, see materials B.4, due to initial participant’s inability to answer the questions. Questionnaire For the questions please be honest and DON’T GUESS! Don’t worry if you can’t remember just circle the Don’t Remember. If you don’t understand a question please ask. Can I remind you for this experiment to work PLEASE don’t discuss what you did with your friends as it will effect the results. THANK YOU for doing this experiment and helping me get ever closer to achieving my PhD! Kirsten Personal Details: Name: Age: Male/Female: Rating of Computer Graphics Knowledge: 1(least knowledge) - 5 (most knowledge) Circle which rating most suits you: 1 2 3 4 5 Example scale: 1- Admires computer animations, e.g. Toy Story, but haven’t a clue how they are computed. 2- Have heard computer graphics concepts talked about but don’t understand what they mean. 3- Attending computer science course but haven’t covered graphics in detail yet. 4- Attended a computer graphics course. 5- Researcher in computer graphics. Questions: 1a) Ignoring the alteration of objects on the central table, were all the rooms the same? Yes No Don’t Know If circled Yes or Don’t Know please go straight to question 2, else answer 1b) 196 1b) Which of the rooms did you feel was different and why? Room 1 Room 2 Room 3 Room 4 2) What colour was the mug and what was written on it? Room 1: Don’t Remember Room 2: Don’t Remember Room 3: Don’t Remember Room 4: Don’t Remember 3a) How many books were there? Room 1: Don’t Remember Room 2: Don’t Remember Room 3: Don’t Remember Room 4: Don’t Remember 3b) What were the titles of the books? Room 1: Don’t Remember Room 2: Don’t Remember 197 Room 3: Don’t Remember Room 4: Don’t Remember 3c) What colour was each book? Room 1: Don’t Remember Room 2: Don’t Remember Room 3: Don’t Remember Room 4: Don’t Remember 4) How many paintings were there in each room and what were they of? Room 1: Don’t Remember Room 2: Don’t Remember Room 3: Don’t Remember Room 4: Don’t Remember 5) Apart from the objects on the main table and the paintings what other furnishings were in each room? (Listing the object, what colour it was and where it was situated in the room) E.g. Room 1: Object Bookcase Object Colour Brown Colour Room 1: Position Against the left wall Position Don’t Remember 198 Room 2: Don’t Remember Room 3: Don’t Remember Room 4: Don’t Remember 6) How many lights were in each room? Room 1: Don’t Remember Room 2: Don’t Remember Room 3: Don’t Remember Room 4: Don’t Remember 7) What colour was the carpet? Room 1: Don’t Remember Room 2: Don’t Remember Room 3: Don’t Remember Room 4: Don’t Remember 8) What colour were the walls? Room 1: Don’t Remember Room 2: Don’t Remember 199 Room 3: Don’t Remember Room 4: Don’t Remember 9) What colour were the frames around the paintings? Room 1: Don’t Remember Room 2: Don’t Remember Room 3: Don’t Remember Room 4: Don’t Remember 10) What colour were the light fittings above the pictures? Room 1: Don’t Remember Room 2: Don’t Remember Room 3: Don’t Remember Room 4: Don’t Remember 11) Apart from the books and mugs with pencils, what objects appeared on the central table (and in which room)? Answer: Don’t Remember Any Other Objects 200 12) What out of all the objects in the rooms can you remember best – i.e. which sticks in your mind? Answer: Nothing 13) Did anything strike you as being odd or ‘non-realistic’? Is there anything in the animation that distracted your perception of the scene? 201 Materials B.4 – Final questionnaire for experiment in chapter 5, for observers whom watched the animations or performed the visual task, counting pencils, or performed the non-visual task, counting backwards from 1000 in steps of two. Questionnaire For the questions please be honest and DON’T GUESS! Don’t worry if you can’t remember just circle the Don’t Remember. If you don’t understand a question please ask. Can I remind you for this experiment to work PLEASE don’t discuss what you did with your friends as it will effect the results. THANK YOU for doing this experiment and helping me get ever closer to achieving my PhD! Kirsten Personal Details: Name: Age: Male/Female: Course: Rating of Computer Graphics Knowledge 1(least knowledge) - 5 (most knowledge) Circle which rating most suits you: 1 2 3 4 5 Example scale: 1- Admires computer animations, e.g. Toy Story, but haven’t a clue how they are computed. 2- Have heard computer graphics concepts talked about but don’t understand what they mean. 3- Attending computer science course but haven’t covered graphics in detail yet. 4- Attended a computer graphics course. 5- Researcher in computer graphics. Questions: 1a) Ignoring the alteration of objects on the central table were both animations the same? Yes No Don’t Know If circled Yes or Don’t Know please go straight to question 2, else answer 1b) 202 1b) What did you feel was different between the two and in which animation? Changing Location of objects Yes No Don’t Know If yes please give a reason see example: E.g. bookcase was on the right wall in the second animation where as it was on the in the first animation. Changing Colour of objects Yes No Don’t Know If yes please give a reason see example: E.g. bookcase was green in the second animation as it was blue in the first animation. Alteration of quality (rendering) Yes No Don’t Know If yes please give a reason see example: E.g. There was a bad rendered shadow from the bookcase in the first animation compared to the second animation Speed Yes No Don’t Know If yes please give a reason see example: E.g. The first animation was faster compared to the second animation 203 2) What colour was the mug and what was written on it? Animation 1: Don’t Remember Animation 2: Don’t Remember 3 a) How many books were there? Animation 1: Don’t Remember Animation 2: Don’t Remember 3 b) What were the titles of the books? Animation 1: Don’t Remember Animation 2: Don’t Remember 3 c) What colour was each book? Animation 1: Don’t Remember Animation 2: Don’t Remember 4) How many paintings were there in each room and what were they of? Animation 1: Don’t Remember Animation 2: Don’t Remember 204 5) Apart from the objects on the main table and the paintings what other furnishings were there in each animation? (Listing the object, what colour it was and where it was situated in the room) E.g. Object Animation 1: Bookcase Object Colour Brown Position Against the left wall Colour Position Animation 1: Don’t Remember Animation 2: Don’t Remember 6) How many lights were in each room, in the animations? Animation 1: Don’t Remember Animation 2: Don’t Remember 7) What colour was the carpet? Animation 1: Don’t Remember Animation 2: Don’t Remember 8) What colour were the walls? Animation 1: Don’t Remember 205 Animation 2: Don’t Remember 9) What colour were the frames around the paintings? Animation 1: Don’t Remember Animation 2: Don’t Remember 10) What colour were the light fittings above the pictures? Animation 1: Don’t Remember Animation 2: Don’t Remember 11) Apart from the books and mugs with pencils, what objects appeared on the central table? Animation 1: Don’t Remember Any Other Objects Animation 2: Don’t Remember Any Other Objects 12) What out of all the objects in the rooms can you remember best – i.e. which sticks in your mind? Answer: Nothing 206 13) Did anything strike you as being odd or ‘non-realistic’? Is there anything in the animation that distracted your perception of the scene? 207 Materials B.5 – Instructions for the non-visual task for the animation in chapter 5 Non-Visual Task Instructions You will be shown two 30 second animations of a series of 4 rooms. During each animation you will be asked to count down back wards from 1000 in steps of 2 continuously from the end of the countdown until each animation has finished. For example 1000, 9998, 9996, 9994, 9992, 9990, 9988, 9986, 9984, 9982, 9980 etc… If this task is easier counting in your own native language then please do so. When counting backwards please keep looking at the monitor and the animation being played. There is a visual countdown before each of the animations start. Each animation will last approximately 30 seconds. Once the first animation has been completely then next animation will be started again with a visual countdown. Once the countdown has finished then start counting down backwards again from 1000. After the animations you will be asked to fill out a questionnaire about what you have seen. This will be carried out on the table outside this room. If you do not understand any of the instructions please ask now! 208 APPENDIX C 209 Materials C.1 – Questionnaire for the Pilot Study to deduce the visible threshold value of resolution QUESTIONNAIRE FOR THRESHOLD VALUES: Dear Participant, RESEARCH PARTICIPANT CONSENT FORM Title: ‘Performing Tasks in Computer Graphical Scenes’ Kirsten Cater, Department of Computer Science, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB Tel: 01179545112 Email: [email protected] Purpose of Research: To investigate performance of tasks in computer graphical scenes. It is imperative to mention at this point that in accordance with the ethical implications and psychological consequences of your participation to our research, that there will be a form of deception taking place however all participants will be debriefed afterwards about the actual purpose of this empirical investigation. Moreover, all the information that will be obtained about you is confidential and anonymity can be guaranteed. Furthermore, you can withdraw your participation at any time if you so wish. Finally, you have the right to be debriefed about the outcome of this study. You can thus contact the researcher, which will be willing to give you details of the study and the outcome. Having made those various points, please sign the following consent form, which states that you agree to take part in the current study. I HAVE HAD THE OPPORTUNITY TO READ THIS CONSENT FORM, ASK QUESTIONS ABOUT THE RESEARCH PROJECT AND I AM PREPARED TO PARTICIPATE IN THIS PROJECT. ____________________________ Participant’s Signature ____________________________ Researcher’s Signature 210 PERSONAL DETAILS: AGE: FEMALE/MALE HAVE YOU DONE ANY EXPERIMENT INVOLVING PSYCHOPHYSICS AND COMPUTER GRAPHICS BEFORE? (Please circle answer) Yes No RATING OF COMPUTER GRAPHICS KNOWLEDGE: (Please circle answer) 1 - Admires computer animations, e.g. Toy Story, but haven’t a clue how they are computed 2 - Have heard computer graphics concepts talked about but don’t understand what they mean 3 - Attending computer science course but haven’t covered graphics in detail yet 4 - Attended a computer graphics course 5 - Studying for/have got a PhD in Computer Graphics QUESTIONS: Can you please indicate whether there were any rendering quality differences between the two images? (Please circle the answer) Trial 1: YES NO Trial 2: YES NO Trial 3: YES NO Trial 4: YES NO Trial 5: YES NO Trial 6: YES NO Trial 7: YES NO Trial 8: YES NO Trial 9: YES NO 211 Trial 10: YES NO Trial 11: YES NO Trial 12: YES NO Trial 13: YES NO Trial 14: YES NO Trial 15: YES NO Trial 16: YES NO Trial 17: YES NO Trial 18: YES NO Trial 19: YES NO Trial 20: YES NO Trial 21: YES NO Trial 22: YES NO Trial 23: YES NO Trial 24: YES NO Thank you very much for your participation! 212 Materials C.2 – Instructions for counting the teapots in the peripheral vision vs inattentional blindness experiment Counting Teapots Instructions You will be shown two images of a room. Each of the images has teapots in it, similar to those in the image on this page. The teapots are a variety of colours and are located all over the image. Your task is to count as quickly as you can the number of teapots located in the image. As soon as you know how many teapots there are in the image say it out loud. You must count the teapots as quickly as you can, for the images are only displayed for 2 seconds! Warning there are some red herrings in the images as well, namely similar objects and colours, just to make life that much harder. So make sure you have counted the teapots! You will be given time for your eyes to adjust to the lighting conditions, then when you are ready the experiment will be started. Firstly you will be shown a black image after which the first image will appear, you’ll count the number of teapots and say out load how many you counted. Then another black image will appear, followed by the second image. Again you’ll count the number of teapots and state out loud how many you have counted. Finally the black image is shown once more. After the images you will be asked to fill out a questionnaire about what you have seen. This will be carried out on the table outside this room. If you do not understand any of the instructions please ask now! 213 Materials C.3 – Questionnaire for the peripheral vision vs inattentional blindness experiment QUESTIONNAIRE Dear Participant, RESEARCH PARTICIPANT CONSENT FORM Title: ‘Performing Tasks in Computer Graphical Scenes’ Kirsten Cater, Department of Computer Science, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB Tel: 01179545112 Email: [email protected] Purpose of Research: To investigate performance of tasks in computer graphical scenes. It is imperative to mention at this point that in accordance with the ethical implications and psychological consequences of your participation to our research, that there will be a form of deception taking place however all participants will be debriefed afterwards about the actual purpose of this empirical investigation. Moreover, all the information that will be obtained about you is confidential and anonymity can be guaranteed. Furthermore, you can withdraw your participation at any time if you so wish. Finally, you have the right to be debriefed about the outcome of this study. You can thus contact the researcher, which will be willing to give you details of the study and the outcome. Having made those various points, please sign the following consent form, which states that you agree to take part in the current study. I HAVE HAD THE OPPORTUNITY TO READ THIS CONSENT FORM, ASK QUESTIONS ABOUT THE RESEARCH PROJECT AND I AM PREPARED TO PARTICIPATE IN THIS PROJECT. ____________________________ Participant’s Signature ____________________________ Researcher’s Signature 214 PERSONAL DETAILS: AGE: FEMALE/MALE HAVE YOU DONE ANY EXPERIMENT INVOLVING PSYCHOPHYSICS AND COMPUTER GRAPHICS BEFORE? (Please circle answer) Yes No RATING OF COMPUTER GRAPHICS KNOWLEDGE: (Please circle answer) 1 - Admires computer animations, e.g. Toy Story, but haven’t a clue how they are computed 2 - Have heard computer graphics concepts talked about but don’t understand what they mean 3 - Attending computer science course but haven’t covered graphics in detail yet 4 - Attended a computer graphics course 5 - Studying for/have got a PhD in Computer Graphics or similar related field 215 Part 1: 1. Can you please indicate the realism of the images? (Please answer this question by indicating your agreement or disagreement on the following 5-point scale ranging from 1 (not realistic at all) to 5 (very realistic): Image 1: 1 2 3 4 Not realistic at all Image 2: 1 5 Very realistic 2 3 4 Not realistic at all 5 Very realistic 2. Can you please indicate whether there were any observed differences between the images? (If yes please use the space provided in order to justify your response): YES NO …………………………………………………………………………………………… …………………………………………………………………………………………… …………………………………………………………………………………… 3. Can you please rate the rendering quality of the two images? (Please answer this question by rating your agreement or disagreement on the following 5-point scale ranging from 1 (High Quality) to 5 (Low Quality): Image 1: 1 2 3 4 HQ Image 2: 1 5 LQ 2 3 4 HQ 5 LQ 216 Part 2: 4. Can you please indicate which of the following items you remember seeing in the scene? (Please place a tick on the appropriate box (es)). Vase Computer Pens Phone Chair Toy Car Books Video Teapot Bottle Pictures Clock Crayons Mr Potato Head from Toy Story Palette Teacup Astray Photo frame 5. Did you notice any difference in the quality of the colour of the teapots in the two images? (If yes please justify your answer in the space provided below). YES NO …………………………………………………………………………………………… …………………………………………………………………………………………… …………………………………………………………………………………… 217 6. Did you notice any difference in the quality of the colour of the pictures in the two images? (If yes please justify your answer in the space provided below). YES NO …………………………………………………………………………………………… …………………………………………………………………………………………… …………………………………………………………………………………… 7a. Out of the two images shown to you by the experimenter, which was the first image? (Please circle the answer, if you are not sure circle don’t know) Image 1 or Image 2 Don’t Know 7b. Out of the two images shown to you by the experimenter, which was the second image? (Please circle the answer, if you are not sure circle don’t know) Image 3 or Image 4 Don’t Know Thank you very much for your participation! 218

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Detail to Attention: Exploiting Limits of the Human Visual