Download Interactive Rendering using the Render Cache

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Autostereogram wikipedia , lookup

Framebuffer wikipedia , lookup

Waveform graphics wikipedia , lookup

InfiniteReality wikipedia , lookup

Indexed color wikipedia , lookup

Anaglyph 3D wikipedia , lookup

Edge detection wikipedia , lookup

Tektronix 4010 wikipedia , lookup

BSAVE (bitmap format) wikipedia , lookup

Hold-And-Modify wikipedia , lookup

Stereoscopy wikipedia , lookup

2.5D wikipedia , lookup

Image editing wikipedia , lookup

Stereo display wikipedia , lookup

Rendering (computer graphics) wikipedia , lookup

Spatial anti-aliasing wikipedia , lookup

Transcript
Enhancing and Optimizing the
Render Cache
Bruce Walter
Cornell Program of Computer Graphics
George Drettakis
REVES/INRIA Sophia-Antipolis
Donald P. Greenberg
Cornell Program of Computer Graphics
Background
Render Cache
• “Interactive Rendering using the Render
Cache”, Rendering Workshop 1999
• Goal
- Interactive Rendering
- Exploit frame-to-frame coherence
- Decouple renderer from display framerate
- Reuse “expensive” rendering results
Background
Goal: Interactive rendering
Ray tracing
Path tracing
Background
Modified Visual
Feedback Loop
renderer
image
display
application
Asynchronous
interface
user
Background
Reproject rendered points
Original view
New view
Background
renderer
Display process
Update Points
Project/Z-Buffer
Depth Cull
Interpolate
Sampling
renderer
image
Background
Results after each stage
Projection
Depth cull
Interpolation
Background
Sampling
Displayed image
Priority image
Requested pixels
Related Work
Faster ray engines
• Optimize and parallelize
- E.g., Wald et al
Hardware-based display
• Mesh-based
- E.g., Tapestry, Holodeck, Tole et al
• Texture-based
- E.g., Corrective textures
Motivation
Render Cache works well
• Can enable interactive use of higher quality
ray-based renderers.
… but needs improvement
• Images too small (256x256)
• Gaps often visible during camera motion
• Not fast enough in tracking shading
changes
Enhancements
Tiled Z-Buffer
• Better scalability and memory coherence
Larger Interpolation Prefilter
• Can fill larger gaps between points
Predictive Sampling
• Improved quality during camera motion
Point Eviction
• Faster update of shading changes
Enhancements
Code Optimization
• Use of SIMD (MMX/SSE/SSE2)
• Data layout, branch conversions, etc.
Publicly Available
• For evaluation, comparison, or use
- Non-commercial binary release
- URL is in the paper
Memory Coherence
Change from R10K to Pentium 4
• Cache reduced from 4MB to 256K
• Clock increased from 195MHz to 1.7GHz
- Cache misses much more expensive
Change from 256x256 to 512x512
• Point data ~ 5MB, Image data ~ 3MB
- Much bigger than cache
Projection and Z-Buffer problematic
Projection and Z-Buffer
Random order memory access
- Read/modify/write operation is memory latency
limited
Point Cloud
5MB
Image - 3MB
Tiled Projection and Z-Buffer
Divide image into tiles
- Tiles sized to fit in cache
Point Cloud
5MB
Tile Buckets - 4MB
Image - 3MB
Tiled Projection and Z-Buffer
Project and bucket sort by tile
Point Cloud
5MB
Tile Buckets - 4MB
Image - 3MB
Tiled Projection and Z-Buffer
Z-Buffer each tile separately
Point Cloud
5MB
Tile Buckets - 4MB
Image - 3MB
Tiled Projection and Z-Buffer
Uses more memory and instructions
- But it is faster (25ms instead of 42ms)
Point Cloud
5MB
Tile Buckets - 4MB
Image - 3MB
Interpolation Filters
Larger filters
• Fill larger gaps in point data
• Generally more expensive
• Result in more blurring of the image
The previous Render Cache
• Used a 3x3 weighted filter
- Can only fill very small gaps
- Introduces only a small amount of blurring
Prefilter
Add a larger “backup” filter
• Results used only when 3x3 filter fails
• Uses a uniform 7x7 filter
- Can be computed cheaply
• Can fill in much larger gaps
• Does not affect sampling priorities
• Actually executed first then overwritten
- Hence the name “prefilter”
Prefilter
3x3 filter only
7x7 prefilter only
Both filters
Predictive Sampling
Sampling is purely reactive
• Helps to guide sparse sampling
• Samples returned in later frame
- Problem when large new regions become
visible
Predict large gaps ahead of time
• Project using a predicted camera
• Request samples before they are needed
Predictive Sampling
Projection is expensive
• 47% of original render cache cost
Use simplified projection
• No Z-Buffer
- Only need to find regions with no points
• Reduced resolution
- 1/4 width and height (1/16 # of pixels)
• Store only 1 byte per pixel
- Occupancy image fits easily in cache
Predictive Sampling
Example during rapid camera rotation
No Prediction
With Prediction
Algorithm Overview
renderer
Update Points
Prediction
Project/Sort
Z-Buffer
Depth Cull
Prefilter
Interpolate
renderer
Sampling
image
Point Eviction
Stale data can be worse than no data
• Points may live a long time at high ratios
- Not enough new samples to overwrite old
• Color change detection already exists
- Enhances sampling in regions of change
- Works by aging nearby points
Evict points beyond an age limit
• Speeds image convergence
SIMD Optimizations
Utilize MMX/SSE/SSE2 instructions
• Project four points at once
• Process R,G,B channel simultaneously
• Add memory prefetches
- Automatic prefetch works well for linear access
• Convert branches to data dependencies
- Compares set masks of zeroes or ones
- Use boolean operations instead of branches
• Roughly a factor of two total speedup
Results
Single 1.7GHz processor - rotating camera
Ray trace only (1.8 fps)
Render Cache (9 fps)
Results
Timing: 62.1 ms (up to 16 fps)
• 512x512 image, render cache only
• 1.7GHz Pentium 4 processor
Sampling
Update Points
Filter / Smooth
Prediction
Prefilter
Depth Cull
Project
Z-Buffer
Scalability with Image Size
1600000
1200x1200
1400000
1200000
Frame Size (Pixels)
1000000
800000
600000
400000
512x512
200000
0
0
50
100
150
200
Frame Time (ms)
250
300
350
Results
Try it for yourself
• Download publicly available binary
- Includes Render Cache and simple Ray Tracer
- Requires a Pentium 4 and Java Web Start
- Free for evaluation and internal use
- Http://www.graphics.cornell.edu/research/intera
ctive/rendercache
Demo
The End