Download Heading later - Intel® Software

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Image editing wikipedia , lookup

General-purpose computing on graphics processing units wikipedia , lookup

Apple II graphics wikipedia , lookup

BSAVE (bitmap format) wikipedia , lookup

Tektronix 4010 wikipedia , lookup

Hold-And-Modify wikipedia , lookup

Waveform graphics wikipedia , lookup

Framebuffer wikipedia , lookup

Spatial anti-aliasing wikipedia , lookup

Rendering (computer graphics) wikipedia , lookup

Ray tracing (graphics) wikipedia , lookup

Transcript
A Hybrid Computationally Efficient Parallel Algorithm for Best Visual
Quality 3D Real-time Graphics
Research Area: Rewriting Algorithms to help Parallel Programming
Authors: Amrit Asrani
Atheendra P Tarun
Athresh R Shigaval
Faculty Mentor: Mr.Sudhir Shenai
Name of the Institution: Global Academy Of Technology, Bangalore.
Abstract:
In the domain of computer graphics, real time visualization is achieved by two
prominent techniques: Z-buffer algorithm and the ray tracing algorithm. Both have its inherent
advantages and limitations. The Ray Tracing technique is capable of producing a very high
degree of photorealism, usually higher than that of typical scanline rendering methods but at a
greater computational cost. On the other hand Z-buffer is computationally fast but not the best
in visual quality. This paper proposes a new hybrid parallel algorithm which exploits the speed
of Z-buffer and the visual quality feature of Ray Tracing. This algorithm takes the Merged kDtrees proposed in the RAZOR architecture of Copernicus system and merges with the Z-buffer
parallel algorithm which uses a hypercube topology. Multithreading is introduced in the
construction stage of Merged kD-trees in computing dynamic scenes which serves as an input
to the Z-buffer which itself is parallel and the computation is inherently fast. Thus the
parallelism is introduced at various stages of the algorithm which is a linchpin for the high
quality 3D Real-time Graphics.
The new hybrid algorithm is presumed to be efficient as the experimental analysis of
the Z-buffer [1] and Ray Tracing [3] independently are proved to be efficient on their own
grounds, viz., in computation and visual quality respectively.
Background:
The Z Buffer algorithm is used to ensure that perspective works the same way in the virtual
world as in the real world. It is a type of Visual Surface Determination (VSD) algorithm[9].
Z buffering works by testing pixel depth and comparing the current position (z coordinate) with
stored data in a buffer (called a z buffer) that holds information about each pixel’s last position.
The pixel which is closer to the viewer is the one that will be displayed. This can be seen when
two squares overlap. The square on the top is visible but not the one below it. Z buffer
algorithm is used to virtualize such states.
The generic sequential approach to Z buffer algorithm is[1] :
For All the Objects in the scene
Project the Object in the image coordinate system
For every scanline of that Object
For all the pixel in a scanline
If Z coordinate of pixel< Zbuffer[pixel]
Write [pixel]
Zbuffer[pixel] = Z coordinate of pixel
End if
End for
End For
End For
Since the image contains several objects, the first step in the sequential algorithm is to project
each object onto the coordinate system. Objects are, then scanned row-by-row. Considering
each pixel in the scanline, its Z coordinate is compared with the value in the z-buffer, which is
previously initialized to infinity. If the z-coordinate is found to be lesser than the value in the
buffer, this new pixel is superimposed on the old pixel. This coordinate is copied onto the Z
buffer. Consequently, the pixels closer to the observer are displayed.
The Parallel Approach:
Improvisation of the sequential algorithm is achieved through the parallel approach.
To substantiate this point we take the example of two overlapping squares. Consider two
squares, one overlapping the other partially, as shown. Applying the z buffer algorithm for
coloring of pixels, the blue square is obscured partially by the yellow square, which is in the
foreground.
fig. 1
The parallel algorithm for this problem is[1]:
ParallelZbuffer()
Begin
Scatter(Vertices)
Scatter(Squares)
>> For all picture to compute Do
>>Project vertices from object to screen coordinate system
MultiBroadcast(Projected Vertices)
>>LocalLoad Estimation(Locals squares)
GlobalLoad = MultiReduce(LocalLoad)
MultiScatter(squares)
>>Sequential Zbuffer
Output the picture
EndDo
End
The parallel parts of this algorithm have been marked with >>.
In order to optimize the memory and computation requirement, our scene is represented by a
two-level
data
structure:
a
set
of
vertices
and
a
set
of
squares.
A
vertex is a set of 6 real numbers which define a point in a coordinate system and a normal for
this point. A square is a set of 4 vertices' indices (4 integers).
We describe the different parts of this algorithm:
Scatter(Vertices):All the vertices of the scene are equally distributed on the parallel
computer. The vertices come from a disk or from a previous
computation on the parallel computer.
Scatter(Squares):We equally distribute squares on the parallel computer.
Note that the squares, of a given processor, can make reference to vertices that might not be
present in the local memory of that processor.
Project vertices from object to screen coordinate system: The projections are done in
parallel. For each vertex we have to do a matrix vector multiplication. Furthermore, we use the
normal to shade the vertices and assign it a RGB color using the Gouraud model.
MultiBroadcast(Projected Vertices): After this step, each processor knows all the projected
vertices of the scene even if it doesn't use them.
LocalLoad = Estimation(Locals Squares): Each processor computes in parallel an
estimation of the load due to its own squares. We approximate the load associated with each
row of the picture, with the number of squares intersecting that row.
GlobalLoad = MultiReduce(LocalLoad): This global load allows to compute for each
processor which part of the picture to treat in order to have a balanced workload.
MultiScatter(Squares): Given image partition, we can compute the squares required by each
processor.
Sequential Zbuffer: We compute in parallel a sequential z-buffer for the part of the image
owned by each processor.
Write the picture: When all the sequential z-buffers are performed, we transfer the image to
an output device.
Ray Tracing:
Ray Tracing is a technique for generating an image by tracing the path of light through pixels in
an image place. Ray tracing gives the best visual quality but is not fast enough to support real
time computation of graphics. There are several possibilities how to make a ray-tracing or raycasting faster. One class of approach employs data structures for speeding up the search for a
closest intersection on a ray. Data structures which support efficient geometric search allow us
to look at only a small percentage of the scene to determine the closest intersection. Octrees,
kD trees, and nested bounding volumes are examples of explicitly hierarchical search
structures of this type. A kD Tree (k-Dimensional Tree) is a space-partitioning data structure
for organizing points in a k-dimensional space[10].
fig.2 - kD Tree Structure
buildkd()[4]:
1) Create a root node for the kD-tree with the scene bounding box and the scene graph root
node.
2) Set the current node to be the root.
3) Set the current discrete LOD level to be the coarsest supported level.
4) Subdivide the geometry at the current node until it satisfies the current discrete
LOD criteria.
5) Build out the kD-tree from this node until the tree termination criteria are satisfied.
6) Retain the current geometry (these nodes are effectively leaves for the current discrete LOD
level).
7) Set the current discrete LOD level to the next finer level.
8) Go to step 4.
At the beginning of every frame, kD-tree construction is initialized with a single root kD-tree
node containing the bounding box of the entire scene and a single pointer to the root of the
scene graph. All further kD-tree building is triggered by traversal operations during ray tracing
The Problem Statement:
The faster Z-Buffer algorithm is not well suited for higher level visibility/occlusion culling. It is
highly resolution dependent and prone to accuracy problems. On the other hand ray tracing
algorithm provides dynamic scenes, high image quality and execution on programmable
multicore architectures[3]. But it’s considerably slow, which leads to the requirement of a new
algorithm which combines the advantages of both these methods.
Our Hybrid Methodology:
Our hybrid approach, imbibes the advantages of both the ray tracing algorithm and the
conventional Z-buffer algorithm, in which we provide the input to the z buffer method of
computations using the kD tree method .
HybridZbuffer()
Begin
>> If (frame received) do
BuildkD()
>>
For all picture to compute Do
>>
Project vertices from object to screen coordinate system
MultiBroadcast(Projected Vertices)
>>
LocalLoad Estimation(Locals squares)
GlobalLoad = MultiReduce(LocalLoad)
MultiScatter(squares)
>>
Sequential Zbuffer
Output the picture
EndDo
End
Its flow chart is:
In this algorithm we first check if a frame is received, if so the buildkD function is called where
the kD tree is created. Since creating kD trees is a time consuming process it is more effective
when parallelized. This serves as the input to the z-buffer algorithm. In each thread the
calculations of normal z buffer algorithm is carried out as discussed earlier. Thus we obtain a
new improvised hybrid algorithm which has the advantages of both the z buffer algorithm as
well as the ray tracing algorithm.
Key Results:
Considering the generation of the image of a teapot
fig.3
According to the hypercube topology[1] proposed by S. Miguet and J. Li based on a ring of
processors, the variation of execution time with the increasing number of processors, is shown
in the graph below[]. The times are given for two sizes of pictures: 256 by 256 pixels and 512
by 512 pixels. It can be clearly observed that there’s a sharp decrease in the execution time,
when we switch from single core to multiple cores. But further increase in the number of cores
does not yield much improvement over its predecessors.
Discussion:
Conventional Z-buffer used for 3D graphics does not provide complex illumination effects like
soft-shadows, reflections and diffuse lighting interactions. Though the Copernicus system,
which utilizes the ray tracing technique, has been considered as its substitute because of its
features like dynamic scenes, high image quality and execution on programmable multicore
architecture, it is considerably slow compared to the Z-buffer. Our algorithm is designed to
contain the advantages of both the above mentioned algorithms and is presumed to be more
competent for computation of set of images, when the polygons are already present in the local
memory and need only a global declaration to be correctly distributed among the processors.
Conclusion and Future Work:
Parallel implementations of various computer graphics algorithms like Z-Buffer, Shadow
Mapping and Ray Tracing achieve good speed up compared to their sequential counter parts.
The proposed algorithm, though untested, promises to deliver satisfactory results and
overcomes the inadequacies of the z-buffer and ray tracing techniques. Buffering of the output
of the kD tree makes it possible to incorporate reflections, refractions, transparency while
reducing the complexity of the algorithm. This gives scope for achieving previously unattainable
image processing capabilities, preceded by extensive testing and analysis of our algorithm.
References:
[1] Henri-Pierre Charles, Laurent Lefèvre and Serge Miguet. An optimized and loadBalanced portable parallel Zbuffer, 2007.
[2] Paul S. Heckbert and Michael Herf. Simulating Soft Shadows with Graphics
Hardware. Carnegie Mellon University,Pittsburgh. January 15,1997.
[3] Venkatraman Govindaraju, Peter Djeu, Karthikeyan Sankaralingam, Mary
Vernon, William R. Mark. Toward A Multicore Architecture for Real-time Raytracing. The University of Texas at Austin, 2008.
[4] Gordon Stoll, William R. Mark, Peter Djeu, Rui Wang, Ikrima Elhassan. Razor: An
Architecture for Dynamic Multiresolution Ray Tracing. The University of Texas at
Austin. April 26,2006.
[5] Kenneth I. Joy. THE DEPTH-BUFFER VISIBLE SURFACE ALGORITHM. University of
California,1996.
[6] Karthik Ramani, Christiaan P Gribble, Al Davis. StreamRay: A Stream Filtering
Architecture for Coherent Ray Tracing. University Of Utah,2009.
[7] Nelson Max, Keiichi Ohsaki. Rendering Trees From Precomputed Z-Buffer Views.
University Of California, Davis.
[8] Michael Wand, Matthias Fischer, Ingmar Peter, Friedhelm Meyer auf der Heide,
Wolfgang Straber. The Randomized z-Buffer Algorithm:Interactive Rendering of
Highly Complex Scenes. Universties Of tubingen and Paderborn,2001.
[9] www.whatis.com
[10] www.wikipedia.org
Acknowledgements:
We are grateful to our college, the HOD and our faculty mentor for all the support and
encouragement we have received from them. We would also like to thank Intel for giving us an
opportunity to present this paper.