Download GPU based Single-Pass Ray Casting of Large Heightfields

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Apple II graphics wikipedia , lookup

BSAVE (bitmap format) wikipedia , lookup

Waveform graphics wikipedia , lookup

Graphics processing unit wikipedia , lookup

Spatial anti-aliasing wikipedia , lookup

Framebuffer wikipedia , lookup

InfiniteReality wikipedia , lookup

2.5D wikipedia , lookup

General-purpose computing on graphics processing units wikipedia , lookup

Ray tracing (graphics) wikipedia , lookup

Rendering (computer graphics) wikipedia , lookup

CGI2012 manuscript No.
(will be inserted by the editor)
GPU based Single-Pass Ray Casting of Large Heightfields
Using Clipmaps
Dirk Feldmann · Klaus Hinrichs
Abstract Heightfields have proved to be useful for
rendering terrains or polygonal surfaces with finestructured details. While GPU-based ray casting has
become popular for the latter setting, terrains are commonly rendered by using mesh-based techniques, because the heightfields can be very large and hence ray
casting on these data is usually less efficient. Compared
to mesh-based techniques, ray casting is attractive, for
it does not require to deal with mesh related problems
such as tessellation of the heightfield, frustum culling
or mesh optimizations. In this paper we present an approach to render heightfields of almost arbitrary size at
real-time frame rates by means of GPU-based ray casting and clipmaps. Our technique uses level-of-detail dependent early ray termination to accelerate ray casting
and avoids aliasing caused by texture sampling or spatial sampling. Furthermore, we use two different methods to improve the visual quality of the reconstructed
surfaces obtained from point sampled data. We evaluate our implementation for four different data sets and
two different hardware configurations.
Keywords ray casting · rendering · single-pass ·
clipmap · heightfield · terrain
1 Introduction
Heightfield rendering has numerous applications in science and entertainment. One major application is terrain rendering which is more and more used to map high
resolution aerial photographs acquired by air planes,
satellites or unmanned aerial vehicles onto a digital surface model (DSM) of the covered area. This approach
Dirk Feldmann · Klaus Hinrichs
VisCG, Department of Computer Science, University of
Münster, Germany
preserves depth perception and provides context and
other information to the viewer. Popular examples are
NASA World Wind [23] or Google Earth [15]. Since
textured polygonal meshes can be processed and rendered by GPUs at high speed, a widely used rendering
technique stores a DSM in (grayscale) texture maps
(so called heightmaps or heightfields) and uses them
to displace the vertices of the corresponding polygonal
mesh [6].
However, most renderers accept only triangle meshes
which can become rather complex and may easily consist of millions of triangles. During mesh generation particular attention has to be paid to different issues, e. g.,
to not produce any cracks, to choose appropriate tessellations and to avoid aliasing caused by small or distant
Therefore it appears to be attractive to bypass the
entire process of converting a heightfield into a mesh
which finally is rasterized and results for many triangles in at most a few pixels whose corresponding fragments succeed in passing all of the numerous tests encountered on their way through the rendering pipeline.
Techniques like relief mapping [27] or parallax occlusion
mapping [33] can make use of pixel shaders on modern
GPUs to perform real time ray casting on heightfields in
order to calculate the displaced sample positions in corresponding color textures which contribute to the final
fragment color. During this ray casting fine-structured
details can be added to surfaces without further tessellating the underlying polygonal mesh. In many cases
this even allows to reduce the polygonal mesh to a single planar quadrilateral which usually consists of only
two triangles.
In order to speed up the ray casting and to achieve
real-time frame rates, many GPU-based heightfield rendering techniques employ maximum mipmaps to access
the DSM. As the size of texture maps that can be handled by GPUs is currently limited by vendor specific
restrictions and ultimately by the amount of available
video memory, large DSMs cannot be stored in a single
heightfield texture for direct access during GPU-based
ray casting.
In this paper we present a GPU-based heightfield ray
casting technique which performs single-pass rendering
of heightfields of almost arbitrary sizes in real time. Our
main contribution is to demonstrate how clipmaps and
current graphics hardware can be used to speed up the
ray casting and improve the image quality by early ray
termination based on level of detail selection while alleviating the aforementioned video memory limitations.
Additionally we use two different refinement methods
to improve the appearance of the reconstructed surfaces
in our renderings. We demonstrate the performance of
our technique for four large data sets of up to 31 GB
2 Related Work
Much research has been performed on CPU-based ray
casting of heightfields as well as on terrain rendering
based on polygonal meshes. Since summarizing these
two areas would exceed the scope of this paper, we
confine ourselves to an overview of recent GPU-based
heightfield ray casting methods related to our work.
Qu et al. [28] presented one of the first GPU-based ray
casting schemes for heightfields which primarily aims at
accurate surface reconstruction of heightfields but does
not use any sophisticated structures for acceleration.
Relief mapping [25] and parallax (occlusion) mapping
[16] are techniques for adding structural details to polygonal surfaces, which have their origin in CPU-based rendering and improve upon the disadvantages of bump
mapping [4]. Both techniques have been implemented
for GPUs (e. g. [27, 33]) and benefit from programmable
graphics pipelines. But as most of these implementations resemble the strategies used in CPU-based ray
casting, like iterative and/or binary search to detect
heightfield intersections, they are prone to the same
kind of rendering artifacts caused by missed intersections in highly spatially variant data sets. An introduction to these closely related techniques can be found
for instance in [1], and more details are given in the
comprehensive state-of-the-art report by Szirmay-Kalos
and Umenhoffer [30] which focuses on GPU-based implementations.
Oh et al. [24] accelerate ray casting and achieve realtime frame rates by creating a bounding volume hierarchy (BVH) of the heightfield, which is stored in a
maximum mipmap and allows to safely advance along
Dirk Feldmann, Klaus Hinrichs
the ray over long distances (see section 3.3). They also
present a method based on bilinear interpolation of
heightfield values to improve the quality of the reconstructed surface obtained from point-sampled data. The
method presented by Tevs et al. [34] also relies on BVHs
stored in maximum mipmaps, but uses a different sampling strategy. Their method advances along the ray
from one intersection of the projected ray with a texel
boundary to the next such intersection, whereas Oh
et al. use a constant step size to advance along the ray.
In addition, Tevs et al. store in each heightfield texel
the height values at the four corners of a quadrilateral
encoded as an RGBA value instead of point samples,
which allows surface reconstruction on parametric descriptions.
Compared to other techniques which also rely on preprocessed information about the heightfield and acceleration data structures, like for instance relaxed cone
step mapping [10, 26, 19], maximum mipmap creation is
much faster and can be performed on the GPU [34].
All these methods have in common that they operate
on single heightfields of relatively small extents which
are intended to add details to surfaces at meso- or microscales instead of representing vast surfaces themselves. Recently Dick et al. [8] have presented a method
for ray casting terrains of several square kilometers extent at real-time frame rates. Their method also employs maximum mipmaps to accelerate the ray casting
process and a tiling approach to render data sets of
several hundred GB size. They also presented a faster
hybrid method which uses ray casting or rasterizationbased rendering, but requires knowledge of the employed
GPU respectively a training phase to decide whether to
use rasterization or ray casting [9].
Our method presented in this paper also aims at rendering very large heightfields only by means of GPU
ray casting. It has been inspired in large parts by the
works of Dick et al. and Tevs et al. as we also employ a
tile-based approach and their cell-precise ray traversal
scheme. But in contrast to the technique by Dick et al.,
which creates a complete mipmap for each tile and requires additional rendering passes to determine the visibility of the tiles, our method further accelerates the
ray casting process and requires only a single rendering
pass by using a tile-based clipmap implementation.
The clipmap, as introduced by Tanner et al. [32], is
based on mipmaps [35] in order to handle very large
textures at several levels of detail which would exceed
the available video or main memory. While the original version requires special hardware, modern GPU
features have superseded these requirements and other
clipmap implementations (or virtual textures) have become available [11, 7, 29, 20, 17, 31, 13] whereupon most
GPU based Single-Pass Ray Casting of Large Heightfields Using Clipmaps
of them rely on texture tiles and permit handling of
arbitrarily large textures as briefly described in section 3.1. Geometry clipmaps as introduced by Losasso
et al. [18], and derived GPU-based variations [3, 5] have
also been used in terrain rendering, but according to our
knowledge only in the context of mesh-based rendering
and not for accelerating ray casting.
3 GPU-based Single-pass Ray Casting Using
In this section we briefly present our tile-based clipmap
implementation, followed by a description of the used
storage scheme for heightfields. Next we describe the
employed ray traversal method, which is basically the
same as the one described in [8], and we discuss how
we accelerate it and avoid aliasing by using clipmaps.
Finally, we present two refinement methods which we
use to improve the appearance of the reconstructed surfaces.
of texture. This process is repeated until the original
texture is completely covered by a single tile of n × m
texels at the least detailed level l = L − 1, which can be
used to derive an ordinary mipmap. In the following we
use the term “clipmap” to refer only to these lower L
levels of a complete, tile-based clipmap but stick with
the terminology as used by Tanner et al. [32].
A clip center depending on the current location and
viewing direction of the virtual scene camera is used
to determine for each level the tiles that are needed
in the current frame. This group of neighboring tiles
is called the active area and located in video memory.
The clip area formed by a larger superset of tiles is kept
in main memory, and the remaining tiles are stored in
secondary memory, e. g., on hard disk. Since the lower
levels of a corresponding mipmap are effectively clipped
to smaller areas, this data structure is called clipmap.
Figure 1 illustrates the principle of a tile-based clipmap.
Once the tiles have been uploaded to video memory,
3.1 Tile-based Clipmap Implementations
Clipmaps are storage schemes for texture maps (textures) which are based on mipmaps and rely like these
on the principle of using pre-filtered data to avoid aliasing artifacts when multiple texels are mapped to one
pixel or less in screen space due to perspective projection (texture minification) [35]. In contrast to mipmaps,
clipmaps only keep those data in memory which are
relevant for rendering the current frame, and they use
caching techniques to reload and update these data.
This reduces the amount of (video) memory occupied
by texture data and also allows to handle textures which
would by far exceed the limits of video or main memory. The clipmap by Tanner et al. [32] relies on special
hardware to update the texels in video memory when
the viewer’s eye point is moved. Modern GPUs allow to
implement clipmaps by using texture tiles and accessing
them in fragment shaders, e. g., by means of texture arrays. Our implementation uses a Flexible Clipmap [13]
which is constructed as described in the following.
At the level l = 0, which corresponds to the finest resolution, the original virtual texture is partitioned into
smaller tiles of n×m texels (tile size). Like in a mipmap,
each 2×2 neighboring texels at level l are combined in a
certain way into a single texel at the next coarser level
l + 1, which implies that 2 × 2 neighboring tiles at level
l correspond to one tile of the same tile size at level
l + 1. With color textures for instance, the combination
may simply be an averaging operation on the values of
the four texels, but the operation depends on the kind
Fig. 1: Structure of a tile-based clipmap with L = 4
clip levels with active areas of at most 3 × 3 tiles (dark
gray) and clip areas of at most 5 × 5 tiles (light gray).
they can be accessed by shaders for rendering. When
the clip center is relocated, i. e., the virtual camera is
moved, tiles stored in video memory and main memory
can be replaced by neighboring ones from main memory respectively secondary memory, if necessary. If the
virtual camera is located for instance far away from the
textured surface currently visible, only the coarser resolution (higher) levels are required, as the texels from
the lower levels would cause aliasing. Hence it is not always required to keep the active areas of all clip levels
in video or main memory. Of course the tile size has to
be chosen carefully to ensure that the tiles themselves
are manageable by the graphics hardware. More details
on clipmap specific issues can be found in [32, 7].
3.2 Clipmaps for DSM Storage
Due to their relation, clipmaps and mipmaps can be
created and used in very similar ways. To use a digital
surface model (DSM) for rendering, in our approach
the heightfield values are stored in the clipmap tiles at
the finest resolution (lowest) level l = 0. A texel at
level l > 0 obtains as height value the maximum height
value of the corresponding 2 × 2 subordinate texels at
level l − 1. If we identify each texel with a bounding
box defined by its height value and its grid cell in the
texture, we obtain a bounding volume hierarchy (BVH)
of the underlying DSM as illustrated in figure 2. This
Fig. 2: BVH derived from a heightfield on a regular grid. Gray boxes correspond to samples at level 0.
Bounding boxes on higher levels and their maximum
value are highlighted by the same color.
is the same construction scheme as used with maximum mipmaps [24, 34, 8]. In the method presented by
Dick et al. [8], the heightfield is split into tiles as well,
but a separate maximum mipmap is created for each
tile. To render vast DSMs, this approach may require
either lots of tiles and thus mipmaps to be present in
video memory or additional rendering passes, especially
if the heightfield is shallow and there is little occlusion between tiles. Furthermore, the tiles located far
away from the viewer may contain fine spatial details,
e. g., steep summits of distant mountains, which are not
only not perceivable from far away but may also expose
spatial aliasing artifacts due to minification caused by
perspective projection. The latter aspect is the same
which motivated the development of mipmaps for texture mapping and also applies to mesh-based rendering
techniques which therefore strive to determine an appropriate level of detail (LOD) in order to avoid rasterizing triangles that would become projected to less
than one pixel in screen space.
The important difference between the usage of clipmaps
and multiple mipmaps is that in case of clipmaps the
BVH spans the entire domain at the topmost level. A
proper placement of the clip center results in the selection of only those tiles of highest resolution at level
l = 0 which are closest to the virtual camera and thus
potentially have to be rendered in full detail. Compared
to level l, at level l+1 the area of the heightfield covered
by a tile is four times larger, and the spatial resolution
is divided in half along each direction of the grid. Thus
the entire domain is spatially pre-filtered and the level
of detail of the heightfield decreases with increasing dis-
Dirk Feldmann, Klaus Hinrichs
tance towards the viewer. Because higher clipmap levels
also correspond to larger bounding boxes, we can exploit this fact to accelerate GPU ray casting in the far
range of the scene as described in the following section.
3.3 Rendering and Accelerating Ray Casting
Given a DSM stored in a clipmap of L levels, we set
the clip center simply by projecting the center of the
viewport into the scene. We also ensure that all tiles in
the active areas of all clip levels or at least the highest
(coarsest) ones are stored in video memory by choosing
appropriate sizes for the tiles and the active area. The
axis-aligned bounding box of the entire DSM, which is
associated with the topmost tile, is based in the xzplane of a left-handed world coordinate system. It is
represented by a polygonal mesh consisting of 12 triangles which serves as proxy geometry for the ray casting
process. A vertex shader calculates normalized 3D texture coordinates from the vertex coordinates of the box
corners, and the clipmap is positioned at the bottom
of the box corresponding to the minimum height value
y = Hmin of the DSM. Hmin and the maximum height
value Hmax are both determined during loading of the
topmost clipmap tile on the CPU. By rendering the
back faces of the proxy geometry we obtain each ray’s
exit point e, and we pass the camera position and the
geometry of the bounding box in world coordinates to
the fragment shader which calculates each ray’s direction d = (dx , dy , dz ) and entry point s to the proxy
geometry and transforms them into normalized 3D texture space. If the camera is located within the bounding box the entry point s becomes the camera position
(cf. [19]). In order to avoid that faces of the proxy geometry are clipped against the far plane of the view
frustum of the virtual camera and hence exit points are
missing, the box is fitted into the view frustum when
the camera is translated.
The actual ray traversal is performed by projecting the
ray onto a clip level dependent 2D grid. For a given
clip level 0 ≤ l < L the extensions of this grid are
determined by (Gu (l), Gv (l)) = W
, H with (W, H)
2l 2l
being the extensions of the DSM in sample points, i.e.,
texels. Hence, the grid at level l has the same size a
single texture containing the entire DSM at mipmap
level l would have. The current height py of a location p = (px , py , pz ) = s + k · d on the ray is retained
and updated in world coordinates to test for intersections with the heightfield. During ray traversal we move
from one intersection of the projected ray dp = (dx , dz )
with a texel boundary to the next such intersection,
i. e., from the projected ray’s entry point enp into a
grid cell directly to its exit point exp as shown in fig-
GPU based Single-Pass Ray Casting of Large Heightfields Using Clipmaps
ure 3. The only exception is at the first entry point
which is the projection of s. We start ray casting at
If a ray hits a bounding box B at some level l > 0
it does not necessarily have to hit any bounding boxes
contained in B at level l −1. This cannot be determined
without descending to the lower level. In order to avoid
using the smaller step size over longer distances when it
is not really necessary, we move up again to level l if we
detect that the ray does not hit any bounding box at
level l − 1 (cf. [34, 8]). These three different cases for the
intersection of a ray with a bounding box are illustrated
in figure 4. The ray casting process is terminated if
Fig. 3: Rays are traversed from one intersection of the
projected ray with a texel boundary to the next such
the coarsest (highest) clip level L − 1 of the BVH at
which the entire DSM is given in a single tile and each
pixel corresponds to the maximum value and thus the
bounding box of 2L−1 × 2L−1 texels at level 0. To determine whether a ray hits a bounding box at level l,
the clipmap tile containing the grid cell which belongs
to the current enp and exp has to be sampled for the
associated height value h. Since the direction of the ray
is needed to determine this grid cell we store the sign
bits of the components of d in the lower three bits of
an integer. This bit mask is created once for each ray
using bit-wise operations in the fragment shader, and
it is evaluated as needed by switch-statements to determine the direction of a ray instead of duplicating
the shader code for the ray casting loop for each of the
overall eight possible branches.
When moving along the ray from point en to point ex
we hit the box surface if the ray is directed downwards
(resp. upwards) and ex (resp. en) lies below the top of
the box (at height h). If a ray hits a bounding box B
at the current level l, it may also hit a bounding box
contained in B at a lower level of the BVH. Therefore
the ray casting process is repeated at the next lower
level l0 = l − 1 from the current position en of the ray,
but only if it is possible and reasonable to proceed as
described in section 3.4. Otherwise the lowest possible
level l = lmin has been reached, and the exact intersection i on the bounding box surface is calculated by
dy ≥ 0
en + d · max
dy < 0
dy , 0
If a ray does not intersect a bounding box B at level l,
it cannot intersect any of the bounding boxes contained
in B at any lower level either, and we therefore advance
along the ray to ex which becomes the entry point en
of the next cell. Compared to a ray traversal performed
just on level 0, only one instead of 2l ×2l samples have to
be tested for intersection, which results in a significant
speed up of the process (cf. [34], [8]).
Fig. 4: Intersection of ray with a height field. The green
ray hits the left red box, but none of the black boxes
either a valid intersection point i on a bounding box
has been found, the ray leaves the domain of the DSM,
or the maximal number of ray casting steps exceeds
2 · max(n, m) with n, m as the tile size in texels. In
the latter two cases, the fragment from which the ray
originates is discarded by the shader.
3.4 LOD-determined Ray Termination
To decide whether we can terminate ray casting at the
current level, we check the following two conditions.
First, we determine at each intersection of a bounding
box the highest resolution available, i. e., the lowest clip
level llow of a tile which covers the corresponding area of
the DSM and is present in video memory. The clipmap
tiles from the active areas of all clip levels are stored in a
texture array which is accessed by the fragment shader.
The Flexible Clipmap uses a certain tile layout and an
additional texture, the tile map [7], to find llow and the
index in the texture array where the corresponding tile
has been stored during its upload into video memory
(see [13] for details). The tile map covers the entire domain of the DSM as well, but each texel corresponds to
one tile of n × m texels at the lowest level l = 0. Each
texel stores the lowest clip level of the tile which covers the corresponding area of the DSM and is currently
present in video memory. For instance, given a tile size
of n = m = 512 texels, a tile map of 512 × 512 texels
holds information about the clip levels of 5122 × 5122
heightfield samples. When tiles at and above level l ≥ 0
Dirk Feldmann, Klaus Hinrichs
are available in video memory, the tile map contains a
square region of 2l ×2l texels with value l (cf. [32]). The
tile map is created on the CPU whenever the cache for
the clipmap tiles is updated due to relocations of the
clip center, and tiles are uploaded in top-down order
to ensure that at least the highest levels are present if
secondary caching structures cause a delay, e. g., when
tiles have to be loaded from hard disk. Thus, by transforming the hit point i on the bounding box surface to
normalized texture coordinates the shader can determine llow by a single texel-precise texture lookup in the
tile map.
Second, the optimal clip level lopt at the current hit
point i = (u, hgrid , v) is determined by the minification
of the corresponding box at level l = 0 in screen space
(cf. [12]). We project the four corners of the cell’s box
κ = (buc · Rx , bhgrid c · Ry , bvc · Rz ), λ = κ + (Rx , 0, 0),
µ = κ + (0, 0, Rz ) and ν = κ + (0, Ry , 0) from world
space into normalized screen space using the model,
view and projection matrix combined in M followed by
perspective division to obtain the vectors a, b, c and f ,
where Rx , Ry , Rz are the numbers of world space units
per heightfield sample along the respective direction.
Then we calculate the areas A1 , A2 and A3 of the projected faces of a box in screen space:
p = (b − a), q = (c − a), r = (f − a)
3.5 Sampling Color Textures
In our implementation, each clipmap tile can consist of
several different texture layers which are handled identically and only differ by the stored data and their texel
aggregation scheme. For each tile we provide an additional layer for a registered color texture to texture
the DSM. This color texture layer is uploaded along
with the heightfield layer and accessed in the fragment
shader via a second texture array. As long as they cover
the same area in world space, the different layers of the
tiles do not even need to be of the same resolution. However, we have not yet implemented this, and therefore
one heightfield sample corresponds to one color sample. In general, to avoid aliasing when sampling the
color texture layer, we would have to determine the
ideal LOD ltex at the final hit point i in the heightfield separately and transform it to the corresponding
tile which holds the color texture layer. This LOD ltex
can be calculated in the same way as lopt during ray
casting (see section 3.4), but in case of a 1:1 relation
of heightfield and color samples we can directly use lopt
and the texture coordinate for the heightfield layer obtained during ray casting to sample the color texture.
The final fragment color is obtained by linear interpolation between the linearly interpolated color values from
the two LODs adjacent to ltex (trilinear interpolation).
A1 = |p × q| = |(px · qy ) − (py · qx )|
A2 = |p × r| = |(px · ry ) − (py · rx )|
3.6 Refinement of Block-sampled Heightfield
A3 = |q × r| = |(qx · ry ) − (qy · rx )|
We want the largest face of one box in screen space A =
max (A1 , A2 , A3 ) to correspond to one texel of a tile at
level lopt in texture space which itself has an area of
P = n·m
. Hence 2lopt = P
A and lopt = − log2 (A · n · m)
Instead of descending to a full resolution mipmap level
which may cause aliasing we can now terminate ray
casting already at level lmin = max (llow , lopt ). The two
different LODs llow and lopt are visualized in figure 5
where each level is coded by a different color.
As pointed out by Oh et al. in [24], the point sampled
DSMs and their treatment as boxes results in blocky images which from a closeup view remind of models built
of bricks (see figure 6a). Because this effect may be un-
(a) none
(b) linear
(c) bicubic
Fig. 6: Demonstration of the improvement in surface
quality achieved by different refinement methods.
(a) llow
(b) lopt
Fig. 5: The two different LODs llow and lopt are used
to terminate the ray traversal and to avoid aliasing.
wanted in most applications, we also implemented two
refinement methods to obtain smooth surfaces. Both
refinement methods are applied after the intersection i
on the bounding box surface has been determined as
described in section 3.3.
GPU based Single-Pass Ray Casting of Large Heightfields Using Clipmaps
The first method is the one presented by Oh et al. [24]
and relies on linear interpolation of two samples obtained from the linearly interpolated heightfield, which
are taken at a distance of each one half cell from i
in forward respectively backward direction along the
ray. This method works quite well and does hardly
slow down the overall performance on modern GPUs,
but in our implementation, some defects – presumably
caused by numerical inaccuracies – on surfaces with
steep slopes remain, as shown in figure 6b. Despite these
small defects, which are barely noticeable during animations or from farther viewing distances, the surfaces
look much smoother.
Our second method uses Hermite bicubic surfaces to
improve the reconstruction of the heightfield. Let (u, v)
denote the projection of i onto the grid of the heightfield where ray casting has been terminated. We interpret the junctions at the four corners of the grid cell
containing (u, v) and its eight neighbors as the corners
of a bicubic surface patch. The four junctions are given
α = (buc , min (SW, S, C, W ) , bvc)
β = (buc + 1, min (S, SE, E, C) , bvc)
γ = (buc , min (W, C, N, N W ) , bvc + 1)
δ = (buc + 1, min (C, E, N E, N ) , bvc + 1)
with C as the height value of the cell containing (u, v)
and SW, S, SE, E, N E, N, N W, W as the height values
of the neighboring cells, starting at the left lower cell
adjacent to α and enumerating them in counterclockwise order (see figure 7). Each patch is parametrized
Fig. 7: Construction scheme for a Hermite bicubic
patch from 3 × 3 heightfield samples surrounding the
projection of intersection point i on the bounding box.
along the grid axes by (s, t) ∈ [0, 1], and the height
h(s, t) on the surface patch is given by
h(s, t) = s3 s2 s 1 · H · G · H T · t3 t2 t 1
∂β 
αy βy ∂vy ∂vy
2 −2 1 1
 γ δ ∂γy ∂δy 
−3 3 −2 −1
∂v 
,G = 
 ∂αy ∂βy ∂ ∂v
αy ∂ 2 βy 
0 0 1 0
 ∂u ∂u ∂u∂v
∂γy ∂δy ∂ 2 γy ∂ 2 δy
1 0 0 0
∂u∂v ∂u∂v
(cf. [14]). The partial derivatives which define the tangential planes on the patch are approximated by using forward respectively backward differences and by
making the following simplifications for the first order
∂αy ∂γy
≈ C − W,
∂αy ∂βy
≈ C − S,
≈N −C
Although the matrix G at each grid cell respectively
texel of the clipmap storing the heightfield is constant,
we calculate it directly in the fragment shader as needed.
The pair of parameters (s, t), which corresponds to an
intersection with the bicubic patch instead of the bounding box, is determined by a second ray casting. Starting
at i on the bounding box surface, the ray p = i + k · d
is advanced at a fixed step width until it either hits
the bicubic patch, i. e., py ≤ h(s, t), or it leaves the domain of the box without intersection. In the latter case,
we treat i as an entry point s on the proxy geometry
and proceed with the accelerated ray casting process
described in section 3 from the current level. We found
a subdivision into 16 steps for traversing the bounding
box of a cell to be completely sufficient, independent of
the clip level l. Fewer subdivision steps expose defects
by missed intersections, whereas increasing the number
of subdivision steps only reduces frame rates without
further improving the reconstruction of the surface.
Besides their simplicity and the possibility to calculate all the relevant information in the fragment shader,
we decided to use Hermite bicubic patches because we
wanted to ensure that the surface remains inside the
bounding boxes of the BVH. By constructing the patches
as described above, we can ensure that they stay completely inside the bounding boxes as we control the
defining tangential planes. The direct usage of forward
and backward differences in our implementation avoids
any scaling of the tangents and therefore leads to desired C 1 continuity between neighboring patches, because their tangents have the same direction and magnitude (cf. [14]). The most severe drawback of this method
is its high computational cost, although we still may
achieve interactive frame rates (see section 4.2). Furthermore, as this method ensures that the height of
each patch is less or equal than the height of its bounding box, and the tangents are not scaled, isolated peaks
in the heightfield become clearly flattened as can be
seen in figure 6c.
However, both refinement methods presented in this
section rely on interpolation of point sampled data on
a regular grid, and only serve in making the resulting
renderings visually more appealing. Besides, even if it
might appear to be sufficient to apply refinement only in
Dirk Feldmann, Klaus Hinrichs
City 1
City 2
Blue Marble
extent [km]
1.4 × 1.0
20.9 × 26.3
≈ 40075.0 × 19970.0
≈ 40075.0 × 19970.0
W ×H
5600 × 4000
83600 × 105200
21600 × 10800
86400 × 43200
size DSM
133 MB
31.6 GB
1.3 GB
19.2 GB
size color texture
99 MB
14.4 GB
time [min]
Table 1: Properties of the different data sets used to evaluate performance. L denotes the total number of clip
levels which have been created, W × H is the grid size at level 0 respectively the size a single texture would have.
Column time contains the durations of the virtual camera flights for our evaluation in minutes.
cases when the viewer is close to a highly detailed area
where the block sampled nature of the data becomes
apparent, we refine the surface at all discrete LODs,
because the transition between large distant boxes and
smooth surfaces is rather disturbing during animations.
In addition, the lighting conditions on smooth surfaces
and blocks are different due to distinct surface normals.
data set
City 1
City 2
Blue Marble
4 Performance Results and Discussion
The implementation of our technique relies on OpenGL
and GLSL 1.50 shaders, and we demonstrate its performance by means of renderings of the four different
data sets listed in table 1. The data set City 2 was
acquired by means of photogrammetric methods from
aerial images. City 1 depicts a small area in City 2 in
which we have a color texture available that has been
derived from orthographic aerial images. The data sets
ETOPO1 [2] and Blue Marble [21] depict the entire
earth and are both derived in large parts from SRTM
data [22], but ETOPO1 also contains bathymetric data,
whereas Blue Marble possesses a color texture derived
from satellite images.
When being sampled in the fragment shader, the height
values are scaled by factors given in column scale in
order to avoid flattened surfaces. Shallow surfaces do
not challenge our ray caster because less mutual occlusions lead to fewer level changes in the BVH during ray
traversal. Renderings of three data sets are shown in
figure 8.
4.1 Evaluation Setup and Results
We used tile sizes of 512 × 512 texels, active area sizes
of 5 × 5 tiles and clip area sizes of 7 × 7 for all data
sets in our tests. The near resp. far plane of the virtual camera were set to 1.0 resp. 2000.0 units. Heightfield layers consist of single channel 32-bit floating point
textures, and color texture layers consist of 24-bit RGB
textures. The results were recorded during virtual camera flights along fixed paths over the heightfields on
a desktop computer with an Intel i7 860 CPU at 2.8
GHz, 6 GB RAM, NVIDIA GeForce GTX 470 graphics adapter with 1280 MB dedicated VRAM and Win-
1024 × 768
1280 × 1024
1920 × 1080
1024 × 768
1280 × 1024
1920 × 1080
1024 × 768
1280 × 1024
1920 × 1080
1024 × 768
1280 × 1024
1920 × 1080
(a) System A
data set
City 1
City 2
1024 × 768
1280 × 1024
1920 × 1080
1024 × 768
1280 × 1024
1920 × 1080
1024 × 768
1280 × 1024
1920 × 1080
1024 × 768
1280 × 1024
1920 × 1080
(b) System B
Table 2: Performance results of our rendering technique.
dows 7 OS (system A). To make our results comparable
to the results reported in [8], we additionally ran the
same tests on a second desktop computer (system B )
with a hardware configuration more similar to theirs
(Intel Q6600 CPU at 2.4 GHz, 4 GB RAM, NVIDIA
GeForce GTX 285 with 1024 MB dedicated VRAM and
Windows 7 OS). Table 2 shows the results for different
screen resolutions on system A and system B in terms
of frames per second (fps). The frame rates take into
account the delays caused by updating the tile caches
in main memory and video memory as described in section 3.1. The times for rendering the given number of
frames are denoted by column time in table 1.
4.2 Performance with Surface Refinement
All values given in table 2 were obtained without any of
the surface refinement methods described in section 3.6.
The impact on the rendering speed and the relative loss
GPU based Single-Pass Ray Casting of Large Heightfields Using Clipmaps
(a) City 2
(b) ETOPO1
(c) Blue Marble
Fig. 8: Example renderings of the data sets which we used in our performance evaluations. Color textures are only
available for City 1 and Blue Marble, ETOPO1 was rendered using a pseudo topographic color map.
1024 × 768
96.2 (-19.6%)
36.8 (-69.3%)
1280 × 1024
62.8 (-19.5%)
24.3 (-68.8%)
1920 × 1080
42.0 (-19.7%)
16.6 (-68.3%)
(a) System A
1024 × 768
75.4 (-12.4%)
36.4 (-57.1%)
1280 × 1024
49.8 (-13.5%)
17.4 (-69.8%)
respectively low grid densities where the block structure becomes apparent. The differences in the frame
rates between the two city data sets and the two earth
data sets result from different grid densities.
1920 × 1080
33.7 (-12.9%)
11.9 (-69.3%)
(b) System B
Table 3: Impact on the performance by surface refinement methods in terms of average frames per second
for City 2 data set and the loss compared to unrefined
in performance when using surface refinement in our
implementation is shown in table 3. These data were
acquired from another evaluation of the same camera
flight through the City 2 data set on system A and system B, because this data set has high spatial frequencies
in the rendered regions and is the most challenging for
our ray caster.
4.3 Discussion
The results in table 2 show that - in accordance with
the results of Dick et al. [8] - very large DSMs can be
rendered in real time by using only ray casting and
acceleration data structures. Although the hybrid approach of Dick et al. [9] performs faster rendering, it
appears to be less flexible, because it requires to select
representative tiles from the data set and views of the
scene during its training phase.
As expected, table 3 shows that when using bicubic surface refinement, the loss in performance is much bigger
than with the linear method, but even at the highest resolution we still achieve interactive frame rates.
The linear method may expose some defects, but offers a good compromise between quality and speed at
higher resolutions. Besides, the refinement of the reconstructed surface only pays for coarse resolution DSMs
5 Conclusions and Future Work
In this paper we have shown that by combining clipmaps
and ray casting very large DSMs can be rendered at
real-time frame rates in a single rendering pass. Our approach eliminates aliasing caused by texture sampling
or spatial sampling. The same LOD selection method
is used in order to avoid unnecessary ray casting steps
in regions distant to the viewer. The size of the rendered DSMs is mainly limited by the amount of secondary memory available. We also used surface refinement based on Hermite bicubic patches to improve the
renderings of point sampled data and still achieved interactive frame rates at high scren resolutions. However,
assigning color values to reconstructed surfaces by using orthographic photo textures suffers from the problem that they do not contain information about surfaces
oriented oblique to the ground plane. This becomes especially apparent for our City 1 data set if we lower
the camera to street level where facades of buildings
are missing. In case of untextured DSMs, the transition
from one LOD to another in the heightfield layer can
be perceived as inconvenient and therefore a method
for smooth transitions between different LODs in this
layer is needed. Besides, it would be desirable to have
direct comparisons of the performance and rendering
quality of our implementation with rasterization-based
techniques and CPU ray casting implementations.
Acknowledgements Our work has been conducted within
the project AVIGLE, which is part of the Hightech.NRW initiative funded by the Ministry of Innovation, Science and
Research of the German State of North Rhine-Westphalia.
AVIGLE is a cooperation of several academic and industrial partners, and we thank all partners for their work and
contributions to the project with special thanks to Aerowest GmbH, Dortmund, Germany for providing us with data
for our City data sets. We would further like to thank the
anonymous reviewers for their valuable advice and all those
who were involved in providing the original data for our data
sets ETOPO1 and Blue Marble.
1. Akenine-Möller, T., Haines, E., Hoffman, N.: Real-Time
Rendering, 3rd edn. A K Peters, Ltd. (2008)
2. Amante, C., Eakins, B.W.: ETOPO1 1 Arc-Minute
Global Relief Model: Procedures, Data Sources and
Analysis. In: NOAA Technical Memorandum NESDIS
NGDC-24, p. 19pp (2009)
3. Asirvatham, A., Hoppe, H.: GPU Gems 2, chap. Terrain Rendering Using GPU-Based Geometry Clipmaps.
Addison-Wesley Longman (2005)
4. Blinn, J.F.: Simulation of Wrinkled Surfaces. In: SIGGRAPH ’78: Proceedings of the 5th annual conference on
Computer graphics and interactive techniques, pp. 286–
292. ACM (1978)
5. Clasen, M., Hege, H.C.: Terrain Rendering using Spherical Clipmaps. In: EuroVis06 Joint Eurographics - IEEE
VGTC Symposium on Visualization, pp. 91–98. Eurographics Association (2006)
6. Cook, R.L.: Shade trees. In: SIGGRAPH ’84: Proceedings of the 11th annual conference on Computer graphics
and interactive techniques, pp. 223–231. ACM (1984)
7. Crawfis, R., Noble, E., Ford, M., Kuck, F., Wagner, E.:
Clipmapping on the GPU. Tech. rep., Ohio State University, Columbus, OH, USA (2007)
8. Dick, C., Krüger, J., Westermann, R.: GPU Ray-Casting
for Scalable Terrain Rendering. In: Proceedings of Eurographics 2009 - Areas Papers, pp. 43–50 (2009)
9. Dick, C., Krüger, J., Westermann, R.: GPU-Aware Hybrid Terrain Rendering. In: Proceedings of IADIS Computer Graphics, Visualization, Computer Vision and Image Processing 2010, pp. 3–10 (2010)
10. Dummer, J.: Cone Step Mapping: An Iterative
11. Ephanov, A., Coleman, C.: Virtual Texture: A Large
Area Raster Resource for the GPU.
In: Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC) 2006, pp. 645–656 (2006)
12. Ewins, J.P., Waller, M.D., White, M., Lister, P.F.: MIPMap Level Selection for Texture Mapping. IEEE Transactions on Visualization and Computer Graphics 4(4),
317–329 (1998)
13. Feldmann, D., Steinicke, F., Hinrichs, K.: Flexible
Clipmaps for Managing Growing Textures. In: Proceedings of International Conference on Computer Graphics
Theory and Applications (GRAPP) (2011)
14. Foley, J.D., van Dam, A., Feiner, S.K., Hughes, J.F.:
Computer Graphics: Principles and Practice, Second Edition in C edn. Addison-Wesley (1995)
15. Google Inc.: Google Earth.
16. Kaneko, T., Takahei, T., Inami, M., Kawakami, N.,
Yanagida, Y., Maeda, T., Tachi, S.: Detailed Shape Representation with Parallax Mapping. In: In Proceedings
of the ICAT 2001, pp. 205–208 (2001)
Dirk Feldmann, Klaus Hinrichs
17. Li, Z., Li, H., Zeng, A., Wang, L., Wang, Y.: Real-Time
Visualization of Virtual Huge Texture. In: ICDIP ’09:
Proceedings of the International Conference on Digital
Image Processing, pp. 132–136. IEEE Computer Society
18. Losasso, F., Hoppe, H.: Geometry clipmaps: Terrain Rendering Using Nested Regular Grids. ACM Transactions
on Graphics (TOG) (2004)
19. Microsoft: DirectX SDK Documentation: RaycastTerrain Sample. (2008)
20. Mittring, M., Crytek GmbH: Advanced Virtual Texture
Topics. In: SIGGRAPH ’08: ACM SIGGRAPH 2008
Classes, pp. 23–51. ACM (2008)
21. NASA: Visible Earth: Earth - The Blue Marble. (1997)
22. NASA:
Mission. (2000)
23. NASA: World Wind.
24. Oh, K., Ki, H., Lee, C.H.: Pyramidal Displacement Mapping: a GPU based Artifacts-free Ray Tracing through an
Image Pyramid. In: VRST ’06: Proceedings of the ACM
symposium on Virtual reality software and technology,
pp. 75–82. ACM (2006)
25. Oliveira, M.M., Bishop, G., McAllister, D.: Relief Texture Mapping. In: SIGGRAPH ’00: Proceedings of the
27th annual conference on Computer graphics and interactive techniques, pp. 359–368. ACM Press/AddisonWesley Publishing Co. (2000)
26. Policarpo, F., Oliveira, M.M.: GPU Gems 3, chap. Relaxed Cone Stepping for Relief Mapping. Addison-Wesley
Professional (2007)
27. Policarpo, F., Oliveira, M.M., Comba, J.a.L.D.: Realtime Relief Mapping on Arbitrary Polygonal Surfaces.
In: Proceedings of the 2005 symposium on Interactive 3D
graphics and games, I3D ’05, pp. 155–162. ACM (2005)
28. Qu, H., Qiu, F., Zhang, N., Kaufman, A., Wan, M.:
Ray Tracing Height Fields. In: Procedings of Computer
Graphics International, pp. 202–207 (2003)
29. Seoane, A., Taibo, J., Hernández, L.: HardwareIndependent Clipmapping. In: Journal of WSCG 2007,
pp. 177 – 183 (2007)
30. Szirmay-Kalos, L., Umenhoffer, T.: Displacement Mapping on the GPU - State of the Art (2006)
31. Taibo, J., Seoane, A., Hernández, L.: Dynamic Virtual
Textures. In: Journal of WSCG 2009, pp. 25 – 32. Eurographics Association (2009)
32. Tanner, C.C., Migdal, C.J., Jones, M.T.: The Clipmap:
a Virtual Mipmap. In: SIGGRAPH ’98: Proceedings of
the 25th Annual Conference on Computer Graphics and
Interactive Techniques, pp. 151–158. ACM (1998)
33. Tatarchuk, N.: Dynamic Parallax Occlusion Mapping
with Approximate Soft Shadows. In: SIGGRAPH ’06:
ACM SIGGRAPH 2006 Courses, pp. 63–69. ACM (2006)
34. Tevs, A., Ihrke, I., Seidel, H.P.: Maximum Mipmaps for
Fast, Accurate, and Scalable Dynamic Height Field Rendering. In: I3D ’08: Proceedings of the 2008 Symposium
on Interactive 3D Graphics and Games, pp. 183–190.
ACM (2008)
35. Williams, L.: Pyramidal Parametrics. In: SIGGRAPH
’83: Proceedings of the 10th Annual Conference on Computer Graphics and Interactive Techniques, pp. 1–11.
ACM (1983)