Download GPU based Single-Pass Ray Casting of Large Heightfields

CGI2012 manuscript No. (will be inserted by the editor) GPU based Single-Pass Ray Casting of Large Heightfields Using Clipmaps Dirk Feldmann · Klaus Hinrichs Abstract Heightfields have proved to be useful for rendering terrains or polygonal surfaces with finestructured details. While GPU-based ray casting has become popular for the latter setting, terrains are commonly rendered by using mesh-based techniques, because the heightfields can be very large and hence ray casting on these data is usually less efficient. Compared to mesh-based techniques, ray casting is attractive, for it does not require to deal with mesh related problems such as tessellation of the heightfield, frustum culling or mesh optimizations. In this paper we present an approach to render heightfields of almost arbitrary size at real-time frame rates by means of GPU-based ray casting and clipmaps. Our technique uses level-of-detail dependent early ray termination to accelerate ray casting and avoids aliasing caused by texture sampling or spatial sampling. Furthermore, we use two different methods to improve the visual quality of the reconstructed surfaces obtained from point sampled data. We evaluate our implementation for four different data sets and two different hardware configurations. Keywords ray casting · rendering · single-pass · clipmap · heightfield · terrain 1 Introduction Heightfield rendering has numerous applications in science and entertainment. One major application is terrain rendering which is more and more used to map high resolution aerial photographs acquired by air planes, satellites or unmanned aerial vehicles onto a digital surface model (DSM) of the covered area. This approach Dirk Feldmann · Klaus Hinrichs VisCG, Department of Computer Science, University of Münster, Germany preserves depth perception and provides context and other information to the viewer. Popular examples are NASA World Wind [23] or Google Earth [15]. Since textured polygonal meshes can be processed and rendered by GPUs at high speed, a widely used rendering technique stores a DSM in (grayscale) texture maps (so called heightmaps or heightfields) and uses them to displace the vertices of the corresponding polygonal mesh [6]. However, most renderers accept only triangle meshes which can become rather complex and may easily consist of millions of triangles. During mesh generation particular attention has to be paid to different issues, e. g., to not produce any cracks, to choose appropriate tessellations and to avoid aliasing caused by small or distant triangles. Therefore it appears to be attractive to bypass the entire process of converting a heightfield into a mesh which finally is rasterized and results for many triangles in at most a few pixels whose corresponding fragments succeed in passing all of the numerous tests encountered on their way through the rendering pipeline. Techniques like relief mapping [27] or parallax occlusion mapping [33] can make use of pixel shaders on modern GPUs to perform real time ray casting on heightfields in order to calculate the displaced sample positions in corresponding color textures which contribute to the final fragment color. During this ray casting fine-structured details can be added to surfaces without further tessellating the underlying polygonal mesh. In many cases this even allows to reduce the polygonal mesh to a single planar quadrilateral which usually consists of only two triangles. In order to speed up the ray casting and to achieve real-time frame rates, many GPU-based heightfield rendering techniques employ maximum mipmaps to access 2 the DSM. As the size of texture maps that can be handled by GPUs is currently limited by vendor specific restrictions and ultimately by the amount of available video memory, large DSMs cannot be stored in a single heightfield texture for direct access during GPU-based ray casting. In this paper we present a GPU-based heightfield ray casting technique which performs single-pass rendering of heightfields of almost arbitrary sizes in real time. Our main contribution is to demonstrate how clipmaps and current graphics hardware can be used to speed up the ray casting and improve the image quality by early ray termination based on level of detail selection while alleviating the aforementioned video memory limitations. Additionally we use two different refinement methods to improve the appearance of the reconstructed surfaces in our renderings. We demonstrate the performance of our technique for four large data sets of up to 31 GB size. 2 Related Work Much research has been performed on CPU-based ray casting of heightfields as well as on terrain rendering based on polygonal meshes. Since summarizing these two areas would exceed the scope of this paper, we confine ourselves to an overview of recent GPU-based heightfield ray casting methods related to our work. Qu et al. [28] presented one of the first GPU-based ray casting schemes for heightfields which primarily aims at accurate surface reconstruction of heightfields but does not use any sophisticated structures for acceleration. Relief mapping [25] and parallax (occlusion) mapping [16] are techniques for adding structural details to polygonal surfaces, which have their origin in CPU-based rendering and improve upon the disadvantages of bump mapping [4]. Both techniques have been implemented for GPUs (e. g. [27, 33]) and benefit from programmable graphics pipelines. But as most of these implementations resemble the strategies used in CPU-based ray casting, like iterative and/or binary search to detect heightfield intersections, they are prone to the same kind of rendering artifacts caused by missed intersections in highly spatially variant data sets. An introduction to these closely related techniques can be found for instance in [1], and more details are given in the comprehensive state-of-the-art report by Szirmay-Kalos and Umenhoffer [30] which focuses on GPU-based implementations. Oh et al. [24] accelerate ray casting and achieve realtime frame rates by creating a bounding volume hierarchy (BVH) of the heightfield, which is stored in a maximum mipmap and allows to safely advance along Dirk Feldmann, Klaus Hinrichs the ray over long distances (see section 3.3). They also present a method based on bilinear interpolation of heightfield values to improve the quality of the reconstructed surface obtained from point-sampled data. The method presented by Tevs et al. [34] also relies on BVHs stored in maximum mipmaps, but uses a different sampling strategy. Their method advances along the ray from one intersection of the projected ray with a texel boundary to the next such intersection, whereas Oh et al. use a constant step size to advance along the ray. In addition, Tevs et al. store in each heightfield texel the height values at the four corners of a quadrilateral encoded as an RGBA value instead of point samples, which allows surface reconstruction on parametric descriptions. Compared to other techniques which also rely on preprocessed information about the heightfield and acceleration data structures, like for instance relaxed cone step mapping [10, 26, 19], maximum mipmap creation is much faster and can be performed on the GPU [34]. All these methods have in common that they operate on single heightfields of relatively small extents which are intended to add details to surfaces at meso- or microscales instead of representing vast surfaces themselves. Recently Dick et al. [8] have presented a method for ray casting terrains of several square kilometers extent at real-time frame rates. Their method also employs maximum mipmaps to accelerate the ray casting process and a tiling approach to render data sets of several hundred GB size. They also presented a faster hybrid method which uses ray casting or rasterizationbased rendering, but requires knowledge of the employed GPU respectively a training phase to decide whether to use rasterization or ray casting [9]. Our method presented in this paper also aims at rendering very large heightfields only by means of GPU ray casting. It has been inspired in large parts by the works of Dick et al. and Tevs et al. as we also employ a tile-based approach and their cell-precise ray traversal scheme. But in contrast to the technique by Dick et al., which creates a complete mipmap for each tile and requires additional rendering passes to determine the visibility of the tiles, our method further accelerates the ray casting process and requires only a single rendering pass by using a tile-based clipmap implementation. The clipmap, as introduced by Tanner et al. [32], is based on mipmaps [35] in order to handle very large textures at several levels of detail which would exceed the available video or main memory. While the original version requires special hardware, modern GPU features have superseded these requirements and other clipmap implementations (or virtual textures) have become available [11, 7, 29, 20, 17, 31, 13] whereupon most GPU based Single-Pass Ray Casting of Large Heightfields Using Clipmaps of them rely on texture tiles and permit handling of arbitrarily large textures as briefly described in section 3.1. Geometry clipmaps as introduced by Losasso et al. [18], and derived GPU-based variations [3, 5] have also been used in terrain rendering, but according to our knowledge only in the context of mesh-based rendering and not for accelerating ray casting. 3 GPU-based Single-pass Ray Casting Using Clipmaps In this section we briefly present our tile-based clipmap implementation, followed by a description of the used storage scheme for heightfields. Next we describe the employed ray traversal method, which is basically the same as the one described in [8], and we discuss how we accelerate it and avoid aliasing by using clipmaps. Finally, we present two refinement methods which we use to improve the appearance of the reconstructed surfaces. 3 of texture. This process is repeated until the original texture is completely covered by a single tile of n × m texels at the least detailed level l = L − 1, which can be used to derive an ordinary mipmap. In the following we use the term “clipmap” to refer only to these lower L levels of a complete, tile-based clipmap but stick with the terminology as used by Tanner et al. [32]. A clip center depending on the current location and viewing direction of the virtual scene camera is used to determine for each level the tiles that are needed in the current frame. This group of neighboring tiles is called the active area and located in video memory. The clip area formed by a larger superset of tiles is kept in main memory, and the remaining tiles are stored in secondary memory, e. g., on hard disk. Since the lower levels of a corresponding mipmap are effectively clipped to smaller areas, this data structure is called clipmap. Figure 1 illustrates the principle of a tile-based clipmap. Once the tiles have been uploaded to video memory, 3.1 Tile-based Clipmap Implementations Clipmaps are storage schemes for texture maps (textures) which are based on mipmaps and rely like these on the principle of using pre-filtered data to avoid aliasing artifacts when multiple texels are mapped to one pixel or less in screen space due to perspective projection (texture minification) [35]. In contrast to mipmaps, clipmaps only keep those data in memory which are relevant for rendering the current frame, and they use caching techniques to reload and update these data. This reduces the amount of (video) memory occupied by texture data and also allows to handle textures which would by far exceed the limits of video or main memory. The clipmap by Tanner et al. [32] relies on special hardware to update the texels in video memory when the viewer’s eye point is moved. Modern GPUs allow to implement clipmaps by using texture tiles and accessing them in fragment shaders, e. g., by means of texture arrays. Our implementation uses a Flexible Clipmap [13] which is constructed as described in the following. At the level l = 0, which corresponds to the finest resolution, the original virtual texture is partitioned into smaller tiles of n×m texels (tile size). Like in a mipmap, each 2×2 neighboring texels at level l are combined in a certain way into a single texel at the next coarser level l + 1, which implies that 2 × 2 neighboring tiles at level l correspond to one tile of the same tile size at level l + 1. With color textures for instance, the combination may simply be an averaging operation on the values of the four texels, but the operation depends on the kind Fig. 1: Structure of a tile-based clipmap with L = 4 clip levels with active areas of at most 3 × 3 tiles (dark gray) and clip areas of at most 5 × 5 tiles (light gray). they can be accessed by shaders for rendering. When the clip center is relocated, i. e., the virtual camera is moved, tiles stored in video memory and main memory can be replaced by neighboring ones from main memory respectively secondary memory, if necessary. If the virtual camera is located for instance far away from the textured surface currently visible, only the coarser resolution (higher) levels are required, as the texels from the lower levels would cause aliasing. Hence it is not always required to keep the active areas of all clip levels in video or main memory. Of course the tile size has to be chosen carefully to ensure that the tiles themselves are manageable by the graphics hardware. More details on clipmap specific issues can be found in [32, 7]. 3.2 Clipmaps for DSM Storage Due to their relation, clipmaps and mipmaps can be created and used in very similar ways. To use a digital surface model (DSM) for rendering, in our approach 4 the heightfield values are stored in the clipmap tiles at the finest resolution (lowest) level l = 0. A texel at level l > 0 obtains as height value the maximum height value of the corresponding 2 × 2 subordinate texels at level l − 1. If we identify each texel with a bounding box defined by its height value and its grid cell in the texture, we obtain a bounding volume hierarchy (BVH) of the underlying DSM as illustrated in figure 2. This Fig. 2: BVH derived from a heightfield on a regular grid. Gray boxes correspond to samples at level 0. Bounding boxes on higher levels and their maximum value are highlighted by the same color. is the same construction scheme as used with maximum mipmaps [24, 34, 8]. In the method presented by Dick et al. [8], the heightfield is split into tiles as well, but a separate maximum mipmap is created for each tile. To render vast DSMs, this approach may require either lots of tiles and thus mipmaps to be present in video memory or additional rendering passes, especially if the heightfield is shallow and there is little occlusion between tiles. Furthermore, the tiles located far away from the viewer may contain fine spatial details, e. g., steep summits of distant mountains, which are not only not perceivable from far away but may also expose spatial aliasing artifacts due to minification caused by perspective projection. The latter aspect is the same which motivated the development of mipmaps for texture mapping and also applies to mesh-based rendering techniques which therefore strive to determine an appropriate level of detail (LOD) in order to avoid rasterizing triangles that would become projected to less than one pixel in screen space. The important difference between the usage of clipmaps and multiple mipmaps is that in case of clipmaps the BVH spans the entire domain at the topmost level. A proper placement of the clip center results in the selection of only those tiles of highest resolution at level l = 0 which are closest to the virtual camera and thus potentially have to be rendered in full detail. Compared to level l, at level l+1 the area of the heightfield covered by a tile is four times larger, and the spatial resolution is divided in half along each direction of the grid. Thus the entire domain is spatially pre-filtered and the level of detail of the heightfield decreases with increasing dis- Dirk Feldmann, Klaus Hinrichs tance towards the viewer. Because higher clipmap levels also correspond to larger bounding boxes, we can exploit this fact to accelerate GPU ray casting in the far range of the scene as described in the following section. 3.3 Rendering and Accelerating Ray Casting Given a DSM stored in a clipmap of L levels, we set the clip center simply by projecting the center of the viewport into the scene. We also ensure that all tiles in the active areas of all clip levels or at least the highest (coarsest) ones are stored in video memory by choosing appropriate sizes for the tiles and the active area. The axis-aligned bounding box of the entire DSM, which is associated with the topmost tile, is based in the xzplane of a left-handed world coordinate system. It is represented by a polygonal mesh consisting of 12 triangles which serves as proxy geometry for the ray casting process. A vertex shader calculates normalized 3D texture coordinates from the vertex coordinates of the box corners, and the clipmap is positioned at the bottom of the box corresponding to the minimum height value y = Hmin of the DSM. Hmin and the maximum height value Hmax are both determined during loading of the topmost clipmap tile on the CPU. By rendering the back faces of the proxy geometry we obtain each ray’s exit point e, and we pass the camera position and the geometry of the bounding box in world coordinates to the fragment shader which calculates each ray’s direction d = (dx , dy , dz ) and entry point s to the proxy geometry and transforms them into normalized 3D texture space. If the camera is located within the bounding box the entry point s becomes the camera position (cf. [19]). In order to avoid that faces of the proxy geometry are clipped against the far plane of the view frustum of the virtual camera and hence exit points are missing, the box is fitted into the view frustum when the camera is translated. The actual ray traversal is performed by projecting the ray onto a clip level dependent 2D grid. For a given clip level 0 ≤ l < L the extensions of this grid are determined by (Gu (l), Gv (l)) = W , H with (W, H) 2l 2l being the extensions of the DSM in sample points, i.e., texels. Hence, the grid at level l has the same size a single texture containing the entire DSM at mipmap level l would have. The current height py of a location p = (px , py , pz ) = s + k · d on the ray is retained and updated in world coordinates to test for intersections with the heightfield. During ray traversal we move from one intersection of the projected ray dp = (dx , dz ) with a texel boundary to the next such intersection, i. e., from the projected ray’s entry point enp into a grid cell directly to its exit point exp as shown in fig- GPU based Single-Pass Ray Casting of Large Heightfields Using Clipmaps ure 3. The only exception is at the first entry point which is the projection of s. We start ray casting at 5 If a ray hits a bounding box B at some level l > 0 it does not necessarily have to hit any bounding boxes contained in B at level l −1. This cannot be determined without descending to the lower level. In order to avoid using the smaller step size over longer distances when it is not really necessary, we move up again to level l if we detect that the ray does not hit any bounding box at level l − 1 (cf. [34, 8]). These three different cases for the intersection of a ray with a bounding box are illustrated in figure 4. The ray casting process is terminated if Fig. 3: Rays are traversed from one intersection of the projected ray with a texel boundary to the next such intersection. the coarsest (highest) clip level L − 1 of the BVH at which the entire DSM is given in a single tile and each pixel corresponds to the maximum value and thus the bounding box of 2L−1 × 2L−1 texels at level 0. To determine whether a ray hits a bounding box at level l, the clipmap tile containing the grid cell which belongs to the current enp and exp has to be sampled for the associated height value h. Since the direction of the ray is needed to determine this grid cell we store the sign bits of the components of d in the lower three bits of an integer. This bit mask is created once for each ray using bit-wise operations in the fragment shader, and it is evaluated as needed by switch-statements to determine the direction of a ray instead of duplicating the shader code for the ray casting loop for each of the overall eight possible branches. When moving along the ray from point en to point ex we hit the box surface if the ray is directed downwards (resp. upwards) and ex (resp. en) lies below the top of the box (at height h). If a ray hits a bounding box B at the current level l, it may also hit a bounding box contained in B at a lower level of the BVH. Therefore the ray casting process is repeated at the next lower level l0 = l − 1 from the current position en of the ray, but only if it is possible and reasonable to proceed as described in section 3.4. Otherwise the lowest possible level l = lmin has been reached, and the exact intersection i on the bounding box surface is calculated by ( en dy ≥ 0 i= h−eny en + d · max dy < 0 dy , 0 If a ray does not intersect a bounding box B at level l, it cannot intersect any of the bounding boxes contained in B at any lower level either, and we therefore advance along the ray to ex which becomes the entry point en of the next cell. Compared to a ray traversal performed just on level 0, only one instead of 2l ×2l samples have to be tested for intersection, which results in a significant speed up of the process (cf. [34], [8]). Fig. 4: Intersection of ray with a height field. The green ray hits the left red box, but none of the black boxes contained. either a valid intersection point i on a bounding box has been found, the ray leaves the domain of the DSM, or the maximal number of ray casting steps exceeds 2 · max(n, m) with n, m as the tile size in texels. In the latter two cases, the fragment from which the ray originates is discarded by the shader. 3.4 LOD-determined Ray Termination To decide whether we can terminate ray casting at the current level, we check the following two conditions. First, we determine at each intersection of a bounding box the highest resolution available, i. e., the lowest clip level llow of a tile which covers the corresponding area of the DSM and is present in video memory. The clipmap tiles from the active areas of all clip levels are stored in a texture array which is accessed by the fragment shader. The Flexible Clipmap uses a certain tile layout and an additional texture, the tile map [7], to find llow and the index in the texture array where the corresponding tile has been stored during its upload into video memory (see [13] for details). The tile map covers the entire domain of the DSM as well, but each texel corresponds to one tile of n × m texels at the lowest level l = 0. Each texel stores the lowest clip level of the tile which covers the corresponding area of the DSM and is currently present in video memory. For instance, given a tile size of n = m = 512 texels, a tile map of 512 × 512 texels holds information about the clip levels of 5122 × 5122 heightfield samples. When tiles at and above level l ≥ 0 6 Dirk Feldmann, Klaus Hinrichs are available in video memory, the tile map contains a square region of 2l ×2l texels with value l (cf. [32]). The tile map is created on the CPU whenever the cache for the clipmap tiles is updated due to relocations of the clip center, and tiles are uploaded in top-down order to ensure that at least the highest levels are present if secondary caching structures cause a delay, e. g., when tiles have to be loaded from hard disk. Thus, by transforming the hit point i on the bounding box surface to normalized texture coordinates the shader can determine llow by a single texel-precise texture lookup in the tile map. Second, the optimal clip level lopt at the current hit point i = (u, hgrid , v) is determined by the minification of the corresponding box at level l = 0 in screen space (cf. [12]). We project the four corners of the cell’s box κ = (buc · Rx , bhgrid c · Ry , bvc · Rz ), λ = κ + (Rx , 0, 0), µ = κ + (0, 0, Rz ) and ν = κ + (0, Ry , 0) from world space into normalized screen space using the model, view and projection matrix combined in M followed by perspective division to obtain the vectors a, b, c and f , where Rx , Ry , Rz are the numbers of world space units per heightfield sample along the respective direction. Then we calculate the areas A1 , A2 and A3 of the projected faces of a box in screen space: p = (b − a), q = (c − a), r = (f − a) 3.5 Sampling Color Textures In our implementation, each clipmap tile can consist of several different texture layers which are handled identically and only differ by the stored data and their texel aggregation scheme. For each tile we provide an additional layer for a registered color texture to texture the DSM. This color texture layer is uploaded along with the heightfield layer and accessed in the fragment shader via a second texture array. As long as they cover the same area in world space, the different layers of the tiles do not even need to be of the same resolution. However, we have not yet implemented this, and therefore one heightfield sample corresponds to one color sample. In general, to avoid aliasing when sampling the color texture layer, we would have to determine the ideal LOD ltex at the final hit point i in the heightfield separately and transform it to the corresponding tile which holds the color texture layer. This LOD ltex can be calculated in the same way as lopt during ray casting (see section 3.4), but in case of a 1:1 relation of heightfield and color samples we can directly use lopt and the texture coordinate for the heightfield layer obtained during ray casting to sample the color texture. The final fragment color is obtained by linear interpolation between the linearly interpolated color values from the two LODs adjacent to ltex (trilinear interpolation). A1 = |p × q| = |(px · qy ) − (py · qx )| A2 = |p × r| = |(px · ry ) − (py · rx )| 3.6 Refinement of Block-sampled Heightfield Reconstruction A3 = |q × r| = |(qx · ry ) − (qy · rx )| We want the largest face of one box in screen space A = max (A1 , A2 , A3 ) to correspond to one texel of a tile at level lopt in texture space which itself has an area of 1 P = n·m . Hence 2lopt = P A and lopt = − log2 (A · n · m) Instead of descending to a full resolution mipmap level which may cause aliasing we can now terminate ray casting already at level lmin = max (llow , lopt ). The two different LODs llow and lopt are visualized in figure 5 where each level is coded by a different color. As pointed out by Oh et al. in [24], the point sampled DSMs and their treatment as boxes results in blocky images which from a closeup view remind of models built of bricks (see figure 6a). Because this effect may be un- (a) none (b) linear (c) bicubic Fig. 6: Demonstration of the improvement in surface quality achieved by different refinement methods. (a) llow (b) lopt Fig. 5: The two different LODs llow and lopt are used to terminate the ray traversal and to avoid aliasing. wanted in most applications, we also implemented two refinement methods to obtain smooth surfaces. Both refinement methods are applied after the intersection i on the bounding box surface has been determined as described in section 3.3. GPU based Single-Pass Ray Casting of Large Heightfields Using Clipmaps The first method is the one presented by Oh et al. [24] and relies on linear interpolation of two samples obtained from the linearly interpolated heightfield, which are taken at a distance of each one half cell from i in forward respectively backward direction along the ray. This method works quite well and does hardly slow down the overall performance on modern GPUs, but in our implementation, some defects – presumably caused by numerical inaccuracies – on surfaces with steep slopes remain, as shown in figure 6b. Despite these small defects, which are barely noticeable during animations or from farther viewing distances, the surfaces look much smoother. Our second method uses Hermite bicubic surfaces to improve the reconstruction of the heightfield. Let (u, v) denote the projection of i onto the grid of the heightfield where ray casting has been terminated. We interpret the junctions at the four corners of the grid cell containing (u, v) and its eight neighbors as the corners of a bicubic surface patch. The four junctions are given by α = (buc , min (SW, S, C, W ) , bvc) β = (buc + 1, min (S, SE, E, C) , bvc) γ = (buc , min (W, C, N, N W ) , bvc + 1) δ = (buc + 1, min (C, E, N E, N ) , bvc + 1) with C as the height value of the cell containing (u, v) and SW, S, SE, E, N E, N, N W, W as the height values of the neighboring cells, starting at the left lower cell adjacent to α and enumerating them in counterclockwise order (see figure 7). Each patch is parametrized Fig. 7: Construction scheme for a Hermite bicubic patch from 3 × 3 heightfield samples surrounding the projection of intersection point i on the bounding box. along the grid axes by (s, t) ∈ [0, 1], and the height h(s, t) on the surface patch is given by T h(s, t) = s3 s2 s 1 · H · G · H T · t3 t2 t 1  ∂α ∂β    αy βy ∂vy ∂vy 2 −2 1 1  γ δ ∂γy ∂δy  −3 3 −2 −1 y y ∂v  ,G =  2 H=  ∂αy ∂βy ∂ ∂v αy ∂ 2 βy  0 0 1 0  ∂u ∂u ∂u∂v  ∂u∂v ∂γy ∂δy ∂ 2 γy ∂ 2 δy 1 0 0 0 ∂u ∂u ∂u∂v ∂u∂v 7 (cf. [14]). The partial derivatives which define the tangential planes on the patch are approximated by using forward respectively backward differences and by making the following simplifications for the first order derivatives: ∂αy ∂γy = ≈ C − W, ∂u ∂u ∂αy ∂βy = ≈ C − S, ∂v ∂v ∂βy ∂δy = ≈E−C ∂u ∂u ∂γy ∂δy = ≈N −C ∂v ∂v Although the matrix G at each grid cell respectively texel of the clipmap storing the heightfield is constant, we calculate it directly in the fragment shader as needed. The pair of parameters (s, t), which corresponds to an intersection with the bicubic patch instead of the bounding box, is determined by a second ray casting. Starting at i on the bounding box surface, the ray p = i + k · d is advanced at a fixed step width until it either hits the bicubic patch, i. e., py ≤ h(s, t), or it leaves the domain of the box without intersection. In the latter case, we treat i as an entry point s on the proxy geometry and proceed with the accelerated ray casting process described in section 3 from the current level. We found a subdivision into 16 steps for traversing the bounding box of a cell to be completely sufficient, independent of the clip level l. Fewer subdivision steps expose defects by missed intersections, whereas increasing the number of subdivision steps only reduces frame rates without further improving the reconstruction of the surface. Besides their simplicity and the possibility to calculate all the relevant information in the fragment shader, we decided to use Hermite bicubic patches because we wanted to ensure that the surface remains inside the bounding boxes of the BVH. By constructing the patches as described above, we can ensure that they stay completely inside the bounding boxes as we control the defining tangential planes. The direct usage of forward and backward differences in our implementation avoids any scaling of the tangents and therefore leads to desired C 1 continuity between neighboring patches, because their tangents have the same direction and magnitude (cf. [14]). The most severe drawback of this method is its high computational cost, although we still may achieve interactive frame rates (see section 4.2). Furthermore, as this method ensures that the height of each patch is less or equal than the height of its bounding box, and the tangents are not scaled, isolated peaks in the heightfield become clearly flattened as can be seen in figure 6c. However, both refinement methods presented in this section rely on interpolation of point sampled data on a regular grid, and only serve in making the resulting renderings visually more appealing. Besides, even if it might appear to be sufficient to apply refinement only in 8 Dirk Feldmann, Klaus Hinrichs name City 1 City 2 ETOPO1 Blue Marble extent [km] 1.4 × 1.0 20.9 × 26.3 ≈ 40075.0 × 19970.0 ≈ 40075.0 × 19970.0 W ×H 5600 × 4000 83600 × 105200 21600 × 10800 86400 × 43200 L 5 9 7 9 scale 1.0 1.0 10.0 10.0 size DSM 133 MB 31.6 GB 1.3 GB 19.2 GB size color texture 99 MB – – 14.4 GB time [min] 0:54 3:34 9:53 13:10 Table 1: Properties of the different data sets used to evaluate performance. L denotes the total number of clip levels which have been created, W × H is the grid size at level 0 respectively the size a single texture would have. Column time contains the durations of the virtual camera flights for our evaluation in minutes. cases when the viewer is close to a highly detailed area where the block sampled nature of the data becomes apparent, we refine the surface at all discrete LODs, because the transition between large distant boxes and smooth surfaces is rather disturbing during animations. In addition, the lighting conditions on smooth surfaces and blocks are different due to distinct surface normals. data set City 1 City 2 ETOPO1 Blue Marble 4 Performance Results and Discussion The implementation of our technique relies on OpenGL and GLSL 1.50 shaders, and we demonstrate its performance by means of renderings of the four different data sets listed in table 1. The data set City 2 was acquired by means of photogrammetric methods from aerial images. City 1 depicts a small area in City 2 in which we have a color texture available that has been derived from orthographic aerial images. The data sets ETOPO1 [2] and Blue Marble [21] depict the entire earth and are both derived in large parts from SRTM data [22], but ETOPO1 also contains bathymetric data, whereas Blue Marble possesses a color texture derived from satellite images. When being sampled in the fragment shader, the height values are scaled by factors given in column scale in order to avoid flattened surfaces. Shallow surfaces do not challenge our ray caster because less mutual occlusions lead to fewer level changes in the BVH during ray traversal. Renderings of three data sets are shown in figure 8. 4.1 Evaluation Setup and Results We used tile sizes of 512 × 512 texels, active area sizes of 5 × 5 tiles and clip area sizes of 7 × 7 for all data sets in our tests. The near resp. far plane of the virtual camera were set to 1.0 resp. 2000.0 units. Heightfield layers consist of single channel 32-bit floating point textures, and color texture layers consist of 24-bit RGB textures. The results were recorded during virtual camera flights along fixed paths over the heightfields on a desktop computer with an Intel i7 860 CPU at 2.8 GHz, 6 GB RAM, NVIDIA GeForce GTX 470 graphics adapter with 1280 MB dedicated VRAM and Win- resolution [pixel] 1024 × 768 1280 × 1024 1920 × 1080 1024 × 768 1280 × 1024 1920 × 1080 1024 × 768 1280 × 1024 1920 × 1080 1024 × 768 1280 × 1024 1920 × 1080 frames 5450 3464 2217 25624 16700 11189 105747 66869 42907 121844 75721 50028 min. [fps] 6.2 5.8 5.1 5.6 5.6 5.1 6.3 5.9 4.9 3.3 4.0 1.8 avg. [fps] 100.9 64.1 41.0 119.7 78.0 52.3 178.5 112.9 72.4 154.2 95.8 63.3 min. [fps] 5.1 4.5 5.1 3.6 3.6 3.3 5.2 5.2 4.7 2.8 2.5 1.0 avg. [fps] 71.2 45.3 29.1 84.9 57.6 38.7 127.9 81.1 52.5 120.6 80.3 54.8 (a) System A data set City 1 City 2 ETOPO1 Blue Marble resolution [pixel] 1024 × 768 1280 × 1024 1920 × 1080 1024 × 768 1280 × 1024 1920 × 1080 1024 × 768 1280 × 1024 1920 × 1080 1024 × 768 1280 × 1024 1920 × 1080 frames 3848 2450 1574 18174 12327 8290 75790 48027 31134 95285 63409 43314 (b) System B Table 2: Performance results of our rendering technique. dows 7 OS (system A). To make our results comparable to the results reported in [8], we additionally ran the same tests on a second desktop computer (system B ) with a hardware configuration more similar to theirs (Intel Q6600 CPU at 2.4 GHz, 4 GB RAM, NVIDIA GeForce GTX 285 with 1024 MB dedicated VRAM and Windows 7 OS). Table 2 shows the results for different screen resolutions on system A and system B in terms of frames per second (fps). The frame rates take into account the delays caused by updating the tile caches in main memory and video memory as described in section 3.1. The times for rendering the given number of frames are denoted by column time in table 1. 4.2 Performance with Surface Refinement All values given in table 2 were obtained without any of the surface refinement methods described in section 3.6. The impact on the rendering speed and the relative loss GPU based Single-Pass Ray Casting of Large Heightfields Using Clipmaps (a) City 2 (b) ETOPO1 9 (c) Blue Marble Fig. 8: Example renderings of the data sets which we used in our performance evaluations. Color textures are only available for City 1 and Blue Marble, ETOPO1 was rendered using a pseudo topographic color map. method linear bicubic 1024 × 768 96.2 (-19.6%) 36.8 (-69.3%) 1280 × 1024 62.8 (-19.5%) 24.3 (-68.8%) 1920 × 1080 42.0 (-19.7%) 16.6 (-68.3%) (a) System A method linear bicubic 1024 × 768 75.4 (-12.4%) 36.4 (-57.1%) 1280 × 1024 49.8 (-13.5%) 17.4 (-69.8%) respectively low grid densities where the block structure becomes apparent. The differences in the frame rates between the two city data sets and the two earth data sets result from different grid densities. 1920 × 1080 33.7 (-12.9%) 11.9 (-69.3%) (b) System B Table 3: Impact on the performance by surface refinement methods in terms of average frames per second for City 2 data set and the loss compared to unrefined rendering. in performance when using surface refinement in our implementation is shown in table 3. These data were acquired from another evaluation of the same camera flight through the City 2 data set on system A and system B, because this data set has high spatial frequencies in the rendered regions and is the most challenging for our ray caster. 4.3 Discussion The results in table 2 show that - in accordance with the results of Dick et al. [8] - very large DSMs can be rendered in real time by using only ray casting and acceleration data structures. Although the hybrid approach of Dick et al. [9] performs faster rendering, it appears to be less flexible, because it requires to select representative tiles from the data set and views of the scene during its training phase. As expected, table 3 shows that when using bicubic surface refinement, the loss in performance is much bigger than with the linear method, but even at the highest resolution we still achieve interactive frame rates. The linear method may expose some defects, but offers a good compromise between quality and speed at higher resolutions. Besides, the refinement of the reconstructed surface only pays for coarse resolution DSMs 5 Conclusions and Future Work In this paper we have shown that by combining clipmaps and ray casting very large DSMs can be rendered at real-time frame rates in a single rendering pass. Our approach eliminates aliasing caused by texture sampling or spatial sampling. The same LOD selection method is used in order to avoid unnecessary ray casting steps in regions distant to the viewer. The size of the rendered DSMs is mainly limited by the amount of secondary memory available. We also used surface refinement based on Hermite bicubic patches to improve the renderings of point sampled data and still achieved interactive frame rates at high scren resolutions. However, assigning color values to reconstructed surfaces by using orthographic photo textures suffers from the problem that they do not contain information about surfaces oriented oblique to the ground plane. This becomes especially apparent for our City 1 data set if we lower the camera to street level where facades of buildings are missing. In case of untextured DSMs, the transition from one LOD to another in the heightfield layer can be perceived as inconvenient and therefore a method for smooth transitions between different LODs in this layer is needed. Besides, it would be desirable to have direct comparisons of the performance and rendering quality of our implementation with rasterization-based techniques and CPU ray casting implementations. Acknowledgements Our work has been conducted within the project AVIGLE, which is part of the Hightech.NRW initiative funded by the Ministry of Innovation, Science and Research of the German State of North Rhine-Westphalia. AVIGLE is a cooperation of several academic and industrial partners, and we thank all partners for their work and 10 contributions to the project with special thanks to Aerowest GmbH, Dortmund, Germany for providing us with data for our City data sets. We would further like to thank the anonymous reviewers for their valuable advice and all those who were involved in providing the original data for our data sets ETOPO1 and Blue Marble. References 1. Akenine-Möller, T., Haines, E., Hoffman, N.: Real-Time Rendering, 3rd edn. A K Peters, Ltd. (2008) 2. Amante, C., Eakins, B.W.: ETOPO1 1 Arc-Minute Global Relief Model: Procedures, Data Sources and Analysis. In: NOAA Technical Memorandum NESDIS NGDC-24, p. 19pp (2009) 3. Asirvatham, A., Hoppe, H.: GPU Gems 2, chap. Terrain Rendering Using GPU-Based Geometry Clipmaps. Addison-Wesley Longman (2005) 4. Blinn, J.F.: Simulation of Wrinkled Surfaces. In: SIGGRAPH ’78: Proceedings of the 5th annual conference on Computer graphics and interactive techniques, pp. 286– 292. ACM (1978) 5. Clasen, M., Hege, H.C.: Terrain Rendering using Spherical Clipmaps. In: EuroVis06 Joint Eurographics - IEEE VGTC Symposium on Visualization, pp. 91–98. Eurographics Association (2006) 6. Cook, R.L.: Shade trees. In: SIGGRAPH ’84: Proceedings of the 11th annual conference on Computer graphics and interactive techniques, pp. 223–231. ACM (1984) 7. Crawfis, R., Noble, E., Ford, M., Kuck, F., Wagner, E.: Clipmapping on the GPU. Tech. rep., Ohio State University, Columbus, OH, USA (2007) 8. Dick, C., Krüger, J., Westermann, R.: GPU Ray-Casting for Scalable Terrain Rendering. In: Proceedings of Eurographics 2009 - Areas Papers, pp. 43–50 (2009) 9. Dick, C., Krüger, J., Westermann, R.: GPU-Aware Hybrid Terrain Rendering. In: Proceedings of IADIS Computer Graphics, Visualization, Computer Vision and Image Processing 2010, pp. 3–10 (2010) 10. Dummer, J.: Cone Step Mapping: An Iterative Ray-Heightfield Intersection Algorithm. http://www.lonesock.net/files/ConeStepMapping.pdf (2006) 11. Ephanov, A., Coleman, C.: Virtual Texture: A Large Area Raster Resource for the GPU. In: Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC) 2006, pp. 645–656 (2006) 12. Ewins, J.P., Waller, M.D., White, M., Lister, P.F.: MIPMap Level Selection for Texture Mapping. IEEE Transactions on Visualization and Computer Graphics 4(4), 317–329 (1998) 13. Feldmann, D., Steinicke, F., Hinrichs, K.: Flexible Clipmaps for Managing Growing Textures. In: Proceedings of International Conference on Computer Graphics Theory and Applications (GRAPP) (2011) 14. Foley, J.D., van Dam, A., Feiner, S.K., Hughes, J.F.: Computer Graphics: Principles and Practice, Second Edition in C edn. Addison-Wesley (1995) 15. Google Inc.: Google Earth. http://earth.google.com/ (2005) 16. Kaneko, T., Takahei, T., Inami, M., Kawakami, N., Yanagida, Y., Maeda, T., Tachi, S.: Detailed Shape Representation with Parallax Mapping. In: In Proceedings of the ICAT 2001, pp. 205–208 (2001) Dirk Feldmann, Klaus Hinrichs 17. Li, Z., Li, H., Zeng, A., Wang, L., Wang, Y.: Real-Time Visualization of Virtual Huge Texture. In: ICDIP ’09: Proceedings of the International Conference on Digital Image Processing, pp. 132–136. IEEE Computer Society (2009) 18. Losasso, F., Hoppe, H.: Geometry clipmaps: Terrain Rendering Using Nested Regular Grids. ACM Transactions on Graphics (TOG) (2004) 19. Microsoft: DirectX SDK Documentation: RaycastTerrain Sample. http://msdn.microsoft.com/enus/library/ee416425(v=vs.85).aspx (2008) 20. Mittring, M., Crytek GmbH: Advanced Virtual Texture Topics. In: SIGGRAPH ’08: ACM SIGGRAPH 2008 Classes, pp. 23–51. ACM (2008) 21. NASA: Visible Earth: Earth - The Blue Marble. http://visibleearth.nasa.gov/view.php?id=54388 (1997) 22. NASA: Shuttle Radar Topography Mission. http://www2.jpl.nasa.gov/srtm/ (2000) 23. NASA: World Wind. http://worldwind.arc.nasa.gov/ (2004). http://www.goworldwind.org 24. Oh, K., Ki, H., Lee, C.H.: Pyramidal Displacement Mapping: a GPU based Artifacts-free Ray Tracing through an Image Pyramid. In: VRST ’06: Proceedings of the ACM symposium on Virtual reality software and technology, pp. 75–82. ACM (2006) 25. Oliveira, M.M., Bishop, G., McAllister, D.: Relief Texture Mapping. In: SIGGRAPH ’00: Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pp. 359–368. ACM Press/AddisonWesley Publishing Co. (2000) 26. Policarpo, F., Oliveira, M.M.: GPU Gems 3, chap. Relaxed Cone Stepping for Relief Mapping. Addison-Wesley Professional (2007) 27. Policarpo, F., Oliveira, M.M., Comba, J.a.L.D.: Realtime Relief Mapping on Arbitrary Polygonal Surfaces. In: Proceedings of the 2005 symposium on Interactive 3D graphics and games, I3D ’05, pp. 155–162. ACM (2005) 28. Qu, H., Qiu, F., Zhang, N., Kaufman, A., Wan, M.: Ray Tracing Height Fields. In: Procedings of Computer Graphics International, pp. 202–207 (2003) 29. Seoane, A., Taibo, J., Hernández, L.: HardwareIndependent Clipmapping. In: Journal of WSCG 2007, pp. 177 – 183 (2007) 30. Szirmay-Kalos, L., Umenhoffer, T.: Displacement Mapping on the GPU - State of the Art (2006) 31. Taibo, J., Seoane, A., Hernández, L.: Dynamic Virtual Textures. In: Journal of WSCG 2009, pp. 25 – 32. Eurographics Association (2009) 32. Tanner, C.C., Migdal, C.J., Jones, M.T.: The Clipmap: a Virtual Mipmap. In: SIGGRAPH ’98: Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques, pp. 151–158. ACM (1998) 33. Tatarchuk, N.: Dynamic Parallax Occlusion Mapping with Approximate Soft Shadows. In: SIGGRAPH ’06: ACM SIGGRAPH 2006 Courses, pp. 63–69. ACM (2006) 34. Tevs, A., Ihrke, I., Seidel, H.P.: Maximum Mipmaps for Fast, Accurate, and Scalable Dynamic Height Field Rendering. In: I3D ’08: Proceedings of the 2008 Symposium on Interactive 3D Graphics and Games, pp. 183–190. ACM (2008) 35. Williams, L.: Pyramidal Parametrics. In: SIGGRAPH ’83: Proceedings of the 10th Annual Conference on Computer Graphics and Interactive Techniques, pp. 1–11. ACM (1983)

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download GPU based Single-Pass Ray Casting of Large Heightfields