A New Object-Order Ray-Casting Algorithm - CiteSeerX

In DVR algorithms, the interaction between rays traced from the viewpoint and the ..... preprocessed ray with every pixel would be less efficient because it would ... half of the depth indicators are initialized to zero and the others to half of the ...
2MB taille 28 téléchargements 251 vues
A New Object-Order Ray-Casting Algorithm Benjamin Mora, Jean-Pierre Jessel, René Caubet Institut de Recherche en Informatique de Toulouse (IRIT), Université Paul Sabatier, 31062 Toulouse, France

l methods in which the whole original dataset is used for the rendering without any intermediate representation of the volume. In DVR algorithms, the interaction between rays traced from the viewpoint and the volume is studied, which allows high quality images and a great freedom of action. Several models of interaction are widely used nowadays, like the maximum intensity projection (MIP) model that returns the maximum value encountered along the ray, or the accumulation model that integrates the signal. Here, we will only focus on optical models [10, 16] that are common to many volume rendering applications, even if our algorithm can be easily extended to other models. The Kajiya’s optical model in its low albedo form is given by: l  s  (1) I λ = ∫ Cλ (s ) × τ (s )× exp − ∫ τ (t ) dt  ds 0  0  Where Iλ is the amount of light of wavelength λ along the ray

Figure 1: Two-pass volume rendering of a 2563 bonsai dataset, visualized at approximately 3 frames per second.

ABSTRACT Many direct volume rendering algorithms have been proposed during the last decade to render 2563 voxels interactively. However a lot of limitations are inherent to all of them, like lowquality images, a small viewport size or a fixed classification. In contrast, interactive high quality algorithms are still a challenge nowadays. We introduce here an efficient and accurate technique called object-order ray-casting that can achieve up to 10 fps on current workstations. Like usual ray-casting, colors and opacities are evenly sampled along the ray, but now within a new objectorder algorithm. Thus, it allows to combine the main advantages of both worlds in term of speed and quality. We also describe an efficient hidden volume removal technique to compensate for the loss of early ray termination. CR Categories : I.3.3 [Computer Graphics]: Picture/Image Generation, I.3.7 [Computer Graphics]: Three-Dimensional graphics and realism, Raytracing, Visible line/surface algorithms. Keywords : Volume Rendering, Scientific Visualization, Medical Imaging, Ray Tracing.

1. INTRODUCTION Volume visualization has been widely studied over the last decade due to the expansion of scientific devices producing such data. Many algorithms have been developed, and some of them are regrouped under the category of direct volume rendering (DVR) {mora, jessel}@irit.fr

reaching the viewpoint. The contribution of the ray at the location s is given by Cλ(s) weighted by the extinction coefficient τ(s) and by the percentage of occlusion that depends on the opacity between the viewpoint and s. However this integral cannot be evaluated as it is, and a Riemann sum is often used to approximate it. This way, rays are usually evenly sampled with a distance ∆s, and the accumulated color (Ci) and opacity (αi) are estimated with the recursive process given below (front-to-back order): Ci+1 = Ci + (1-αi) αsCs

(2)

αi+1 = αi + (1-αi) αs It is obvious that the accuracy of the integral estimation directly depends on the distance ∆s and on the evaluation of the values αs and Cs. A large sampling distance can accelerate the rendering times, but on the other hand it provides low quality images. Furthermore the sampled values αs and Cs have to be estimated from the discrete volume data with a reconstruction filter. Thus the choice of the reconstruction filter is crucial and nowadays only the trilinear filter and the Gaussian filter, usually considered as reasonable quality filters, can perform direct volume rendering in acceptable times, even if studies [15] have shown that a better quality can be obtained with more complex filters. Then a good volume rendering application should provide the best compromise between quality and speed. In this paper we will intend to give such a trade-off by using an efficient approach to compute ray-casting. Usually, trilinear interpolation is made easier with this algorithm, but the pixel-bypixel approach (also called image-order) drastically slows down the rendering process, even if the hidden regions of the volume are not processed. On the opposite, the projection approaches (also called object-order) are well-suited for skipping empty regions, but the usually associated filters are either low-quality filters or too complex to be interactive, and hidden volume removal is not very efficient. Our approach combines the advantages of both approaches to produce interactive high-quality volume rendering. First, in section 2, we will look at today’s mostly used techniques and we will try to explain the strengths and

weaknesses of each approach. Then, in section 3, we will describe the new object-order ray-casting algorithm. Finally, we will show in section 4 that efficient hidden volume removal is possible with it before giving our results in section 5.

2. PREVIOUS WORK Four main software algorithms have emerged in the last decade [18] and are widely used: the Shear-Warp method and the Hardware-assisted 3D texture mapping techniques that are speed oriented, and the quality oriented ray-casting and splatting algorithms. Shear-Warp [12] is currently considered as the fastest software algorithm. It offers many optimizations probably allowing unbeatable rendering times. This object-order method considers the volume as a stack of 2D slices parallel to the face of the volume the most perpendicular to the view axis. Slices are accumulated on an intermediate image that undergoes an ultimate resampling step to produce the final image. The intermediate image is aligned with the slices and has the same pixel density, allowing both the volume and the image to be run in an efficient memory order, and to perform fast projection (i.e. one voxel is projected on one pixel). To improve quality, a bilinear interpolation is made for every projection with constant coefficients within a slice. Finally an efficient pre-classified runlength encoding of the volume allows skipping quickly empty regions. However, regarding quality, this algorithm has many drawbacks. First, the sampling rate on the z-axis is between 1 and 1.73 according to the viewpoint, which is definitely not enough for the observation of thin volume structures. The pre-classification partially blurs the intermediate image, which is increased by the final resampling step. Finally, artifacts occur when the viewing angle is close to 45o due to the bilinear interpolation. Thus, the global quality provided by the original implementation turns out to be poor. A solution to these drawbacks is to use trilinear interpolation, post-classification and supersampling, as implemented in the VolumePro board [26]. This PCI board can render 500 million interpolated samples per second with a brute-force Shear-Warp algorithm (parallel projection), which is sufficient to render 2563 volumes at 30 frames per second. Supersampling can be computed on hardware in the z-direction and on software in the x and y directions by rendering several images at different offsets. However supersampling divides the frame rate by the number of samples per voxel, and then if the sampling rate along the 3 axes is doubled to produce high quality images, the frame rate can decrease to under 4 frames per second. Furthermore, applying these improvements on the original algorithm should also reduce greatly its performances. Thus, real time high quality volume rendering should not be really possible yet by using a shear-warp algorithm. Another popular way to perform interactive volume rendering is to use 3D texture mapping hardware [1, 2, 4, 17, 29] by extracting and compositing 2D planes parallel to the image plane, but until recently the proposed approaches had several limitations, like binary classification or diffuse shading only. Engel and al. [3] have used the NVidia’s OpenGL extension available with the new Geforce3 graphics hardware to circumvent these drawbacks. Although real-time rendering rates are possible on small volumes ( Level 5 [82]). of the hierarchy only represents one pixel. The updating process begins on the first HOM (i.e. the finest map) every time a pixel of the image plane becomes opaque. In this case the pixel of the HOM including the opaque pixel and its 3 nearest pixels in the map are incremented (fig. 6a). When an HOM pixel reaches its maximum value (i.e. incremented 16 times), updating starts again recursively at the superior level. Thus, the complexity of this process is equal to m2.Log2(m) only, where m is the image width. However, an opacity test must also be added for every treated ray. The visibility process consists in determining the hidden nodes during the rendering. Because the projection of a node is also represented by a hexagon, an HOM level is now associated with every octree level. This HOM level is equal to the finest level such that a HOM pixel can include the entire projection of the octree node (fig. 6b). Thus, the visibility test is performed for each node by looking if the pixel of the corresponding occlusion map where the center of the node projects is equal to 16. This value means that an extended square around the pixel made of 16 quarter of pixel (fig. 6b) is opaque. Figure 7 shows an example of an octree run with efficient hidden volume removal. In this example, most of the encountered leaf nodes are located near the visible surface, though some of them are surprisingly far beyond because the recursive run of the volume allows efficient but non-optimal hidden volume removal. Another observation is the rareness of big block skipping. This is mainly due to the low accuracy given by the corresponding occlusion maps.

4.3 Other Optimizations y

4.3.1 Fast Projection x

(a) (b) Figure 6: Examples of an occlusion map used during the updating process (a) and the visibility test (b).

(c)

The projection of the center of the octree nodes is a key element in our application because it is used in the visibility process (§ 4.2) and for determining the rays intersecting the cell (§ 3.1). Here, the use of an orthogonal projection allows the projected

Size

Rendering Mode

Cells

Oc cluded Cells

Sampled Points

Tbird 1.4 GHz

Duron 600 MHz

UNC Head (a)

256x 256x225

Single

343K (4093K)*

91.70%

UNC Head (b)

256x 256x225

Single

268K (1255K)*

79.70%

907K (4711K)*

645K

6.2 f ps

2.6 f ps

694K (1475K)*

453K

7.7 f ps

UNC Head (c )

256x 256x225

Double

617K (5120K)*

3.4 f ps

88%

1570K (6300K)*

1077K

3.1 f ps

UNC Engine (d)

256x 256x110

Single

1.4 f ps

236K (1453K)*

83.80%

621K (1686K)*

356K

9.1f ps

4.5 f ps

UNC Engine (e)

256x 256x110

UNC Engine (f )

256x 256x110

Double

371K (1175K)*

68.50%

759K (1570K)*

564K

5.6 f ps

2.7 f ps

Single

1544K (2150)*

28,2%

1822K (2750K)*

6747K

1.4 f ps

0.55 f ps

UNC Brain (g)

256x 256x167

Single

226K (2434K)*

90.80%

593K (2808K)*

654K

7.7 f ps

3.2 f ps

A neuris m (h)

256x 256x256

Single

71K (104K)*

31.80%

113K (190K)*

223K

20 f ps

10 f ps

V olume

Oc tree Nodes

* Without Hidden V olume Remov al

Table 1: Measurements for different renderings. center of a node to be computed from the projected center of the parent node by a simple 2D translation. In this way, eight constant 2D translation vectors are preprocessed for every level of the octree. 32 bits integer arithmetic is also used to quickly determine the subdivisions and the pixels affected by the projection and to quickly update recursively occlusion maps. Finally, the depth component used for fast depth cueing is computed in the same way.

4.3.2 Classification and Shading Classification and shading are computed on every sampled point along the ray from the volume samples and the gradient by using a trilinear interpolation. Usually, classification and shading are performed either before the interpolation step (preclassification and preshading) or after (postclassification and postshading). The first technique is fast while the latter gives better results. However, in the case of preclassification and preshading it is strongly recommended to use opacity weighted color interpolation [13, 33]. Our approach is a hybrid method using postclassification but only preshading, in order to not degrade speed. Here the volume gradient is precomputed like many algorithms do [11, 12, 19], and a number indexing a set of quantized space directions is associated with every voxel. Shading is applied on every quantized direction before every rendering and the resulting preshaded reflectance map is used during the rendering to shade the 8 vertexes before the color interpolation at the sampling location. Mueller et al. [24] have shown an example of preshading and postclassification with bad artifacts. We want to point out that no artifact is visible with our application and we think that the given interpretation of this phenomenon is probably erroneous.

4.3.3 Optimized Trilinear Reconstruction MMX, SSE or 3DNow! SIMD processor extensions have been used to improve memory copying and trilinear interpolation. Color and opacity have been computed on 16 bits unsigned integers. However, code running on MMX only processors does not allow the line equation tests to take place (cf. 3.1) and then minor artifacts can occur on the image.

4.3.4 Volume Interleaving To reduce cache misses, the volume is decomposed in small blocks containing 323 samples [11] where the bits of the coordinates are interleaved as follows: (Xn…0, Yn…0, Zn…0)

⇒ Zn…5Yn…5Xn…5Z4Y4X4…Z0Y0X0

A 32-entry table and binary operators are used to interleave the five lower bits and to generate the new memory address.

5. RESULTS Here, we are looking at the first results of this new algorithm. The entire program except trilinear interpolation and memory copying (SIMD instructions) is written in C++. In contrast with many other DVR methods using small window sizes, here, all rendering are made on 5122 pixel images not to degrade image quality. The number of quantized space directions used for shading is equal to 8192. Pixel subdivisions (fig.2a) and precomputed rays (fig. 2b) are both fixed to 162, allowing up to 4x zoom without visible noise. The sampling_rate variable determining the distance between two consecutive interpolations is initialized such as ∆s=0.5, which allows between two and four samplings along a ray within a cell. Thus, these settings make high quality possible. Finally, due to the high efficiency of our algorithm when visualizing isosurfaces, the current implementation can also superimpose two rendering passes stemming from different transfer functions. Benchmarks have been performed on two platforms allowing 3DNow! instructions: a low-end platform based on an AMD Duron 600 MHz (64/64 KB L1/L2 data cache) with 320 MB (SDRAM 100 MHz) and a more recent AMD Athlon 1.4 GHz (64/256 KB L1/L2 data cache) with 512 MB (DDR SDRAM 266 MHz). Four datasets have been used: the usual head, brain and engine from UNC Chapel Hill plus a highly compact angiography dataset (courtesy of Philips Research Labs). An additional PIII platform has also be used for a comparison with the Ultravis system. The main results are summarized in table 1. All the measurements are averaged with 24 renderings. We observe a minimal/maximal divergence of less than 30% about rendering times. Octree processing and volume preshading take approximately 20 seconds, but are computed only once per volume. The fourth column indicates the number of cells (in thousands) within the volume that are neither transparent nor removed by the hidden volume test, even if all the rays going through them can be already opaque. The next column gives the ratio of occluded cells. The sixth column is about the number of octree nodes that are run. The values within parentheses are measured without hidden volume removal, which is disabled to rate its efficiency. The seventh column indicates the number of sampled points along the rays and the last two columns give the rendering times.

Volum e

Ultravis

OO R C

UNC Head (a)

0.8 f ps

2.4 f ps

UNC Engine (d)

3.5 f ps

3.5 f ps

UNC Brain (g)

1.0 f ps

3.2 f ps

A neuris m (h)

0.45 f ps

6.75 f ps

Bonsai (f ig. 1)

0.5 f ps

2.1 f ps

Table 2: Comparisons with the Ultravis system (PIII 600MHz, 512 MB) The results show that interactive high-quality volume rendering is possible on current high-end platforms when visualizing isosurfaces. Better still, rendering remains interactive even with highly complex transfer functions that include a great part of the volume in the rendering process (f). No other algorithm running on a standard workstation is able to produce such a frame rate with this level of detail today. While methods based on 3D texture hardware currently do not exceed 2 fps on 2563 volumes with a limited accuracy, previously mentioned ray-casting techniques are really slower. Only a Shear-Warp implementation should be able nowadays to deliver a superior frame rate, but once again with an important loss of quality that clearly limits its use. We have compared the efficiency of our algorithm with the Ultravis system [11], which is one of the most advanced ray-casting platforms. The same parameters have been used with both methods, but the Ultravis system, which generates 2562 pixel images only, uses post-shading and can handle perspective renderings. This latter is the main drawback of our algorithm, but it is only required in some specific applications and many professional systems do not implement it [26]. The results clearly show the superiority of our approach, and like many other ray-casting algorithms, Ultravis performs badly on datasets with much empty space (cf. bonsai and aneurism datasets). Rendering times are only equals for the engine dataset where early ray termination is very important, showing us that HVR is also an efficient alternative to the lack of early ray termination in object-order volume rendering. Last but not least, we have noticed that our algorithm produces a much better image quality, partially due to the low image resolution of Ultravis. By studying table 1 in great details, we have come to many interesting conclusions. First, we can clearly see that hidden volume removal eliminates a large fraction of the cells and octree nodes within the opacity range when visualizing isosurfaces. Here, the predominant parts of the rendering are the octree run and the voxel loading, while trilinear interpolation becomes predominant in the case of semi-transparent transfer functions. An important fact is that the efficiency of HVR is very superior to the method proposed by Lee and Ihm [9] where the ratio of occluded splats for similar renderings (b) and (g) is respectively only 25% and 67%. Actually, this ratio can also be considered as a good HVR speed-up estimation because our algorithm delivers an approximately constant cell throughput (between 1.4 and 2.1 millions cells per second). Thus, hidden volume removal is a very aggressive optimization here, but the recursive run of the octree does not optimize it (as seen previously). A better approach in future works might be a plane-by-plane volume run, allowing a better efficiency in occlusion tests. However, the non-leaf nodes will be run several times. Another observation is that the rendering times seem to scale well with the processor clock frequency, even if the two configurations are quite different (memory clock, L2 cache size).

6. CONCLUSION A volume rendering application must be carefully designed and should always provide user-friendliness and accuracy. Hardware implemented techniques, although they are fast and sometimes not expensive, have limited on-board memory and are not flexible. On the other hand software algorithms can handle a wide variety of problems but suffer from a lack of performances. Our method offers a new way to perform ray-casting on rectilinear grids, achieving almost real-time volume rendering when visualizing isosurfaces and at least, interactive renderings in general. These achievements are mainly due to the efficient object-order raycasting approach that we have introduced and to its optimizations like hidden volume removal. But in contrast with the rare software techniques able to produce such a frame rate, our algorithm reaches the high-level of details that scientific visualization requires. Indeed, its features such as randomized high sampling rate with trilinear interpolation, large image size (5122 pixels), and interactive post-classification are facilitating devices. We have found two drawbacks to our method. First, it is the use of preshading that slightly degrades the quality and finally, the lack of perspective projection. This latter is needed in stereo viewing applications or in virtual reality for example, but it is not required most of the time for scientific visualization where parallel projection is often preferred. In the future, we should look for a way to implement an efficient post-shading version of this algorithm in order to prevent the frame rate from decreasing too much. We are also planning to improve the rendering engine and to use multi-processor based PCs.

7. REFERENCES [1]

B. Cabral, N. Cam and J. Foran. Accelerated volume rendering and tomographic reconstruction using texture mapping hardware. IEEE/ACM Siggraph symposium on volume visualization, 1994, pp. 91-97. [2] F. Dachille, K. Kreeger, B. Chen, I. Bitter and Arie Kaufman. Highquality volume rendering using texture mapping hardware. Proc. 1998 Siggraph/Eurographics workshop on graphic hardware, pp. 6976. [3] K. Engel, M. Kraus and T. Ertl. High Quality pre-integrated volume rendering using hardware-accelerated pixel shading. In Proc Eurographics/Siggraph workshop on graphic hardware, 2001. [4] A. Van Gelder and K. Kim. Direct volume rendering via 3D texture mapping hardware. Proc. Volume Rendering Symposium 1996., pp. 23-30, 1996. [5] J. S. Gondek, G. W. Meyer and J. G. Newman. Wavelength dependant reflectance functions. Siggraph’94 proc., 1994. [6] N. Greene, M. Kass, G. Miller. Hierarchical Z-Buffer Visibility. SIGGRAPH’93 Proc., 1993, pp. 231-238. [7] N. Greene. Hierarchical polygon tiling with coverage masks. SIGGRAPH’96 Proc., 1996, pp. 65-74. [8] J. Huang, K. Mueller, N. Shareef, R. Crawfis. Fast splats: optimized splatting on rectilinear grids. IEEE Visualization’00 proceedings, October 2000. [9] R. K. Lee and I. Ihm. On enhancing the speed of splatting using both object-and-image space coherence. Graphical models and image processing, vol. 62, no. 4, 2000. pp 263-282. [10] J. Kajiya and B. Von Herzen. Ray tracing volume densities. SIGGRAPH’84, July 1984, pp. 165-174. [11] G. Knittel. The Ultravis System. IEEE/ACM SIGGRAPH Volume visualization and graphics symposium 2000, October 2000, pp. 7178.

[24] K. Mueller, T. Möller and R. Crawfis. Splatting without the blur. Proc. Visualization'99, 1999, pp. 363-371. [25] K. Mueller, N. Shareef, J. Huang and R. Crawfis. Splatting. Highquality splatting on rectilinear grids with efficient culling of occluded voxels. IEEE TVCG, Vol. 5, No. 2, 1999, pp. 116-134. [26] H. Pfister, J. Hardenbergh, J. Knittel, H. Lauer and L. Seiler. The volumepro real-time ray-casting system. SIGGRAPH’99, 1999, pp. 251-260.

[12] P. Lacroute and M. Levoy. Fast volume rendering using a shearwarp factorization of the viewing transformation. SIGGRAPH’94, 1994, pp. 451-458. [13] M. Levoy. Display of surfaces from volume data. IEEE Comp. Graph. & App., Vol. 8, no. 5, 1988, pp. 29-37, 1988. [14] M. Levoy. Efficient raytracing of volume data. ACM Transactions on graphics, vol. 9, no. 3, 1990, pp. 245-261. [15] S.R. Marschner and R.J. Lobb. An evaluation of reconstruction filters for volume rendering. Proceedings of visualization’94, October 1994, pp. 100-107. [16] N. Max. Optical model for direct volume rendering. IEEE Transaction on visualization and computer graphics, 1995, Vol. 1, no. 2, pp. 99-108. [17] M. Meißner, U. Hoffman and W. Straßer. Enabling classification and shading for 3D texture mapping based volume rendering using OpenGl and extensions. Proc. Visualization’99, 1999, pp. 207-214. [18] M. Meißner, J. Huang, D. Bartz, K. Mueller, R. Crawfis. A practical comparison of popular volume rendering algorithms. IEEE/ACM SIGGRAPH Volume visualization and graphics symposium 2000, October 2000, pp. 81-90. [19] B. Mora, J.P. Jessel and R. Caubet. Accelerating volume rendering with quantized voxels. IEEE/ACM SIGGRAPH Volume visualization and graphics symposium 2000, Oct. 2000, pp. 63-70. [20] B. Mora, J.P. Jessel and R. Caubet. Visualization of isosurfaces with parametric cubes. Eurographics’01 Proc., vol. 20 no. 3, pp. 377-384, September 2001. [21] T. Möller, R.Machiraju, K.Mueller, R.Yagel. Evaluation and Design of Filters Using a Taylor Series Expansion. IEEE Transactions on Visualization and Computer Graphics, ITVCG 3(2): 184-199, June 1997. [22] L. Mroz, H. Hauser and E. Groller. Interactive high quality maximum intensity projection. Eurographics’00, vol. 19, no 3, 2000. [23] K. Mueller and R. Crawfis. Eliminating popping artifacts in sheet buffer-based splatting. Proc. Visualization’98, 1998, pp. 239-245.

[27] T. Theußl, T. Möller, M. E. Gröller. Optimal Regular Volume Sampling. IEEE Visualization 2001 proceedings, October 2001, San Diego. [28] M. Wan, A. Kaufman and S. Bryson. High performance presenceaccelerated ray casting. proc. Visualization’99, 1999, pp. 363-371. [29] R. Westermann and T. Ertl, Efficiently using graphics hardware in volume rendering applications. SIGGRAPH’98, 1998, pp. 169-177. [30] R. Westermann and B. Sevenich. Accelerated volume ray-casting using texture mapping. In proc. 2001 [31] L. Westover. Interactive volume rendering. Proceedings of the Chapel Hill Workshop on volume visualization, may 1989. [32] L. Westover. Footprint evaluation for volume rendering. SIGGRAPH’90 Proc., 1990, pp. 367-376. [33] C. Wittenbrink, T. Malzbender, and M. Goss. Opacity-weighted color interpolation for volume sampling. Symposium on Volume Visualization, pp. 135-142, 1998. [34] R. Yagel and A. Kaufman. Template-based volume viewing. Proc. Eurographics’92, vol.11, no.3, pp. 153-167. [35] H. Zhang, D. Manocha, T. Hudson and K. E. Hoff. Visibility culling using hierarchical occlusion Maps. SIGGRAPH’98 Proc., 1998, pp. 77-88. [36] J. Wilhelms and A. Van Gelder. A coherent projection approach for direct volume rendering. SIGGRAPH’91 Proc., pp. 275-284. [37] L. Sobierarjski and R. Avila. A Hardware Acceleration Method for Volume Ray Tracing. IEEE Visualization 1995.

(a)

(b)

(c)

(g)

(d)

(e)

(f)

(h)

Figure 8: Different renderings used for the benchmarks