Deferred Splatting Gaël GUENNEBAUD Loïc BARTHE Mathias PAULIN
IRIT – UPS – CNRS TOULOUSE – FRANCE
http://www.irit.fr/~Gael.Guennebaud IRIT – University of Toulouse – France
EG04
Plan ● ●
●
●
● ●
Complex Scenes: Triangles or Points ? HighQuality Splatting: Really efficient ? Deferred Splatting: ● Accurate point selection ● Temporal Coherency Applications: Occlusion culling & SPT Results Future Works
IRIT – University of Toulouse – France
EG04
Motivations ●
RealTime rendering of Complex Scenes
●
Triangles: fully supported by graphics HW, but... ● ●
●
tiny triangles are inefficient multiresolution can be very tedious
One solution is Points: no connectivity, no texture map, no... ➔ multiresolution rendering: simple & efficient ●
●
but...
IRIT – University of Toulouse – France
EG04
Motivations ●
One solution is Points, but ... ● ●
Large magnification: low quality Flat surfaces: inefficiency ➔ hybrid, triangles and points are complementary:
use triangles when points become less efficient
●
HighQuality point rendering is expensive ➔ deferred splatting !
IRIT – University of Toulouse – France
EG04
Efficient Point Rendering ●
2 issues:
●
How to select points that have to be rendered ?
●
How to render the points ?
IRIT – University of Toulouse – France
EG04
Efficient Point Rendering ●
How to select points that have to be rendered? ●
Store points into a hierarchical data structure (kdtree, octree, hierarchy of bounded spheres, ...)
●
Recursive traversal with ● visibility culling
(view frustum,backface,occlusion, ...)
●
LOD selection
(local density estimation, remove superfluous points) ●
How to render the points ?
IRIT – University of Toulouse – France
EG04
Efficient Point Rendering ●
How to select points that have to be rendered?
●
How to render the points ? ●
Efficiency => ●
graphics HW
●
splatting approach
IRIT – University of Toulouse – France
EG04
GPU Point Rendering quality & performance issues
Standard GL_POINTS
Opaque ellipses
( render a disk instead of a square is almost free )
6085 Number of million of points per second (GeForceFX 5900 under Linux) IRIT – University of Toulouse – France
3545
HighQuality Splatting
( accumulation of elliptic Gaussian, e.g. EWA Surface Splatting )
610
vs 44 M of small triangles per second EG04
Complex Scenes : Example Scene ~ 6800 trees ● 1 tree ~ 750k points ➔ 5000 Millions points ●
IRIT – University of Toulouse – France
●
After HighLevel culling & LOD: ~ 4 M points are still potentially visible and have to be rendered
●
But in fact only 150k are really visible !
EG04
Our Solution : Deferred Splatting ●
is similar to “deferred shading”: ● Defer expensive rendering computations to visible points only
●
is based on: ● An accurate point selection ● Temporal coherency
IRIT – University of Toulouse – France
EG04
HighQuality Splatting on GPU a multipass algorithm Hierarchical & multiresolution data structure
High Level Point Selection (Culling, LOD, ...)
Data Set
sub set
List of selected points (indexes, list of ranges, ...)
GPU
Zbuffer
●
Color buffer
CPU IRIT – University of Toulouse – France
EG04
HighQuality Splatting on GPU a multipass algorithm Hierarchical & multiresolution data structure
Data Set
GPU
Visibility splatting (1) High Level Point Selection (Culling, LOD, ...)
sub set
In order to accumulate visible splats only: ● precompute the depth buffer: ● std GL_POINTS primitive ● + per fragment shape & depth correction
Zbuffer
●
CPU
IRIT – University of Toulouse – France
Color buffer
EG04
HighQuality Splatting on GPU a multipass algorithm Hierarchical & multiresolution data structure
Data Set
GPU
Visibility splatting (1) High Level Point Selection (Culling, LOD, ...)
Zbuffer
sub set
Color splatting (2) std GL_POINTS primitive ● + per fragment Gaussian weight CPU ● + accumulation IRIT – University of Toulouse – France ●
Color buffer
EG04
HighQuality Splatting on GPU a multipass algorithm Hierarchical & multiresolution data structure
Data Set
GPU
Visibility splatting (1) High Level Point Selection (Culling, LOD, ...)
Zbuffer
sub set
Color splatting (2) Owing to : CPU IRIT – University of Toulouse – France
∑ weights ≠ 1
Color buffer Normalization (3) EG04
HighQuality Splatting on GPU [ analyse ]
Hierarchical & multiresolution data structure
GPU
Data Set
Visibility splatting (1) High Level Point Selection (Culling, LOD, ...)
sub set
EXPENSIVE / SLOW
COULD BE HUGE CPU
1220 M pts/s
Color splatting (2)
COARSE > 4 M pts
IRIT – University of Toulouse – France
Zbuffer
Color buffer Normalization (3) EG04
The
Deferred Splatting Algorithm
IRIT – University of Toulouse – France
EG04
Accurate Point Selection Hierarchical & multiresolution data structure High Level Point Selection (Culling, LOD, ...)
Data Set
sub set
Visibility splatting (1) Zbuffer
Color splatting CPU IRIT – University of Toulouse – France
GPU
Color buffer Normalization EG04
Accurate Point Selection Hierarchical & multiresolution data structure High Level Point Selection (Culling, LOD, ...)
Data Set
sub set
GPU
Visibility splatting (1) Zbuffer Break this direct path ● Add an accurate point selection ● Only visible points should pass the new test ●
Color splatting CPU IRIT – University of Toulouse – France
Color buffer Normalization EG04
Accurate Point Selection Hierarchical & multiresolution data structure High Level Point Selection (Culling, LOD, ...)
GPU
Data Set
sub set
Visibility splatting (1) Zbuffer Index (2)
Render points as fast as possible: ● no shading, no blending, no.... ● GL_POINTS, size = 1 pixel ● Color = handle of the point CPU = comb(object's id,point's id) ●
IRIT – University of Toulouse – France
Color buffer = {handle of visible points}
EG04
Accurate Point Selection Hierarchical & multiresolution data structure High Level Point Selection (Culling, LOD, ...)
Data Set
GPU
Visibility splatting (1)
sub set
Zbuffer Index (2)
Read & (2') Sort
Read the color buffer ● Extract indices from handles ● Sort point's indices by object CPU => index arrays Bi IRIT – University of Toulouse – France Bi
●
Color buffer
EG04
Accurate Point Selection Hierarchical & multiresolution data structure High Level Point Selection (Culling, LOD, ...)
Data Set
sub set
GPU
Visibility splatting (1) Zbuffer Index (2)
Read & (2') Sort
Bi
CPU IRIT – University of Toulouse – France
Color splatting (3)
Color buffer Normalization (4) EG04
Accurate Point Selection break this direct path Hierarchical & by taking advantage of Data Set multiresolution temporal coherency data structure Visibility High Level Point Selection (Culling, LOD, ...)
sub set
GPU
splatting (1) Zbuffer Index (2)
Read & (2') Sort
Bi
CPU IRIT – University of Toulouse – France
Color splatting (3)
Color buffer Normalization (4) EG04
Accurate Point Selection Render only points which are Hierarchical & visible in the previous frame Data Set multiresolution data structure Visibility Bi1 splatting (1) High Level Point Selection (Culling, LOD, ...)
Read & (2') Sort
sub set
Bi
CPU IRIT – University of Toulouse – France
Bi1 ≠ Bi => holes Index (2)
Color splatting (3)
GPU
Zbuffer
Color buffer Normalization (4) EG04
Temporal Coherency : Artifacts
Frame i
Frame i+1 temporal coherency approximation leads to artifacts
IRIT – University of Toulouse – France
EG04
Temporal Coherency Render only points which are Hierarchical & visible in the previous frame Data Set multiresolution data structure Visibility Bi1 splatting (1) High Level Point Selection (Culling, LOD, ...)
Read & (2') Sort
sub set
Bi
CPU IRIT – University of Toulouse – France
Bi1 ≠ Bi Index (2) => holes
Color splatting (4)
GPU
Zbuffer
Color buffer Normalization (5) EG04
Temporal Coherency Hierarchical & multiresolution data structure Bi1 ● Compute B from the i High Level incomplete Zbuffer Point Selection sub ● Also compute B B (Culling, LOD, ...)i i1 set
Read & (2') Sort
Data Set
GPU
Visibility splatting (1) Index (2) Update the Zbuffer : “Render Bi Bi1 “ ●
BiBi1 Bi
CPU IRIT – University of Toulouse – France
Visibility splatting (3) Color splatting (4)
Zbuffer
Color buffer Normalization (5) EG04
The Complete Algorithm summary step by step Hierarchical & multiresolution data structure High Level Point Selection (Culling, LOD, ...)
Read & (2') Sort
Data Set Bi1
Visibility splatting (1)
sub set
Index (2)
BiBi1 Bi
CPU IRIT – University of Toulouse – France
Visibility splatting (3) Color splatting (4)
GPU
Zbuffer
Color buffer Normalization (5) EG04
One point per pixel ... ●
Deferred Splatting allows only one point per pixel
●
Advantages ● Remove superfluous points (LOD selection) ● Solve color buffer overflow (only 8 bits per component)
●
Drawbacks
IRIT – University of Toulouse – France
EG04
One point per pixel ... ●
Deferred Splatting allows only one point per pixel
●
Advantages Drawbacks ● We may lose texture information ● High frequency textured models + coarse highlevel LOD selection ➔ flickering artifacts ... ● Can be solved using surfel mipmap
●
[Pfister et al. 00]
IRIT – University of Toulouse – France
EG04
Deferred Splatting
Applications Occlusion Culling Sequential Point Trees
IRIT – University of Toulouse – France
EG04
HighLevel Occlusion Culling Hierarchical & multiresolution data structure
Data Set Bi1
High Level Point Selection (Culling, LOD, ...)
Occluded nodes removal
Visibility splatting (1) HW Occlusion Queries (asynchronous)
sub set
GPU
Zbuffer
Color buffer
CPU IRIT – University of Toulouse – France
EG04
Sequential Point Trees [Dachsbacher03] Preprocessing: build a sequential version of the hierarchy
IRIT – University of Toulouse – France
EG04
Sequential Point Trees [Dachsbacher03] Preprocessing: build a sequential version of the hierarchy Rendering: CPU: fast & coarse selection of a prefix GPU: fine LOD selection at the point level
IRIT – University of Toulouse – France
EG04
Sequential Point Trees [Dachsbacher03] Preprocessing: build a sequential version of the hierarchy Rendering: CPU: fast & coarse selection of a prefix GPU: fine LOD selection at the point level
IRIT – University of Toulouse – France
EG04
Sequential Point Trees Preprocessing: build a sequential version of the hierarchy Rendering: CPU: fast & coarse selection of a prefix GPU: fine LOD selection at the point level
SPT Coarse SPT selection
CPU
prefix
Classical HighQuality Splatting: all points of the coarse prefix are processed by 2 complex vertex programs
=> inefficient IRIT – University of Toulouse – France
Visibility splatting(1) + SPT fine selection
GPU
Color splatting (2) + SPT fine selection EG04
Sequential Point Trees Preprocessing: build a sequential version of the hierarchy Rendering: CPU: fast & coarse selection of a prefix GPU: fine LOD selection at the point level
GPU
SPT Coarse SPT selection
CPU
prefix
Index (2) + SPT fine selection
Deferred Splatting: all points of the coarse prefix are processed by 1 very simple vertex program
=> efficient IRIT – University of Toulouse – France
EG04
Results Classical GPU based HighQuality Splatting versus
Deferred Splatting
IRIT – University of Toulouse – France
EG04
Results : Simple Head ● ●
● ●
285k points Average FPS: ● EWA Splatting: 34 ● Deferred Splatting: 41 Speed up: x1.2 % of culled points: 5070%
EWA Splatting
with DS
Reading buffer + sort Render Indexes
classic
Visibility Splatting 0
10
20
IRIT – University of Toulouse – France
30
40
EG04
Results : 200 Hugo ● ● ●
● ●
1 Hugo = 450k points Scene = 200 Hugo in motion Average FPS: ● EWA Splatting: 11.5 ● Deferred Splatting: 34.5 Speed up : x3 % of culled points: 90% EWA Splatting
with DS
Reading buffer + sort Render Indexes
classic
Visibility Splatting 0
20
40
IRIT – University of Toulouse – France
60
80
EG04
Results : Forest ● ● ●
● ● with DS
1 tree = 750k points Scene = 6800 trees Average FPS: ● EWA Splatting: 1.11.8 ● Deferred Splatting:1120 Speed up : x10 % of culled points: 9097%
(1 tree)
classic
EWA Splatting Reading buffer + sort
Render Indexes Visibility Splatting
with DS classic 0
50
100
150
200
IRIT – University of Toulouse – France
250
300
350
400
450
500
550
600
EG04
What about screen resolutions ? ●
●
When the screen size increases ●
The rendering time linearly increases
●
The speed up of deferred splatting remains constant
Large resolution => reading the color buffer becomes expensive: 1024² => 25ms ! ● AGP limitation > PCI express ? 512x512 724x724
EWA Splatting Reading buffer + sort Render Indexes Visibility Splatting
1024x1024 0 50 100
200
300
400
IRIT – University of Toulouse – France
500
600
700
800
900 1000 1100 1200 EG04
Usability ●
Unsuitable for simple scenes (