Vincent Forest – VORTEX group
Accurate Shadows by Depth Complexity Sampling
Problematic in real-time rendering Solve the rendering equation [Kajiya 86]
L x ' x ' ' =L e x ' x ' ' ∫ L x x ' f s x x ' x ' ' V x x '
dA x
Common simplifications for real time rendering
Consider only the center of the lights Approximate the energy from the other surfaces by a coarse ambient term A
L x ' x ' ' =E A
∑ x ∈ pointLight
2
∣∣x−x '∣∣
M
coso cosi '
L x x ' f s x x ' x ' ' V x x '
What are shadows?
Shadows are visibility queries
cos o cosi ' 2
∣∣x − x '∣∣
Problematic
Is it possible to generate accurate shadows in real-time?
Hard shadows
Shadow-maps
S.T.A.L.K.E.R. [GSC Game World 07]
✔ “Independent” of the geometric complexity ✔ Work with all rasterizable primitives Crysis [Crytek 07]
✗ Several renderings for omni-directional lights ✗ Aliasing artifacts
Shadow-volumes ✔ Accurate hard shadows ↔ ray-traced hard shadows ✗ Dependent of the geometric complexity ✗ Primitive constraints ✗ Fill-rate bottleneck
Doom3 [id-software 04]
F.E.A.R. [Monolith 05]
From hard shadows to soft shadows In ground truth, the light sources have got an area!
Fragments can be lit, shadowed or in the penumbra
How large is the visible part of light for each fragment ?
Simplifications for real time rendering
Modulate the fragment color by a visibility coefficient in [0, 1]
v_coef ↔ percentage of visible light area V x ' , l =1 coef
N
L x ' x ' ' =E A
∑ x ∈lightCenter
L x x ' f s x x ' x ' ' V coef x ' , l
∑V x x' x ∈l
coso cos i ' 2
∣∣x− x '∣∣
Soft shadows
Visually plausible soft shadows (v_coef=filtered hard shadows)
Percentage Closer Filtering (PCF) [Reeves 87] [Fernando 05]
Linearize the depth data to allow the pre-filtering of the shadow-maps
Variance Shadow-Map (VSM) [Donnelly and Lauritzen 06]
Convolution Shadow-Map (CSM) [Annen et al. 07]
Physically plausible soft shadows (v_coef=approximation of occluded area)
Soft Shadow Mapping (SSM) [Guennebaud et al. 06]
Penumbra-wedge [Assarsson and Akenine-Möller 02 03]
Penumbra-wedge A penumbra-wedge conservatively bounds the penumbra region defined by a silhouette edge silhouette edge
Algorithm 1) Initialize the per fragment v_coef by the result of the shadow-volumes 2) Extrude a penumbra wedge from each silhouette edge 3) For each fragment in the penumbra-wedge, update its v_coef according to the light area occluded by the corresponding silhouette edge
rasterized fragment
fragment in inner penumbra
inner half-wedge outer half-wedge
Penumbra-wedge
v_coef update
Project the silhouette edge onto the light source as seen by the fragment
Add/Subtract the corresponding covered percentage
v_coef == 1
occluder
v_coef -= 0.35
v_coef += 0.10
35%
10% light source
outer edge
inner edge
v_coef == 0.75
Penumbra-wedge
v_coef update
Project the silhouette edge onto the light source as seen by the fragment
Add/Subtract the corresponding covered percentage
v_coef == 1
occluder
v_coef -= 0.35
v_coef += 0.10
v_coef == 0.75
35%
10% light source
outer edge
inner edge
Silhouette overlapping [Forest et al. 06] v_coef == 1
v_coef -= 0.20 20%
v_coef -= 0.20 20%
v_coef == 0.60 10% overlapped region
light source
outer edge
outer edge
under-estimation
Soft shadow volumes for ray-tracing [laine et al. 05]
Accurate off-line shadows using penumbra-wedges
Soft shadow volumes for ray-tracing Local reconstruction of the visibility function
light sample
Define the number of occluders between a light sample and a fragment (i.e.: the depth complexity)
fragment
Algorithm
Build a data structure storing occluding silhouette edges [Laine et al. 05] [Lehtinen et al. 06] Retrieve silhouette edges affecting the fragment visibility
occluded area light samples shadow ray
✗
silhouette edge
Update the depth complexities of light samples according to the silhouette edges (Integration rules) Define if the light samples with the lowest depth complexity lit the fragment: shoot a shadow-ray
penumbra-wedges
Accurate shadows by DCS
Accurate Shadows by Depth Complexity Sampling
Accurate shadows by DCS Accuracy of the off-line soft shadow volumes with + advantages of the real-time penumbra-wedge
Accurate shadows by DCS Accuracy of the off-line soft shadow volumes with + advantages of the real-time penumbra-wedge
No data structure storing silhouette edges
Stream the silhouette edges → save memory and computation time
Accurate shadows by DCS Accuracy of the off-line soft shadow volumes with + advantages of the real-time penumbra-wedge
No data structure storing silhouette edges
Stream the silhouette edges → save memory and computation time
Avoid shooting a shadow ray
Don't require a ray-tracer → better suited for fully dynamic scenes
Accurate shadows by DCS Accuracy of the off-line soft shadow volumes with + advantages of the real-time penumbra-wedge
No data structure storing silhouette edges
Stream the silhouette edges → save memory and computation time
Avoid shooting a shadow ray
Don't require a ray-tracer → better suited for fully dynamic scenes
Planar and omni-directional surfacic light sources
Animated textured lights & semi opaque occluders
Accurate shadows by DCS Accuracy of the off-line soft shadow volumes with + advantages of the real-time penumbra-wedge
No data structure storing silhouette edges
Stream the silhouette edges → save memory and computation time
Avoid shooting a shadow ray
Don't require a ray-tracer → better suited for fully dynamic scenes
Planar and omni-directional surfacic light sources
Animated textured lights & semi opaque occluders
Allow real time performances!
Accurate shadows by DCS
Overview
Store for each fragment the depth complexity with a set of light samples
Problematic
Efficiently compute the depth complexity: Initialization + update of the per fragment depth complexity counters
Store a sufficient amount of per fragment light samples
Define a sampling strategy for the light samples distribution
Depth complexity computation Initialize the depth complexity (avoid shooting a ray)
Use the result of the shadow-volume pass ↔ penumbra-wedge initialization Shadow-volume result = per fragment number of occluding silhouette loops
Update the depth complexity
Use the penumbra-wedge rules ↔ split the wedges in an inner/outer part
light sample 0
occluder 0 0
0
0 0
0
light source
0
+1 +1 +1
0 0
0
outer edge
0
1 1
0
-1 0
0
0
1 1
0
0 0
0
inner edge
Silhouette overlapping naturally handles: increment the depth complexity
Per fragment depth complexity counters Standard buffer stores at most 4 values (RGBA)
Only 4 counters !?
Counter packing representation
Obvious solution: pack several counters in a single value
Vectorized update
Example: 2 counters of 4-bits packed in 1 color channel of 8-bits counter0
counter1
V = 0x00
00000000
counter0 == 0; counter1 == 0
V+= 0x11
00010001
counter0 == 1; counter1 == 1
V+= 0x46
01010111
counter0 == 5; counter1 == 7
V-= 0x21
00110110
counter0 == 3; counter1 == 6
Light samples distribution [Pharr and Humphrey 04]
Decorrelate the sampling pattern ↔ rotate the initial sampling pattern
Avoid visible sampling pattern
Use a stratified sampling strategy
Interleaved sampling [Keller et al. 01] [Segovia et al. 06]
See also GDC: Efficient interleaved sampling with modern graphics hardware and applications
Light samples distribution
64 correlated samples
64 decorrelated samples
16 decorrelated stratified samples
64 decorrelated stratified samples
16 samples + 4x4 interleaved sampling pattern
From depth complexity to visibility coefficient Average the saturate depth complexity D, between the fragment x' and the N samples si,l onto the light l V coef x ' , l =1−
N
N −1
∑ Sat D x ' , si , l i= 0
Introduce explicit sample luminance Li,l ↔ textured lights N −1 1 rgb rgb V coef x ' , l = ∑ L i , l −L i , l Sat D x ' , s i , l N
1
i= 0
Simulate opaque occluders
D is computed by summing the occluders' percentage of opaqueness instead of their number
Sampling of the direct illumination Solve the rendering equation for each sample onto the light sources
Solve the visibility query from the depth complexity V x ' x =H D x ' , x
H is the Heaviside function H x = 1, x0
0, x≤0
For semi opaque occluders, use the Sat function instead
Spacial varying of the luminance naturally handles (textured lights)
L x ' x ' ' =E A
∑ x ∈lightSurface
L x x ' f s x x ' x ' ' V x ' x
coso cos i ' 2
∣∣x−x '∣∣
GPU implementation: sampling distribution
Pre-compute the sampling distribution in a texture Pack several 2D sample positions per texel (save memory and texture fetches)
Sample positions are computed in normalized texture space → a precision of 1 byte for each coordinate is sufficient 8 samples per texture fetch (texture format: RGBA32F)
GPU implementation: shadow-volumes framework Perform the silhouette detection onto the GPU (Geometry Program)
Write the silhouette edges in a Transform Feedback Buffer (TFB) Writing to the TFB is asynchronous → scissor rectangle definition, depth bounds evaluation [Lengyel 05]... can be performed in parallel
Perform shadow-volume extrusion with a GP
Perform the shadow-volume capping in a GP (additional geometric pass) Stencil test in a single pass using a simple fragment program
# face.x = fragment is front-facing ? 1 : -1 ATTRIB face = fragment.facing; ATTRIB fPos = fragment.position; TEMP r0; # texture[0] == zBuffer TEX r0.x, fPos, texture[0], RECT; SGE r0.x, fPos.z, r0.x; MUL result.color.w, r0.x, -face.x;
GPU implementation: wedges rendering Robustly extruded to the infinity with a GP Performances influenced by the number of generated data
One triangle strip per half-wedge → 12 vertexes per half-wedge
Use common fill-rate optimizations and fragment rejection [Lengyel 05]
GPU implementation: counter packing No integer blending
Counter packing = base decomposition
Avoid naive GPU implementation of the counter packing
Each base factor encodes a counter with a precision up to the base-1
float32 are supported for both buffer format and blending
With this precision one can count up to 224-1 without missing an integer Use the following base decomposition of 224-1 to pack 3, 4, 6 or 8 counters in one value 24
2 −1
0
1
2
=
255∗256 256 256
=
63∗64 64 64 64
= =
15∗16 16 16 16 16 16 7∗8 0 81 82 83 8 4 8 5 8 6 8 7
0
1
2
3
0
1
2
3
4
5
GPU implementation: counter packing initialization Base 64 representation
B64 representation = 16 counters per pixel (RGBA32F) Use 4 Multi Render Targets (MRT) to increase the number of counters up to 64
ATTRIB fPos = fragment.position; TEMP r0; # texture[0].w == stencil value # 266305 == 64^0 + 64^1 + 64^2 + 64^3 TEX r0.w, fPos, texture[0], RECT; MUL r0.w, r0.w, 266305; MOV result.color[0], r0.w; MOV result.color[1], r0.w; MOV result.color[2], r0.w; MOV result.color[3], r0.w;
GPU implementation: depth complexity update Naive approach
Check each light sample
Discreet approach (cf figure)
Samples lying in the sector defined by the origin and the vector vx (x in {0, 1}) are encoded in a bit field and stored in a cube map A 2D texture indexed by the orthogonal projection of the light center onto ecp stores the bit field of samples covered by this line Final bit mask of the covered samples = a simple logical combination
GPU implementation: direct illumination Either numerically solve the rendering equation
Merge direct lighting with the visibility queries
Or compute a v_coef to modulate the direct lighting
Require an additional pass
GPU implementation: direct illumination Either numerically solve the rendering equation
Or compute a v_coef to modulate the direct lighting
Merge direct lighting with the visibility queries
Require an additional pass
Add the contribution of the light to the color buffer The interleave sampling strategy requires an additional filtering step combining the interleaved sampling pattern [Segovia et al. 06]
GPU implementation
Results
A complete image rendering
Indirect lighting is approximated by an irradiance-map [Ramamoorthi et al 01]
Stress the algorithm
No light attenuation & no subjective data compression
Implementation
Per pixel lighting (Blinn BRDF [Blinn 77])
OpenGL API + the NV_gpu_program4 extension for the shaders
Workstation
Core 2 Duo E6700, 4GB of DDR2 800Mhz, Geforce 8800 GTX
Videos Doom3: 17,693 visible polygons
Japan: 501,650 visible polygons
Kitchen: 146,131 visible polygons
Discussion
Not the exact solution
But an efficient algorithmic tool
Not handle the indirect illumination
Correct evaluation of the direct illumination for fully dynamic scene
Unified object based framework
Use the same framework and adjust the quality-performance ratio
Thank you
More details in the paper : “Accurate Shadows by Depth Complexity sampling”
Eurographics 14th - 18th April 2008
Soon available at http://www.irit.fr/~Vincent.Forest
References
Assarsson U. Akenine-Möller T.: A geometry-based soft shadow volume algorithm using graphics hardware. ACM Transactions on Graphics, Proc. SIGGRAPH (2003), 511--520 Akenine-Möller T., Assarsson U.: Approximate soft shadows on arbitrary surfaces using penumbra wedges. In Proc. EG Workshop on Rendering Techniques (2002), 297—306 Annen T., Mertens T., Bekaert P., Seidel H.-P., Kautz J.: Convolution shadow maps, In Proc. EG Simposium on Rendering (2007), 51--60 Blinn J.F.: Models of light reflection for computer synthesized pictures. In Proc. SIGGRAPH (1977), ACM Press, 192—198 Crow F.C.: Shadow algorithms for computer graphics. In Proc. SIGGRAPH (1977), ACM Press, 242— 248 Donnely W., Lauritzen A.: Variance shadow maps. In Proc. symposium on Interactive 3D graphics and games (2006), 161--165 Forest V., Barthe L., Paulin M.: Realistc soft shadows by penumbra-wedge blending. In Proc. SIGGRAPH/EUROGRAPHICS conference on graphics hardware (2006), 39—48 Fernando R.: Percentage-close soft shadows. In SIGGRAPH Sketches (2005), ACM Press, 39
References
Guennebaud G., Barthe L., Paulin M.: Real-time soft shadow mapping by backprojection. In Proc. EG Symposium on Rendering (2006), 227—234 Guennebaud G. Barthe L. Paulin M.: High-quality adaptive soft shadow mapping. vol.26 Eurographics (2007), 525--533 Kajiya J.T.: The rendering equation. In Proc. SIGGRAPH (1986), ACM Press, 143—150 Keller A., Heidrich W.: Interleaved sampling. In Proc. Eurographics Workshop on Rendering Techniques (2001), 269—276 Laine S., Aila T., Assarsson U., Lehtinen J., Akenine-Möller T.: Soft shadow volumes for ray tracing. ACM Transactions on Graphics, Proc. SIGGRAPH (2005), 1156—1165 Lengyel E.: Advanced stencil shadow and penumbra wedge rendering. Game Developer Conference (2005) Lehtinen J., Laine S.; Aila T.: An improved physically-based soft shadow volume algorithm. Computer Graphics Forum, Proc. Eurographics (2006), 303—312 Pharr M., Humphreys G.: Physically based rendering: From theory to practice. Morgan Kaufmann 2004
References
Ramamoorthi R., Hanrahan P.: An efficient representation for irradiance environment maps. In Proc. SIGGRAPH (2001), ACM Press, 497—500 Reeves W.T., Salesin D.H., Cook R.L.: Rendering antialised shadows with depth maps. In Proc. SIGGRAPH (1987), 283--291 Segovia B., Iehl J.-C., Mitanchey R., Péroche B.: Non-interleaved deferred shading of interleaved sample patterns. In Proc. SIGGRAPH/EUROGRAPHICS conference on graphics hardware (2006) Schwarz M. Stamminger M.: Bitmask soft shadow, Eurographics (2007), 515--524
Hard shadows Solve the visibility query from the light center
i.e.: the light sources have no area (spot/point lights)
Fragments are either lit or shadowed ↔ binary test
Shadow-maps [Williams 78]: Image based 1) Discretization of the depth of the scene as seen by the light center 2) Check if the fragment is visible by the light
Shadow-volumes [Crow 77]: Object based 1) Define the silhouette edges of the objects from the light center 2) Extrude the silhouette edges ↔ build the shadow volumes 3) Define the fragment in the shadow volume (stencil test)
Soft Shadow Mapping The shadow map is considered as a discreet representation of the scene clipped occluded area
Algorithm: for each fragment
light source
1) Back-project and clip the shadow-map samples onto the light sources shadow-map
2) Subtract the area occluded by each sample
Define the back projected samples [Guennebaud et al. 06] [Schwarz and Stamminger 07]
sample
Light leaking [Guennebaud et al. 06] Back projected samples overlapping [Guennebaud et al. 07] [Schwarz and Stamminger 07]
p
Counter packing
Don't allow negative values
Render outer half-wedges before the inner half-wedges (Avoid temporary negative values)
Store only integer values
Not a problem since the depth complexity is always greater or equal to zero
Discretize and map the opaqueness domain into integers
precision VS number of counters
Maximum depth complexity = maximum number of occluding silhouette loops in the frame ↔ result of the shadow-volumes Adjust the precision/number of the counters according to the maximum value resulting of the shadow-volume pass
Memory Cost Depends only on the algorithm parametrization and the light types in the scene re s o lu tio n : 1 0 2 4 x 1 0 2 4 S ilh o u e tte T F B S a m p le s p o s itio n E d g e L U T P e r lig h t L U T C o lo r+S te n c il b u ffe r Z b u ffe r V b u ffe r D is c o n tin u ity b u ffe r T e m p filte re d b u ffe r D e fe rre d b u ffe r D C b u ffe r
F o rm a t 1 ,3 1 0 ,7 2 0 x 3 2 F 6 4 s a m p le s p a c k e d in R G B A 3 2F (5 1 2 ² + 6 *6 4 ²)*R G B A 3 2 F 6 4 x 6 4 R G B 8 RGBA16F DEPTH_COM PONENT24 RGB8 ALPHA8 RGBA8 RGB32F RGBA32F
M e m o ry c o s t 5M B 128B 4 .3 7 5 M B 12KB 8M B 3M B 3M B 1M B 4M B 12M B 16M B
Performances (time in milli-seconds)
Performances (time in milli-seconds)