Vincent Forest – VORTEX group Accurate Shadows by Depth

Accurate hard shadows ... Accurate off-line shadows using penumbra-wedges .... Efficient interleaved sampling with modern graphics hardware and applications ...
2MB taille 33 téléchargements 55 vues
Vincent Forest – VORTEX group

Accurate Shadows by Depth Complexity Sampling



Problematic in real-time rendering Solve the rendering equation [Kajiya 86]

L  x '  x ' ' =L e  x '  x ' ' ∫ L  x  x ' f s  x  x '  x ' ' V  x   x ' 

dA  x 

Common simplifications for real time rendering 



Consider only the center of the lights Approximate the energy from the other surfaces by a coarse ambient term A

L  x '  x ' ' =E A

∑ x ∈ pointLight



2

∣∣x−x '∣∣

M



coso  cosi ' 

L  x  x '  f s  x  x '  x ' ' V  x   x ' 

What are shadows? 

Shadows are visibility queries

cos o  cosi '  2

∣∣x − x '∣∣

Problematic

Is it possible to generate accurate shadows in real-time?

Hard shadows 

Shadow-maps

S.T.A.L.K.E.R. [GSC Game World 07]

✔ “Independent” of the geometric complexity ✔ Work with all rasterizable primitives Crysis [Crytek 07]

✗ Several renderings for omni-directional lights ✗ Aliasing artifacts 

Shadow-volumes ✔ Accurate hard shadows ↔ ray-traced hard shadows ✗ Dependent of the geometric complexity ✗ Primitive constraints ✗ Fill-rate bottleneck

Doom3 [id-software 04]

F.E.A.R. [Monolith 05]





From hard shadows to soft shadows In ground truth, the light sources have got an area! 

Fragments can be lit, shadowed or in the penumbra



How large is the visible part of light for each fragment ?

Simplifications for real time rendering 

Modulate the fragment color by a visibility coefficient in [0, 1] 

v_coef ↔ percentage of visible light area V  x ' , l =1 coef

N

L  x '  x ' ' =E  A

∑ x ∈lightCenter

L  x  x '  f s  x  x '  x ' '  V coef  x ' , l 

∑V x  x' x ∈l

coso  cos i '  2

∣∣x− x '∣∣

Soft shadows 

Visually plausible soft shadows (v_coef=filtered hard shadows)





Percentage Closer Filtering (PCF) [Reeves 87] [Fernando 05]



Linearize the depth data to allow the pre-filtering of the shadow-maps 

Variance Shadow-Map (VSM) [Donnelly and Lauritzen 06]



Convolution Shadow-Map (CSM) [Annen et al. 07]

Physically plausible soft shadows (v_coef=approximation of occluded area) 

Soft Shadow Mapping (SSM) [Guennebaud et al. 06]



Penumbra-wedge [Assarsson and Akenine-Möller 02 03]



Penumbra-wedge A penumbra-wedge conservatively bounds the penumbra region defined by a silhouette edge silhouette edge



Algorithm 1) Initialize the per fragment v_coef by the result of the shadow-volumes 2) Extrude a penumbra wedge from each silhouette edge 3) For each fragment in the penumbra-wedge, update its v_coef according to the light area occluded by the corresponding silhouette edge

rasterized fragment

fragment in inner penumbra

inner half-wedge outer half-wedge

Penumbra-wedge 

v_coef update 

Project the silhouette edge onto the light source as seen by the fragment



Add/Subtract the corresponding covered percentage

v_coef == 1

occluder

v_coef -= 0.35

v_coef += 0.10

35%

10% light source

outer edge

inner edge

v_coef == 0.75

Penumbra-wedge 

v_coef update 

Project the silhouette edge onto the light source as seen by the fragment



Add/Subtract the corresponding covered percentage

v_coef == 1

occluder

v_coef -= 0.35

v_coef += 0.10

v_coef == 0.75

35%

10% light source 

outer edge

inner edge

Silhouette overlapping [Forest et al. 06] v_coef == 1

v_coef -= 0.20 20%

v_coef -= 0.20 20%

v_coef == 0.60 10% overlapped region

light source

outer edge

outer edge

under-estimation

Soft shadow volumes for ray-tracing [laine et al. 05]

Accurate off-line shadows using penumbra-wedges



Soft shadow volumes for ray-tracing Local reconstruction of the visibility function 

light sample

Define the number of occluders between a light sample and a fragment (i.e.: the depth complexity)

fragment 

Algorithm 







Build a data structure storing occluding silhouette edges [Laine et al. 05] [Lehtinen et al. 06] Retrieve silhouette edges affecting the fragment visibility

occluded area light samples shadow ray



silhouette edge

Update the depth complexities of light samples according to the silhouette edges (Integration rules) Define if the light samples with the lowest depth complexity lit the fragment: shoot a shadow-ray

penumbra-wedges

Accurate shadows by DCS

Accurate Shadows by Depth Complexity Sampling



Accurate shadows by DCS Accuracy of the off-line soft shadow volumes with + advantages of the real-time penumbra-wedge



Accurate shadows by DCS Accuracy of the off-line soft shadow volumes with + advantages of the real-time penumbra-wedge



No data structure storing silhouette edges 

Stream the silhouette edges → save memory and computation time



Accurate shadows by DCS Accuracy of the off-line soft shadow volumes with + advantages of the real-time penumbra-wedge



No data structure storing silhouette edges 



Stream the silhouette edges → save memory and computation time

Avoid shooting a shadow ray 

Don't require a ray-tracer → better suited for fully dynamic scenes



Accurate shadows by DCS Accuracy of the off-line soft shadow volumes with + advantages of the real-time penumbra-wedge



No data structure storing silhouette edges 



Stream the silhouette edges → save memory and computation time

Avoid shooting a shadow ray 

Don't require a ray-tracer → better suited for fully dynamic scenes



Planar and omni-directional surfacic light sources



Animated textured lights & semi opaque occluders



Accurate shadows by DCS Accuracy of the off-line soft shadow volumes with + advantages of the real-time penumbra-wedge



No data structure storing silhouette edges 



Stream the silhouette edges → save memory and computation time

Avoid shooting a shadow ray 

Don't require a ray-tracer → better suited for fully dynamic scenes



Planar and omni-directional surfacic light sources



Animated textured lights & semi opaque occluders



Allow real time performances!

Accurate shadows by DCS 

Overview 



Store for each fragment the depth complexity with a set of light samples

Problematic 

Efficiently compute the depth complexity: Initialization + update of the per fragment depth complexity counters



Store a sufficient amount of per fragment light samples



Define a sampling strategy for the light samples distribution



Depth complexity computation Initialize the depth complexity (avoid shooting a ray) 





Use the result of the shadow-volume pass ↔ penumbra-wedge initialization Shadow-volume result = per fragment number of occluding silhouette loops

Update the depth complexity 

Use the penumbra-wedge rules ↔ split the wedges in an inner/outer part

light sample 0

occluder 0 0

0

0 0

0

light source 

0

+1 +1 +1

0 0

0

outer edge

0

1 1

0

-1 0

0

0

1 1

0

0 0

0

inner edge

Silhouette overlapping naturally handles: increment the depth complexity



Per fragment depth complexity counters Standard buffer stores at most 4 values (RGBA) 



Only 4 counters !?

Counter packing representation 

Obvious solution: pack several counters in a single value



Vectorized update



Example: 2 counters of 4-bits packed in 1 color channel of 8-bits counter0

counter1

V = 0x00

00000000

counter0 == 0; counter1 == 0

V+= 0x11

00010001

counter0 == 1; counter1 == 1

V+= 0x46

01010111

counter0 == 5; counter1 == 7

V-= 0x21

00110110

counter0 == 3; counter1 == 6

Light samples distribution [Pharr and Humphrey 04] 

Decorrelate the sampling pattern ↔ rotate the initial sampling pattern 

Avoid visible sampling pattern



Use a stratified sampling strategy



Interleaved sampling [Keller et al. 01] [Segovia et al. 06] 

See also GDC: Efficient interleaved sampling with modern graphics hardware and applications

Light samples distribution

64 correlated samples

64 decorrelated samples

16 decorrelated stratified samples

64 decorrelated stratified samples

16 samples + 4x4 interleaved sampling pattern



From depth complexity to visibility coefficient Average the saturate depth complexity D, between the fragment x' and the N samples si,l onto the light l V coef  x ' , l =1−



N

N −1

∑ Sat  D  x ' , si , l  i= 0

Introduce explicit sample luminance Li,l ↔ textured lights N −1 1 rgb rgb V coef  x ' , l = ∑ L i , l −L i , l Sat  D  x ' , s i , l  N



1

i= 0

Simulate opaque occluders 

D is computed by summing the occluders' percentage of opaqueness instead of their number



Sampling of the direct illumination Solve the rendering equation for each sample onto the light sources



Solve the visibility query from the depth complexity V  x '   x =H  D x ' , x  

H is the Heaviside function H  x = 1, x0

0, x≤0





For semi opaque occluders, use the Sat function instead

Spacial varying of the luminance naturally handles (textured lights)

L  x '  x ' ' =E  A

∑ x ∈lightSurface

L  x  x '  f s  x  x '  x ' '  V  x '   x

coso  cos i '  2

∣∣x−x '∣∣

GPU implementation: sampling distribution





Pre-compute the sampling distribution in a texture Pack several 2D sample positions per texel (save memory and texture fetches) 



Sample positions are computed in normalized texture space → a precision of 1 byte for each coordinate is sufficient 8 samples per texture fetch (texture format: RGBA32F)



GPU implementation: shadow-volumes framework Perform the silhouette detection onto the GPU (Geometry Program) 





Write the silhouette edges in a Transform Feedback Buffer (TFB) Writing to the TFB is asynchronous → scissor rectangle definition, depth bounds evaluation [Lengyel 05]... can be performed in parallel

Perform shadow-volume extrusion with a GP 



Perform the shadow-volume capping in a GP (additional geometric pass) Stencil test in a single pass using a simple fragment program

# face.x = fragment is front-facing ? 1 : -1 ATTRIB face = fragment.facing; ATTRIB fPos = fragment.position; TEMP r0; # texture[0] == zBuffer TEX r0.x, fPos, texture[0], RECT; SGE r0.x, fPos.z, r0.x; MUL result.color.w, r0.x, -face.x;

 

GPU implementation: wedges rendering Robustly extruded to the infinity with a GP Performances influenced by the number of generated data 



One triangle strip per half-wedge → 12 vertexes per half-wedge

Use common fill-rate optimizations and fragment rejection [Lengyel 05]



GPU implementation: counter packing No integer blending 



Counter packing = base decomposition 



Avoid naive GPU implementation of the counter packing

Each base factor encodes a counter with a precision up to the base-1

float32 are supported for both buffer format and blending 



With this precision one can count up to 224-1 without missing an integer Use the following base decomposition of 224-1 to pack 3, 4, 6 or 8 counters in one value 24

2 −1

0

1

2

=

255∗256 256 256 

=

63∗64 64 64 64 

= =

15∗16 16 16 16 16 16  7∗8 0 81 82 83 8 4 8 5 8 6 8 7 

0

1

2

3

0

1

2

3

4

5



GPU implementation: counter packing initialization Base 64 representation 



B64 representation = 16 counters per pixel (RGBA32F) Use 4 Multi Render Targets (MRT) to increase the number of counters up to 64

ATTRIB fPos = fragment.position; TEMP r0; # texture[0].w == stencil value # 266305 == 64^0 + 64^1 + 64^2 + 64^3 TEX r0.w, fPos, texture[0], RECT; MUL r0.w, r0.w, 266305; MOV result.color[0], r0.w; MOV result.color[1], r0.w; MOV result.color[2], r0.w; MOV result.color[3], r0.w;



GPU implementation: depth complexity update Naive approach 



Check each light sample

Discreet approach (cf figure) 





Samples lying in the sector defined by the origin and the vector vx (x in {0, 1}) are encoded in a bit field and stored in a cube map A 2D texture indexed by the orthogonal projection of the light center onto ecp stores the bit field of samples covered by this line Final bit mask of the covered samples = a simple logical combination



GPU implementation: direct illumination Either numerically solve the rendering equation 



Merge direct lighting with the visibility queries

Or compute a v_coef to modulate the direct lighting 

Require an additional pass



GPU implementation: direct illumination Either numerically solve the rendering equation 



Or compute a v_coef to modulate the direct lighting 

 

Merge direct lighting with the visibility queries

Require an additional pass

Add the contribution of the light to the color buffer The interleave sampling strategy requires an additional filtering step combining the interleaved sampling pattern [Segovia et al. 06]

GPU implementation

Results 

A complete image rendering 

Indirect lighting is approximated by an irradiance-map [Ramamoorthi et al 01]





Stress the algorithm 



No light attenuation & no subjective data compression

Implementation 



Per pixel lighting (Blinn BRDF [Blinn 77])

OpenGL API + the NV_gpu_program4 extension for the shaders

Workstation 

Core 2 Duo E6700, 4GB of DDR2 800Mhz, Geforce 8800 GTX

Videos Doom3: 17,693 visible polygons

Japan: 501,650 visible polygons

Kitchen: 146,131 visible polygons

Discussion 

Not the exact solution 



But an efficient algorithmic tool 



Not handle the indirect illumination

Correct evaluation of the direct illumination for fully dynamic scene

Unified object based framework 

Use the same framework and adjust the quality-performance ratio

Thank you



More details in the paper : “Accurate Shadows by Depth Complexity sampling” 

Eurographics 14th - 18th April 2008



Soon available at http://www.irit.fr/~Vincent.Forest

References 















Assarsson U. Akenine-Möller T.: A geometry-based soft shadow volume algorithm using graphics hardware. ACM Transactions on Graphics, Proc. SIGGRAPH (2003), 511--520 Akenine-Möller T., Assarsson U.: Approximate soft shadows on arbitrary surfaces using penumbra wedges. In Proc. EG Workshop on Rendering Techniques (2002), 297—306 Annen T., Mertens T., Bekaert P., Seidel H.-P., Kautz J.: Convolution shadow maps, In Proc. EG Simposium on Rendering (2007), 51--60 Blinn J.F.: Models of light reflection for computer synthesized pictures. In Proc. SIGGRAPH (1977), ACM Press, 192—198 Crow F.C.: Shadow algorithms for computer graphics. In Proc. SIGGRAPH (1977), ACM Press, 242— 248 Donnely W., Lauritzen A.: Variance shadow maps. In Proc. symposium on Interactive 3D graphics and games (2006), 161--165 Forest V., Barthe L., Paulin M.: Realistc soft shadows by penumbra-wedge blending. In Proc. SIGGRAPH/EUROGRAPHICS conference on graphics hardware (2006), 39—48 Fernando R.: Percentage-close soft shadows. In SIGGRAPH Sketches (2005), ACM Press, 39

References 















Guennebaud G., Barthe L., Paulin M.: Real-time soft shadow mapping by backprojection. In Proc. EG Symposium on Rendering (2006), 227—234 Guennebaud G. Barthe L. Paulin M.: High-quality adaptive soft shadow mapping. vol.26 Eurographics (2007), 525--533 Kajiya J.T.: The rendering equation. In Proc. SIGGRAPH (1986), ACM Press, 143—150 Keller A., Heidrich W.: Interleaved sampling. In Proc. Eurographics Workshop on Rendering Techniques (2001), 269—276 Laine S., Aila T., Assarsson U., Lehtinen J., Akenine-Möller T.: Soft shadow volumes for ray tracing. ACM Transactions on Graphics, Proc. SIGGRAPH (2005), 1156—1165 Lengyel E.: Advanced stencil shadow and penumbra wedge rendering. Game Developer Conference (2005) Lehtinen J., Laine S.; Aila T.: An improved physically-based soft shadow volume algorithm. Computer Graphics Forum, Proc. Eurographics (2006), 303—312 Pharr M., Humphreys G.: Physically based rendering: From theory to practice. Morgan Kaufmann 2004

References 







Ramamoorthi R., Hanrahan P.: An efficient representation for irradiance environment maps. In Proc. SIGGRAPH (2001), ACM Press, 497—500 Reeves W.T., Salesin D.H., Cook R.L.: Rendering antialised shadows with depth maps. In Proc. SIGGRAPH (1987), 283--291 Segovia B., Iehl J.-C., Mitanchey R., Péroche B.: Non-interleaved deferred shading of interleaved sample patterns. In Proc. SIGGRAPH/EUROGRAPHICS conference on graphics hardware (2006) Schwarz M. Stamminger M.: Bitmask soft shadow, Eurographics (2007), 515--524





Hard shadows Solve the visibility query from the light center 

i.e.: the light sources have no area (spot/point lights)



Fragments are either lit or shadowed ↔ binary test

Shadow-maps [Williams 78]: Image based 1) Discretization of the depth of the scene as seen by the light center 2) Check if the fragment is visible by the light



Shadow-volumes [Crow 77]: Object based 1) Define the silhouette edges of the objects from the light center 2) Extrude the silhouette edges ↔ build the shadow volumes 3) Define the fragment in the shadow volume (stencil test)



Soft Shadow Mapping The shadow map is considered as a discreet representation of the scene clipped occluded area



Algorithm: for each fragment

light source

1) Back-project and clip the shadow-map samples onto the light sources shadow-map

2) Subtract the area occluded by each sample 

Define the back projected samples [Guennebaud et al. 06] [Schwarz and Stamminger 07]

 

sample

Light leaking [Guennebaud et al. 06] Back projected samples overlapping [Guennebaud et al. 07] [Schwarz and Stamminger 07]

p

Counter packing 

Don't allow negative values 





Render outer half-wedges before the inner half-wedges (Avoid temporary negative values)

Store only integer values 



Not a problem since the depth complexity is always greater or equal to zero

Discretize and map the opaqueness domain into integers

precision VS number of counters 



Maximum depth complexity = maximum number of occluding silhouette loops in the frame ↔ result of the shadow-volumes Adjust the precision/number of the counters according to the maximum value resulting of the shadow-volume pass



Memory Cost Depends only on the algorithm parametrization and the light types in the scene re s o lu tio n : 1 0 2 4 x 1 0 2 4   S ilh o u e tte  T F B S a m p le s  p o s itio n E d g e  L U T P e r lig h t L U T C o lo r+S te n c il b u ffe r Z ­b u ffe r V ­b u ffe r D is c o n tin u ity  b u ffe r T e m p  filte re d  b u ffe r D e fe rre d  b u ffe r D C ­b u ffe r

F o rm a t 1 ,3 1 0 ,7 2 0  x  3 2 F 6 4  s a m p le s  p a c k e d  in  R G B A 3 2F (5 1 2 ² + 6 *6 4 ²)*R G B A 3 2 F 6 4 x 6 4  R G B 8 RGBA16F DEPTH_COM PONENT24 RGB8 ALPHA8 RGBA8 RGB32F RGBA32F

M e m o ry  c o s t 5M B 128B 4 .3 7 5 M B 12KB 8M B 3M B 3M B 1M B 4M B 12M B 16M B

Performances (time in milli-seconds)

Performances (time in milli-seconds)