-
Basic Traversal Algorithm
- Recursive version (conceptual)
- Iterative version (GPU-friendly — no recursion)
- Use an explicit stack (array of node indices)
-
Ordered Traversal (Optimization)
- Visit the closer child first
- If closer child has a hit, the farther child may be skipped entirely
- How to determine closer child
- Compare AABB intersection distances:
t_left vs t_right
- Push farther child first (so closer child is on top of stack)
- Significant speedup for primary rays (coherent rays)
- Less benefit for secondary rays (incoherent)
-
Shadow Ray Optimization
- Shadow rays only need to know IF there’s a hit, not WHICH hit
- Use
gl_RayFlagsTerminateOnFirstHitEXT in Vulkan
- In software BVH: return immediately on first primitive hit
- Avoids traversing the entire BVH for occlusion queries
-
Stack Overflow
- BVH depth can exceed stack size for degenerate trees
- Typical max depth: 30-40 for well-built BVH
- Stack size of 64 is usually sufficient
- Degenerate case: all primitives in a line → depth = N (use SAH to avoid)
-
GPU Traversal Considerations
- Warp divergence
- Different threads in a warp may take different BVH paths
- Reduces GPU utilization (some threads idle while others traverse)
- Hardware RT (Vulkan) handles this internally — major advantage
- Memory access patterns
- BVH nodes should be in a contiguous buffer (good cache behavior)
- Leaf primitives should be sorted to match BVH order
- Persistent threads
- Keep threads alive across multiple rays (avoid kernel launch overhead)
- Used in production GPU path tracers